Similarweb research supports Twitter’s assertion that bots represent less than 5% of monetizable users, while also indicating that between 21% to 29% of Twitter’s monetizable content comes from bots.
Twitter management says that only 5% of monetizable user accounts on its social networking service are controlled by bots – a claim hotly disputed by Elon Musk as he tries to extract himself from a commitment to buy the company. One of many arguments about the value of Twitter as a company, or as an investment, revolves around whether bots make up a big share of what Twitter calls monetizable daily active users (mDAUs) – the living, breathing humans advertisers care about reaching.
Similarweb research suggests bots are not a large percentage of Twitter’s monetizable user base. While other research has focused on trying to estimate the size of the bot population, Similarweb’s data science team took the opposite approach. Working with a data set enriched by data from an enrolled panel of web and mobile app users, our researchers attempted something Twitter management has suggested was likely impossible: creating its own estimate of the mDAU audience.
Similarweb researchers found no evidence to support Musk’s claim that up to 20% of mDAUs are bots – based on our findings, Twitter’s 5% bot estimate looks reasonable. While their research focused on the behavior of known human users on Twitter, it allowed them to make inferences about the remaining fraction of activity on the social network attributable to automated programs, or bots, posting or reposting content.
“We looked at the problem of verifying the reported mDAU by Twitter from a unique perspective,” Similarweb CTO Ron Asher said. “We use a panel where we have confidence that the panelists are real human users. With that information in hand and the proper models, we were able to accurately analyze the population of the reported mDAU by Twitter. In a nutshell, we believe that our data supports Twitter’s position on this point.”
However, Similarweb’s analysis also suggests that bots generate between 20.8% and 29.2% of the content posted to Twitter. This is based on the observation that a small number of accounts seem to generate most of the content on the site – and third-party research estimates that bots generate 1.57x as much content. That may not undermine Twitter’s mDAU statistics, but it does seem to undercut the opportunities for real human interaction on Twitter that could be considered valuable.
To be clear, our study leans on the work of other researchers for estimates of bot activity and behavior. Those sources are spelled out in more detail in the accompanying research report. Our original research focuses instead on positive identification of human activity on Twitter, the categorization of interactions with the service (active versus inactive accounts and authenticated versus unauthenticated users), and patterns of content creation.
Similarweb is a publicly traded (NYSE: SMWB) digital intelligence company that analyzes web and mobile app data for insights. . Similarweb customers include marketing and product development organizations who use its solutions to optimize their online businesses as well as investors and others who seek to better understand the competitive position of organizations in the digital world. Customers range from midsize businesses to the Fortune 500, including major names like Morgan Stanley, Google, and Adobe, and Pepsico. The insights Similarweb provides would not be possible without the backing of a data science team that applies artificial intelligence and other analytic techniques to modeling the data.
- Similarweb’s estimates of Twitter’s mDAU correlate with Twitter’s reported results over the past twelve months. Referring to the most recent point of comparison as an example, Similarweb estimates that US mDAU in the second quarter of 2022 numbered 40.3 million, which is only 2.8% lower than the 41.5 million US mDAU Twitter reported for the period.
- Similarweb estimates that 19% of mDAU are responsible for all of the content created on Twitter.
- Given Twitter’s relative proportion of 5% bots to mDAU presumption, Similarweb estimates that 20.8% to 29.2% of Twitter’s monetizable content is created by bots, which seems likely to influence the overall user experience for real users.
With the right datasets and approach, Twitter’s mDAU can be estimated from an external point of view
On May 16, 2022, Twitter CEO Parag Agrawal published a series of tweets addressing the company’s official estimate that 5% of mDAUs are spam accounts the company has been unable to weed out. “Our estimate [of the mDAU of real users] is based on multiple human reviews (in replica) of thousands of accounts, that are sampled at random, consistently over time, from accounts we count as mDAUs,” he wrote, adding that “Unfortunately, we don’t believe that this specific estimation can be performed externally.”
The data scientists at Similarweb took that as a challenge and began working to determine what could be deduced about the character of Twitter’s network from the outside.
After an exhaustive investigation, the researchers concluded that it is possible to estimate the mDAU of Twitter using Similarweb’s extensive datasets (user panels, direct measurements, publicly available data) with statistical analysis that they believe yields results within a reasonable degree of error. The results are shown below:
“Regardless of how this research may or may not play as a story for the media, this project was worthwhile for our data science team to undertake for its own purposes, forcing us to stretch our capabilities for understanding patterns of traffic and user behavior for social networking services,” said Shai Dekel, VP of Artificial Intelligence for Similarweb. “From our perspective, understanding human interaction with the web and mobile apps is the most important thing – and still allows us to make inferences about the activity that does not originate with a person.”
Data suggests that a small number of mDAU on Twitter create virtually all of the monetizable content
Similarweb data scientists found a unique trend related to the content creation activity of Twitter’s mDAU. In the figure below, we provide the researchers’ estimates for Twitter’s US user distribution of authenticated and unauthenticated users (users that do not have a Twitter account or users that consume Twitter content without logging into their account). Our researchers further segmented the authenticated Twitter users into active and inactive users, where an active user is defined as a user who created content at least once within the relevant time period. The Twitter content can be a tweet or a reply to a tweet. Interestingly, a significant number of unauthenticated users seem to access Twitter web content through mobile devices, choosing to bypass the app.
Distribution of authenticated and unauthenticated Twitter mDAU in the United States
Looking at the activity distribution presented, our data scientists concluded that the data appears to show that only 19% of the real US Twitter authenticated users seem to generate content on any given day.
The 5% bots estimate doesn’t matter as much as what those bots seem to do
As many observers have pointed out, identifying and classifying bots is not easy. Any software program that posts or reposts Twitter content without direct human supervision could be called a bot, but not all bots are evil. Twitter provides an official API for bot programming, in essence acknowledging the value of bots that tweet out weather reports or retweet posts on specific topics, like the WordPress bot you can follow if you want to see posts that mention #WordPress, wordpress.org or WordCamp training events. The issue is more with bots that evade that officially sanctioned path, faking human interaction. Often, they do this so they can artificially inflate sentiment for and against stocks, products, politicians, or causes. Twitter says it tries to identify and eliminate as many of these frauds as possible but admits those countermeasures are not perfect.
Twitter has stated that they believe that up to 5% of their mDAU at any time are bots generating content via tweets or replies. To evaluate this assertion, our data scientists first went through a process of identifying and eliminating bias from the potential presence of bot activity in Similarweb’s own sampling of user accounts, concluding that no more than 0.07% of the accounts might be suspect. That quality check allowed them to be confident that their analysis of human activity on the network was consistent with Twitter’s estimate of the other 5% made up of bots.
So, assuming Twitter’s estimate that 5% of mDAU are bots is correct, and combining this with our researchers’ estimate that 19% of mDAU create all content, this implies that approximately 20.8% of the US content on Twitter is generated by bots. This estimate correlates with SparkToro’s research, which evaluates the fraction of data bot activity from the total Twitter activity. However, that’s only the low estimate.
Given that bots generate an estimated 1.57x more content than humans, our data suggests that bots likely generate 29.2% of US content on Twitter.
In short, Similarweb estimates that 20.8% to 29.2% of US content on Twitter was generated by bots, rather than posted by humans.
Context matters when assessing monetizable users on Twitter
Twitter management introduced the concept of mDAU on Twitter as a means to help explain the value of the platform to external stakeholders. Until now, there has not been an external point of view on this metric, with a thorough analysis of the estimated amount of mDAUs or the distribution of human versus bot-created content.
Similarweb’s research shows it is possible to estimate these metrics based on external measures, and our data scientists believe it’s important to continue evaluating these numbers to understand Twitter’s monetizable potential as a company and the value of its network to users and advertisers. We found no evidence that Twitter’s management is wrong to claim that bots represent only 5% of mDAUs, but we see cause for concern that a significant amount of Twitter content (20.8% to 29.2% of US content) may be bot-generated.
The Similarweb data scientists who conducted this research include Moran Abelev, Ariel Azia PhD, Shlomi Babluki, Eyal Ben-Eliezer PhD, Shai Dekel PhD, Itzhak Dvash PhD, Raphael Peretz and Adam Soffer.
The Similarweb Insights & Communications team is available to pull additional or updated data on request for the news media (journalists are invited to write to firstname.lastname@example.org). When citing our data, please reference Similarweb as the source and link back to the most relevant blog post or similarweb.com/corp/blog/insights/.
Contact: For more information, please write to email@example.com.
Citation and permission of use: Please refer to Similarweb as a digital intelligence platform. To use the data presented in this report, we request a link back to www.similarweb.com and to this blog post.
Report By: Raymond “RJ” Jones, VP Communications & Insights, and David Carr, Sr. Insights Manager
Disclaimer: This report is presented for informational purposes only. Under no circumstances are the materials to be considered or relied upon in any manner as legal or investment advice. All data, reports and other materials provided or made available by Similarweb are estimations and extrapolations based on data obtained from third parties. Similarweb shall not be responsible for the accuracy or completeness of the data, reports or materials presented herein, and shall have no liability for any decision made or action taken by any third party based in whole or in part on the data, reports and materials.
Wondering what Similarweb can do for you?
Here are two ways you can get started with Similarweb today!