How many bots are there on Twitter? The question is difficult to answer and misses the point

By Kai Cheng Yang, Indiana University and Philippe Menczer, Indiana University

Twitter reports that less than 5% of accounts are fake or spammers, commonly referred to as “bots”. Since his offer to buy Twitter was accepted, Elon Musk has repeatedly questioned these estimates, even going so far as to dismiss Public response from Managing Director Parag Agrawal.

Later Musk put the case on hold and demanded more evidence.

So why are people arguing over the percentage of bot accounts on Twitter?

As creators of Botometera widely used bot detection tool, our group at Indiana University Social Media Observatory has been investigating inauthentic accounts and social media manipulation for over a decade. We brought the concept ofsocial botin the foreground and first estimate their prevalence on Twitter in 2017.

Based on our knowledge and experience, we believe that estimating the percentage of bots on Twitter has become a very difficult task, and debating the accuracy of the estimate might miss the point. Here’s why.

What exactly is a bot?

To measure the prevalence of problematic accounts on Twitter, a clear definition of targets is necessary. Common terms such as “fake accounts”, “spam accounts”, and “bots” are used interchangeably, but they have different meanings. Fake or bogus accounts are those that impersonate people. Accounts that mass-produce unsolicited promotional content are defined as spammers. Bots, on the other hand, are accounts controlled in part by software; they can post content or perform simple interactions, like retweeting, automatically.

These types of accounts often overlap. For example, you can create a bot that pretends to be a human to automatically post spam. Such an account is both a bot, a spammer and a fake. But not all fake accounts are bots or spammers, and vice versa. Arriving at an estimate without a clear definition only produces misleading results.

Defining and distinguishing account types can also inform appropriate interventions. Fake accounts and spam emails degrade the online and violent environment platform policy. Malicious bots are used to spread false information, inflate popularity, exacerbate conflict with negative and inflammatory content, manipulate opinions, influence elections, commit financial fraud and disrupt communication. However, some bots may be harmless or even usefulfor example by helping to disseminate information, issuing disaster alerts and conduct research.

Simply banning all bots is not in the best interests of social media users.

For simplicity, researchers use the term “inauthentic accounts” to refer to the collection of fake accounts, spammers, and malicious bots. This is also the definition that Twitter seems to be using. However, it’s unclear what Musk has in mind.

hard to count

Even when consensus is reached on a definition, there remain technical challenges in estimating prevalence.

a network graph showing a circle made up of groups of colored dots with lines connecting some of the dots
Networks of coordinated accounts spreading COVID-19 information from untrustworthy sources on Twitter in 2020. Pik-Mai Hui

External researchers do not have access to the same data as Twitter, such as IP addresses and phone numbers. This hinders the ability of the public to identify inauthentic accounts. But even Twitter acknowledges that the actual number of inauthentic accounts could be higher than he estimatedbecause detection is difficult.

Inauthentic accounts evolve and develop new tactics to evade detection. For example, some fake accounts use AI-generated faces as profiles. These faces may be indistinguishable from the real ones, even to humans. Identifying these accounts is difficult and requires new technologies.

Another difficulty is posed by coordinated accounts who appear to be normal individually but act so similar to each other that they are almost certainly controlled by a single entity. Yet they are like needles in the haystack of hundreds of millions of daily tweets.

Finally, inauthentic accounts can evade detection by techniques such as swap handles or automatically publication and deletion large volumes of content.

The distinction between inauthentic and genuine accounts is becoming increasingly blurred. Accounts can be hacked, bought or rentedand some users “give” their credentials to organizations who publish on their behalf. Consequently, what is called cyborg accounts are controlled by both algorithms and humans. Similarly, spammers sometimes post legitimate content to disguise their activity.

We observed a wide spectrum of behaviors mixing characteristics of bots and people. Estimating the prevalence of inauthentic accounts requires applying a simplistic binary classification: authentic or inauthentic account. No matter where the line is drawn, mistakes are inevitable.

Missing the overview

The recent debate’s focus on estimating the number of Twitter bots oversimplifies the issue and misses the importance of quantifying the harm caused by online abuse and manipulation by inauthentic accounts. .

screenshot of a web form
Screenshot of the BotAmp app comparing likely bot activity around two topics on Twitter. Kaicheng Yang

By BotAmp, a new tool in the Botometer family that anyone with a Twitter account can use, we found that the presence of automated activity is not evenly distributed. For example, cryptocurrency chat tends to show more bot activity than chat chat. Therefore, whether the overall prevalence is 5% or 20% makes little difference to individual users; their experiences with these accounts depend on the people they follow and the topics they care about.

Recent evidence suggests that inauthentic narratives may not be solely responsible for the spread of misinformation, hate speech, polarization and radicalization. These issues usually involve many human users. For example, our analysis shows that misinformation about COVID-19 has been openly spread on Twitter and Facebook by verified, high level accounts.

Even if it were possible to accurately estimate the prevalence of inauthentic accounts, this would not solve these problems. A significant first step would be to recognize the complex nature of these issues. This will help social media platforms and policy makers craft meaningful responses.

Kai Cheng YangPhD student in Computer Science, Indiana University and Philippe Menczerteacher of informatics and informatics, Indiana University

This article is republished from The conversation under Creative Commons license. Read it original article.