In the lead up to the 2020 US Presidential Election Twitter implemented new labels for government officials, organizations and state-affiliated media accounts. This was a follow on from their previous ban on state-backed political advertising in 2019.
By their own description Twitter apply these labels to:
- Accounts of key government officials, including foreign ministers, institutional entities, ambassadors, official spokespeople, and key diplomatic leaders. At this time, our focus is on senior officials and entities who are the official voice of the state abroad.
- Accounts belonging to state-affiliated media entities, their editors-in-chief, and/or their senior staff.
Here is an example of a government official (look for the label “United Kingdom government official” just underneath their Twitter handle).
More importantly the label takes up significant screen real estate in each of that user’s tweets:
And here is a government organization:
Known government run media organizations have labels too:
But it’s important to note that not all government run accounts will have the labels. Here is the verified account for the Prime Minister of Pakistan.
How the Prime Minister of Pakistan’s office gets verified and not added to a list of known government accounts is hard to imagine. It probably says something about how difficult it is to uniformly apply these labels, and also keep them up to date. It would take some research to find out, but it may also say something about the political biases of the United States and Silicon Valley. In case you were wondering, no the current Prime Minister of Pakistan doesn’t have a label either:
Over in the Documenting the Now Slack we had some questions recently about where to find these government labels in the data our tools collect. We took a look and it doesn’t appear that these labels are made available through either the v1.1 or v2 API endpoints. If this is wrong please get in touch!
Not making this information available is unfortunate because these labels are highly significant pieces of metadata for researchers who are studying the influence of government in social media conversations. They also provide a window in on how Twitter themselves see the governments of the world through their categorization rules and processes.
Since this information is only available through the web interface I recently spent some time adding functionality to the snscrape utility to extract the labels, as well as the label URLs. The label URLs are useful because, while they aren’t as specific as the label descriptions, they can sometimes be used to group together different language variants of the same label.
This morning I tested it out by collecting 60,000 tweets that mention the word “covid19”.
snscrape --jsonl twitter-search covid19 > results.jsonl
I then wrote a simple program to read the JSON and count the labels that were present:
|China state-affiliated media
|Iran state-affiliated media
|Çin devletine bağlı medya
|Медиј који сарађује са владом Србија
|China government official
|Thailand government organization
|Russia state-affiliated media
|Média affilié à un État, Russie
|Média affilié à un État, Chine
|Representante gubernamental de Cuba
|Organisation du gouvernement - France
|Organización gubernamental de España
|Cuba - Funzionario di Stato
|Medios afiliados al gobierno, China
|Russia government account
|Lembaga pemerintah Indonesia
|Organización gubernamental de Cuba
|Canada government official
|Cina - Organizzazione governativa
|Medios afiliados al gobierno, Honduras
|Italia - Funzionario di Stato
However for that same dataset of tweets there were only these three Label URLs used:
So while the URLs don’t provide anywhere near the level of granularity that the descriptions do, they could be useful for grouping together language variants of the same label like:
- China state-affiliated media
- Média affilié à un État, Chine
- Medios afiliados al gobierno, China
Right now my modification to the snscrape tool is in a pull request. If you think it might be useful in your own work please go and give it a thumbs up. Normally I’m not a huge advocate of scraping social media. But when it comes to data that is not available through sanctioned channels (APIs), and the data is being created (and gatekeeped) by powerful entities we aren’t left with much of a choice.
Update 2021-09-20: the PR has been merged, so it’s part of snscrape proper now!