I don’t use Twitter much anymore, but I still help maintain some tools like twarc for helping to work with it. Twitter announced a big change in that they are now going to allow certain users to edit tweets.

I haven’t read much about the reception of this, but I do know it has been requested for a long time. I do like how Twitter took a tip from Wikipedia and journalistic practice by implementing it with a publicly viewable version history. I also like that each version gets its own tweet id, and distinct URL. They could have chosen to have the previous version of a tweet redirect to the latest, but I’m glad they didn’t.

Sam Hames quickly modified twarc so that it collects version history information from the API automatically, so users don’t have to remember to request it. This was released as part of twarc v2.12.0 over the weekend. This means when you collect data from Twitter you will get some extra information about whether a tweet has previous versions.

I thought it might be interesting to experiment with a small twarc plugin that would let you find edited tweets in previously collected tweets, which is now available as twarc-edits.

To use it you first need to collect some data:

$ twarc2 search politics tweets.jsonl

Then you can extract any edited tweets:

$ twarc2 edits tweets.jsonl edits.jsonl

If you like you can chain them together in a pipeline if you don’t care about the full dataset:

$ twarc2 search politics | twarc2 edits - edits.jsonl

If it was possible to search for edited tweets via the API this would obviate the need for this plugin. But it doesn’t seem possible to do that yet?

After a bit of experimentation it seems that edited tweets do appear in Search API results, although I’m not sure about the filter stream. It also looks like timeline API requests will not include edited tweets, and only return the most current version, which makes sense I suppose?

Before realizing the edit functionality hadn’t been rolled out to the US yet I had initially started working on a small program to look for edits from members of the US Congress. But after noodling on this for a bit I decided it made more sense to have a generic way of composing edit filtering with other twarc functions, rather than having it bound up in one particular program.

For example I wanted to follow all the member’s Twitter accounts, and watch for edits. But now that can be done by adding some filter stream rules:

$ twarc2 stream-rules add "from:1343627416120532992 from:1339633259349745670 from:1337452596093587462 from:1344481217685692416 from:1345536897838436353 from:1343646545955213312"
$ twarc2 stream | twarc edits edits.jsonl

How do you get all the Twitter user ids for US Congresspeople? With the (amazing) unitedstates/congress-legislators dataset of course! You have to join two YAML files together, so here’s a little program I was using to do that:

#!/usr/bin/env python3

# Use https://github.com/unitedstates/ data to get JSON for current members of
# congress including their social media accounts.
#
# Note: you will need to pip install requests and pyyaml

import json
import yaml
import requests

def main():
    print(json.dumps(get_people(), indent=2))

def get_people():
    people = {}

    for person in get("https://raw.githubusercontent.com/unitedstates/congress-legislators/main/legislators-current.yaml"):
        person['social'] = None
        people[person['id']['bioguide']] = person

    # join social media data
    for social in get("https://raw.githubusercontent.com/unitedstates/congress-legislators/main/legislators-social-media.yaml"):
        people[social['id']['bioguide']]['social'] = social['social']

    return list(people.values())

def get(url):
    """Fetch and parse some YAML.
    """
    return yaml.safe_load(requests.get(url).text)

if __name__ == "__main__":
    main()