When Google Met WikiLeaks

When Google Met WikileaksWhen Google Met Wikileaks by Julian Assange
My rating: 4 of 5 stars

This book is primarily the transcript of a conversation between Julian Assange and Eric Schmidt (then CEO of Google) and Jared Cohen for their book The New Digital Age. The transcript is also available in its entirety (fittingly) on the WikiLeaks website along with the actual audio of the conversation. The transcript is book-ended by several essays: Beyond Good and “Don’t Be Evil”, the Banality of “Don’t Be Evil” (also published in New York Times) and Deliver us from “Don’t Be Evil”.

Assange read The New Digital Age and wasn’t happy with the framing of the conversation, or the degree to which his interview wasn’t included. When Google Met WikiLeaks is Assange’s attempt to reframe the discussion in terms of the future of publishing, information and the Internet. In particular Assange takes issue with Schmidt and Cohen’s assertion that:

The information released on WikiLeaks put lives at risk and inflicted serious diplomatic damage.

Schmidt and Cohen offer no source for this bold assertion, and in a note they equate WikiLeaks with minimally enabling espionage, again with no citation. Assange makes the case that WikiLeaks is actually in the business of publishing and journalism, not secretly selling information for private gain. I think Assange does this, but more importantly, he presents a view of the near future of the Internet, that is presaged by WikiLeaks, which is actually interesting and compelling. The transcript itself is heavily annotated with footnotes, many of which have URLs, that are archived at archive.today.

For me the most interesting parts of the book center on what Assange calls the Naming of Things:

The naming of human intellectual work and our entire intellectual record is possibly the most important thing. So we all have words for different objects, like “tomato.” But we use a simple word, “tomato,” instead of actually describing every little aspect of this god damn tomato…because it takes too long. And because it takes too long to describe this tomato precisely we use an abstraction so we can think about it so we can talk about it. And we do that also when we use URLs. Those are frequently used as a short name for some human intellectual content. And we build all of our civilization, other than on bricks, on human intellectual content. And so we currently have system with URLs where the structure we are building our civilization out of is the worst kind of melting plasticine imaginable. And that is a big problem.


Transcript of secret meeting between Julian Assange and Google CEO Eric Schmidt

This particular section goes on to talk about some really interesting topics: such as the effects of right to be forgotten laws, DNS, Bittorrent magnet URIs, how not to pick ISPs, hashing algorithms, digital signatures, public key cryptography, Bitcoin, NameCoin, flood networks, and distributed hash tables. The fascinating thing is that Schmidt is asking Assange for these details to understand how WikiLeaks operates; but Assange’s response is to discuss some general technologies that may influence a new kind of Web of documents. A Web where identity matters, where documents are signed and mirrored, republished and resilient.

Assange has been largely demonized by the mainstream press, and this book humanizes him quite a bit. It’s hard not to think of him in the Ecuadorian Embassy in London (where he will have been for 1500 days tomorrow) quietly adding footnotes to the transcript, and archiving web content.

OR Books role in printing this content on paper, for bookshelves everywhere is another aspect to this process of replication. Hats off to them for putting this project together.

Here’s some musical accompaniment to go along with this post:

Languages on Twitter.

There have been some interesting visualizations of languages in use on Twitter, like this one done by Gnip and published in the New York Times. Recently I’ve been involved in some research on particular a topical collection of tweets. One angle that’s been particularly relevant for this dataset is language. When perusing some of the tweet data we retrieved from Twitter’s API we noticed that there were two lang properties in the JSON. One was attached to the embedded user profile stanza, and the other was a top level property of the tweet itself.

We presumed that the user profile language was the language the user (who submitted the tweet) had selected, and that the second language on the tweet was the language of the tweet itself. The first is what Gnip used in its visualization. Interestingly, Twitter’s own documentation for the /get/statuses/:id API call only shows the user profile language.

When you send a tweet you don’t indicate what language it is in. For example you might indicate in your profile that you speak primarily English, but send some tweets in French. I can only imagine that detecting language for each tweet isn’t a cheap operation for the scale that Twitter operates at. Milliseconds count when you are sending 500 million tweets a day, in real time. So at the time I was skeptical that we were right…but I added a mental note to do a little experiment.

This morning I noticed my friend Dan had posted a tweet in Hebrew, and figured now was as a good a time as any.

I downloaded the JSON for the Tweet from the Twitter API and sure enough, the user profile had language en and the tweet itself had language iw which is the deprecated ISO 639-1 code for Hebrew (current is he. Here’s the raw JSON for the tweet, search for lang:

{
  "contributors": null,
  "truncated": false,
  "text": "\u05d0\u05e0\u05d7\u05e0\u05d5 \u05e0\u05ea\u05d2\u05d1\u05e8",
  "in_reply_to_status_id": null,
  "id": 540623422469185537,
  "favorite_count": 2,
  "source": "<a href=\"http://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac</a>",
  "retweeted": false,
  "coordinates": null,
  "entities": {
    "symbols": [],
    "user_mentions": [],
    "hashtags": [],
    "urls": []
  },
  "in_reply_to_screen_name": null,
  "id_str": "540623422469185537",
  "retweet_count": 0,
  "in_reply_to_user_id": null,
  "favorited": true,
  "user": {
    "follow_request_sent": false,
    "profile_use_background_image": true,
    "profile_text_color": "333333",
    "default_profile_image": false,
    "id": 17981917,
    "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/3725850/woods.jpg",
    "verified": false,
    "profile_location": null,
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/524709964905218048/-CuYZQQY_normal.jpeg",
    "profile_sidebar_fill_color": "DDFFCC",
    "entities": {
      "description": {
        "urls": []
      }
    },
    "followers_count": 1841,
    "profile_sidebar_border_color": "BDDCAD",
    "id_str": "17981917",
    "profile_background_color": "9AE4E8",
    "listed_count": 179,
    "is_translation_enabled": false,
    "utc_offset": -18000,
    "statuses_count": 14852,
    "description": "",
    "friends_count": 670,
    "location": "Washington DC",
    "profile_link_color": "0084B4",
    "profile_image_url": "http://pbs.twimg.com/profile_images/524709964905218048/-CuYZQQY_normal.jpeg",
    "following": true,
    "geo_enabled": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/17981917/1354047961",
    "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/3725850/woods.jpg",
    "name": "Dan Chudnov",
    "lang": "en",
    "profile_background_tile": true,
    "favourites_count": 1212,
    "screen_name": "dchud",
    "notifications": false,
    "url": null,
    "created_at": "Tue Dec 09 02:56:15 +0000 2008",
    "contributors_enabled": false,
    "time_zone": "Eastern Time (US & Canada)",
    "protected": false,
    "default_profile": false,
    "is_translator": false
  },
  "geo": null,
  "in_reply_to_user_id_str": null,
  "lang": "iw",
  "created_at": "Thu Dec 04 21:47:22 +0000 2014",
  "in_reply_to_status_id_str": null,
  "place": null
}

Although tweets are short they certainly can contain multiple languages. I was curious what would happen if I tweeted two words, one in English and one in French.

When I fetched the JSON data for this tweet the language of the tweet was indicated to be pt or Portuguese! As far as I know neither testing nor essai are Portuguese.

This made me think perhaps the tweet was a bit short so I tried something a bit longer, with the number of words in each language being equal.

This one came across with lang fr. So having the text be a bit longer helped in this case. Admittedly this isn’t a very sound experiment, but it seems interesting and useful to see that Twitter is detecting language in tweets. It isn’t perfect, but that shouldn’t be surprising at all given the nature of human language. It might be useful to try a more exhaustive test using a more complete list of languages to see how it fairs. I’m adding another mental note…

Inter-face


Image from page 315 of “The elements of astronomy; a textbook” (1919)

Every document, every moment in every document, conceals (or reveals) an indeterminate set of interfaces that open into alternate spaces and temporal relations.

Traditional criticism will engage this kind of radiant textuality more as a problem of context than a problem of text, and we have no reason to fault that way of seeing the matter. But as the word itself suggests, “context” is a cognate of text, and not in any abstract Barthesian sense. We construct the poem’s context, for example, by searching out the meanings marked in the physical witnesses that bring the poem to us. We read those witnesses with scrupulous attention, that is to say, we make our detailed way through the looking glass of the book and thence to the endless reaches of the Library of Babel, where every text is catalogued and multiple cross-referenced. In making the journey we are driven far out into the deep space, as we say these days, occupied by our orbiting texts. There objects pivot about many different points and poles, the objects themselves shapeshift continually and the pivots move, drift, shiver, and even dissolve away. Those transformations occur because “the text” is always a negotiated text, half perceived and half created by those who engage with it.

Radiant Textuality by Jerome McGann