The Side of Pompidou Center
by Frank Kovalchek

Last week Stephen Wolfram opened the FEDLINK Innovation Talk series at the Library of Congress. Wolfram is truly an intellectual maverick (check out his Wikipedia page) and kind of an archetype for what is typically meant by the word genius. I haven’t read A New Kind of Science, or used Wolfram Alpha or Mathematica very much. Perhaps if I knew more of the details behind his thinking I wouldn’t have left the talk feeling a little bit disappointed.

I would have liked to hear him reflect on what motivated him to spend 25 years building a platform to make “knowledge computable”. He clearly had a vision of this work, and it would’ve been fun to hear about it – and where he sees this type of work in 25 years. Perhaps some discussion of whether there were boundaries to making knowledge computable, and if knowledge itself can be thought of without human intention as part of the equation. It would also have been interesting to hear a few more technical details about the platform itself and how it is orchestrated. Maybe I wasn’t listening closely enough, but what we got instead was a lot of pointing, clicking, typing and talking to Wolfram Alpha: showing off how it could answer questions like a good reference librarian–and some quite funny jokes along the way.

But, to be fair, he did mention a few interesting things, especially during the (too brief) question and answer period at the end. Here are the things I took mental note of, that I still remember a week later.

I hadn’t heard of the Computable Document Format before, but I guess it’s been around for a few years. CDF is Wolfram’s own custom file format that makes data visualizations interactive.

… the CDF standard is a computation-powered knowledge container—as everyday as a document, but as interactive as an app.

It seems to work as a browser plugin that bridges to Mathematica. One nice side effect of CDF is that it makes the underlying data available. Another side effect is that any CDF document that is composed with FreeCDF is automatically licensed CC-BY-SA. You can also pay for EnterpriseCDF, which then provides more licensing options, as well as the ability to add what looks like DRM to CDF documents. The documentation talks about it being a standard, and Wikipedia says it has an assigned MIME type (application/cdf), but I can’t seem to find a specification for it, or even a registration of the mime type at the IETF. Considering the level of interactivity that documents have on the Web now, with open tools like D3 that sit on top of features from HTML 5 and JavaScript it’s hard to get terribly excited about CDF.

Wolfram also mentioned work on something he called the Wolfram Data Format. I can’t seem to find much information about it on the Web. It sounded like something akin to Resource Description Format, for describing entities and their attributes, and relations … and seemed to primarily be used for getting data into and out of the knowledgebase that Wolfram Alpha sits on top of. During the Q/A session someone asked about Wolfram’s views on Linked Data, and he knew enough about it to say that RDF wasn’t expressive enough for his needs. He wasn’t terribly clear on how it wasn’t expressive enough: I remember an example about needing to express the position of Mercury and Venus at various points in a concise way. In my experience I’ve found that RDF gave me plenty of rope.

There is a Pro version of Wolfram Alpha that lets you do a variety of things you can’t do in the free version. The most interesting one of these is that it lets you upload your own data in a bunch of different formats for analysis by Alpha. Presumably this data could be added to the Wolfram Alpha knowledgebase, and help form what Wolfram called the Wolfram Repository.

The R word is pretty charged at my place of work, and I imagine it might be at yours too. Collectively, many dollars have been spent creating systems, certification guidelines, and research about what the digital repository might be. As with many e-this and e-that words, the word digital doesn’t really add so much to the meaning of digital repository as repository does. Wolfram Alpha defines repository as:

A facility where things can be deposited for storage and safekeeping

Historically, repositories of knowledge have been found in the form of libraries, archives and museums that are sometimes part of larger institutions like schools, universities, societies, governments, businesses, or personal collections. So Wolfram wants the research community to use Wolfram Alpha as a repository. The carrot here is that data that is uploaded to the Wolfram Repository will be directly citable in CDF documents. During his response about Linked Data, Wolfram commented on how often URLs break, and how they weren’t suitable for linking papers to their underlying data. The solution that he seemed to propose is that data would be citable as long as writers used his document format, editing tools, and repository.

Indeed, when asked about the role of libraries and the library profession Wolfram responded saying that in his view the role of the librarian will be to help educate people who have data, to help make it computable, by massaging it into the correct format. What he didn’t say (but I heard) was that the correct format was WDF, and that it would be made computable by pushing it into the Wolfram Alpha data repository.

Don’t get me wrong, I think his vision of a future for libraries that help researchers work with data is a compelling one. It’s an extension of a trend over the past 10-15 years where libraries have built statistical, textual or geographic data collections, that are made available with educational services around them. Certainly getting data into and out of Wolfram Alpha, and making it citable by CDF documents could be a component of this work.

But what was missing from Wolfram’s presentation was a vision for how we build data repositories collaboratively, across cultural, corporate and socio-political borders. There were glimpses of an amazing system that he has built, with algorithms and meta-algorithms for choosing them … but it wasn’t clear how to add your own algorithms, to introspect on the decisions that were being made, and see the sources of data that were used in its computations.

Above all, I didn’t hear Wolfram describe how his platform includes the Web as an essential part of its architecture. I know I’m biased towards the Web, but Tim Berners-Lee’s enduring insight is that the design of the Web needed (and still needs) to be open. Sometimes open systems can seem ugly (hence the picture of the Pompidou Center above) since they show you the guts of things. Occasionally things can get nasty when parties have opposing interests. But it’s extremely important to try. How do we build a future where libraries, archives and museums collect locally and build repositories of data for systems like Wolfram Alpha, Wikipedia, Google’s Knowledge Graph, Facebooks OpenGraph in a sustainable way? Libraries aren’t well-equipped to build these types of systems themselves, the state of the art is always changing. But these institutions ought to be in a good position to serve as trusted partners, tied to the interests of particular knowledge communities, that can help make data available to the systems like Wolfram Alpha.