Seminar Week 6

This week we dove into some readings about information retrieval. The literature on the topic is pretty vast, so luckily we had Doug Oard on hand to walk us through it. The readings on deck were Liu (2009), Chapelle, Joachims, Radlinski, & Yue (2012) and Sanderson & Croft (2012). The first two of these were had some pretty technical, mathematical components that were kind of intimidating and over my head. But the basic gist of both of them was understandable, especially after the context that Oard provided.

Oard’s presentation in class was Socratic: he posed questions for us to answer, which he helped answer as well, which led on to other questions. We started with what information, retrieval and research was. I must admit to being a bit frustrated about returning to this definitional game of information. It feels so slippery, but we basically agreed that it was social contruct and moved on to lower level questions such as: what is data, what is a database, what are the feature sets of information retrieval. The feature sets discussion was interesting because we basically have worked with three different features sets: descriptions of things (think of catalog records), content (e.g. contents of books) and user behavior.

We then embarked on a pretty detailed discussion of user behavior, and how the technique of interleaved data sets lets computer systems adaptively tweak the many parameters that tune information retrieval algorithms based on user behavior. Companies like Google and Facebook have the user attention to be able to deploy these adaptive techniques to evolve their systems. I thought it was interesting to reflect on how academic researchers are then almost required to work with these large corporations in order to deploy their research ideas. I also thought it might be interesting how having a large set of users who expect to use your product in a particular way might become a straight jacket of sorts, and perhaps over time lead to a calcification of ideas and techniques. This wasn’t a fully formed thought, but it seemed that this purely statistical and algorithmic approach to design lacked some creative energy that is fundamentally human – even though their technique had human behavior at its center as well.

I guess it’s nice to think that the voice of the individual matters, and we’re not just dumbing all our designs down to the lowest common denominator between us. I think this class successfully steered me away from the competitive space of information retrieval even though my interest in appraisal and web archives moves in that direction, with respect to focused crawling. Luckily a lot of the information retrieval research in this area has been done already, but what is perhaps lacking are system designs that incorporate the decisions of the curator/archivist more. If not I guess I can fall back on my other research area of the history of standards on the Web.

References

Chapelle, O., Joachims, T., Radlinski, F., & Yue, Y. (2012). Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems (TOIS), 30(1), 6.

Liu, T.-Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331.

Sanderson, M., & Croft, W. B. (2012). The history of information retrieval research. Proceedings of the IEEE, 100(Special Centennial Issue), 1444–1451.