This week in my qualitative methods class we all looked for qualitative datasets on the Web in various data repositories and considered how easily they could be reused (secondary analysis). Because of their confidential nature, qualitative studies are harder to find. Even when pseudonyms are used it can often be very easy to identify individuals and organizations in interviews, and observational data.

I felt like I cheated a bit because of the dataset I chose, since it was specifically created for reuse. I was a bit surprised that it seemed difficult to track the data set upstream, to see how data sets were cited in the literature. But there were some links. It was a literal academic exercise, but I think I may be revisiting this dataset at some point.


Dataset

Joyce, Mary, António Rosas, and Philip N. Howard. Global Digital Activism Data Set, 2013. ICPSR34625-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-06-12. http://doi.org/10.3886/ICPSR34625.v2 or http://digital-activism.org/projects/gdads/

Article

Edwards, Frank and Howard, Philip N. and Joyce, Mary, Digital Activism and Non‐Violent Conflict (2013). Available at SSRN: http://ssrn.com/abstract=2595115 or http://dx.doi.org/10.2139/ssrn.2595115

Summary:

This dataset was created by the Digital Activism Research Project at the University of Washington in order to help build a collective understanding of the effects of digital technology on political phenomena around the world, and guide future data collection efforts. The study is particularly aimed at broadening the prevalent research focus on cyberwar and cyberterror to include how Information and Communications Technology (ICT) are deployed to further civic engagement, and other non-violent political practices. It also aims to provide a broad-based qualitative and quantitative dataset for an emerging field of study that cuts across a variety of disciplines, and is wider in scope than the more prevalent individual case study approach. They did publish some initial findings based on the dataset in a project report which found:

  • digital activism is civil, non-violent and rarely involves malicious computer hacking
  • Facebook and Twitter dominate global activism, but there is a long tail of of regional uses
  • diversity of tool use has a positive impact on the success in activist campaigns (no Facebook or Twitter revolution)

But this study is unique because it inverts the usual positioning of the scholarly research article and the underlying dataset, by making the dataset the primary outcome, with an attendant report.

Reflection on Data:

The data was assembled 40 coders, using hundreds of cases of reviewed news stories by professional and citizen journalists, between 1982 and 2012. The data itself is comprised of two datasets:

  1. a collection of 1,180 coded cases from 151 countries collected through snowball sampling between 1982 and 2012. The coded dataset includes 51 properties.
  2. a collection of 426 coded cases from 100 countries from 2010 to 2012, that includes 28 properties.

The v2 dataset is more compact and concentrated, and the authors believe it to be of higher quality because their criteria for what is activism became more refined. I think in some ways the v1 dataset could be thought of as a pilot study. In addition to the Excel and CSV files for the cases and their codes, the data set also includes source files that map the case ID to the news story URL, and also details about how the story was discovered.

Secondary Research Questions:

The resesarchers put a great deal of effor into building and packaging this dataset for reuse. The report includes lots of details about sampling strategy and inclusion criteria. It even has a CC-BY-NC license, so it is clear how it can be resused without needing to necessarily contact the authors. Since the study used Global Voices Online, MobileActive.org, InformationActivism.org and Wikipedia as sources for the news articles I would be interested in doing some analysis of the comments or discussion around the articles. I am particularly interested in the role that the Wikipedia articles played as gatekeepers to the news stories. Were multiple language Wikipedia’s used? Was there a core set of users that were actively enriching articles with links to the media? I think this could be performed using documentary analysis, similar to the larger dataset.