These are some notes that I shared internally at work after attending It Will Never Work in Theory yesterday. The full list of presenters is near the bottom of this page. As you can see the presentations were largely from researchers who were studying software development practices. But the thing I really liked was that almost all the presenters made an effort to bring their findings back to how they could be applied in practice. The presentations were also short and very high quality. The event bridged the gap between research about software development and software development practitioners. Look for the videos to be released in the next few weeks. Here are the ones that jumped out at me while listening.

  • Alberto Bacchelli: studied code reviews on Github, and found that the ordering of the files matters (files nearer the bottom were commented on less). Annotations from the author can help guide review, and also reviewing in non-linear order. I think I’ve seen Justin Littman doing some of the former with PRs.
  • Mairieli Wessel: GitHub actions aren’t just a question of tooling, it is better to think of them as software dependencies, that can also change social interactions between developers.
  • Marian Petre: studied high performing dev teams and found that they treated bugs as opportunities for understanding their codebases better, rather than just as things that need to be quickly squashed. See http://oro.open.ac.uk/72243/1/SWSI-2020-04-0117.R2_Lopez.pdf and https://mitpress.mit.edu/books/software-design-decoded.
  • Davide Fucci: examined how teams that said they did TDD actually worked in practice. Instrumented IDEs to track what was being edited and when, and found that some worked for a long time on tests and a long time on the implementation and that things worked better when they were more granular (bit of test, bit of implementation). Also found that TDD can come at a emotional cost for developers.
  • Mei Nagappan: looked at bias in code review by examining 2.5M PRs on GitHub. Found very troubling indications that PRs by non-white classified developer are fewer, and less likely to be merged. Non-white developers were 10 times more likely to merge PRs from other non-white developers than white developers. The classification in method in this paper is potentially controversial – they used something called NamePrism, but justified it based on the observation that developers had biases about names too. https://arxiv.org/pdf/2104.06143.pdf
  • Kelly Blincoe: studied the effects of destructive criticism in code reviews. Destructive criticism generally understood as inconsiderate and/or non-specific comments. Asked developers to respond to constructive and destructive reviews, and inquired whether it would help improve code, whether it was valid, and if it was appropriate. Found that destructive criticism was often perceived as valid but that it wouldn’t help improve code and wasn’t appropriate. Destructive criticism tends to decrease task completion, and working with the other developer and on the health of the project. Devs agree destructive critiscism is harmful and will cause a negative reaction, but are divided about whether it is ok if it helps improve the code. There isn’t one size fits all approach to code review. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/8b1f394be0986d5e861431d20dd59c4fc875389e.pdf
  • Peggy Storey: researches the different motivations for measuring productivity and their flaws. Qualitative measures of satisfaction and well-being have significant impacts on product quality and velocity. https://dl.acm.org/doi/10.1145/3454122.3454124
  • Cat Hicks: Fostering learning culture on teams. People are always looking for clues to whether or not they are in a safe place for learning. Code that works requires continual learning. Learning Debt is when learning is necessary but discouraged, developers learning becomes hidden, covert and unhappy (padded in estimated, etc). This is a learned behavior, that then becomes a cycle. https://www.catharsisinsight.com/_files/ugd/fce7f8_2a41aa82670f4f08a3e403d196bcc341.pdf

The online commenting didn’t work, for which the organizers were very apologetic about. But I think some attendees managed to ask questions via email…which honestly is an interesting idea :) Someone commented how important GitHub was to doing some of this research, which really highlighted how important it is as a site for understanding software development. It would be interesting to see how/if anything changes when looking at organizational private repositories.

There are several things I’d like to follow up on here, especially how to create cultures of learning on teams, effective ways of thinking about errors as opportunities, and how to enrich review practices. I will definitely stay tuned for the next It Will Never Work in Theory, which provided a very accessible view into the use of empirical methods to study software development. It’s an area somewhere at the intersection of computer science and information studies, that I didn’t really get exposed to much in my schooling–maybe because I was focused more on the empirical study of data. But data is code, and code is data…