I find debugging a very interesting topic of study, so when I stumbled on the paper “Studying the advancement in debugging practice of professional software developers”, I had to do a blog on its contents.
The Paper and the Surveys
As stated already, the paper is called “Studying the advancement in debugging practice of professional software developers”, by Michael Perscheid, Benjamin Siegmund, Marcel Taeumel, and Robert Hirschfeld. Published in the Software Quality Journal, March 2017 (Volume 25, Issue 1).
The work appears to have been performed in 2013 or so, judging from the reference from a previous publication of a subset of the work in 2014. This version of the paper was written in 2015 it seems, finished in early 2016. Thus, it is a few years old. But on the other hand, does debugging change much over time?
The paper collects information collected by two methods: a series of in-depth interviews with programmers at four German firms, along with an online survey of a few hundred programmers (also in Germany). This is not a big enough study to make statistically valid generalizations – but who cares about that? I view debugging as a craft, and the observations were sufficiently similar that we definitely can learn something from the paper.
The companies they interviewed all built web applications, which means that there is less information on OS-level and close-to-the-machine errors. It is worth noting that just because you use a language that is higher-level than C/C++ does not mean you necessarily get rid of bugs.
The paper notes that most real projects use a large variety of languages, which agrees with my experience. The companies they interviewed each listed some ten different languages each, including mini-languages like SVG. The survey gave a list of 64 different programming languages in use across some 300 answers!
The interviewed developers presented a rather consistent set of methodologies for tackling bugs that show up at run-time:
Search. One common thread across many programmers is the usefulness of easy source code navigation and search. Many programmers interviewed would search the source code to find the location of a failure (which is non-trivial for a large software project).
Search for similar. Another use of search is to find code similar to what fails, and comparing them to see if there are differences between them. This provides a quick way to find common gotchas in the code.
Recent changes. Another common method was to use the fact that source code management systems can point out what has changed recently. If a new bug is encountered, it does seem reasonable to see what changed recently and could have introduced the bug.
All these methods build on the core value of being able to read code when doing debug. Reading code for debug is a bit different from reading code in the abstract, though – when debugging, you are usually trying to understand what the code does under specific circumstances, given some information about the state of the program.
As an aside, I see these three code inspection methods being used extensively by the developers that I work closely with. When I find a new bug in Simics, I tend to go talk to the people responsible for a particular subsystem, and search + recent changes usually find the suspect piece of code in short order.
Controlled experiments. In both the interviews and the survey, a common method for debugging was to set up experiments to try to narrow down a bug. I think this makes a ton of sense and it is one of the most common methods I apply myself when trying to understand what is going on and/or providing a nice reproducer for a bug report. Changing a single input to a command or function or program run is a great way to figure out just what is causing something to happen. For some reason, this kind of “bracketing” is one of the hardest things to do well for inexperienced programmers.
Logging? It is worth noting that the developers interviewed were not terribly keen on logging, instead preferring to use symbolic debuggers and step source code. This surprises me since I find logging being the most important debug tool for complex systems, but maybe their bugs had a stronger tendency to stay within the confines of one program or piece of software. In the survey part, it should be noted that “printf()” was the most widely known method, but that debuggers were used slightly more often. Formal asserts were also common.
Reverse debuggers were viewed skeptically; the interviewed developers felt that rerunning a program was a sufficient methodology for working backwards. Which is surprising, but probably indicates that their software was more deterministic than most. In the survey, only about half of all participants had heard of them, and only some 10-15% has actually tried to use one. It seems the technology just wasn’t mature enough at the time for wide-spread pick-up.
Reproducing bugs was not mentioned explicitly in the paper, even though it is implicit in the comments around hard-to-fix bugs. In many cases, these were likely hard-to-reproduce bugs. I think this is a bit amiss, since to me reproducibility is one of the most common problems seen in practice. Concepts like reverse debugging and record-replay directly address this problem as one of the more interesting ones in deployed software. There was also no mention about the differences between the developer’s systems and the systems were problems happened – maybe this is less of an issue for web applications that you maintain yourself and software you ship to a wide variety of customer systems. Maybe.
Learning to Debug
For the interview part of the paper, debugging is presented as a practical skill learnt on the job, and without a formal frame to describe it. It is worth quoting what the paper says about the developers they interviewed, as I think it perfectly describes a craft rather than a science or even engineering discipline:
They learned debugging either by doing or from demonstration by colleagues. Not surprisingly, they have difficulties describing their approach. While they can speak about development processes in general on an abstract level, they resort to showing and examples when speaking about debugging.
In the survey part, about half the participants claimed to have learnt debug at the university in formal training. Most of them only did debugging in a single course, which is not a whole lot… (As also stated in the paper) There is a lot that could be done in formal training to help developers get better at debugging. One particular method that they cite and that I really like is the sharing of war stories – debugging seems to be a field were reading about how other people solved different problems provides an idea for how you might approach future bugs. A very good idea is to do this inside of an organization, maintaining “bug logs” that new developers on a team can learn from – it sounds like something definitely worth trying!
Bugs and Hard Bugs
In the survey part, they looked at the distribution of easy and hard bugs. As could be expected, most bugs are easy, but the hard bugs can be really hard. Debugging is a common part of everyday software development, but it does not seem to dominate their time.
The hard bugs tended to take more than a week to fix, and some were never fixed at all. The hard bugs were categorized. The most common categories of hard bugs were:
- Design faults – the bug was due to a mistake in the overall design. These were the most common hard bugs and would point to the usefulness of somehow being better at testing designs and specifications at an early stage.
- Parallelism– as commonly observed, parallel behaviors are harder to debug than serial code.
- Memory errors – the “typical” problems with memory management.
- Vendor issues – external code from vendors. I would suspect this gets more common as we start to adopt more and more libraries in our coding (I highly recommending the article “Surviving Software Dependencies – Software reuse is finally here but comes with risks” by Russ Cox, in ACM Queue or Communications of the ACM for more on the potential mess we end up with when including code without thought).
I think all these hard bugs share the property that they go beyond just a single program or the programmer’s own code. Interactions between components is harder to grasp than a serial flow in a single piece of code, and when those interactions become timing-dependent or dependent on opaque black boxes things become a lot more complex. This is where logging comes in, at least in my book of standard methods.
The survey also pointed out that some bugs were so hard that they never got properly fixed, just worked around. There were also bugs that just went away, but no actual root cause was ever properly identified. A bit scary, but understandable in a complex system.
This paper was interesting, even with the inherent limitations of the methodology employed. Doing a broader study than just a few anecdotes is useful, and I learnt something from reading it.
Not mentioned above was the paper’s recurring references to scientific debugging – considering debugging as a scientific experiment, in the sense of observing what is going on, building an hypothesis about it, and doing experiments to prove or disprove the hypothesis. There is definitely something to that idea, but I am not sure just how core it is to debugging. It also goes a bit counter to one of my favorite principles, which is to “stop thinking and look”. Premature hypothesis formation might limit the avenues explored to understand the system behavior. Somehow, a scoping-down of an issue would seem like the core first step in debugging.
It was also refreshing not to have a discussion around Heisenbugs and Bohr bugs.
The paper contains some nice quotes and observations about debugging, such as:
- “Debugging is twice as hard as writing the program in the first place”, attributed to Kernighan and Plauger, Elements of Programming Style, from 1978. I would tend to agree that this is definitely the case.
- A hard bug is one where there are “large gaps between root cause and failure”, citing a 1997 Communications of the ACM paper by Eisenstadt.
- The same source says that “bugs that render tools inapplicable” are also hard.
- The authors discovered this in their survey: “many of the bugs remembered as the hardest are due to parallel execution and do not fit in the existing categories”. True.
- They cited one of my favorite debug texts of all time, the 9 indispensable rules by Agans.