Debug, Design, and Microsoft Data – Observations from Uppsala

It used to be that Microsoft was the big, boring, evil company that nobody felt was very inspiring. Today, with competition from Google and Apple as well as a strong internal research department, Microsoft feels very different. There are really interesting and innovative ideas and paper coming out of Microsoft today. It seems that their investments in research and software engineering are generating very sophisticated software tools (and good software products).

I have recently seen a number of examples of what Microsoft does with the user feedback data they collect from their massive installed base. I am not talking about Google-style personal information collection, but rather anonymous collection of user interface and error data in a way that is more designed to built better products than targeting ads.

The first paper is “Debugging in the (very) large: ten years of implementation and experience” by Kinshumann et al, Communications of the ACM, July 2011. This paper describes how Microsoft uses of the data they collect from Windows Error Reporting (you know, the little dialog boxes that appear every once in a while on Windows when a program has crashed or frozen, or Windows restored from a crash).

Microsoft has a number of heuristics that look at the data collected, grouping the bug reports into buckets. Ideally, each bucket corresponds to a single root cause for possibly quite different errors. They automatically analyze the errors and generate metadata about the error reports that can be used to generate statistics and allow database queries to be performed over all collected error
reports. Heuristics include walking through chains of threads blocked on synchronization objects to determine which one is the actual cause of a hang, and finding the most likely thread and stackframe for containing the root cause of an error. Heuristics are applied both on the client and the server, but mostly on the server. Technically very hard to do right, I can appreciate the huge amount of work that has gone into engineering this.

With this huge pile of information, a new debugging method becomes available: statistics-driven bug finding and prioritization at large scale. The introduction to the paper puts it very well:

Beyond mere debugging from error reports, WER enables a new form of statistics-based debugging. WER gathers all error reports to a central database. In the large, programmers can mine the error report database to prioritize work, spot trends, and test hypotheses. Programmers use data from WER to prioritize debugging so that they fix the bugs that affect the most users, not just the bugs hit by the loudest customers. WER data also aids in correlating failures to co-located components. For example, WER can identify that a collection of seemingly unrelated crashes all contain the same likely culprit—say a device driver—even though its code was not running at the time of failure.

For a product manager like me, used to working with individual bug reports in bug reporting systems and trying to manually assess the importance of each error, this is nothing short of a dream. Instead of trying to guess how many users can be impacted by a bug, Microsoft can run queries against the error report database and get a fairly accurate idea of how common a certain error is in the user base. This has allowed them to address the most common errors first, leading to Windows and Office becoming more stable for more users in recent generations. They can also pinpoint which device drivers are causing the most issues, and putting pressure on vendors to clean up their act.

I wonder where else you really apply this idea of statistical debugging. You need a large user base, in systems which are connected to the Internet so you can collect data, and who are comfortable with providing direct feedback to you as a vendor. Apparently, Apple has the same kind of feature built into iOS, with more than 100 million users which seem not to be too interested in strong privacy. Presumably, Google can do the same thing with Android, at least its use in phones. Mozilla has a crash reporter, so I guess it makes sense in the consumer space.

But when your user base counts in thousands of seats and half of these are in defense sector beyond air-gaps, it is harder to apply. Products that call home are not taken to kindly in the professional field, as secrecy and confidentiality is very important to big companies. Industrial embedded products like telecom infrastructure might have sufficient volume of code and computer hardware to form a basis for statistical reporting – as long as operators agree to provide the information to the hardware vendors.

Another example of how Microsoft makes use of their collected data is in UI design. The blog post “Improvements in Windows Explorer“, by Steven Sinofsky, from the Windows 8 blog discusses how Windows Explorer has evolved over the years, and how it is now getting a radical redesign based on usage data. Microsoft is an enviable position here, having collected information about what millions of users are doing. Definitely beats inspiration or trying with a few users in a classic user interface lab.

I have seen quite a few people criticize this blog post from a variety of angles – from the fact that they are not data-driven enough and keep rarely-used buttons in the ribbon to the fact that they remove somebody’s favorite function. It is also the case that the measurements can only tell you which functions people are using from what is available today – if you want to invent new things, data like this might not be very helpful.

Fortunately, Microsoft also seems to have taken a clue from Linux and is allowing much more user customization than before. For me, this is great news, as I seem to have a user profile quite far from the mainstream. We have not seen Windows 8 in its final form just yet, but hopefully this approach will be applied to other parts of that GUI overhaul too. There are professional Windows users who need an OS that makes even very esoteric operations easy to access, and customizations of things like the start menu possible. Hopefully, we do not get washed away by the flood of data from regular users.

For some reason, I feel that bug reporting is not as sensitive to the user style as GUI design – Windows and driver bugs would seem to be more evenly distributed as they depend more on hardware than on software. At least it seems to me that Windows is more stable today
than it was a couple of years ago.

One thought on “Debug, Design, and Microsoft Data”

qwerty says:

2011 November 21 at 15:19

“At least it seems to me that Windows is more stable today
than it was a couple of years ago.” – Well said
Windows 7 freezed on me only one time in like 2 1/2 years.
Linux desktop is a joke and Android crashes even in the simulator.
When i experienced a planned maintenance electrical power outage recently it felt weird.
The silence and darkness were refreshing. I started to long the preelectricity days : no constant humming or blinking, everything organic without a button in sight.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

One thought on “Debug, Design, and Microsoft Data”

Leave a Reply