The Adversarial Approach to Compilation

Paul Henning-Kamp has written a series of columns for the ACM Queue and Communications of the ACM. He is  pointed, always controversial, and often quite funny. One recent column was called “The Most Expensive One-Byte Mistake“, which discusses the bad design decision of using null-terminated strings (with the associated buffer overrun risks that would have been easily avoided with a length+data-style string format). Well worth a read. A key part of the article is the dual observation that compilers are starting to try to solve the efficiency problems of null-terminated strings – and that such heavily optimizing compilers quite often very hard to use.

He has a wonderful line from an anonymous Convec C3800 programmer who accounted his experience with the optimizing compiler as “having to program as if the compiler was my ex-wife’s lawyer“. I can recognize and understand the sentiment. The C language, in particular, is full of corner cases where behavior is “undefined” or “implementation dependent”. The exact behavior of some statements is also pretty complex – and as a compiler writer, you end up having to work with the rules of the language like a lawyer works the law – the thinking is not all that different. And if you take this approach to the extreme, the description of the compiler as an adversarial lawyer is spot-on.

Unfortunately, it is quite easy in C (and trivial in C++) to wander into such territory without noticing and without doing anything that is obviously wrong for an average programmer. For example, the program I described in my recent post about a bug demo is strictly speaking using undefined behavior, and the compiler is theoretically free to generate a program that reformats the harddrive or does nothing at all. In practice, I don’t think compiler designers strive to generate truly creative unexpected results. Rather, if you detect unclear statement in the program, you would emit a warning.

Still, I can identify with this kind of adversarial relationship with a compiler. It really feels like being in a game with a rules lawyer, where you read the rules (programming language book) and try to figure out how to play the better than your opponent.


3 thoughts on “The Adversarial Approach to Compilation”

  1. Funny coincidence — I just submitted a paper where one of the experiments involves a compiler that gives a random result whenever the compiled program executes an undefined behavior relating to integer operations (divide by zero, shift past bitwidth, signed overflow). Perhaps unsurprisingly, quite a few programs in SPEC INT 2006 stop working as expected. Of course, satan’s compiler is a conforming C and C++ implementation.

  2. If we take it as a given that compilers are going to silently exploit undefined behaviors, there’s only one way forward: we need tools that let users know about undefined behaviors. Of course these exist, but only for smallish subsets of the ~200 undefined behaviors in C99. I don’t even know of anyone who claims to have a comprehensive list of undefined behaviors in C++11 (unlike C99, the standard lacks a list). So anyway: tools are the answer and they lag far behind the compilers’ ability to exploit undefined behaviors.

  3. John Regehr : So anyway: tools are the answer and they lag far behind the compilers’ ability to exploit undefined behaviors.

    Scary, but true. But the alternative, Ada-style restrictive behavior, is probably just as bad. A bit like memory consistency models – too rigid, and they are safe but slow. Too weak, and things are fast but impossible to use completely correctly. We need some better-defined middle ground it seems.

    Thanks for the information, interesting with the Spec behavior. As you say, not surprising.

Leave a Reply

Your email address will not be published. Required fields are marked *