Reverse History Part Two – Research

This is the second post in my series on the history of reverse execution, covering various early research papers. It is clear that reverse debugging has been considered a good idea for a very long time. Sadly though, not a practical one (at the time). The idea is too obvious to be considered new. Here are some of the papers that I have found, going back before reverse debugging got started for real in actual products (around 2003) as well later on for interesting research papers that did not make it into products.  It is worth noting that products/useful software has become more common in recent times as the way that reverse debugging ideas get expressed.

When searching through the literature using the ACM Digital Library (and some Yahoo and Google web searches), I find quite a large body of research literature dating to the late 2000s. Apparently, the appearance of commercial reverse debuggers have inspired research into the topic – and our increasingly powerful machines have made heavier solutions feasible (at least for single programs).

Disclaimer – this is not really an exhaustive survey of the field, but some of the more interesting papers that I have stumbled on.

This post has been updated several times after its first publication, to include papers I have found in the meantime. Last update 2017-11-02, to include the Jockey paper from 2005.

1973. The first actual mention of reverse debugging I have found is a short 1973 Communications of the ACM article called “Reversible Execution” by Marvin Zelkowitz. Basically, this paper proposed to apply a backtracking facility built into a PL/I compiler to support AI research (a precursor to Prolog execution semantics from what it sounds like) to also replay parts of a normal program for debugging. This in essence would enable replay of a program execution, based on a log of how the program ran and the values of variables changed. Basically, a forward-only record-replay solution for user-level single-threaded programs.

1987. Thomas LeBlanc and John Mellor-Crummey, “Debugging Parallel Programs with Instant Replay“, IEEE Transactions on Computers. This paper covers record-replay debugging of parallel user-level programs on the BBN Butterfly. They record all interaction between processes (threads) by intrumenting the OS calls that are used to communicate and synchronize between processes. Programs also sometimes had to be changed so that their communication protocols allowed the instrumentation to work properly. Thus, the approach did not apply to unchanged code either at source or binary level. Processes still compute their results, only the order or interactions are enforced. In this way, the overhead was kept usefully low. The approach does not work well for programs that deal with asynchronous interrupts and hardware interaction by the software, since there is no good way to hook and replay such events in their implementation.

The most important concept here is the use of re-execution of code to reconstruct values of a program, assuming deterministic computation as long as asynchronous interactions can be controlled. This is the basis of all reverse execution approaches based on checkpointing and re-execution. Not sure if they invented the idea, but this is the earliest distinct mention I have seen of this.

1988. Stuart Feldman and Channing Brown, “IGOR: a system for program debug via reversible execution“, 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging (PADD) . This paper looks at the problem of how to debug user-level single-threaded code on a UNIX-like system. The programs need to be recompiled to add a small amount of support, in particular to help with restarting
execution. It is essentially a record-replay approach, but capable of placing the execution of a program at any point in its past execution. They save regular checkpoints during program execution (every tenth of a second or so seemed appropriate in their testing). The checkpoints only include the data that has been changed since the previous checkpoint (they added an OS facility to mark all pages that a program writes between checkpoints). A unique feature is the ability to look at the value of a variable as it is stored in successive checkpoints, and to “go to the point in time where variable X has value V” (assuming a variable is monotonically increasing or decreasing, like an outermost loop counter). To do the detailed positioning, they use an interpreter for the target system, essentially mixing native and interpreted execution in their debugger.

1995. (Update since the original posting of this blog post) An unnamed effort by Michael Elizabeth Chastain implemented a record-replay debugger for Linux programs, based on instrumenting all kernel calls. See my blog post from 2016 for what few details there are.

1996. Mark Russinovich (who I believe is the man behind the recent novel “Zero Day“) and Bryce Cogswell, “Replay for Concurrent Non-Deterministic Shared-Memory Applications“, PLDI 1996. In this paper, instrumented system libraries and binary code modification is used to enable deterministic replay of multithreaded applications. The binary modification is used to implement an instruction counter to provide a logical time-base, and instrumented system libraries are used to force a serialized (but interleaved) execution of the program. It does not run the program in parallel on a multiprocessor, but rather runs it on  a single processor. User-level, instrumented and intrusive, parallel on single processor.

2000. Bob Boothe, “Efficient Algorithms for Bidirectional Debugging“, PLDI 2000. I actually saw this paper being presented live at the PLDI conference in Vancouver. At the time, I was working on embedded compilers and debuggers, and our little group from IAR looked at what was being presented at said “yeah, might work on a big desktop, but you can’t do that in a real machine”. Why was that the case? Essentially, what Boothe did was to use the Unix fork call to spawn off checkpoints of a program as it was executing. This was a smart way to achieve checkpointing using existing OS facilities. In addition, he recorded the results of system calls to replay them during replay.

Time was given by counters in target program (compiled-into it as part of the debugging build). Most of the paper was spent on basic algorithms for how to do “previous” (step back 1 line), “before” (inverse of finish), “buntil” (backwards until, looking at data conditionals to determine when to stop). His use of “backwards” rather than “reverse” as the name of the concept does make this paper a bit harder to find. While limited to a single threaded user-level process, this paper is probably the first to introduce reverse debugging in the sense of the primary user interface being the ability to step backwards and check for breakpoints backwards in time. A good idea in the UI design was the undo command that jumped the debugger back to the last position where you stopped. Having worked with reverse debugging quite a lot myself, this operation makes a lot of sense as an addition to standard run-control commands.

2002. Tankgut Akgul and Vincent Mooney, “Instruction-level Reverse Execution for Debugging“, at PASTE 2002. This paper is an odd one, as it was neither really reverse nor practical. The approach is based on doing a static analysis of the assembly code of a program to generate code that could reverse operations based on some logging of destructive operations. Interesting, but not really practical as I see it. One noteworthy tidbit in the paper is the measurement that logging only destructive operations is some 50 x more efficient than logging all changes.

2002. George Dunlap, Samuel King, Sukru Cinar, Murasa Bazrai, and Peter Chen, “ReVirt: enabling intrusion analysis through virtual-machine logging and replay“, at OSDI 2002. This paper uses the User-Mode Linux (UML) paravirtual machine to checkpoint and replay an operating system. I think this is the first use of a true virtual machine to achieve reverse execution, even though it is a paravirtual machine rather than a full virtual machine. The replay starts from a single checkpoint at the start of execution of the guest system, so moving to a certain point in time can be very time-consuming. It does support reverse breakpoints. To move to a certain point in time, they count executed instructions, not actual time. The use of a paravirtual solution precludes the use of many real device drivers and debugging issues related to interaction with most real devices. It is uniprocessor, system-level, unmodified target (except the change to make the OS itself run paravirtualized on top of UML).

2003. Bil Lewis and Mireille Ducasse, “Using events to debug Java programs backwards in time“, OOPSLA 2003. This paper targets programs running on top of a Java Virtual Machine, resulting in an Omniscient Debugger. They instrument the target program to log all state changes. A big lock is used to serialize all threads into a single interleaved execution. The implementation is admitted to be almost worst-case inefficient, but the debugger was still considered useful for some real work. The debug UI does feature backwards breakpoints, so it qualifies as actual reverse debugging.

Note that in principle, backwards debugging and record-replay should be possible to implement quite efficiently for a language virtual machine like the JVM, if you can modify the VM itself. This has indeed been done today, for example by Microsoft for their CLR and in the use of full-system simulators (and other simulators) to implement reverse debugging.

2005. Yasushi Saito. Jockey: “A user-space library for record-replay debugging”. HP Labs technical report 2005-46, and also published at USENIX 2005. The system mentioned in this paper, Jockey, seems to have disappeared since. The idea was to insert a special library into Linux user-space programs that would record them and allow replay in a deterministic manner. It worked by patching non-deterministic instructions (RDTSC) and with intercept code for operating-system calls. It is some ways an ancestor to Mozilla RR. Jockey did not do reverse debugging, only deterministic replay. It only worked on single-threaded programs, with is a rather big limitation. What is interesting is the target application: it was built to help debug a distributed system consisting of a large number of separate processes communicating over the network. In such a context it makes a lot of sense to record, replay, and debug one process in isolation – not much value is typically gained from trying to build a “global” view across multiple processes, and it is fundamentally really hard to debug such a process as the system is running live. This type of program with external input dependencies causing non-deterministic and non-repeatable execution is an excellent case for record-replay and reverse debug.

2005. Samuel King, George Dunlap, and Peter Chen, “Debugging Operating Systems with Time-Traveling Virtual Machines“, USENIX 2005. This paper builds on the foundation of ReVirt, but adds checkpoints during a run to make the replay to a certain point in time more efficient. They also introduce the differential storing of disk state in the checkpointing (just like done in Simics). Overall, it is very similar to the replay approach taken by VmWare a few years later. Still single-processor. The paper contains a nice list of example bugs for which reverse debugging work much better than cyclic debugging. They also reference Simics as something that might do something with reverse, but do not quite know what (this is right after the launch of Simics reverse execution as discussed in the next post).

2006. Sanjay Bhansali, Wen-Ke Chen, Stuart de Jong, Andrew Edwards, Ron Murray, Milenko Drinic, Darek Mihocka, and Joe Chau, “Framework for Instruction-level Tracing and Analysis of Program Executions“, Virtual Execution Environments (VEE), 2006. This paper presents the underlying tracing and replay system that is used in the WinDbg Time Travel Debugger that was released in 2017.

The approach used is to apply binary translation (using a toolkit called Nirvana) to the code during recording, to get a handle on all memory access. By adding this interpreting layer, it is possible to accurately trace multiple simultaneously executing threads (this might be the first system to do that). The same toolkit can also be used to re-execute the code for analysis, provided with the code, initial state, and any values read from memory during the execution of a snippet. On top Nirvana, they build a tool called iDNA that will write and read traces to allow an execution to be repeated. The traces are portable across hosts, which is very handy for record-replay debugging.

The approach works for multithreaded user-level programs on multicore machines, running a Windows operating system. It does not require access to program source code, and it can handle code-generating runtimes and self-modifying code.

2007. Paul Brook and Daniel Jacobowitz, “Reversible Debugging“, GCC Developer’s Summit 2007. This paper presented some early work on implementing reverse debugging inside the Qemu full-system simulator. Logically, it follows in the steps of the Simics reverse debugger announced in 2005 (see the third post in this series of blogs). They use gdb as their frontend, and the paper introduced the idea of setting the default execution direction of the debugger. This UI choice was later adopted in the reversible gdb 7.0 released in 2009. The paper contains a good discussion of some of the practical issues involved in performing operations like reverse-step to go back a line, or stepping back up through a function call. It is a good explanation of the basic problems.

The actual reversing is implemented in Qemu using a checkpoint and deterministic replay, along with logging of any operation that is not deterministic. Their main test case is actually using the semihosting variant of Qemu where IO calls are intercepted and passed on to the host. Just like all other such solutions, they record the return values and do not repeat side-effects. It is system-level, single processor, unmodified target software, and time is based on counting executed instructions.

The Qemu architecture makes enforcing reversibility across all devices fairly difficult, since each device manages its own interaction with the user. One interesting tidbit in the paper is that Qemu by default triggers timer interrupts based on host time, but these had to be made fully virtual to be reversible.

2015. (Update since the original posting of this blog post) Pavel Dovgalyuk, Denis Dmitriev, and Vladimir Makarov, “Don’t Panic: Reverse Debugging of Kernel Drivers“. An approach using record-replay of hardware interactions from an OS running inside a Qemu virtual amachine, allowing for reverse debug of drivers that talk to a physical piece of hardware rather than a simulated hardware device.

Summary The above is a small selection from all the papers published on this topic, but I think they capture the most common methods used. Basically, almost all of the approaches require some kind of change to the target software to enable reversibility. Interestingly, the first few commercial solutions for reverse debug that appeared did not take that approach, but rather targeted unmodified programs using a different set of fundamental techniques. See the next blog post for that part of reverse history.