In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. Part one of this series provided a background on the technology and part two discussed various research papers on the topic going back to the early 1970s. The first commercial product featuring reverse debugging was launched in 2003, and then there have been a steady trickle of new products up until today.
1999. In 1999, Lauterbach launched the Context Tracking System, CTS. Lauterbach is a big player in the embedded debug market, with their TRACE32 debugger. CTS is based on a trace from a hardware unit or from an instruction-set simulator, and the essence of what CTS adds is the ability to go back in time and investigate past state. While the description is not quite clear on the capabilities, describing it as backwards step with no mention of backwards breakpoints. However, based on discussion with Lauterbach experts, it is clear that there is a “step backwards until condition holds” command available which is in essence enough to do reverse breakpoints. Thus, the Lauterbach CTS has to count as the first commercial reverse debug solution. It is cross-target, system-level, and uniprocessor like Time Machine.
2003. The embedded tools company Green Hills launched their Time Machine feature in their well-known MULTI debugger.
I consider this the start of commercial reverse debugging, as it was This was the first second commercial-grade product to include reverse debugging. The implementation was based on tracing the execution of a program on actual hardware, using a debug probe and a “JTAG” debug interface. The trace box would capture several gigabytes of execution data, and then the debugger performed operations based on this trace. To check a backwards breakpoint, you scan back over the trace until you find a matching state or operation (such a memory access or instruction address that is being executed). The main limitation of the method is that the trace buffer can only capture a few seconds of execution on a typical 100s of MHz embedded processor. It only works for a single processor, and it does not capture IO actions (except as memory-mapped IO). It is system-level, cross-target, and uniprocessor.
Extending this kind of trace to multicore has proven hard, since getting a synchronized trace out of several processors is very hard. There might be debug hardware coming out in the next few years that can indeed support a time-stamped consistent trace of multiple cores, and with such hardware, the Time Machine approach could well be extended into multicore.
2005. Simics 3.0 was launched by Virtutech (later acquired by Wind River and Intel) with full-system reverse execution and reverse debugging. The Simics approach was unique so far, being based on a full-system simulator. By simulating the entire target, it is trivial to reverse (and put reverse breakpoints on) changes to memory, persistent storage like disks, and hardware devices. Since all device models in the simulator are deterministic in their implementation, re-executing hardware events like interrupts and IO outputs is just as easy as re-executing code on the main processor, something that had eluded all previous approaches. Recording is used at the interface between the simulator and the outside world, such as user interaction over graphics displays and serial ports and connections to the real-world network. The software stack is unmodified and system-level, and the simulator can handle multiple processors and even multiple machines in a network as a unit. The use case is normally cross-target (even if a system identical to the host can be simulated, it would work like a cross target logically). Time is handled by counting clock cycles on all processors in the system, and reverse debugging can position the simulation at any point in time based on the virtual time.
There is a cost in execution speed from simulation rather than direct execution, and an intrusion effect from running on a simulator rather than on a physical machine. This affects the timing of events, even with a software stack that is not modified. Still, the fact that you can run a complete real software stack with no modifications needed before starting to run the target system is fairly rare in the world of reverse debuggers.
Simics shipped with a modified gdb that talked gdb serial to Simics and accessed reverse execution with some new debugger commands as well as extensions to the gdb serial protocol. This was offered to the gdb community, but not accepted. However, prompted by this, the gdb community started to discuss reverse execution. Some interesting old threads can still be found, such as http://sourceware.org/ml/gdb/2005-05/msg00225.html. Clearly, at that point in time Virtutech did not really explain how Simics worked, and there were some pretty bad proposals floated in the community for how to do reverse. In the end, the gdb serial design did turn out in the right way, assuming the remote debugger would reverse itself and gdb would just ask it to do so. This separation of concerns is important to creating practical reverse debugging solutions that can use any debugger backend.
2005. Also in 2005, Lauterbach launched the Context Tracking System, CTS. Lauterbach is a big player in the embedded debug market, with their TRACE32 debugger. CTS can be seen as their reply to the Time Machine debugger. CTS is also based on a trace from a hardware unit or from an instruction-set simulator. However, from the available information is also appears to be more limited – you can step back and go back in time and replay forward, but there is no mention of actual backwards breakpoints (even today, six years later). Thus, I count this as record-replay rather than reverse debug. It is cross-target, system-level, and uniprocessor like Time Machine.
2006. Undo Software launched the first Linux-targeting host-based reverse debugger, UndoDB. It is described as a bidirectional debugger (the same terminology as the Boothe 2000 PLDI paper). It is user-level, does do reverse breakpoints (and data breakpoints, also known as watchpoints, which is really useful). It handles multiple threads (at least in 3.0), but from the description of the recording technology used I believe they have to serialize their execution. The implementation is based on checkpoint and re-execution, with recording of all non-deterministic events like IO. There is a feature to move to a certain point in time, based on “simulated nanoseconds”. These are not really nanoseconds, but values which are guaranteed to increase even between two instructions (which probably means that they are sub-nanoseconds and on a > 1GHz CPU single-cycle instructions will indeed take less than one nanosecond).
There is a nice description of how it works on their online man page. It is worth noting that they call it “gdb”, but the command set is distinct from what gdb introduced with its reverse execution in 2009. They use the “b” prefix for backwards commands rather than “r” for reverse. In some way, UndoDB is in direct competition with the gdb reverse target, but it is much much faster and has more features.
2008. The Rogue Wave (at the time, it was an independent company) TotalView debugger gained support for reverse debugging, with the ReplayEngine add-on. TotalView is an old mainstay in the HPC market, having been around since at least 1993. Indeed, it was developed initially for the BBN Butterfly computer, and thus it might have had a touch with reverse execution as far back as the 1987 paper cited in my previous blog post.
Judging from the available materials, TotalView can clearly can step back in various ways. However, it is not clear that it triggers breakpoints when going backwards. Thus, it has to count as record-replay debugging rather than reverse debugging. The base of the implementation is extensive instrumentation of the the runtime system of the target computer. The implementation builds on the fact that the target programs tend to b clustered programs that use MPI to communicate – and thus a large part of the communication between threads is explicit and easily intercepted and recorded. There is also an existing infrastructure of checkpoint and restart for parallel programs using MPI to support fault tolerance that was used as the base of the implementation. Finally, in a slightly ugly hack, they make each multi-threaded program run on a single processor by a big lock. In this way, all that needs to be replayed is the interleaving of threads on a single processor, a far more tractable problem compared to trying to replicate a true parallel execution in a new session.
2008. VmWare officially launched a record-replay debugger based on their virtual machine technology with VmWare Workstation 6.5. Single-processor, system-level (but really only supported for user-level debugging), cross target (since the VM is not really the absolutely same hardware as the host), time model is based on the virtual machine which I believe is cycles-based. Mostly used for record-replay debug of non-deterministic software bugs, but could also do reverse debugging including reverse data breakpoints. Based on snapshot and deterministic re-execution, plus recording of all non-deterministic device accesses (not all devices in the VmWare hardware emulation layer are deterministic). Going back to a snapshot was a very heavy operation (I tried it) since you had to restore the entire target memory (quickly got into gigabytes). The hardware supported in the VM was quite limited, and things like CD-ROMs and floppies could not be part of a record/replay session. Replay logs could be moved between hosts.
The VmWare reverse debug functionality was removed from VmWare workstation version 8 in 2011, since it required a large investment and was not apparently used by very many VmWare users. This indicates that trying to build developer-oriented functionality into a technology base that is fundamentally driven by the need of deployed virtual machines was hard. There are contradictions between these two goals, as the determinism and control needed for a good reverse debugger is not necessarily consistent with maximum performance for running virtual machines in a production setting.
2009. gdb 7.0 added support for reverse execution (a work that began in 2006). The built-in “record” target supports reverse debugging on user-level single-threaded programs on the same host. The command set for reverse debugging is fairly full-featured, but is a bit quirky with a “set direction” command that makes regular run-control commands work in reverse. The record technology is quite slow since it basically records the effect of each and every instruction run in the program. It is really more similar to the technology underlying Green Hills time machine (complete trace of all operations), than to solutions that rerun code to reconstruct state, such as Simics, UndoDB, revirt, and the Bidirectional debugger.
In addition to its built-in target, gdb can also control external reversible debug systems over the gdb serial protocol. This made the changes to gdb-serial created by Virtutech for Simics in 2005 part of the mainline gdb release. Several tools support the command set, including VmWare, UndoDB, and Simics. There was also a set of MI commands added to basically let Eclipse use gdb as a backend for reverse debug, including using it to control external tools via gdb-serial. How this happened is quite a long story, and I made a small contribution to the gdb code base myself in the process. Read about this here.
2009. Eclipse CDT added support for reverse execution, using gdb 7.0 reverse as the initial backend. As noted above, this lets Eclipse also use other reverse debugging backends (Eclipse uses the gdb-MI interface to gdb to control the debug session). This is noteworthy since it meant that the buttons to control reverse execution are now part of the CDT, making it much easier to use Eclipse to build a frontend to any reversible backend. Eclipse is not really a debugger, just an interface to a debugger.
2009. Microsoft Visual Studio got record-replay debugging with IntelliTrace. It is strictly about replay debugging, including the nice ability to send traces around between developers. There are no backwards breakpoints. The support is limited to programs running on top of the .net runtime system, meaning that it does not apply to classic Windows software. Using the CLR virtual machine as the implementation basis should make the implementation easier, cleaner, and faster compared to a machine-level native solution. It is user-level, single-threaded, and host-based. Time concept is unknown. It seems to record only some part of the past state, meaning that not all system investigation operations can be performed in the past.
2011. Adobe demonstrated (but has not yet launched) reverse debugging in their Flash Builder programming environment. A nice video is posted on the Adobe website. Seems to be based on the virtual machine that flash runs on, and includes what looks like pretty powerful backwards data analysis tools. In a blog post, the developer describes some of the features, which to me seem to indicate some pretty heavy recording.
2011. VmWare removed support for reverse debugging.
2015. Mozilla rr was launched, as open-source modern implementation of user-mode reverse debug with explicit support for record-replay debugging.
2016. Simulics was launched. Closed-source commercial reverse debugger based on full-system simulation, for the embedded systems market.
In researching these commercial tools, I am pretty sure there is at least one that is lost. A company called Visicomp launched a Java debugger called RetroVue in 2002 which supposedly did allow backwards debugging in some way. However, it seems that this tool was not really practical, being too slow for actual use. It seems to have disappeared since without anyone picking up its legacy. The technology was apparently pretty much like the Omniscient Debugger presented in 2003 and which I described in the blog post on reverse execution research.