Last year (2015), a paper called “Don’t Panic: Reverse Debugging of Kernel Drivers” was presented at the ESEC/FSE (European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering) conference. The paper was written by Pavel Dovgalyuk, Denis Dmitriev, and Vladimir Makarov from the Russian Academy of Sciences. It describes a rather interesting approach to Linux kernel device driver debug, using a deterministic variant of Qemu along with record/replay of hardware interactions. I think this is the first published instance of using reverse debugging in a simulator together with real hardware.
The twist in the paper is to use real hardware devices with a target OS running on a virtual platform. In this way, the kernel drivers for the hardware are running inside a controllable environment with much greater debug access than running directly on the hardware. It also avoids having to model the hardware inside of Qemu, which is nice in case you already have the hardware. While most of the target system is being simulated in Qemu, access to particular hardware devices are being forwarded to the physical hardware attached to the host machine. All replies and asynchronous inputs (also including network packets, serial port input from the user) are being recorded so that they can later be replayed. The virtual platform also regularly saves snapshots or checkpoints of its state.
Reverse debugging is achieved in the standard way of jumping back to a snapshot and replaying from that point forward. Gdb is used to control reverse debugging. Setting up kernel module debug for the target system requires manually finding the load address of the drivers, which is the common case too, unless you manage to use OS awareness to figure out the load address from the outside. Note that you cannot use an on-target gdb agent to debug the kernel in this fashion, as all code on the target will be reversed.
The case study is based on USB, which seems reasonable. With USB, the virtual platform can indeed see all transactions between the driver and the real hardware device. Generalizing this to devices with DMA like PCIe I guess is very difficult, since intercepting inbound DMAs is not normally part of what a PCIe system does. It would thus seem limited to devices with well-behaved interfaces.
Analysis and Questions
One thing that is not clear from the paper is whether the replay is done within the same session, or whether it can be used to replay a bug in a new Qemu session? I.e., is this a pure reverse debug solution, or also a record-replay solution?
Also, precisely how is determinism in the core of Qemu achieved? Since Qemu often uses “wild” modeling techniques like using a host timer instead of virtual timers, something must have been done here. Presumably, all input from such (fairly unsound from a determinism perspective) implementation mechanisms end up being recorded too.
The paper does not provide details on the virtual target system, but I guess this is only for a single processor in the target system. Would this work for multiple processors? My gut feeling is that it would not, at least not without tightening up how multiple processors are scheduled.
There would seem to be some interesting time sensitivity in the interaction with the host hardware. The virtual platform might well have different timing behavior from an actual real platform. That does not prevent record-replay from working, but it might affect observed behaviors. As long as the virtual platform is “fast enough” this is likely OK.
On the other hand, if we built a virtual platform model of the hardware, timing would both be synched up automatically and trivially repeatable across runs. It would work for any type of hardware, regardless of the nature of the interface. It would be available before hardware, and without requiring access to the physical hardware. On the other hand, such modeling is really only reasonably performed by the silicon vendor. If you are an external party developing a driver for a piece of hardware you have available, using the actual hardware unit makes a ton of sense and this kind of approach could be a really nice helpful addition to the toolbox. Similar things can be done with Simics, using record-replay while communicating with the real world.
The cited performance numbers for recording (up to 10% slowdown) and replay (which takes 2x to 3x more time than executing without replay) make sense. However, the performance given for Simics is the paper is not really correct for this context – the WODA 2013 paper cited that claims Simics is 40x slower than Qemu is not reporting on relevant Simics performance numbers. In the paper, they measure Simics speed with a memory tracer attached, which is a fairly large drag on Simics performance – it forces the simulator to drop back to interpreter mode. When not using tracing, Qemu and Simics speed tends to be comparable – which is not surprising given that they both use JIT technology for non-Intel targets and some form of virtualization for Intel targets. Simics can also use virtualization to speed up execution with record/replay, providing very fast recording runs to be fed into a separate reverse debugging run. Disclosure: I work for Intel, in the Simics team. Your performance from Simics will always depend on the use case, target system, host system, and nature of the software load.
Summary
In summary, the approach described here is an interesting mix of virtual machines and real hardware. Recording the hardware interface of the real hardware and replaying into a virtual machine achieves record-replay and reverse debugging for drivers without having to model the hardware. There are both limitations and benefits to this kind of approach, but overall it is a nice addition to the set of reverse debugging research projects.
One thought on “Reverse Debug with Hardware in the Loop”