I am just finishing off reading the chapters of the Processor and System-on-Chip Simulation book (where I was part of contributing a chapter), and just read through the chapter about the Tensilica instruction-set simulator (ISS) solutions written by Grant Martin, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS solutions, since that they have an inherently variable target in the configurable and extensible Tensilica cores. However, the more interesting part of the chapter was the discussion on system modeling beyond the core. In particular, how they deal with interrupts to the core in the context of a temporally decoupled simulation.
This is a small detail, but one where I have always had a feeling that some fundamental assumption was missing in my discussions with various people from the hardware design community. It always seemed that hardware designers assumed a different basic design – and Grant Martin explained it very well just what that was. They only check for interrupts at the beginning of a time slice. Which makes interrupts less precise versus the code, but also makes the core interpreter fairly simple since all it has to do is to churn through instructions.
There is another solution, which is employed in Simics, where the processor can take an interrupt at any point in a time quantum. To do this, the processor needs to be aware of what is going to happen. The essentials of the solution is to have devices call the processor and tell it that they intend to interrupt it at some point T in time. The processor simulator then makes sure to stop and give the device model a chance to act at that exact point in time. It is obvious that this solution is easily generalized to cover all time callbacks needed to drive device work. A significant part of the responsibility for running the event-driven simulation is moved into the processor core.
Making the event queue visible to the processor also gives the processor a chance to hypersimulate, or skip idle time. Since it knows the next point in time that something will happen (either the end of a time quantum or an event posted by a device), it can very easily, safely, and repeatably jump forward in time without any impact on simulation semantics.
When dealing with multiple processors, this means that each processor will have precise interrupts from the devices that are close to it. Timers and IO interrupts tend to work closely with a certain processor for a prolonged period of time. Interrupts between processors suffer a time-quantum delay sometimes, but that is no worse than the solution of checking all interrupts at time-quantum boundaries.
Qemu uses a solution which is a mix of the two. According to the 2005 Usenix paper, devices do call into the processor to announce an interrupt, but this is handled by “soon” returning to the processor main loop. Processors are not responsible for keeping track of interrupts, making it very imprecise and not very repeatable when interrupts will happen.
Thus, we can see that there are a few different ways to implement interrupts in virtual platforms. Each approach comes from a different tradition and features different trade-offs.
I was a bit surprised by the comment in the Tensilica chapter that only correctly synchronized programs will work on a temporally decoupled simulation. In my experience, temporal decoupling is transparent to software functionality – all software runs. The perceived timing of operations can be different, and some tightly-coupled code might behave in suboptimal ways, but it certainly runs and works. And lets you observe parallel code errors.
Temporal decoupling is necessary in any fast platform, and its effect on semantics are really minor. With the simple tweak of having a processor know when interrupts might happen, it will also not affect the device-processor interface very much, maintaining very tight synchronization between processors and their controlled hardware.
The statement that only correctly synchronised programmes work on a temporarily decoupled simulation is really meant to contrast proper handling of synchronisation with an assumption that synchronisation works only with totally cycle accurate simulation – in a multiprocessor system. For example, suppose that correct programme behaviour relies on event A occurring on processor X before event B occurs on processor Y. (And yes, some people do write software with these kinds of implicit timing assumptions). If you run it on a temporally decoupled simulation, out of order, and where “cycles” no longer are cycle accurate, and if you use methods such as direct memory interface in place of modelling memory transactions accurately over a bus, then you might end up with event A occuring on processor X after event B occurs on processor Y. In that case, the system simulation might go very wrong – and often will. This is because synchronisation is assumed based entirely on “correct” timing, rather than on something more explicit and formalised. Is this good software practice? No – but software can (and all too often is) written with these kinds of implicit assumptions in it. So yes, the software runs, but I would question whether it always “works”. That is what we meant by “correct synchronisation” although we clearly were not clear enough in stating it.
Thanks for the other kind words.
Grant Martn
Yes. With that kind of timing-dependent coding anything but a 100% accurate system model might lead to failure. And any change to the system will lead to failure in the real world… sounds more like the old-fashioned way of coding discussed at http://jakob.engbloms.se/archives/1360