I just found a recent paper on the topic of parallel simulation of computer systems. Christopher Schumacher et al., published an articles at CODES+ISSS in October of 2010 talking about “parSC: Synchronous Parallel SystemC Simulation on Multicore Architectures“. Essentially, parallel SystemC.
This is very much a hot topic: for the past few years, everyone has been looking for ways to run various forms of simulators in parallel. We had some good discussions on this only last Wednesday at a seminar at KTH where I was presenting about Simics.
The approach taken in this paper is different from what you find being done in tools like Simics (as I briefly discussed at MCC 2009 and SiCS Multicore Days 2008). They do not exploit temporal decoupling or islands with different local time. Instead, they have a single global clock in the entire simulation, and just parallelize the work that is done during each cycle.
The key for this to be beneficial and practical is that the work done per cycle is far greater than the cost to drive the simulation forward “between” cycles. In a high-level TLM model where the work per cycle might be as small as a single host instruction (JIT translation of a simple integer instruction from target to host), it is obvious that this approach would not work at all. However, this work explicitly targets clock-cycle-level simulations, where the work per cycle per hardware unit can be very large. The paper discusses actions that take 1000 to 2000 host cycles per step, and at that level of effort, there is definitely some potential for parallel gain.
What is nice with the approach is that they do peg semantics to a sequential reference, which does aid debugging. Due to the very tight synchronization, it would seem to be deterministic, at least on the same host (the SystemC kernel can theoretically behave differently on different hosts).
They do have one example that is simulating a shared-memory multiprocessor using temporally decoupled CPU models (100 target cycles per invocation, probably 1000 to 10000 host cycles). This achieves fairly neat speedups on a very symmetric case. However, this comes at the cost of making the simulation nondeterministic – even for the single-threaded case which is pretty scary.
Overall, an interesting paper showing that there is more to be discovered in parallel simulation.