Threading or Not as a Hardware Modeling Paradigm

gears-modelingTraditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.

In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.

As I see it, the main alternative modeling paradigm is to use a classic event-driven system, where all activity is triggered by events and run the associated code to completion. This makes execution occur in a series of simulation steps in various part of the system, rather than as a set of (pseudo) concurrent tasks.

Threaded Problems

The most common complaint with threading is performance. This has become very clear in the case of using SystemC for transaction-level modeling. All advice in how to do good and fast TLM coding tells us to use SC_METHODs, which are essentially callbacks that are not active objects in their own right. Note that SystemC models found in the wild are often built on SC_THREADs despite this advice, as that is the “easiest” way to do things. Some convenience systems part of the OSCI TLM-2.0 library also rely on threads to convert between AT-style asynchronous and LT-style synchronous function calls (which is pretty unavoidable, but not applicable in the realm of high-performance simulation for virtual platforms).

Furthermore, using threading as a paradigm (even cooperative single-active-thread cooperative threads like in SystemC or classic MacOS) bring with it the problems of concurrent programming, in that you suddenly need to care about protecting data structures against conflicting accesses, worry about deadlocks, and similar concurrent programming issues. Without threads, all such issues go away.

Note that using threading as a modeling paradigm with truly concurrent execution of models will make the execution have all the problems of parallel programs, especially non-deterministic execution and hard-to-find bugs. At least a cooperative multitasking system tends to be deterministic in the way it goes wrong.

Threading as a hardware model programming style therefore makes concurrent multithreaded simulation harder rather than easier to achieve. Especially if the semantics of the simulation system specifies an interleaved model of execution as the semantics, which is the case for SystemC. In this cases, there is no way to really make SystemC parallel without adding parallelism as some extra library.

However, one of the biggest practical problems with threading is the problem of inspecting, changing, and checkpointing simulation state. With threads, you end up having state stored in local variables on the stacks in the system, as well as in processor registers, the program counter, and other places that are hard to get to from the outside.  This is not just me saying this, I found this well said in the sampalib white paper :

Using threads means that part of the simulation state is in stacks, which may limit the ability to persist the state of the simulation in checkpoints.

Using wait() implies context switch which are costly in terms of simulation speed, and thus often discouraged in guidelines for modeling SystemC™ models

To furthermore drive this point, all librariesfor general program state serialization that I have seen (for C++ and Java, for example) also rely on explicit state stored in objects, and explicitly do not support the “transient” state held in local variables and the program counter. Essentially, only heap-allocated objects are handled in serialization solutions.

Event-Driven Solutions

An event-driven transaction-level hardware simulation is coded in a different way from a naive threaded implementation (but not that differently from a more sophisticated threaded program).

Each device model has to make its state explicit as a set of variables, and preferably also declare these for access for an external tool using something like GreenSocs GreenControl or Simics Attributes. It also has to expose a set of functions to be called when events happen or other devices in the simulation system send a transaction into the device model.

Additionally, you should encapsulate all state in a model inside the model object and not expose it for direct access from the outside. A pure object-oriented style with accessor functions for everything is required for best modularity.

The advantages of this model are clear:

  • Concurrency problems are reduced, since each function call will run to completion before any other object or function is activated. There is no need to worry about shared data variables, as they should not exist.
  • Checkpointing and inspection is facilitated, since all state is now explicit and declared.
  • Performance is typically increased, since there is no need to do context switches between threads. Locality is also increased by having functions run to completion before returning.
  • True concurrency is easier to achieve, since each model can quite easily be considered a local-state, shared-nothing, explicit message-passing component similar to Erlang threads. This makes it possible for the simulation scheduler to run multiple models concurrently on multiple host threads. For more on this topic, see my SiCS Multicore Days 2008 presentation on how Simics was threaded.

The downside is that some people consider the programming more complicated. Which is really a matter of appearance over substance: event-driven programming tends to be more robust and easier to follow in the long run, since threaded programming makes things a bit too implicit.

Here is the basic example of a thread that does some periodic work.

Threaded style:

  loop forever:
    do work...
    wait(some time)

Event-driven style, where we just repost an event each time we are called:

  do work...
  post event(some time, Time_callback)

Another advantage of event-driven models is that such a paradigm makes it clear that you need to be able to accept any call into the model at any time. This makes for more robust code, since it is quite easy to (intentionally or by mistake) encode an expectation on the sequence of activity in a threaded that might not be what actually happens at run-time. In particular, the state of any protocol being acted on will need to be explicitly rather than implicitly represented.

There is much more to be said on how to code in this style, but there are long papers out there to read on this.

High-Performance Event-Driven Simulation

Note that in high-performance virtual platform-style simulation, processors will usually be a special case in both threaded and event-driven styles. That is since the flow of instructions that they execute constitute very many very small actions that cannot affort a context switch between each. Here, the advantage of the event-driven model is even clearer, given some special-casing of processors. This is another long story that I will not reiterate here, but basically, most events as discussed above will be memory accesses from a processor to read and write device registers, and each such memory access can be handled in a single simulation step. No need to switch context or do anything but handle a simple function call. By not having a wait() call to deal with, this mechanism can be kept simple and cheap — which is essentially using an SC_METHOD in SystemC. But in the complete absence of SC_THREADs and their ilk, many other things can be optimized even better.

The End

What I wanted to provide in this almost-article-length post was an idea for the problems that I see threads cause as a modeling paradigm for hardware models, and the advantages offered by a reactive event-driven style. For some reason, this is misunderstood in the modeling community at large, probably because most operating systems and simulation systems in common use today present various forms of threads as the way to model concurrent behavior. However, threads as a prominent user-level programming model are known to be bad in many ways… and modeling is no exception to this rule.

Note that I realize that threads are needed at some level in order to take advantage of multicore hardware, but I think they are best hidden inside a simpler framework that presents a simpler understandable semantics to the user.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.