Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In “Is Host-Code Execution History“, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.
However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.
The problems are these:
Most companies that are using host-code execution today use “explicit access”. This means they require all places in the code that access the hardware to call read() and write() functions so every hardware access goes through a common set of functions and then they use #ifdef to change the hardware accesses to call the simulator if they are doing verification with host-code execution. If they are running on the target system, then pointer dereferences are used.
This is where implicit access came in. It provided a way to automatically trap pointer dereferences that were reading and writing to hardware locations and convert the load or store instruction into a simulated read or write. For reads it would put the result into the proper host CPU register and the user had no idea that a line of C code would magically turn into a bus transaction on a Verilog BFM
Yes, that is a right pain, and I have seen lots of solutions for it, none of which have the elegant simplicity of a processor simulation. The “implicit access” system is basically trying to trap memory accesses without overtly changing the source code of a program. I guess the best way to do this is binary instrumentation, but it is still very hard to get to work right and robustly. A simulator is simply much simpler in principle here.
Jason continues later on:
Given the hassle of host-code execution I would prefer to cross compile the software and run the target instruction set. Beyond the implicit or explicit access issue, this also eliminates issues with differences in data type sizes, data structure layout, byte order (endianess) and other differences between the host and target processor.
That is absolutely true! Jason does not mention the additional fun of what happens when the target is running an OS that is happily fielding interrupts, scheduling software tasks, etc. Also, that having to maintain a separate build target and maybe code variant is very expensive, process-wise. The expense that a good virtual platform incurs can be paid for pretty quickly once such reduced friction costs are factored in.
So I guess I pretty much agree with all that Jason is saying, and thanks him for mentioning Simics. Thanks for the insights into what was done in the 1990s, it always interesting to get pointers to old fundamental and interesting work.
About how the virtual platforms actually work inside: it is not that complicated in principle (but pretty hairy to get it quite right and fast in practice). You have to simplify the timing of the target processor, you have to convert from target processor binaries to host binary format using some kind of just-in-time compilation technique (also called dynamic binary translation or code morphing), and you have to provide some kind of direct access to target memory for the target processor simulation (like the DMI feature in SystemC TLM-2.0, but usually the difficult bits are on the CPU side of that, not the memory side). The most interesting bit is how to build the surroundign system model to not slow the CPU model down, and for this I can recommend a couple of pieces of writing: