There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification.
A common question from simulation users to us simulation providers is “can I simulate a machine with N cores”, where N is “large”. As if running lots of cores was a simulation system or even a hardware problem. In almost all cases, the problem is with software. Creating an arbitrary configuration in a virtual platform is easy. Creating a software stack for that arbitrary platform is a lot harder, since an SMP software stack needs to understand about the cores and how they communicate.
Essentially, what you need is a hardware design that has addressing room for lots of cores, and a software stack that is capable of using lots of cores — even if such configurations do not exist in hardware. Unfortunately, since software is normally written to run on real existing machines, there tends to be unexpected limitations even where scalability should be feasible “in principle”.
Here is the story of how I convinced Linux to handle more than two cores in a virtual MPC8641D machine.
This is a small Linux SMP programming tip, which I had a hard time finding documented clearly anywhere on the web. I guess people won’t find it here either, but with some luck some search engine will pick up on this.
Sometimes it is very reassuring that certain things do not work when tested in practice, especially when you have been telling people that for a long time. In my talks about Debugging Multicore Systems at the Embedded Systems Conference Silicon Valley in 2006 and 2007, I had a fairly long discussion about relaxed or weak memory consistency models and their effect on parallel software when run on a truly concurrent machine. I used Dekker’s Algorithm as an example of code that works just fine on a single-processor machine with a multitasking operating system, but that fails to work on a dual-processor machine. Over Christmas, I finally did a practical test of just how easy it was to make it fail in reality. Which turned out to showcase some interesting properties of various types and brands of hardware and software.