Unknown to most, IBM has one of the world’s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found in the IBM Journal of Research and Development, number 1, 2009, . It is called “IBM System z10 Firmware Simulation”, by Körner et al.
A common question from simulation users to us simulation providers is “can I simulate a machine with N cores”, where N is “large”. As if running lots of cores was a simulation system or even a hardware problem. In almost all cases, the problem is with software. Creating an arbitrary configuration in a virtual platform is easy. Creating a software stack for that arbitrary platform is a lot harder, since an SMP software stack needs to understand about the cores and how they communicate.
Essentially, what you need is a hardware design that has addressing room for lots of cores, and a software stack that is capable of using lots of cores — even if such configurations do not exist in hardware. Unfortunately, since software is normally written to run on real existing machines, there tends to be unexpected limitations even where scalability should be feasible “in principle”.
Here is the story of how I convinced Linux to handle more than two cores in a virtual MPC8641D machine.
The best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a global shared memory, as well as a means for the processors to interrupt each other in order to notify about the presence of a message or synchronize in general. Getting this really right turned out to be not so easy.
I have an article about ecosystem enablement for new hardware, co-authored with Richard Schnur of Freescale published in the December 2008 issue of EDA Tech Forum. The core concept is that a virtual platform solution makes it possible to get a new chip to market faster with better software support, and even enables virtual design-in of a chip at OEM customers before hardware becomes available. The article builds on our joint experience with the QorIQ P4080 launch in the Summer of 2008, where we had several operating systems and middleware packages in place at the moment the chip was announced. EDA Tech Forum requires registration, but it was still free, and there are many other good articles available.
Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.
In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.
Now I am home again, and some days have passed since the IP 08 panel discussion about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a good job of moderating and keeping things interesting.
I have an article appearing in the latest issue of Elektronik i Norden, about using virtual platforms for multicore computer systems. It is framed in the context of the Freescale multicore push, in particular the QorIQ P4080, and addresses the common issues of debug, execution speed, and the need to zoom in on details every once in a while.
On Wednesday this week, I will take part of a panel discussion about virtual platforms and using them for software development, at the IP08 conference in Grenoble in France. We have a good crew, including Markus Willems from Synopsys, Peter Flake from ELDA, and Loic le Toumelin from TI (who I have not met before).
I just read an interesting paper from the 2004 Embedded System’s Conference (ESC) written by Gary Stringham. It is called “ASIC Design Practices from a Firmware Perspective” and straddles the boundary between hardware design and driver software development. It was good to see someone take the viewpoint of “how you actually program a hardware device is as important as what it does”. Gary seems to understand both the hardware design and implementation view of things, as well as that of the embedded software engineer. To me, that seems to be a fairly rare combination of skills, to the detriment of our entire economy of computer system development.
I just read a fairly interesting book about the British Spitfire fighter plane of World War 2. The war bits were fairly boring, actually, but the development story was all the more interesting. I find it fascinating to read about how aviation engineers in the 1930s experiment and guess their way from the slow unwiedly biplanes of World War 1 and the 1920s to the sleek very fast aircraft of 1940 and beyond. It is a story that also has something tell us about contemporary software development and optimization.
Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design “properly”. Quite an interesting exchange, and I think that Frank is a bit more right in his thinking about virtual platforms and how to use them. Read on for some comments on the exchange.
To continue from last week’s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.
There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on embedded PowerPC machines, and teaching myself how to do Linux device drivers in the process. This really brought out the best in virtual platform use.
Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In “Is Host-Code Execution History“, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.
However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.
Chip Design Magazine published an article by me in their August/September 2008, about Getting Software into the Hardware Design Loop. The article is about the technical and marketing aspects of how chip designers can get early feedback from software and systems designers, early in the hardware design process. The vehicle for this? Virtual platforms, obviously.
The article/editoral “Using virtual platforms to improve AdvancedTCA software development practice” is now up at CompactPCI and AdvancedTCA Systems, an online and paper journal for the rack-based market. It is about our experience at Virtutech in using virtual platforms to drive system and software development for “pretty large” target systems, even those based on standard hardware.
And really, there is no such thing as a standard embedded system. Even if you use a standard backplane and buy off-the-shelf boards and cards to put in it, the combination of cards and added mezzanine cards makes each system quite unique. If you could use completely standard PC hardware for your system with no custom additions or special IO units, the thing would in likelihood not actually be an embedded system.
I just read the panel interview at the start of the latest issue (Number 4, 2008) of ACM Queue. Here, you have Bryan Cantrill of Sun (the man behind dTrace) bemoan the difficulty of testing faults. In particular:
Part of the reason I’m interested in virtualization is as a development methodology. It has not delivered on this, but one of the things that I ask is can I use virtualization to automate someone pulling the Ethernet cable out of the jack? I can get a lot closer to simulating it if you let me create a toy virtual machine than I can running on the live machine.
Well, this already exists. It is a common feature to any virtual platform that is not a datacenter-oriented runtime engine like VmWare, Xen, LPAR, and its ilk. Doing fault injection is a primary use case for virtual platforms, especially for larger servers and systems featuring redundancy and fault tolerance.
I just read an opinion-provoking piece “Software developer attitudes: just get on with it” by Frank Schirrmeister, as well as the article “Life imitating art: Hardware development imitating software development” by Glenn Perry that he linked to. Both these articles touch on the long-standing question of who does development the “best” in computing. I have heard these arguments many times, where software developers think that there is something mythical about hardware development that makes things work so much better with much fewer bugs, and hardware people looking at the speed of development and fanciful fireworks of coding that software engineers can do. It could be a case of the grass always looking greener on the other side… but there are some concrete things that are relevant here.
In the July 2008 issue of IEEE Computer, there is short article called “In Praise of Scripting: Real Programming Pragmatism“, by Ronald P. Loui, a professor at Washington University (WUSTL). The article deals with the issue of what is the appropriate first language to teach new CS (Computer Science) students, and considers that a “scripting” langauge like Python or Ruby might be way better than Java (no doubt about that I think).
What can this teach us for the purpose of simulation and the creation of models of computer system hardware for the purpose of simulation? Maybe a fair bit…
As might be evident from this blog, I do have a certain interest in history and the history of computing in particular. One aspect where computing and history collide in a not-so-nice way today is in the archiving of digital data for the long term. I just read an article at Forskning och Framsteg where they discuss some of the issues that use of digital computer systems and digital non-physical documents have on the long-term archival of our intellectual world of today. Basically, digital archives tend to rot in a variety of ways. I think virtual platform technology could play a role in preserving our digital heritage for the future.
The book “Taxonomies for the Development and Verification of Digital Systems“, edited by Brian Bailey, Grant Martin, and Thomas Andersson, was published in 2005 by Springer Verlag. It is a legacy of the defunct VSIA, and presents an attempt to bring order to nomenclature and taxonomies in the chip design field (its scope is defined to be broader than that, but in essence, the book is about SoC design for the most part).
In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all… is ARM right or am I right, or are we both right since we are talking about different things?
Let’s look at this in more detail.
I have another opinion piece published over at SCDsource.com. The title, “Why virtual platforms need cycle-accurate models“, was their creation, not mine, and I think it is a little bit off the main message of the piece.The follow-up discussion is also fairly interesting.
The key thing that I want to get across is that we need virtual platforms where we can spend most of our time executing in a fast, not-very-detailed mode to get the software somewhere interesting. Once we get to the interesting spot, we can then switch to more detailed models to get detailed information about the software behavior and especially its low-level timing. Getting to that point in detailed mode is impossible since it would take too much time.
This is something that computer architecture researchers have been doing for a very long time, just look at how toolsets like SimpleScalar and Simics with the Wisconsin GEMS system use fast mode for “positioning” and more detailed execution for “measurement”. It is also what is now commercial with the Simics Freescale QorIQ P4080 Hybrid virtual platform. Tensilica also have the ability to switch mode in their toolchain.
See an upcoming post for more on how to get at the cycle-accurate models – this was just to point out that that the article is there, for symmetry with previous posts about my articles popping up in places.
The slides from the Power Architecture Conference in München and Paris are now online (and have been for a few weeks) at the Power.org site for the event. Some interesting things there about Power Architecture in particular but also virtual platforms were an almost main theme of the show.
YouTube – Freescale QorIQ P4080 Hybrid Simulation is a video of a demo of the QorIQ P4080 hybrid simulation. Cool of Freescale to be publishing it like this, I think it is a very smart move!
Updated: Here is the video inline, let’s see if this works.
Only half an hour ago, the embargoes lifted. Freescale announced its new QorIQ series of multicore (and some single- and dual-core) processors. For the top-end of that line, the P4080, Freescale and Virtutech (where I work, remember) have developed a virtual platform solution to help Freescale customers get to working products faster. The virtual platform is available now, and is already running several operating systems including VxWorks, QNX, and a variety of Linuxes. Apart from the fairly large scale of this SoC, the really new part of the virtual platform is the so-called Hybrid solution, where the fast models are combined with detailed models from Freescale themselves. This creates a cycle-level detailed model with validated timing, “from the source” — but without the performance issues of having to run everything at great level of detail. Rather, you use the fast model to steer the simulation of a workload to an interesting spot, and then turn up the level of detail then and there. You can also select which components of the chip are actually detailed and which parts are modeled with the fast functional models, avoiding the incredible slow-down of running and entire virtual platform at a great level of detail.
If you happen to be at the FTF in Orlando, do come by and look at the demos!
I have been involved in this work for the past year, and it is wonderful to finally see it coming out and be able to talk about it.
As a follow-up to my previous post on the scope of ESL, I found a nice tidbit in an EETimes article… basically saying that hardware design is declining inside the typical system houses.
SystemC TLM-2.0 has just been released, and on the heels of that everyone in the EDA world is announcing various varieties of support. TLM-2.0-compliant models, tools that can run TLM-2.0 models, and existing modeling frameworks that are being updated to comply with the TLM-2.0 standard. All of this feeds a general feeling that the so-called Electronic System Level design market (according to Frank Schirrmeister of Synopsys, the term was coined by Gary Smith) is finally reaching a level of maturity where there is hope to grow the market by standards. This is something that has to happen, but it seems to be getting hijacked by a certain part of the market addressing the needs of a certain set of users.
There is more to virtual platforms than ESL. Much more. Remember the pure software people.
Edit: Maybe it is more correct to say “there is more to virtual platforms than SoC”, as that is what several very smart comments to this post has said. ESL is not necessarily tied to SoC, it is in theory at least a broader term. But currently, most tools retain an SoC focus.
Being a bit of a computer history buff, I am often struck by how most key concepts and ideas in computer science and computer architecture were all invented in some form or the other before 1970. And commonly by IBM. This goes for caches, virtual memory, pipelining, out-of-order execution, virtual machines, operating systems, multitasking, byte-code machines, etc. Even so, I have found a quite extraordinary example of this that actually surprised me in its range of modern techniques employed. This is a follow-up to a previous post, after having actually digested the paper I talked about earlier.
On Tuesday next week, I will be presenting at the Power Architecture Conference (PAC) in München, Germany. The topics will be multicore debug using virtual hardware, and the new Simics Accelerator technology. Especially Simics Accelerator is pretty interesting technology.
It is a simple idea, using multiple host cores to run a virtual platform, with fairly amazing results. Now, using a single computer we can run fairly incredible simulations that were the realm of pure fantasy just a few years ago. We also got a nice new little box to demonstrate it with, an eight-core Dell with 16 GB of RAM. With 64-bit Linux, this thing makes my Core 2 Duo laptop with 32-bit Vista look like yesteryear’s snail… And creates that giggling feeling that a really impressive new toy brings up in even the most grown up boys. Booting a 16-machine network of PowerPC boards was so fast it was not demoworthy. I think we have to up the ante to some 100 target machines to make it interesting, and I have no doubt that a combination of multithreading and idle-loop optimization will make that thing be usefully interactive from the target command lines. There are many other wild things we could try on that demo box, once it gets back from the Power Architecture Conferences tour.