There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on embedded PowerPC machines, and teaching myself how to do Linux device drivers in the process. This really brought out the best in virtual platform use.
I keep looking out for interesting examples of parallel software, and there is constant trickle of these. This past week I spotted a couple of new ones in the EDA field: SPICE simulation and chip timing analysis.
Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In “Is Host-Code Execution History“, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.
However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.
Chip Design Magazine published an article by me in their August/September 2008, about Getting Software into the Hardware Design Loop. The article is about the technical and marketing aspects of how chip designers can get early feedback from software and systems designers, early in the hardware design process. The vehicle for this? Virtual platforms, obviously.
And really, there is no such thing as a standard embedded system. Even if you use a standard backplane and buy off-the-shelf boards and cards to put in it, the combination of cards and added mezzanine cards makes each system quite unique. If you could use completely standard PC hardware for your system with no custom additions or special IO units, the thing would in likelihood not actually be an embedded system.
There were some interesting comments on how to define efficiency in a world of plentiful cores. The theme from my previous blog post called “Real-Time Control when Cores Become Free” came up several times during the talks, panels, and discussions. It seems that this year, everybody agreed that we are heading to 100s or 1000s of “self-respecting” cores on a single chip, and that with that kind of core count, it is not too important to keep them all busy at all times at any cost. As I stated earlier, cores and instructions are now free, while other aspects are limiting, turning the classic optimization imperatives of computing on its head. Operating systems will become more about space-sharing than time-sharing, and it might make sense to dedicate processing cores to the sole job of impersonating peripheral units or doing polling work. Operating systems can also be simplified when the job of time-sharing is taken away, even if communications and resource management might well bring in some new interesting issues.
So, what is efficiency in this kind of environment?
Everybody seems to think the launch of the Google Chrome browser is very important and cool. Probably because Google itself is considered important and cool. I am a bit more skeptical about the whole Google thing, they seem to building themselves into a pretty dangerous monopoly company… but there are some interesting architectural and parallel computing aspects to Chrome — and Internet Explorer 8, it turns out.
Part of the reason I’m interested in virtualization is as a development methodology. It has not delivered on this, but one of the things that I ask is can I use virtualization to automate someone pulling the Ethernet cable out of the jack? I can get a lot closer to simulating it if you let me create a toy virtual machine than I can running on the live machine.
Well, this already exists. It is a common feature to any virtual platform that is not a datacenter-oriented runtime engine like VmWare, Xen, LPAR, and its ilk. Doing fault injection is a primary use case for virtual platforms, especially for larger servers and systems featuring redundancy and fault tolerance.
What can this teach us for the purpose of simulation and the creation of models of computer system hardware for the purpose of simulation? Maybe a fair bit…
As might be evident from this blog, I do have a certain interest in history and the history of computing in particular. One aspect where computing and history collide in a not-so-nice way today is in the archiving of digital data for the long term. I just read an article at Forskning och Framsteg where they discuss some of the issues that use of digital computer systems and digital non-physical documents have on the long-term archival of our intellectual world of today. Basically, digital archives tend to rot in a variety of ways. I think virtual platform technology could play a role in preserving our digital heritage for the future.
The book “Taxonomies for the Development and Verification of Digital Systems“, edited by Brian Bailey, Grant Martin, and Thomas Andersson, was published in 2005 by Springer Verlag. It is a legacy of the defunct VSIA, and presents an attempt to bring order to nomenclature and taxonomies in the chip design field (its scope is defined to be broader than that, but in essence, the book is about SoC design for the most part).
In early July, Cadence announced their new “C2S” C-to-silicon compiler. This event was marked with some excitement and blogging in the EDA space (SCDSource, EDN-Wilson, CDM-Martin, to give some links for more reading). At core, I agree that what they are doing is fairly cool — taking an essentially hardware-unrelated sequential program in C and creating hardware from it. The kind of heavy technology that I have come to admire in the EDA space.
In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all… is ARM right or am I right, or are we both right since we are talking about different things?
I have another opinion piece published over at SCDsource.com. The title, “Why virtual platforms need cycle-accurate models“, was their creation, not mine, and I think it is a little bit off the main message of the piece.The follow-up discussion is also fairly interesting.
The key thing that I want to get across is that we need virtual platforms where we can spend most of our time executing in a fast, not-very-detailed mode to get the software somewhere interesting. Once we get to the interesting spot, we can then switch to more detailed models to get detailed information about the software behavior and especially its low-level timing. Getting to that point in detailed mode is impossible since it would take too much time.
See an upcoming post for more on how to get at the cycle-accurate models – this was just to point out that that the article is there, for symmetry with previous posts about my articles popping up in places.
The slides from the Power Architecture Conference in München and Paris are now online (and have been for a few weeks) at the Power.org site for the event. Some interesting things there about Power Architecture in particular but also virtual platforms were an almost main theme of the show.
Only half an hour ago, the embargoes lifted. Freescale announced its new QorIQ series of multicore (and some single- and dual-core) processors. For the top-end of that line, the P4080, Freescale and Virtutech (where I work, remember) have developed a virtual platform solution to help Freescale customers get to working products faster. The virtual platform is available now, and is already running several operating systems including VxWorks, QNX, and a variety of Linuxes. Apart from the fairly large scale of this SoC, the really new part of the virtual platform is the so-called Hybrid solution, where the fast models are combined with detailed models from Freescale themselves. This creates a cycle-level detailed model with validated timing, “from the source” — but without the performance issues of having to run everything at great level of detail. Rather, you use the fast model to steer the simulation of a workload to an interesting spot, and then turn up the level of detail then and there. You can also select which components of the chip are actually detailed and which parts are modeled with the fast functional models, avoiding the incredible slow-down of running and entire virtual platform at a great level of detail.
If you happen to be at the FTF in Orlando, do come by and look at the demos!
I have been involved in this work for the past year, and it is wonderful to finally see it coming out and be able to talk about it.
SystemC TLM-2.0 has just been released, and on the heels of that everyone in the EDA world is announcing various varieties of support. TLM-2.0-compliant models, tools that can run TLM-2.0 models, and existing modeling frameworks that are being updated to comply with the TLM-2.0 standard. All of this feeds a general feeling that the so-called Electronic System Level design market (according to Frank Schirrmeister of Synopsys, the term was coined by Gary Smith) is finally reaching a level of maturity where there is hope to grow the market by standards. This is something that has to happen, but it seems to be getting hijacked by a certain part of the market addressing the needs of a certain set of users.
There is more to virtual platforms than ESL. Much more. Remember the pure software people.
Edit: Maybe it is more correct to say “there is more to virtual platforms than SoC”, as that is what several very smart comments to this post has said. ESL is not necessarily tied to SoC, it is in theory at least a broader term. But currently, most tools retain an SoC focus.
Being a bit of a computer history buff, I am often struck by how most key concepts and ideas in computer science and computer architecture were all invented in some form or the other before 1970. And commonly by IBM. This goes for caches, virtual memory, pipelining, out-of-order execution, virtual machines, operating systems, multitasking, byte-code machines, etc. Even so, I have found a quite extraordinary example of this that actually surprised me in its range of modern techniques employed. This is a follow-up to a previous post, after having actually digested the paper I talked about earlier.
On Tuesday next week, I will be presenting at the Power Architecture Conference (PAC) in München, Germany. The topics will be multicore debug using virtual hardware, and the new Simics Accelerator technology. Especially Simics Accelerator is pretty interesting technology.
It is a simple idea, using multiple host cores to run a virtual platform, with fairly amazing results. Now, using a single computer we can run fairly incredible simulations that were the realm of pure fantasy just a few years ago. We also got a nice new little box to demonstrate it with, an eight-core Dell with 16 GB of RAM. With 64-bit Linux, this thing makes my Core 2 Duo laptop with 32-bit Vista look like yesteryear’s snail… And creates that giggling feeling that a really impressive new toy brings up in even the most grown up boys. Booting a 16-machine network of PowerPC boards was so fast it was not demoworthy. I think we have to up the ante to some 100 target machines to make it interesting, and I have no doubt that a combination of multithreading and idle-loop optimization will make that thing be usefully interactive from the target command lines. There are many other wild things we could try on that demo box, once it gets back from the Power Architecture Conferences tour.
By means of a trip down virtualization history, I found a real gem in 1969 paper called A program simulator by partial interpretation, by Kazuhiro Fuchi, Hozumi Tanaka, Yuriko Manago, Toshitsugu Yuba of the Japanese Government Electrotechnical Laboratory. It was published at the second symposium on Operating systems principles (SOSP) in 1969. It describes a system where regular target instructions are directly interpreted, and any privileged instructions are trapped and simulated. Very similar to how VmWare does it for x86, or any other modern virtualization solution.
The Register has a few podcasts in addition to their website, and the one called “Semicoherent Computing” has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to their interview from September 2007 with David Ditzel of Transmeta fame. He had a lot to say about the history of computing, as well as interesting things on where computing is going. Well worth a listen! Particular interesting highlights…
Now the ESC SV 2008 is over. I really enjoyed going to the show this year, and presenting on simulation for embedded systems. The topic has to be heating up, I had some fifty people listen to the talk, which is really very good. Hope that they learnt how to build good transaction-level hardware models, and have some idea on how to apply this to their own projects. Hopefully, I can come back next year for the ESC 2009 (update: this did not happen) and do it again (even though the recent travel trouble makes it a less attractive idea to fly back here right now…).
It must have been Google Alerts that send me a link to the HOTOS 2007 (Hot Topics in Operating Systems) paper by Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin called Compatibility is not Transparency: VMM Detection Myths and Realities. This paper is slightly less than a year old today, so it is old by blog standards and quite recent by research paper standards. It deals with the interesting problem of whether a virtual machine can be made undetectable by software running on it — and software that is trying to detect it. Their conclusion is that it is not feasible, and I agree with that. The reason WHY that is the case can use some more discussion, though… and here is my take on that issue from a Simics/embedded systems virtualization perspective.
Power.org publishes a quarterly newsletter over at www.power.org/news/newsletter. In the April 2008 issue it features a short article by me introducing Simics 4.0 and Simics Accelerator, the way in which Virtutech Simics takes advantage of multicore processors to simulate large target systems using a multithreaded simulator.
I have an article at SCDSource.com, about how virtual platform creation needs to become more efficient. And the Virtutech current solution to that issue, DML, Device Modeling Language. There is no need to repeat the contents here, just head over to www.scdsource.com/article.php?id=166 to read it! I really think that DML has something to contribute in the world of virtual platforms. We need to find ways to be more efficient about how to create models, and that means creating a better programming language.
So what is SCDSource? Is is a quite good news and analysis site about the electronics industry, EDA, virtual platforms, and other themes close to my heart. SCDSource was started in October 2007, and have produced a series of good and interesting articles since. They tend to actually write articles and not just repeat press releases, and to report form interesting panels at events like DATE, ESC, and Multicore Expo.
It just dawned on me recently (and it sure must have been obvious to those working with configuring AMP — Assymtric Multiprocessing Systems) that in an AMP setup, the operating systems involved actually know about each other and have to account for the fact that they are sharing a single processor chip with other operating systems. So you cannot just take two single-core operating system images from an existing multiple-processor (local memory) solution and put them on a single chip and things just work. You do need to prepare the boot process and find a way to nicely share the common I/O devices, timers, accelerator engines and other resources on the chip. This is materially different from a virtualized setup.
Some more thoughts on how to program multicore machines that did not make it into my original posting from last week. Some of this was discussed at the multicore day, and others I have been thinking about for some time now.
One of the best ways to handle any hard problem is to make it “somebody else’s problem“. In computer science this is also known as abstraction, and it is a very useful principle for designing more productive programming languages and environments. Basically, the idea I am after is to let a programmer focus on the problem at hand, leaving somebody else to fill in the details and map the problem solution onto the execution substrate.