In my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning “unchanging” or “constant” behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.
One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without it. Not having checkpointing is like having a word processor where you only get to save once, when your document is finished, with no option of saving intermediate states.
But not everyone seems to consider this an important feature, judging from its relative rarity in the world of EDA and virtual platforms. Why is this? Let’s look at some possible explanations.
There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification.
In FLOSS Weekly issue 57, about 20 minutes into the show, Randall Schwartz and Leo Laporte express genuine surprise that the XMBC media player application is all in C++. That is pretty telling, some parts of the computing world are indeed moving on to more modern pastures like Python, Perl, Ruby, and even Objective C (for the Mac people). And quite a contrast to the EDA world where C++ is still considered the new shiny thing, as I have lamented before… thanks for that small but golden genuine surprise, Randall and Leo!
The 44th episode of the Stackoverflow podcast contains an interesting discussion on what I have liked to call “the tyranny of syntax”.They note that for some reason people are scared of anything that does not look like C, but still lament some of the less good design patterns in C, such as the fact that closing braces have no annotations as to what is being closed. They also talk about the use of “little languages”, and an old favorite song of mine.
Frank Schirrmeister of Synopsys recently published a blog post called “Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some Synopsys-Virtio Innovator examples of such uses. In his view, most appication-software is being developed using host-compiled techniques. I want to add to this refutal by adding that application-software is surely a very important — and large — use case for virtual platforms.
This is a small Linux SMP programming tip, which I had a hard time finding documented clearly anywhere on the web. I guess people won’t find it here either, but with some luck some search engine will pick up on this.
Edited on 2009-Feb-01, to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09.
Thanks to Simon Kågströms post (and the even better second-generation with screenshots) about using Eclipse for the Linux kernel, I have a much nicer work environment now for my ongoing work in learning Linux device drivers on PowerPC, which has helped me work my way through several hard-to-figure-out system calls. Continue reading “Eclipse Linux Kernel Indexing Works”
The best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a global shared memory, as well as a means for the processors to interrupt each other in order to notify about the presence of a message or synchronize in general. Getting this really right turned out to be not so easy.
Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.
In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.
My hosting service just told me to update to WordPress 2.7 — as the previous version had known security holes. So I did, and after I upgraded, the blog itself broke.
Just after the first post on the front page, there was a nasty error message:
Fatal error: Only variables can be passed by reference in .../functions.php on line ...
Now I am home again, and some days have passed since the IP 08 panel discussion about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a good job of moderating and keeping things interesting.
To continue from last week’s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.
I have recently discovered stackoverflow.com and I must say it is something I very much recommend. The idea is simple, and the details rich and interesting.
The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first.
Kunle Olukotun‘s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.
But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.
This might appear as a stretched analogy, but it struck as me as obvious when I tried playing the Lego Racers boardgame with my 3-year old this weekend. The game is ranked pretty low on Boardgamegeek, and deservedly so. The promise and premise is great: use Lego cars to race around a track and pick up new pieces to modify the powers of your car… sounds like great fun. Right? But it is not, and that’s where my analogy with the age of software comes in.
This was a refreshingly different post: Too Many Cores, not Enough Brains:
More importantly, I believe the whole movement is misguided. Remember that we already know how to exploit multicore processors: with now-standard multithreading techniques. Multithreaded programming is notoriously difficult and error-prone, so the challenge is to invent techniques that will make it easier. But I just don’t see vast hordes of programmers needing to do multithreaded programming, and I don’t see large application domains where it is needed. Internet server apps are architected to scale across a CPU farm far beyond the limits of multicore. Likewise CGI rendering farms. Desktop apps don’t really need more CPU cycles: they just absorb them in lieu of performance tuning. It is mostly specialized performance-intensive domains that are truly in need of multithreading: like OS kernels and database engines and video codecs. Such code will continue to be written in C no matter what.
The argument at core is that multicore is about performance, and performance optimization is generally something that we do prematurely rather than focussing on how to solve the core problem in the best way. You have to respect Jonathan Edwards, and often this is true: programmers optimize themselves into a horrible design that is also slow.
I just read an opinion-provoking piece “Software developer attitudes: just get on with it” by Frank Schirrmeister, as well as the article “Life imitating art: Hardware development imitating software development” by Glenn Perry that he linked to. Both these articles touch on the long-standing question of who does development the “best” in computing. I have heard these arguments many times, where software developers think that there is something mythical about hardware development that makes things work so much better with much fewer bugs, and hardware people looking at the speed of development and fanciful fireworks of coding that software engineers can do. It could be a case of the grass always looking greener on the other side… but there are some concrete things that are relevant here.
I just listened to another Floss Weekly show, Number 36 where they interviewed Jan Lehnard of the CouchDB project. CouchDB is very interesting, in that it is a database designed for replication, redundancy, and thus massive parallelism. It was initially written by Damien Katz on his own, but now it is an Apache Foundation project sponsored by IBM. The most interesting thing is that Damien decided in 2006 to rewrite the C++ prototype he had in Erlang, and did so in just a few months if I understood my Erlang friends right. So here we have a really good parallel program written in a true parallel language.
In the March/April 2008 issue of ACM Queue, there is an article on GPU Programming by Kayvon Fatahalian and Mike Houston of Stanford that I found a very interesting read. It presents and analyzes the programming model of modern GPUs, in the most coherent and understandable way that I have seen so far. The PC GPU has a model for programming parallel hardware that might be a good pattern for other areas of processing. Programmers do not have to write explicitly parallel code, the machinery and hardware takes care of ensuring parallel behavior, as long as the code follows the assumptions made in the model.
In the July 2008 issue of IEEE Computer, there is short article called “In Praise of Scripting: Real Programming Pragmatism“, by Ronald P. Loui, a professor at Washington University (WUSTL). The article deals with the issue of what is the appropriate first language to teach new CS (Computer Science) students, and considers that a “scripting” langauge like Python or Ruby might be way better than Java (no doubt about that I think).
What can this teach us for the purpose of simulation and the creation of models of computer system hardware for the purpose of simulation? Maybe a fair bit…
Model-based architecture (MDA) or model-based development is an idea that to me comes from the automotive field. To, it means that you use some tool that is capable of modeling both a computer controller system and the environment being controlled to create a simulation world where computer control and environment meet and the characteristics of the controller can be ascertained quickly. The key is to not have to convert controller algorithms to concrete code, and not have to run concrete code on concrete hardware against physical prototypes to test the controllers. Today, this seems to be applied to many fields where you are creating control systems (automotive, aviation, robotics). The tools are math-based like MatLab and LabView, along with special programming environments based on UML and StateCharts.
What is interesting is that most of these tools are graphical in nature. And they do seem to work quite well, which is quite surprising given the otherwise poor record of graphical programming as opposed to text-based programming. There were a pile of graphical programming environments in the 1980’s, none of which amounted to much. What survived and prospered were the good old text-based languages like C, C++, Java, VisualBasic, etc. In practice, it seems like it is very hard to beat sequential text when it is time to actual get code working. More efficient programming seems to boil down to having to write less text and having text which is easier to write (for example, dynamic typing, rich libraries, garbage collection, and other modern language features that remove intellectual burdens from the programmer).
But graphics do seem to work for domain-specific cases (like control engineering or signal processing), especially for data-flow-style problems. And for abstract architecture work. So there has to be something to it… but what?
In early July, Cadence announced their new “C2S” C-to-silicon compiler. This event was marked with some excitement and blogging in the EDA space (SCDSource, EDN-Wilson, CDM-Martin, to give some links for more reading). At core, I agree that what they are doing is fairly cool — taking an essentially hardware-unrelated sequential program in C and creating hardware from it. The kind of heavy technology that I have come to admire in the EDA space.
But I have to ask: why start with C?
SystemC TLM-2.0 has just been released, and on the heels of that everyone in the EDA world is announcing various varieties of support. TLM-2.0-compliant models, tools that can run TLM-2.0 models, and existing modeling frameworks that are being updated to comply with the TLM-2.0 standard. All of this feeds a general feeling that the so-called Electronic System Level design market (according to Frank Schirrmeister of Synopsys, the term was coined by Gary Smith) is finally reaching a level of maturity where there is hope to grow the market by standards. This is something that has to happen, but it seems to be getting hijacked by a certain part of the market addressing the needs of a certain set of users.
There is more to virtual platforms than ESL. Much more. Remember the pure software people.
Edit: Maybe it is more correct to say “there is more to virtual platforms than SoC”, as that is what several very smart comments to this post has said. ESL is not necessarily tied to SoC, it is in theory at least a broader term. But currently, most tools retain an SoC focus.
A very interesting idea that has been bandied around for a while in manycore land is the notion that in the future, we will see a total inversion in today’s cost intuition for computers. Today, we are all versed in the idea that processor cores and processing times are quite precious, while memory is free. For best performance, you need to care about the cache system, but in the end, the goal is to keep those processor pipelines as busy as possible. Processors have traditionally been the most expensive part of a system, and ideas such as Integrated Modular Avionics are invented to make the best use of a resource perceived as rare and expensive…
But is that really always going to be true? Is it reasonably to think of CPU cores are being free but other resources as expensive? And what happens to program and system design then?
I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.
Sometimes it is very reassuring that certain things do not work when tested in practice, especially when you have been telling people that for a long time. In my talks about Debugging Multicore Systems at the Embedded Systems Conference Silicon Valley in 2006 and 2007, I had a fairly long discussion about relaxed or weak memory consistency models and their effect on parallel software when run on a truly concurrent machine. I used Dekker’s Algorithm as an example of code that works just fine on a single-processor machine with a multitasking operating system, but that fails to work on a dual-processor machine. Over Christmas, I finally did a practical test of just how easy it was to make it fail in reality. Which turned out to showcase some interesting properties of various types and brands of hardware and software.
The book “Multicore Programming – Increasing Performance through Software Multithreading” by Shameem Akhter and Jason Roberts is part of a series of books put out by Intel in their multicore software push. In case you have not noticed, Intel has a huge market push currently where they give seminars, publish articles and books, and give curricula to universities in order to get more parallel software in place. I read this book recently, and here is a short review.
Continue reading “Book Review: Intel’s Multicore Programming Book”
Most of the time when talking about the impact of multicore processing on software, we complain that it makes the software more complicated because it has to cope with the additional complexities of parallelism. There are some cases, however, when moving to multicore hardware allows a software structure to be simplified. The case of Integrated Modular Avionics (IMA) and the honestly idiotic design of the ARINC 653 standard is one such case.
Continue reading “When Multicore makes Things Simpler, like IMA”