I recently read a couple of articles on multicore that felt a bit like jumping back in time. In IEEE Spectrum, David Patterson at Berkeley’s parallel computing lab brings up the issue of just how hard it is to program in parallel and that this makes the wholesale move to multicore into something like a “hail Mary pass” for the computer industry. In Computer World, Chris Nicols at NICTA in Australia asks what you will do with a hundred cores – implying that there is not much you can do today. While both articles make some good points, I also think they should be taken with a grain of salt. Things are better than they make them seem.
David Patterson’s article is very similar in its message to what we used to hear five years ago as everyone woke up and panicked when single-core computing ran out of steam. Chris Nicols is extrapolating from the current state in desktop PCs, and asking how the programs we run today will work when you have scores of cores rather than two or four.
The main message in both articles is that software needs to adapt to multicore, and that this is not happening as quickly as it needs to. It is clear that automatic parallelization of existing code is a no-starter, and the Computer World article proposes the use of domain-specific languages (which I agree is a very good way to go). David Patterson is less clear on what he thinks is a good programming model for multicore, leaving that as an open problem. Patterson does point out that we have quite a few success stories in parallel computing. I think it is important to not underestimate the set of problems which are amenable to parallelization. In particular, in the embedded field, we have many naturally parallel problem domains. For example, networking offers abundant parallelism as many clients, servers, and data streams are active.
However, both articles also miss some of the things which are happening to make multicore easier to use, especially in embedded. My favorite example of a technology that nobody seems to talk about outside of the embedded field is the hypervisor. With an hypervisor (such as the Wind River Hypervisor), you can take existing distributed systems and consolidate them onto a single multicore device, continuing a long tradition of multiple-processor programming in embedded. Also, the hardware-based debug tools which are available for embedded systems – including multicore – do not seem to register at all with the mainstream researchers. It is really a shame, and hopefully we can start to change that with conferences like the S4D where we start to bring industrial debugging experience to the academic community.
Another aspect which is missing from both articles is any discussion about the debug and test of parallel software. Coding a parallel program is one thing, making sure it works is quite another. I think this is an interesting problem in its own right, as exposing parallel bugs is often just as hard as writing the buggy code to begin with. Wind River Simics is a tool that can be used to really stress multicore software, thanks to its ability to vary configuration parameters and inject some extra delays into the target system. Simics is also a very good tool for debugging multicore software thanks to its controlled deterministic execution environment. I have already discussed this in a Wind River blog post, and I will not repeat the argument here.
Post scriptum: setting some facts straight
I want to quickly correct some factual mistakes in the Computer World article: the processor power wall at 130W that he discusses is often much lower in embedded systems – a mobile phone drawing 130W would not be very popular, and in networking infrastructure, the magic limit today seems to be around 30W. So it all depends, and the Itanium example is totally irrelevant. In certain mainframe computers, the limit is higher than that. The same is true for the claimed upper possible clock speed limit at 4 GHz. This has been surpassed by several companies, most notably by IBM who clocked the Power6 to 4.7 GHz a few years ago. However, he is right that we will see cores multiplying in almost all systems as single cores run out of speed increases.
As Jack Ganssle said, us embedded folks get no respect.
Hi Jakob, I also took the liberty to comment on David’s article, here: http://a-vajda.eu/blog/?p=532. It’s not exactly targeted to the embedded domain, but it does provide a way to use hundreds-to-thousands of cores for virtually *any* application.
Hope to see you at the MC day next week!