Freescale QorIQ P4080 is out — with Simics support

Only half an hour ago, the embargoes lifted. Freescale announced its new QorIQ series of multicore (and some single- and dual-core) processors. For the top-end of that line, the P4080, Freescale and Virtutech (where I work, remember) have developed a virtual platform solution to help Freescale customers get to working products faster. The virtual platform is available now, and is already running several operating systems including VxWorks, QNX, and a variety of Linuxes. Apart from the fairly large scale of this SoC, the really new part of the virtual platform is the so-called Hybrid solution, where the fast models are combined with detailed models from Freescale themselves. This creates a cycle-level detailed model with validated timing, “from the source” — but without the performance issues of having to run everything at great level of detail. Rather, you use the fast model to steer the simulation of a workload to an interesting spot, and then turn up the level of detail then and there. You can also select which components of the chip are actually detailed and which parts are modeled with the fast functional models, avoiding the incredible slow-down of running and entire virtual platform at a great level of detail.

If you happen to be at the FTF in Orlando, do come by and look at the demos!

I have been involved in this work for the past year, and it is wonderful to finally see it coming out and be able to talk about it.

Is SoC (was: ESL) all there is to virtual platforms?

SystemC TLM-2.0 has just been released, and on the heels of that everyone in the EDA world is announcing various varieties of support. TLM-2.0-compliant models, tools that can run TLM-2.0 models, and existing modeling frameworks that are being updated to comply with the TLM-2.0 standard. All of this feeds a general feeling that the so-called Electronic System Level design market (according to Frank Schirrmeister of Synopsys, the term was coined by Gary Smith) is finally reaching a level of maturity where there is hope to grow the market by standards. This is something that has to happen, but it seems to be getting hijacked by a certain part of the market addressing the needs of a certain set of users.

There is more to virtual platforms than ESL. Much more. Remember the pure software people.

Edit: Maybe it is more correct to say “there is more to virtual platforms than SoC”, as that is what several very smart comments to this post has said. ESL is not necessarily tied to SoC, it is in theory at least a broader term. But currently, most tools retain an SoC focus.

Continue reading “Is SoC (was: ESL) all there is to virtual platforms?”

The 1970 rule strikes again: Virtual Platform Principles in 1967

Being a bit of a computer history buff, I am often struck by how most key concepts and ideas in computer science and computer architecture were all invented in some form or the other before 1970. And commonly by IBM. This goes for caches, virtual memory, pipelining, out-of-order execution, virtual machines, operating systems, multitasking, byte-code machines, etc. Even so, I have found a quite extraordinary example of this that actually surprised me in its range of modern techniques employed. This is a follow-up to a previous post, after having actually digested the paper I talked about earlier.

Continue reading “The 1970 rule strikes again: Virtual Platform Principles in 1967”

Power Architecture Conference München 2008

Power.org LogoOn Tuesday next week, I will be presenting at the Power Architecture Conference (PAC) in München, Germany. The topics will be multicore debug using virtual hardware, and the new Simics Accelerator technology. Especially Simics Accelerator is pretty interesting technology.

It is a simple idea, using multiple host cores to run a virtual platform, with fairly amazing results. Now, using a single computer we can run fairly incredible simulations that were the realm of pure fantasy just a few years ago. We also got a nice new little box to demonstrate it with, an eight-core Dell with 16 GB of RAM. With 64-bit Linux, this thing makes my Core 2 Duo laptop with 32-bit Vista look like yesteryear’s snail…  And creates that giggling feeling that a really impressive new toy brings up in even the most grown up boys. Booting a 16-machine network of PowerPC boards was so fast it was not demoworthy.  I think we have to up the ante to some 100 target machines to make it interesting, and I have no doubt that a combination of multithreading and idle-loop optimization will make that thing be usefully interactive from the target command lines. There are many other wild things we could try on that demo box, once it gets back from the Power Architecture Conferences tour.

Tri-core or Tricore or TriCore(tm)

I do find it kind of funny when marketing names go bad in unexpected ways of collide in unexpected ways. There is this fairly old Infineon combined DSP/MCU core called TriCore (the name means it is both a RISC, a DSP, and an MCU). It was a nice name, easy to recognize, easy to pronounce, unlike the competition at the time. Today though, we are seeing multicore chips with three cores on the die. So what are these, if not tri-core chips, in analog with single- dual- quad- oct- etc.  And this makes it very necessary to use the hyphen. For example, the Freescale recent StarCore 8113 chip with three cores has its press release explicitly headed tri-core with an hyphen. I guess marketing would have liked the more visually pleasing tricore moniker along with dualcore, which looks fairly established.

Ah well, not to mention the fun Infineon will have if it launches a triple-core TriCore device. Maybe in a third generation TriCore 3? The power of three, indeed. TriTriTriCore possibly?

Real-time control when cores become free

ImageA very interesting idea that has been bandied around for a while in manycore land is the notion that in the future, we will see a total inversion in today’s cost intuition for computers. Today, we are all versed in the idea that processor cores and processing times are quite precious, while memory is free. For best performance, you need to care about the cache system, but in the end, the goal is to keep those processor pipelines as busy as possible. Processors have traditionally been the most expensive part of a system, and ideas such as Integrated Modular Avionics are invented to make the best use of a resource perceived as rare and expensive…

But is that really always going to be true? Is it reasonably to think of CPU cores are being free but other resources as expensive? And what happens to program and system design then?

Continue reading “Real-time control when cores become free”

Virtual Platform by Virtualization Extensions — 1969

By means of a trip down virtualization history, I found a real gem in 1969 paper called A program simulator by partial interpretation, by Kazuhiro Fuchi, Hozumi Tanaka, Yuriko Manago, Toshitsugu Yuba of the Japanese Government Electrotechnical Laboratory. It was published at the second symposium on Operating systems principles (SOSP) in 1969. It describes a system where regular target instructions are directly interpreted, and any privileged instructions are trapped and simulated. Very similar to how VmWare does it for x86, or any other modern virtualization solution.

Continue reading “Virtual Platform by Virtualization Extensions — 1969”

David Ditzel Interview at The Register/Semicoherent Computing

TheRegister Radio LogoThe Register has a few podcasts in addition to their website, and the one called “Semicoherent Computing” has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to their interview from September 2007 with David Ditzel of Transmeta fame. He had a lot to say about the history of computing, as well as interesting things on where computing is going. Well worth a listen! Particular interesting highlights…

Continue reading “David Ditzel Interview at The Register/Semicoherent Computing”

Power Architecture Newsletter Article

Power logo squarePower.org publishes a quarterly newsletter over at www.power.org/news/newsletter. In the April 2008 issue it features a short article by me introducing Simics 4.0 and Simics Accelerator, the way in which Virtutech Simics takes advantage of multicore processors to simulate large target systems using a multithreaded simulator.

Heterogeneous vs homogeneous systems, revisited

I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.

Continue reading “Heterogeneous vs homogeneous systems, revisited”

IBM z6: Multicore, Accelerators

z6 die photoThe IBM mainframe family started with the S/360 back in the 1960s is still going strong. The naming has been a interesting in recent years, going from S/390 to z900 to z990 to z9.

Continue reading “IBM z6: Multicore, Accelerators”

VIA/Centaur Isaiah: Multicore when needed, not earlier than that

Scott Wasson at the techreport.com has a short but fairly good write-up on the microarchitecture of VIAs new ‘Isaiah’ processor core, developed by their subsidiary Centaur technology. What is interesting is the part on multicore processing. Scott is quoting Glenn Henry from Centaur:

He points out that most people don’t and shouldn’t care what type of CPU they have in their PCs, so long as it gets the job done. When Centaur started, Henry says, they had to develop engineers with a different mindset, not “faster is better.” He set a series of targets involving die size limits and a ship date, and then directed his people to make the processor fast enough within those constraints that people would want to buy it.

Indeed, once you’ve absorbed the Centaur mindset, Henry’s answers to questions become somewhat predictable. Will Isaiah go multi-core? It can; it’s built that way, and Henry thinks Intel’s approach of a shared L2 cache makes sense. But he scoffs at the notion that people need multiple cores in basic computing devices right now. Henry says Centaur will go to multiple cores if it needs that level of performance or if Intel convinces people they have to have it.

It is a very interesting and different way to go about processor design. Aiming for “good enough for what people do now”. A Skoda, not a BMW, to use an old automotive analogy. But note that while in some markets “good enough” also means “bleak and boring”. But this is not necessarily the case in personal computing.

In computers, the software and system construction is what makes a PC stand out from another, not the raw performance of the processor. And as good enough in the Via case also means fairly radically low-power, you can build some cool and cool compact solutions from something like the Isaiah, provided that you absolutely want it to run a commodity OS like Windows or x86 standard Linux (which makes sense for a home machine).

If you care to do a bit more customization and create true consumer electronics, you can easily get the same features at half the power budget using an ARM, MIPS, or embedded POWER core.

The only bit that strikes me as interesting is that Henry thinks of multicore as a way to increase performance, rather than as a way to decrease power. Maybe the additional size of a second core has something do with this, but other players in the market, most notably ARM, is using multicore as a way to reduce the power consumption at a given level of performance. Could be that in the x86 world, anything slower than the current Isaiah design will just be too single-threaded slow to be viable running Windows.

Dekker’s Algorithm Does not Work, as Expected

Sometimes it is very reassuring that certain things do not work when tested in practice, especially when you have been telling people that for a long time. In my talks about Debugging Multicore Systems at the Embedded Systems Conference Silicon Valley in 2006 and 2007, I had a fairly long discussion about relaxed or weak memory consistency models and their effect on parallel software when run on a truly concurrent machine. I used Dekker’s Algorithm as an example of code that works just fine on a single-processor machine with a multitasking operating system, but that fails to work on a dual-processor machine. Over Christmas, I finally did a practical test of just how easy it was to make it fail in reality. Which turned out to showcase some interesting properties of various types and brands of hardware and software.

Continue reading “Dekker’s Algorithm Does not Work, as Expected”

Book Review: Intel’s Multicore Programming Book

Multicore programming book coverThe book “Multicore Programming – Increasing Performance through Software Multithreading” by Shameem Akhter and Jason Roberts is part of a series of books put out by Intel in their multicore software push. In case you have not noticed, Intel has a huge market push currently where they give seminars, publish articles and books, and give curricula to universities in order to get more parallel software in place. I read this book recently, and here is a short review.
Continue reading “Book Review: Intel’s Multicore Programming Book”

When Multicore makes Things Simpler, like IMA

Most of the time when talking about the impact of multicore processing on software, we complain that it makes the software more complicated because it has to cope with the additional complexities of parallelism. There are some cases, however, when moving to multicore hardware allows a software structure to be simplified. The case of Integrated Modular Avionics (IMA) and the honestly idiotic design of the ARINC 653 standard is one such case.
Continue reading “When Multicore makes Things Simpler, like IMA”

Øredev 2007

Öredev logo

Just like in 2006, I went to the Øredev conference in Malmö and presented a workshop using Virtutech Simics. This year, I worked with Jonas Svennebring from Freescale and we created a workshop around parallelizing network processing software for running on a multicore Freescale processor. The workshop went reasonably well, and the participants definitely learned something about what we trying to get across, even though we did not have much time to actualy complete the programming assignments.

Continue reading “Øredev 2007”

Homogeneous and Heterogeneous Multicore vs Programmers

An old colleague just sent me an email bringing up a discussion we had last year, where he was a strong proponent for the homogeneous model of a multiprocessor. The root of that discussion was the difference between the Xbox 360 and Playstation 3 processors. The Xbox 360 has a three-core, two-threads-per-core homogeneous PowerPC main processor called the Xenon (plus a graphics processor, obviously), while the PS3 has a Cell processor with a single two-threaded PowerPC core and seven SPEs, Synergistic Processing Elements (basically DSP-like SIMD machines).

In the game business, it is clear that the Xenon CPU is considered easier to code for. This means that even though the Cell processor clearly has higher theoretical raw performance, in practical the two machines are about equal in power since it is harder to make use of the Cell. Which seems to be a fact.

So here, homogeneous systems do appear to have it easier among programmers. However, I do not believe that that extends to all systems, all the time, everywhere.

Continue reading “Homogeneous and Heterogeneous Multicore vs Programmers”

Parallel Processing Requires Parallel IO

One common use-case for multicore processing on the desktop and elsewhere is “doing many things at the same time”. You could be running many user-interface programs at once, like the “typical today’s teenager template” of tens of IM clients, web sessions, email conversations, music and video players, downloading movies, etc. Or it is a more business-like background indexing of harddrives, backups being taken, downloading large business files, compiling software, updating source code repositories, etc.

I have been doing both of these modes to some extent, and the main problem with them at least on a PC is that while the processors might be good at multitasking and sharing the CPU load, my IO system is annoyingly non-parallel.

Continue reading “Parallel Processing Requires Parallel IO”

Applications that can make use of more compute power (e.g., iPod Video)

A question that pops up quite often when computer architects and representatives from firms like Intel encounter a crowd today is but just what do you need more computing power for????. Most regular users are fairly happy with the speed at which they process words, surf the web, read email, do IP phone calls, crunch numbers in Excel, and other common tasks. It is hard to perceive the need for more speed in everyday tasks, unlike a decade or two ago when you could definitely ask for improvement. I remember scrolling a page in PageMaker on a Mac SE (8Mhz 68000). You counted the clicks and waited for the screen to jump, redraw, jump, redraw, stabilize… quite a different experience from working with modern computers and far more complex software that still responds instantaneously to almost any work.

Continue reading “Applications that can make use of more compute power (e.g., iPod Video)”

SICS Multicore Day 2007 – More on Programming

Some more thoughts on how to program multicore machines that did not make it into my original posting from last week. Some of this was discussed at the multicore day, and others I have been thinking about for some time now.

One of the best ways to handle any hard problem is to make it “somebody else’s problem“. In computer science this is also known as abstraction, and it is a very useful principle for designing more productive programming languages and environments. Basically, the idea I am after is to let a programmer focus on the problem at hand, leaving somebody else to fill in the details and map the problem solution onto the execution substrate.

Continue reading “SICS Multicore Day 2007 – More on Programming”

SICS Multicore Day August 31

The SICS Multicore Day August 31 was a really great event! We had some fantastic speakers presenting the latest industry research view on multicores and how to program them. Marc Tremblay did the first presentation in Europe of Sun’s upcoming Rock processor. Tim Mattson from Intel tried hard to provoke the crowd, and Vijay Saraswat of IBM presented their X10 language. Erik Hagersten from Uppsala University provided a short scene-setting talk about how multicore is becoming the norm.

Continue reading “SICS Multicore Day August 31”

Real-Time in Sweden (RTiS) 2007

RTiS 2007 just took place in Västerås, Sweden. It is a biannual event where Swedish real-time research (and that really means embedded in general these days) presents new results and summarizes results from the past two years. For someone who has worked in the field for ten years, it really feels like a gathering of friends and old acquaintances. And always some fresh new faces. Due to a scheduling conflict, I was only able to make it to day one of two.

I presented a short summary of a paper I and a colleague at Virtutech wrote last year together with Ericsson and TietoEnator, on the Simics-based simulator for the Ericsson CPP system (see the publications page for 2006 and soon for 2007). I also presented the Simics tool and demoed it in the demo session. Overall, nice to be talking to the mixed academic-industrial audience.

Continue reading “Real-Time in Sweden (RTiS) 2007”