Heterogeneous vs homogeneous systems, revisited

I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.

Continue reading “Heterogeneous vs homogeneous systems, revisited”

IBM z6: Multicore, Accelerators

z6 die photoThe IBM mainframe family started with the S/360 back in the 1960s is still going strong. The naming has been a interesting in recent years, going from S/390 to z900 to z990 to z9.

Continue reading “IBM z6: Multicore, Accelerators”

VIA/Centaur Isaiah: Multicore when needed, not earlier than that

Scott Wasson at the techreport.com has a short but fairly good write-up on the microarchitecture of VIAs new ‘Isaiah’ processor core, developed by their subsidiary Centaur technology. What is interesting is the part on multicore processing. Scott is quoting Glenn Henry from Centaur:

He points out that most people don’t and shouldn’t care what type of CPU they have in their PCs, so long as it gets the job done. When Centaur started, Henry says, they had to develop engineers with a different mindset, not “faster is better.” He set a series of targets involving die size limits and a ship date, and then directed his people to make the processor fast enough within those constraints that people would want to buy it.

Indeed, once you’ve absorbed the Centaur mindset, Henry’s answers to questions become somewhat predictable. Will Isaiah go multi-core? It can; it’s built that way, and Henry thinks Intel’s approach of a shared L2 cache makes sense. But he scoffs at the notion that people need multiple cores in basic computing devices right now. Henry says Centaur will go to multiple cores if it needs that level of performance or if Intel convinces people they have to have it.

It is a very interesting and different way to go about processor design. Aiming for “good enough for what people do now”. A Skoda, not a BMW, to use an old automotive analogy. But note that while in some markets “good enough” also means “bleak and boring”. But this is not necessarily the case in personal computing.

In computers, the software and system construction is what makes a PC stand out from another, not the raw performance of the processor. And as good enough in the Via case also means fairly radically low-power, you can build some cool and cool compact solutions from something like the Isaiah, provided that you absolutely want it to run a commodity OS like Windows or x86 standard Linux (which makes sense for a home machine).

If you care to do a bit more customization and create true consumer electronics, you can easily get the same features at half the power budget using an ARM, MIPS, or embedded POWER core.

The only bit that strikes me as interesting is that Henry thinks of multicore as a way to increase performance, rather than as a way to decrease power. Maybe the additional size of a second core has something do with this, but other players in the market, most notably ARM, is using multicore as a way to reduce the power consumption at a given level of performance. Could be that in the x86 world, anything slower than the current Isaiah design will just be too single-threaded slow to be viable running Windows.

Dekker’s Algorithm Does not Work, as Expected

Sometimes it is very reassuring that certain things do not work when tested in practice, especially when you have been telling people that for a long time. In my talks about Debugging Multicore Systems at the Embedded Systems Conference Silicon Valley in 2006 and 2007, I had a fairly long discussion about relaxed or weak memory consistency models and their effect on parallel software when run on a truly concurrent machine. I used Dekker’s Algorithm as an example of code that works just fine on a single-processor machine with a multitasking operating system, but that fails to work on a dual-processor machine. Over Christmas, I finally did a practical test of just how easy it was to make it fail in reality. Which turned out to showcase some interesting properties of various types and brands of hardware and software.

Continue reading “Dekker’s Algorithm Does not Work, as Expected”

Book Review: Intel’s Multicore Programming Book

Multicore programming book coverThe book “Multicore Programming – Increasing Performance through Software Multithreading” by Shameem Akhter and Jason Roberts is part of a series of books put out by Intel in their multicore software push. In case you have not noticed, Intel has a huge market push currently where they give seminars, publish articles and books, and give curricula to universities in order to get more parallel software in place. I read this book recently, and here is a short review.
Continue reading “Book Review: Intel’s Multicore Programming Book”

When Multicore makes Things Simpler, like IMA

Most of the time when talking about the impact of multicore processing on software, we complain that it makes the software more complicated because it has to cope with the additional complexities of parallelism. There are some cases, however, when moving to multicore hardware allows a software structure to be simplified. The case of Integrated Modular Avionics (IMA) and the honestly idiotic design of the ARINC 653 standard is one such case.
Continue reading “When Multicore makes Things Simpler, like IMA”

Homogeneous and Heterogeneous Multicore vs Programmers

An old colleague just sent me an email bringing up a discussion we had last year, where he was a strong proponent for the homogeneous model of a multiprocessor. The root of that discussion was the difference between the Xbox 360 and Playstation 3 processors. The Xbox 360 has a three-core, two-threads-per-core homogeneous PowerPC main processor called the Xenon (plus a graphics processor, obviously), while the PS3 has a Cell processor with a single two-threaded PowerPC core and seven SPEs, Synergistic Processing Elements (basically DSP-like SIMD machines).

In the game business, it is clear that the Xenon CPU is considered easier to code for. This means that even though the Cell processor clearly has higher theoretical raw performance, in practical the two machines are about equal in power since it is harder to make use of the Cell. Which seems to be a fact.

So here, homogeneous systems do appear to have it easier among programmers. However, I do not believe that that extends to all systems, all the time, everywhere.

Continue reading “Homogeneous and Heterogeneous Multicore vs Programmers”

Parallel Processing Requires Parallel IO

One common use-case for multicore processing on the desktop and elsewhere is “doing many things at the same time”. You could be running many user-interface programs at once, like the “typical today’s teenager template” of tens of IM clients, web sessions, email conversations, music and video players, downloading movies, etc. Or it is a more business-like background indexing of harddrives, backups being taken, downloading large business files, compiling software, updating source code repositories, etc.

I have been doing both of these modes to some extent, and the main problem with them at least on a PC is that while the processors might be good at multitasking and sharing the CPU load, my IO system is annoyingly non-parallel.

Continue reading “Parallel Processing Requires Parallel IO”

Applications that can make use of more compute power (e.g., iPod Video)

A question that pops up quite often when computer architects and representatives from firms like Intel encounter a crowd today is but just what do you need more computing power for????. Most regular users are fairly happy with the speed at which they process words, surf the web, read email, do IP phone calls, crunch numbers in Excel, and other common tasks. It is hard to perceive the need for more speed in everyday tasks, unlike a decade or two ago when you could definitely ask for improvement. I remember scrolling a page in PageMaker on a Mac SE (8Mhz 68000). You counted the clicks and waited for the screen to jump, redraw, jump, redraw, stabilize… quite a different experience from working with modern computers and far more complex software that still responds instantaneously to almost any work.

Continue reading “Applications that can make use of more compute power (e.g., iPod Video)”

SICS Multicore Day 2007 – More on Programming

Some more thoughts on how to program multicore machines that did not make it into my original posting from last week. Some of this was discussed at the multicore day, and others I have been thinking about for some time now.

One of the best ways to handle any hard problem is to make it “somebody else’s problem“. In computer science this is also known as abstraction, and it is a very useful principle for designing more productive programming languages and environments. Basically, the idea I am after is to let a programmer focus on the problem at hand, leaving somebody else to fill in the details and map the problem solution onto the execution substrate.

Continue reading “SICS Multicore Day 2007 – More on Programming”

SICS Multicore Day August 31

The SICS Multicore Day August 31 was a really great event! We had some fantastic speakers presenting the latest industry research view on multicores and how to program them. Marc Tremblay did the first presentation in Europe of Sun’s upcoming Rock processor. Tim Mattson from Intel tried hard to provoke the crowd, and Vijay Saraswat of IBM presented their X10 language. Erik Hagersten from Uppsala University provided a short scene-setting talk about how multicore is becoming the norm.

Continue reading “SICS Multicore Day August 31”

Real-Time in Sweden (RTiS) 2007

RTiS 2007 just took place in Västerås, Sweden. It is a biannual event where Swedish real-time research (and that really means embedded in general these days) presents new results and summarizes results from the past two years. For someone who has worked in the field for ten years, it really feels like a gathering of friends and old acquaintances. And always some fresh new faces. Due to a scheduling conflict, I was only able to make it to day one of two.

I presented a short summary of a paper I and a colleague at Virtutech wrote last year together with Ericsson and TietoEnator, on the Simics-based simulator for the Ericsson CPP system (see the publications page for 2006 and soon for 2007). I also presented the Simics tool and demoed it in the demo session. Overall, nice to be talking to the mixed academic-industrial audience.

Continue reading “Real-Time in Sweden (RTiS) 2007”