SiCS Multicore Days: The Debate Points

It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.

What was quite striking this year was the greater difference of opinion between the speakers. I guess that in 2007, most of the discussion was on the level of “ouch, here comes multicore and what are we going to do about it”. This year, we got a bit deeper and with one more year of experience and massive research work, the collective world of multicore have made some progress and gained insights. And that’s when the differences start to show up; the fact that we have differences of opinion tells us that we are starting to dig into details and turning up different answers due to different viewpoints and user experiences.

So where were the differences this time?

  • Heterogeneous vs homogeneous cores (on a single chip). Kunle Olukotun clearly supported the heterogeneous style (which is what you with Sun’s Niagara that he designed the basis for). Erik Hagersten was more interested in the difference between thin and fat cores of the same basic ISA, and Anant Agarwal was strongly in favor of completely homogeneous systems (which is what they build at Tilera). In my biased view, I think the argument for heterogeneous in pure energy efficiency is always going to prevail. See some of my previous blog posts on this topic, for some background:
  • Domain-specific vs general-purpose programming languages. The same sides here, with Kunle advocating domain-specific languages, and Anant and David Padua more in the general-purpose camp. I like domain-specific better, it seems to rhyme more with what I see people actually doing today to increase programming productivity overall.
  • Memory bottleneck or not? The most interesting discussion came when memory bandwidth and cache sizes were discussed. One quite common school of thought over the past few years teach that caches per core will shrink, and bandwidth to get data into and out of a chip is going to be a severe restriction on what can be done. Not all in the panel agreed with this, there was the idea (mostly from Kunle) that in some way the massive bandwidths and low latencies achievable within a chip (compared to between chip in a classic discrete-processors multiprocessor) could make this less of a problem. Personally, I think this is going to be some kind of problem, but maybe not as much as passing data around faster might reduce the need to store it temporarily. Despite the need for more bandwidth, nobody really agreed with Erik’s thought that maybe it makes sense to build chips that do not max out on the number of cores they contain, but rather try to balance core count with achievable IO bandwidth. That idea has some merit.
  • Core counts. Moore’s law tells us there are going to be thousands of cores on a chip fairly soon… but if we do not manage to make good use of them, maybe the growth in core counts will slow soon. Putting four or six or eight cores into a general-purpose system makes sense today, but more than that might turn out to be a waste for the vast majority of users that do not have problems to solve and programs to run that can make of more than that. In the same sense, maybe it is better with slightly fewer more powerful cores than a maximum amount of minimalistic cores, considering the state of software available today. So it sounds like a fairly divergent future here.
  • Shared memory or local memories? Most of the seemed to be in the camp proposing that shared memory is too convenient not to have, even when it really is bad for you. Several bad jokes comparing shared memory to alcohol, and the moderator of the panel suggesting that a good way to avoid the hangover of shared memory is to stay drunk… whatever that means in practice.

Somethings were generally agreed upon, though.

  • Programming is an issue, shared-memory or local-memory or whatever. the idea for the solution varied, however, as discussed above.
  • Cores will still be plentiful and that operating-systems focusing on sharing time on a single very valuable core is an idea of the past. The keyword for the future is spatial sharing and reducing the overhead of management (I have some previous blog posts on this topic, especially on the subject of IMA and real-time control when cores are free).
  • Virtualization and isolating partitions of a multicore chip from each are necessary mechanisms. Running multiple different operating systems on a single chip will be quite normal, probably under the control of some global hypervisor.

Any comments on this from my small audience? I think the topics under discussion are quite fascinating and the kind of issues on which the success of major chip design projects will be decided. A good architecture with a good programming model has a great chance of success (as long as it looks like a continuation of something existing :)).

7 thoughts on “SiCS Multicore Days: The Debate Points”

  1. Re Homogeneous vs heterogeneous architecture. It seems that you are saying that given a certain piece of software it is possible to assemble custom hardware hardware that runs that software more energy efficiently than some general purpose hardware. I guess this is hard to argue with.

    In this interview, http://arstechnica.com/articles/paedia/gpu-sweeney-interview.ars/, Tim Sweeney seems to argue that homogeneous architectures is better because “…it could dramatically simplify the toolset and the processes for creating software.”, and heterogeneous is worse because “…a lot of the complexity is unnecessary and makes load-balancing more difficult.”

    Perhaps it would be possible to compile a list of arguments for and against homogeneous/heterogeneous architectures (hopefully arguments that everyone can agree on), and then use those arguments to reason about what architecture is better for running different sets of software.

  2. Thanks for the link!

    I can see his point, but it is equally important to recognize why GPUs are not the same as CPUs today — if it was as simple as simple programming trumping raw power, the GPU would be dead. But even today’s fairly general GPUs are orders of magnitude more efficient than general-purpose processors at churning through their target loads. And nothing is going to change that.

    As I see it, an important facet of programming is that form follows function — a good program should be designed after the environment it is going to work in and the manipulations and computations it is supposed to achieve. For graphics, this would mean that program structure is still quite domain-driven, which can be exploited by domain-specific architectures. Not being domain-specific is really going to make the hardware quite inefficient.

    Just look at how much more efficiently a multithreaded architecture cuts through web servers compared to single-threaded processors. A “general purpose” computer is a swiss army knife: decent at a lot of things, great at nothing. When you need true greatness, you specialize the architecture to suit the domain.

    Also, as soon as power and efficiency per chip size becomes a real issue, (which it really is not on a PC which even in laptops have quite generous power budgets of 60 to 90W, and in desktops reaching to 900W today I heard), heterogeneity becomes much more attractive. On battery power, specialization really helps.

  3. Hi Jakob,
    I’m glad to see you liked the panel debate.
    On the issue of heterogeneous vs homogeneous: there’s one thing that is often overlooked, the issue of cost vs benefit when it comes to chip design. Designing a new chip costs roughly the same, irrespective of its nature (at least there isn’t an order of magnitude difference), but the more specialized a chip is, the smaller market it can address. Hence, I would argue that more generic chips with good interconnects, memory architecture etc will have a greater economical viability.
    I’ll soon post an article on this to my blog as well, http://www.a-vajda.eu/blog
    Cheers,
    Andras (the one who moderated the panel)

  4. Andras, that is a good point, but I think it overlooks the running costs of the chip. If you can get an order of magnitude efficiency in power, the cost might be worth it. The alternative is quite often that you cannot keep up at all. No CPU can do routing or graphics displaying or pattern matching quite as fast as specialized hardware.

    Quite often the cost of designing-in an existing accelerator appears quite small for a particular chip design. Most chips are mostly regroupings of existing IP, not brand new from-scratch development.

    But it is a sliding scale of economics, and including the chip design cost as well run-time energy efficiency, processing latencies, and total system cost is not an easy tradeoff. I think that a general system will always be less efficient, and the question is really when that lack of efficiency makes it inapplicable and therefore cuts off a part of the market that a more specialized chip can address. There is a reason that Intel chips are rare in embedded — 100W is simply a bit much to stomach for many systems just to get a few processor cores.

    Too bad that question never got asked in the panel in that kind of direct a way.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.