An old colleague just sent me an email bringing up a discussion we had last year, where he was a strong proponent for the homogeneous model of a multiprocessor. The root of that discussion was the difference between the Xbox 360 and Playstation 3 processors. The Xbox 360 has a three-core, two-threads-per-core homogeneous PowerPC main processor called the Xenon (plus a graphics processor, obviously), while the PS3 has a Cell processor with a single two-threaded PowerPC core and seven SPEs, Synergistic Processing Elements (basically DSP-like SIMD machines).
In the game business, it is clear that the Xenon CPU is considered easier to code for. This means that even though the Cell processor clearly has higher theoretical raw performance, in practical the two machines are about equal in power since it is harder to make use of the Cell. Which seems to be a fact.
So here, homogeneous systems do appear to have it easier among programmers. However, I do not believe that that extends to all systems, all the time, everywhere.
From a strict hardware properties perspective, it is clear that the more specialized a solution is, the more efficient it is going to be. It is going to use less energy and/or take less time to do a particular task. For example, it is common wisdom in signal processing that an ASIC (or FPGA) with a certain function hard-wired into it is always the fastest and most efficient solution. A DSP is more flexible but still far more efficient than a general-purpose processor. The same holds in the physical world. A Swiss army knive is certainly versatile, but also a fairly lousy knife if considered as a just a knife. There is a reason good craftsmen always have lots of obscure specialized tools, and that is that a specialized tool always does the job best when used by a skilled person.
So where does this leave computer architecture? Some people would like all processors in a system to be the same, since that supposedly simplifies the programming problem. For example, Tom Leonard at Valve claims, in a presentation from gdconf 2007, that a single type of processor is preferable to the current PC mix of general CPU, graphics processor, physics processor, AI processor, etc. And that might well simplify their particular programming problem which is games on general-purpose PCs.
On the other hand, heterogeneous hardware systems seem to be all the rage right now. The Cell processor is certainly the most publicized case and one of the mainstream trail blazers, but it is by no means alone in advocating a heterogeneous hardware architecture as the best solution to real-world problems. In the PC mainstream, AMD and Intel are both looking into combining graphics processor cores with general processor cores on the same die. Sun’s Niagara 2/UltraSparc T2 chip adds security accelerators to a server processor.
In the embedded field, you have the classic mobile phone chips like the Texas Instruments OMAP series where an ARM core is combined with one or more DSP cores, as well as IO peripherals and accelerators for various tasks like media processing. Many other vendors have similar offerings for the field of mobile devices, it is not just TI.
In more stationary embedded systems, the Freescale MPC8572 is a typical design, where you have multiple processor cores and various smart accelerators on a chip. The processor cores do not dominate the area, rather the peripherals and accelerators do. Other chips with the same type of mix of IO units, accelerators and offload engines, and processor cares are PA Semi’s PA1682M, and the various Octeon devices from Cavium.
What all of these devices tend to feature are broadly speaking five types of hardware blocks:
- Fully programmable general-purpose processors like x86, Power, MIPS, ARM.
- Fully programmable domain-oriented processors like DSPs and GPUs, where you have programs that run self-contained but usually under the control of a general-purpose processor.
- Configurable advanced accelerators like pattern-matching engines and security engines and MPEG decoders. They can complete significant amounts of work by themselves, and take over entire functions from the software, but the overall sequencing of tasks is controlled by a programmable processor.
- Fixed-function hardware devices like Ethernet controllers (that can have a fair bit of smarts around them, including TCP/IP offload), serial controllers, PCI controllers, which are configured and driven small step by small step from the processor.
- Infrastructure devices that are needed in order to make a system run at all, like interrupt controllers, memory controllers, coherency controllers, timers, and debug support. All systems need them, and their properties are not really differentiators (other than the bandwidth of the memory controller).
I think there is no way that all these categories are going to collapse into a single type of general-purpose processor core. Since the chip customers always want to build their hardware with the least number of chips, space on the few chips actually being used is going to be at a premium. In the choice is between a large number of features and high peak performance using specialized processors and accelerators and a single general core, I think most of the time the general core is going to lose. As long as you have some idea of the application domain of your chip, the specialized accelerators and processors are the better choice.
Also, some types of problems are singularly unsuitable for general-purpose processors with their sequential programming model with a small set of registers and a large serially-accessed memory. For example, looking up a value in a table is very easy to do in parallel in hardware, and very inefficient to do in software. Even if you have multiple cores attacking the lookup issue, it cannot approach the speed of inherently parallel raw logic. Work that has to be performed very regularly with minimal jitter, like clocking bits onto a communications bus, is also hard to do well on a processor and very easy to do in hardware. There are many other such examples.
So I think hardware designers exercise good sense when they keep insisting on heterogeneous system designs and System-on-Chip designs. The heterogeneous systems might be perceived as harder to program, but they gain so much efficiency that there simply is no choice. It is not simply the case of hardware designers making their job easier at the cost of software programmers.
Furthermore, heterogeneous systems can actually be easier to architect programs for if the structure of the hardware maps well to the domain. In a sense, the hardware comes with the software architecture predefined by the hardware structure. Dividing up graphics onto a GPU and AI onto a CPU makes sense for programming games, for example. Doing signal processing on a set of dedicated DSPs and leaving the coordination and control task to a CPU is sensible as a system architecture for a mobile phone base station.
If the hardware architecture is bad for the task at hand, this is obviously bad, but that is really a question of choosing the right architecture for the the application rather than an inherent problem with specialized architectures. A car is pretty poor at crossing oceans, and boats work pretty bad on the highway. Dividing up work clearly onto different pieces of hardware tends to isolate faults, and reduce the possible negative interference between tasks since it reduces the number of shared resources. Shared resources are ever the bane of performance and stability for parallel systems. Dedicating various pieces of hardware to different tasks also help solve the IO issue that I discussed earlier.Heterogeneous systems force you to use different styles of expressing your algorithms for different types of processing elements — a DSP programmer has to think differently from a general-purpose programmer. But I think that most of the time, that extra complexity buys you so much performance that it is worth it.
This question of different types of programming skills also mean that the size of the project is relevant for how heterogeneous an architecture can be. With more programmers involved, it is easier to have programmers specialize themselves on different types of architectures. Thus, a large project can likely do a better job of using a large number of different processing elements compared to a small project where there is less scope for specialization. The more you do something, the better you get simply.
I think one of the main reasons why people consider heterogeneous systems harder to program is the fact that the programming tools are usually poorer for more specialized architectures. Since there are fewer users of a Freescale QUICCEngine compared to an x86 CPU, the tools will be fewer and usually less rich in features. With a background programming desktop Windows systems, the tool support for programming a Cell SPE will strike you as quite poor in comparison. But that is not sufficient reason to give up heterogeneous systems… rather a call for standardization of how to access the same types of functionality across systems and preferably across vendors. The more different systems that use the same programming interface, the better and richer the tools will be.
So where does this long post (or maybe one should call it a short article) leave us in terms of conclusions? I think the following will always hold true:
- Heterogeneous systems are here to stay for simple efficiency reasons.
- Heterogeneous systems typically provide the best hardware support for a particular application domain.
- Programming heterogeneous systems is really not more difficult than homogeneous systems, from an essentials perspective.
- Tools for homogeneous systems derived from popular desktop and server computing systems tend to be better than tools for other systems, and this makes the homogeneous systems apparently easier to program.