Probably thanks to the yearly Mobile World Congress, there have been a slew of recent announcements of mobile application processors recently. Everything is ARM-based, but show quite some variety in the CPU core configurations used. Indeed, I think this variety has something to say on the general state of multicore.
My starting point is the marketing hype surrounding core counts (something we saw in PCs a few years ago). It sure sounds powerful with an “octacore” device, right? Beats an old dualcore any day, right? To some extent this is marketing hype, but it also does show that there is a need for more performance in phones and tablets. I have also been looking for a new TV lately, and was quite flabbergasted to see several brands touting dual-core processors as a key feature of their offerings. Processor specs for selling a dumb thing like a TV? Things are weird. Sure, Tim Cook of Apple blasted this as being stupid marketing “because they cannot provide a great experience” — but the truth of the matter is that when vendors compete within a single software ecosystem, this is what you compete on. The experience of Android or Windows is mostly the same across devices, so differentiation has to come from the hardware, and core counts are easy to understand. Just like horsepower in cars.
But the key question is: can we really use more cores than two or four in any meaningful way?
Looked at objectively, there are four ways to improve performance of the processor portion of an SoC:
- More cores
- Better microarchitecture
- Higher clock frequency
- Allow the device to go hotter and use more power
In ARM land, there is clearly room for all four, while in Intel PCs, it is hard to do more than minor improvements in anything except core count. So we are left with a more interesting competitive space than most. The better microarchitecture arena is a place where Qualcomm and Apple have shown that you can do better than ARMs standard cores.
Some smart and brave souls at ST Ericsson (one of which I met many years ago and have great respect for) put out an interesting whitepaper on the quad-core mania, making the case that their new very-high-clock frequency dual Cortex-A9 is more useful than a lower-clocked quad-core Cortex-A9. Basically, the software just does not typically make use of multiple cores in a good way, while driving the clock frequency higher will immediately accelerate the software that people really care about. When it comes to reducing the latency of operations, higher clocks tend to be hard to beat. Still, some marketing person at ST Ericsson decided that quad is the word of the day, and dubbed their Novathor L8580 2.5 GHz dual-core a “eQuad”, for something equivalent to a quad. Silly. Had the core been a bit more modern, I think this could have been a real winner. For most real applications, this should be much much faster than the last-generation 1.0 to 1.5 GHz Cortex-A9 dual-cores or quad-cores.
Another interesting new release was the Renesas Mobile MP6530. It is a chip targeting the midrange of smartphones, using an ARM bigLITTLE setup with two Cortex-A15 and two Cortex-A7 cores in a single core complex. What I found interesting here was that for the first time, I saw a bigLITTLE setup described as activating both types of cores at once (look at the diagram on the press release page). Until now, I have only seen designs that switch between only using A7 cores and only using A15s. It was just a matter of time before software matured to this point (the OS scheduler needs to understand that it is dealing with different types of cores, and behave accordingly), but still good to see it happening. I expect the same idea to be used on other mixed-core setups, would be strange if it was Renesas-specific.
This setup makes eminent sense to me. Living with multicore PCs for several years now, it is clear that having more than one core is useful for any multitasking environment. There is enough work simmering around even in a lightly-used system that using two cores clearly reduce latencies and provide a smoother experience with less locking-up of the UI. As long as nothing heavy is running, you can use the two power-efficient A7s and get a nice long battery life. What is interesting is what happens when a heavy task with high demands on latency appears. I think a good example is a web browser doing a rendering pass or running a Javascript application, or a game with heavy AI (which is pretty serial in general it seems). In this case, to get a snappy user experience you can kick in a powerful core at high frequency and get the job done quickly. Perfect, just activate one of the A15 cores and keep the rest of the background noise running on the A7s. If something really heavy comes along, you activate the second A15 core too.
So far, so good. But what about the new octacores sporting four Cortex-A15 along four Cortex-A7? To me, this seems strange. Assuming that we are able to use any combination of cores, this setup would only make sense if four A7s could somehow do a better job than a single A15 while using less power. I heard some numbers indicating that the magic difference is about 3 in performance… so in that case, just what are all these A7s there for? It would make more sense with a hexacore device – 2 A7s for background noise, and then you turn on 1 to 4 A15 cores to process progressively heavier software tasks. I have no doubt that we can imagine and invent software that can make good use of any number of powerful cores – but where is the case for a large number of weak cores in a client machine (it makes perfect sense on a server)?
Considering the existence of a reference platform from ARM with two A15s and three A7s, you can clearly mix the numbers. No need for the same number of A7s and A15s. Maybe this will change as the use of truly heterogeneous multiprocessing spreads within the ARM ecosystem, while right now the safe bet is a quad + quad setup where you switch from one cluster to another. I have no real data and can only speculate.
What I would like to see done is to run long-term CPU profiling on an all-cores-active-all-the-time bigLITLLE platform and see just how much use you get out of each type of core. Unfortunately, I do not have that right now, and maybe nothing such will be on the market for quite some time.
In conclusion, I think there is some merit to bumping up mobile processors to at least quad, and there is definitely potential in mixing fast and slow cores. Making good use of eight cores seems a bit unlikely though (unless you switch between two separate clusters). One wonders if that silicon area could not have been used for something else instead.