Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
It is common practice in multicore computing today to dynamically change the clock frequency of a processor and turn cores on and off in order to adjust the compute power available to the current workload. Such operations tend to be limited in scope, as processors have minimum clock frequencies that make sense, and often the memory system requires all cores to be at the same frequency. Operating systems also tend to want to work with homogeneous sets of cores, as that makes scheduling reasonably straight-forward. This is probably what has kept the idea of “small + large” cores of the same ISA out of the mainstream of SMP design, despite all its advantages in principle.
Now, Nvidia has managed to implement some of that idea in Kal-El.
The key observation is that if you can turn cores on and off, once you get down to a single active core, any system is by definition homogeneous across all cores regardless of what that core is. Changing the nature of this core should then be much easier, since there is only a single core to contend with.
What Nvidia does in Kal-El is to add a fifth low-power core to the main group of four high-performance cores. The fifth core is architecturally identical (ARM Cortex-A9), so that the system state can be moved from the high-performance to the low-performance cores without undue complexities. Indeed, this is all done in hardware, so the OS (typically, Android) thinks it is running on a homogeneous quad-core. When the system is lightly loaded and the OS decides to only have a single core on, the hardware can detect the load is really light, and effectively change the nature of the active core to a low-power-optimized version.
Once more compute power is needed, the hardware invisible slips back to the first high-power core, and then the OS can start increasing clocks and turning on cores as usual. It is effectively the same as a regular ARM Cortex-A9 quad-core setup, but with better low-power performance. The following graph from the Nvidia white paper shows it pretty clearly (red text is my added comment):
Note the slope of the green line: that core is not a good one if you want high performance. It is optimized to scale within a range of low compute-power requirements, rather than provide the best performance per watt at the high end. Using Variable SMP, Nvidia lets us have both.
- ArsTechnica has a short summary
- There does not seem to be much more right now, everyone is really just reiterating the points from the white paper.