A very interesting idea that has been bandied around for a while in manycore land is the notion that in the future, we will see a total inversion in today’s cost intuition for computers. Today, we are all versed in the idea that processor cores and processing times are quite precious, while memory is free. For best performance, you need to care about the cache system, but in the end, the goal is to keep those processor pipelines as busy as possible. Processors have traditionally been the most expensive part of a system, and ideas such as Integrated Modular Avionics are invented to make the best use of a resource perceived as rare and expensive…
But is that really always going to be true? Is it reasonably to think of CPU cores are being free but other resources as expensive? And what happens to program and system design then?
This idea was brought up again in an interview in edition 33 of TimeSys LinuxLink Radio podcast with the creators of the Propeller chip, Parallax. In this chip, they abandon interrupts and instead dedicate entire processors to various IO tasks. The cores stay in very low power mode until something interesting happen on IO pins, and then wake up and process. No interrupts, no interrupt handler, almost zero latency. Very good for real-time systems. The intuition you have just screams “why not share the IO load on a single processor”? Which is old wisdom, processors are expensive. And having a bunch of them just waiting in deep sleep for something to happen seems absolutely nonsensical. Especially as the current propeller chip only has eight cores, it seems a bit limiting and just screaming for timesharing and/or virtualization to handle a few more tasks than eight…
But is it really?
Consider the alternative design. A much more powerful processor running an operating system with all its associated overhead. For hard real-time control, what you do there is either use a static time-driven scheduler or use event-driven programs with an analysis that makes sure that you can always meet all deadlines. And these solutions have a hard time reaching 100 percent CPU usage and usually involve a fairly high interrupt latency. For really short real-time latencies, you end up vastly overprovisioning the processor in order to make sure that the cycles that an interrupt takes to process in overhead pass as quickly as possible. With caches, you either have to lock them or once again vastly overprovision the processor to make sure to meet guaranteed latencies.
The interesting question that I do not have really good numbers for is for a given problem of say ten critical real-time tasks, using ten simple cores doing one thing each or sharing ten critical tasks on a single core is the least expensive solution hardware-wise. Since it seems that in general, to make a processor ten times faster you need either ten times the clock frequency or a significantly larger die, it is not clear-cut. Especially not with a tough latency constraint.
If you factor in the cost of software development and validation of system functionality, I am pretty sure that the multiple simple cores approach is very competitive. Instead of analyzing a complex concurrent software system full of potential nasty shared resources and unknown software paths, you can handle ten small programs running on simple cores with simple timing, and with shared resources that are at least obvious from the hardware design.
There will be shared resources, be sure of that. If nothing else, the memory controller will be an issue. The propeller chip solves this by strictly temporally sharing memory access between the cores in a fixed static schedule. Predictability before all!
Note that the EU STREP project called MERASA is looking into something that could be related to this idea, with 2 to 16 cores per chip with lots of predictability piled on.
Selling such a chip as a standard part would seem to make sense as well, as most control systems are fairly low-volume. The propeller chip is very small, eight cores only, but if you look at what Ambric and Picochip are doing, it is easy to envision a controller part with twenty or a hundred little cores on. That could be really cool 🙂
If you have a high-volume part, I guess you could build that control ASIC yourself today, around something like a Tensilica pile of cores, which have been proven to scale to hundreds of cores even today in custom ASICs.
A half-way design in this style is the Freescale 5514-15-16 parts which combine a compute core with a smaller core to handle all the interrupts. If nothing else, this validates the fundamental idea of interrupts being an issue. Putting all interrupts on a secondary core could be seen as a bandaid on a bandaid… but it sure makes sense if the cores are not considered free.
Will be interesting to see what the future brings.