More from the SiCS multicore days 2008.
There were some interesting comments on how to define efficiency in a world of plentiful cores. The theme from my previous blog post called “Real-Time Control when Cores Become Free” came up several times during the talks, panels, and discussions. It seems that this year, everybody agreed that we are heading to 100s or 1000s of “self-respecting” cores on a single chip, and that with that kind of core count, it is not too important to keep them all busy at all times at any cost. As I stated earlier, cores and instructions are now free, while other aspects are limiting, turning the classic optimization imperatives of computing on its head. Operating systems will become more about space-sharing than time-sharing, and it might make sense to dedicate processing cores to the sole job of impersonating peripheral units or doing polling work. Operating systems can also be simplified when the job of time-sharing is taken away, even if communications and resource management might well bring in some new interesting issues.
So, what is efficiency in this kind of environment?
It was clear from both the panel discussion and discussions over lunch that programmer productivity and predictability are things that can be traded for absolute100% load on all cores. Just like making 100% use of main memory is not usually a design goal today, so making 100% use of all processor cores is not a reasonable the goal tomorrow. Some resources are so plentiful that it makes sense not to try to push usage to the limit.
With 100s of cores, it is quite likely that even for the most performance-demanding loads like doing LTE decoding, it is not worth the herculean effort to get all cores running at full speed all the time. Getting 80% to 90% of the cores working on a workload is probably a good tradeoff.
Another tradeoff you can make is to increase determinism and debuggability by assigning tasks and schedules in a more static and predictable way. Instead of trying to balance loads across the cores, tasks could be assigned in some static or semi-static manner, so that the execution of a system can be repeated with some chance of success. That should not be too hard if all cores run a static cyclic scheduler, for example, or even a single task on each core. Dynamic scheduling might well be a global suboptimization in a world with plenty of cores, as it just makes things more complex for a fairly small increase in actual efficiency. You could also imagine putting debug agents and code on certain cores just to help you get better insight into what the system is doing. A bit like I blogged about after last year’s Multicore Day, asking designers to put more silicon into debug functionality. Maybe in a 100s of core device, we allocate cores to debug as well (I do not think we can do without dedicated debug circuitry, as that is needed to effect things like stopping cores quickly and similar).,
When I heard this, my gut reaction was that “hey, that is not particularly environmental” — any kind of waste of resources is really an anathema to the ecologically friendly society we need to build over the next 10-20 years. But then someone pointed out that a key part of the efficiency equation is that you turn off the unused cores and accelerators so they do not use any power. And since the cores are a resource that keeps increasing in count from basically the same use of resources (manufacturing a chip will cost about the same amount of energy and materials for each chip, but with finer geometries you pack double the number of cores in it), it should be fine. It should also be noted that multicore computing by itself allows for more efficient processing units, for a variety of reasons.
Robustness also tends to increase if you have some slack in your system. For example, most hard real-time systems insist on not being more than 80% loaded or so (on a single CPU) even at the worst of tested times. To have some margin for the inevitable unexpected situations. For a 100s of cores device, you might also want to spare some cores for the case that hardware faults crop up in certain parts of the chip. Then you can shift loads to other cores (which obviously requires a pretty resilient interconnect to make any sense).
This final point bring me to my final thought on this was of building computing systems: in some way, we get closer to physical engineering habits when cores are free. We do not build bridges with the minimum amount of concrete and steel to handle the load we expect. Instead, there is a margin of error of a factor of three or five or so, to make sure that even in the most unexpected of unknown circumstances, that bridge will still stand. In a similar way, we might be able to use lots of free cores to engineer software systems that have far more resilience in them than todays systems that keep trying to make maximum use of the resource of clock cycles and instruction processing count. I do not quite know how that kind of system would look, but the analogy is very interesting.