When I was working on my PhD in WCET – Worst-Case Execution Time analysis - our goal was to utterly precisely predict the precise number of cycles that a processor would take to execute a certain piece of code. We and other groups designed analyses for caches, pipelines, even branch predictors, and ways to take into account information about program flow and variable values.
The complexity of modern processors – even a decade ago – was such that predictability was very difficult to achieve in practice. We used to joke that a complex enough processor would be like a random number generator.
Funnily enough, it turns out that someone has been using processors just like that. Guess that proves the point, in some way.
I was recently introduced to the concept of the HAVEGE project – HArdware Volatile Entropy Gathering and Expansion, run at IRISA in Rennes in France from what seems to be around 2002 to 2006. The main author, Andre Seznec, has also published in the WCET field. Today, the same idea is found nicely packaged in the HAVEGED code base for Linux, found at http://www.issihosts.com/haveged.
The idea behind HAVEGE is to run a piece of code that is designed to incur cache misses, confuse branch predictors, and generally strain the prediction mechanisms of a processor. In this way, the timing of the code will fluctuate even though it is basically straight-line code with no decision-making. These timing variations can be captured by reading a high-resolution timer such as the x86 processor’s TSC (Time Stamp Counter), or some other source that can report the execution time of a piece of code.
The key advantage of such a source of randomness is that it is easy to quickly acquire lots of randomness (or entropy in crypto language), and it is also impossible to predict the results. For cryptographic applications, this unpredictability from the perspective of an outside observer is very important, as it makes random numbers generated based on this much stronger in the face of an attack.
I think HAVEGE offers a good example of how to make lemonade from lemons. If we conclude that processor timing cannot be predicted, consider that fact as a feature for cryptography rather than as a problem for WCET.
The first paper on HAVEGE is called “Hardware Volatile Entropy Gathering and Expansion: Generating unpredictable random numbers at user level“, IRISA internal report 1492, October 2002. It presents the core idea a little differently from later papers. In it, they measure the cache and TLB effects on randomness, assuming the key to randomness being the effects of interrupts where OS code affect the cache and TLB entries used by the program. An underlying assumption is that if you just run a program in isolation, the caching and speculation mechanism will converge to a good state for the program, with no or little timing variation as a result.
I wonder if that is still true on a modern machine. Their measurements were performed on a mid-1990s UltraSPARC II, which is in-order and much simpler than current Intel Core processors. Even an ARM Cortex-class processor is much more complex. I would really like to see measurements about the inherent randomness in today’s processors, without any recourse to interrupts and hardware actions to disturb the picture. I wonder if you would still see variations in the execution time of a body of code due to the different periods of various hardware mechanisms, or if it all converges to maximum throughput and minimal hardware latencies for all parts of the pipeline. For some reason, I have my doubts that the hardware would be that ideal in practice.
What makes the randomness of the actual hardware hard to evalutate is that the available codebase is the HAVEGE code, which is an “expansion” of the basic HAVEG idea. The expansion being to couple a PRNG to the collection of entropy from the hardware, in order to produce much more random noise (in terms of random bits per second) than just the hardware would provide. While very practical, this also serves to obscure the fundamental randomness of the hardware from direct measurement.
Essentially, HAVEGE generates a ton of random data that appears to be of high quality in the tests provided. But that data mixes three factors into a single measurement:
- Hardware low-level random fluctuations (cache, pipeline, branch predictor)
- Hardware coarse-grained variation (interrupt timing, the time taken
to perform OS actions in response to interrupts)
- The effectiveness of the PRNG code
Picking these three apart would be interesting, and it is a shame that there seems to be no recent evaluation of HAVEGE.