I previously blogged about the HAVEGE algorithm that is billed as extracting randomness from microarchitectural variations in modern processors. Since it was supposed to rely on hardware timing variations, I wondered what would happen if I ran it on Simics that does not model the processor pipeline, caches, and branch predictor. Wouldn’t that make the randomness of HAVEGE go away?
I got HAVEGE up on a Simics x86 target model with Linux pretty quickly, and ran the two provided tests. Ent, which is a quick entropy test, and nist which supposedly much more thorough.
To my surprise, they both said the randomness we got was totally acceptable. This would seem to invalidate the fundamental assumption of HAVEGE – that it needs to collect randomness from hardware in order to produce good-quality randomness. To try to understand a bit more of what was going on, I took at look at the execution using Simics Analyzer (the dredd.motherboard.processor lines are the processors, and the orange part is the HAVEGE program, yellow is the kernel):
Zooming in a bit:
We can see that the program is regularly interrupted by the OS, which could be a reason for random timing variations. The instructions run by the OS should vary in count, which would disturb the time stamp counter values read by the HAVEGE program. That could be sufficient to cause random variations, essentially showing that HAVEGE really works well just from OS noise – even in an otherwise idle machine.
However, at this point I started to have my doubts. Something did not feel right.
So I tried to remove all variations from the HAVEGE program. I replaced the “HARDTICKS” macro in HAVEGE with the constant 0 (zero) rather than reading the time stamp counter of the processor. This immediately failed the randomness test.
However, when I used the constant 1 (one) instead, the ent test passed. And even nist almost passed with only a single missed test out of the 426 tests executed.
Thus, the conclusion is that we do not know how well HAVEGE ‘s collection of hardware randomness works, since the evaluation software is too weak. In essence, we do not know if the collection of hardware randomness matters or not, as the proposed measurement hides the randomness behind a pretty good PRNG algorithm.
Ideally, we would need a measurement that would evaluate the predictability of the randomness generated. Or at least one that can correctly estimate the impact of the variation of low-level hardware timing on the quality of the final random numbers. Unfortunately, that is not the case here, throwing the entire idea into doubt.