There have been quite a few security exploits and covert channels based on timing measurements in recent years. Some examples include Spectre and Meltdown, Etienne Martineau’s technique from Def Con 23, the technique by Maurice et al from NDSS 2017, and attacks on crypto algorithms by observing the timing of execution. There are many more examples, and it is clear that measuring time, in particular in order to tell cache hits and cache misses apart, is a very useful primitive. Thus, it seems to make sense to make it harder for software to measure time, by reducing the precision of or adding jitter to timing sources. But it seems such attempts are rather useless in practice.
[Updated 2018-01-29 with a note on ARC SEM110-120 processors]
So – it will be possible to measure time even if we remove the precision of system calls that provide time, add jitter, make the result fuzzy, or even randomize the output from RDTSC. All that is required is that executing instructions have a reasonably steady pace – which is where this discussion gets really interesting.
A recurring idea in “Fantastic Timers” is that you just need something that repeats regularly to have a clock. It is nice if you can tie it to real time, but that is not necessary. A cache miss and a cache hit are so far apart in timing that you can spot them using abstract time (plus some clever calibration). Their techniques work sometimes because the runtime engine is rather steady in its pacing, and sometimes because the hardware makes the code run at a steady pace in a loop.
Here is where we can see how modern architectures manage to be both random and predictable at the same time. At a large scale, I have long claimed that time-precise coding is a mostly dead practice. Hardware today has so many mechanisms in place based on speculation, dynamic behaviour, long-term execution history, etc., that trying to make a safe prediction about the execution time of a piece of code is pretty pointless.
But on the other hand… on the small scale code running on a modern processor can actually be very steady. If take a loop or loop nest, and assume that the code is in the L1 cache, the branch predictor has been taught what the code does, and the data fetch predictor is clued into the access patterns, then a rather stable execution time is the result. The processor is approaching or at the minimum time for the code. After all, the purpose of the many dynamic prediction and caching mechanisms in a processor is to eventually let the core pipeline get through useful instructions at the best possible pace. This is designed to be analyzable, to allow optimization of tight compute algorithms.
I cannot see how we could intentionally throw this out – timing attacks could in principle be thwarted by a processor that would occasionally just do random hiccups and make instructions waste time. But such intentional hiccups would have to be on the order of level 2 or level 3 cache miss penalties to matter. They would also have to be randomly distributed at a higher level too – you don’t want to allow average over long times to defeat the mechanism, so therefore it would have to vary widely over time as well. The processor would have to incur the equivalence of TLB misses, page faults, hardware interrupts – in the middle of regular user code not suffering from any of this in practice.
Thus, obfuscating the timing of code would only be done at the cost of a lot of performance. It would make code optimization a whole lot harder, since a slow run might be caused by nothing else than bad luck in the built-in processor timing lottery. It would make validating the functionality and performance design of a processor pure hell, when you have no real idea what the timing is supposed to be for a certain sequence of instructions.
I don’t think anyone would like such an environment. And it would not help anyway I believe.
If nothing else, since there are attacks that exploit micro-architectural features and side-effects — but that do not rely on measuring time at all. For example, there was a Black Hat USA presentation in 2015 that showed how to communicate between processes by measuring and controlling the amount of out-of-order execution going on in the processor. The Rowhammer attack does not depend on measuring time, it comes down to analog behaviors in the hardware.
Update 2018-01-29: Just found out about the Synopsys ARC SEM-110 and SEM-120 processor cores that were launched in 2016. These processors do intentionally randomize code execution time by adding NOPs and delay loops to code that needs to be secured (not the code that might be snooping on it). Interesting to see someone actually do that. It should be noted however that the feature is intended to thwart “eavesdropping” side-channel attacks like measuring the power, energy, and time of critical code. Sounds like a sensible concept, actually.
However, it has little relevance to what is happening right now in the “big” processor core space. These ARC cores are three-stage-pipeline in-order processors that do not have any speculative mechanisms to use as side channels. I guess you could still use rowhammer on them, in case they were ever attached to regular DIMMs. But they are immune to Spectre and similar by design.