Timing Measurements and Security

There have been quite a few security exploits and covert channels based on timing measurements in recent years. Some examples include Spectre and Meltdown, Etienne Martineau’s technique from Def Con 23, the technique by Maurice et al from NDSS 2017, and attacks on crypto algorithms by observing the timing of execution. There are many more examples, and it is clear that measuring time, in particular in order to tell cache hits and cache misses apart, is a very useful primitive.  Thus, it seems to make sense to make it harder for software to measure time, by reducing the precision of or adding jitter to timing sources. But it seems such attempts are rather useless in practice.

[Updated 2018-01-29 with a note on ARC SEM110-120 processors]

I found a nice book chapter from early 2017, “Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript”, by Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard, which shows that useful timing estimates can be achieved pretty much by simply running code. No explicit clock sources are needed. The paper uses clever techniques to establish useful time measurements – in JavaScript, none the less – even when the precision of the provided system time source (performance.now) is radically reduced, or the API call removed entirely. In the end, it is enough to be able to have a counter that counts at a steady pace while being referenced by the code that wants a time source.

The paper shows several ways of constructing such a counter in JavaScript, where the language and run-time system does make it at least slightly difficult. If you can run basic compiled code on the machine (even as user-level code inside a virtual machine) it should be utterly trivial to do the same.

So – it will be possible to measure time even if we remove the precision of system calls that provide time, add jitter, make the result fuzzy, or even randomize the output from RDTSC. All that is required is that executing instructions have a reasonably steady pace – which is where this discussion gets really interesting.

Obfuscating Time?

A recurring idea in “Fantastic Timers” is that you just need something that repeats regularly to have a clock. It is nice if you can tie it to real time, but that is not necessary. A cache miss and a cache hit are so far apart in timing that you can spot them using abstract time (plus some clever calibration). Their techniques work sometimes because the runtime engine is rather steady in its pacing, and sometimes because the hardware makes the code run at a steady pace in a loop.

Here is where we can see how modern architectures manage to be both random and predictable at the same time.  At a large scale, I have long claimed that time-precise coding is a mostly dead practice. Hardware today has so many mechanisms in place based on speculation, dynamic behaviour, long-term execution history, etc., that trying to make a safe prediction about the execution time of a piece of code is pretty pointless.

But on the other hand… on the small scale code running on a modern processor can actually be very steady. If take a loop or loop nest, and assume that the code is in the L1 cache, the branch predictor has been taught what the code does, and the data fetch predictor is clued into the access patterns, then a rather stable execution time is the result. The processor is approaching or at the minimum time for the code. After all, the purpose of the many dynamic prediction and caching mechanisms in a processor is to eventually let the core pipeline get through useful instructions at the best possible pace.  This is designed to be analyzable, to allow optimization of tight compute algorithms.

I cannot see how we could intentionally throw this out – timing attacks could in principle be thwarted by a processor that would occasionally just do random hiccups and make instructions waste time. But such intentional hiccups would have to be on the order of level 2 or level 3 cache miss penalties to matter. They would also have to be randomly distributed at a higher level too – you don’t want to allow average over long times to defeat the mechanism, so therefore it would have to vary widely over time as well. The processor would have to incur the equivalence of TLB misses, page faults, hardware interrupts – in the middle of regular user code not suffering from any of this in practice.

Thus, obfuscating the timing of code would only be done at the cost of a lot of performance. It would make code optimization a whole lot harder, since a slow run might be caused by nothing else than bad luck in the built-in processor timing lottery. It would make validating the functionality and performance design of a processor pure hell, when you have no real idea what the timing is supposed to be for a certain sequence of instructions.

I don’t think anyone would like such an environment.  And it would not help anyway I believe.

If nothing else, since there are attacks that exploit micro-architectural features and side-effects — but that do not rely on measuring time at all. For example, there was a Black Hat USA presentation in 2015 that showed how to communicate between processes by measuring and controlling the amount of out-of-order execution going on in the processor.  The Rowhammer attack does not depend on measuring time, it comes down to analog behaviors in the hardware.

Update 2018-01-29: Just found out about the Synopsys ARC SEM-110 and SEM-120 processor cores that were launched in 2016. These processors do intentionally randomize code execution time by adding NOPs and delay loops to code that needs to be secured (not the code that might be snooping  on it).  Interesting to see someone actually do that. It should be noted however that the feature is intended to thwart “eavesdropping” side-channel attacks like measuring the power, energy, and time of critical code.  Sounds like a sensible concept, actually.

However, it has little relevance to what is happening right now in the “big” processor core space. These ARC cores are three-stage-pipeline in-order processors that do not have any speculative mechanisms to use as side channels. I guess you could still use rowhammer on them, in case they were ever attached to regular DIMMs. But they are immune to Spectre and similar by design.

Conclusion

Thus, it would seem that reducing the precision and making timing sources more jittery won’t really help with the core problem. It is probably a good idea to do this in JavaScript to make it harder to do exploits, but it is not a panacea. It appears that in the end, it is the side channels themselves that have to be suppressed. Which is not a particularly appealing statement to make, since side channels by definition are not designed into a system. They are discovered as side-effects of otherwise reasonable decisions. In the end, there is no replacement for an adversarial mind-set, and putting resources into thinking about how things can be broken, not just made to work in the first place.

One thought on “Timing Measurements and Security”

  1. Very cool. The only true way is to get back to 8086 without all the funny things around.
    Second way is Elbrus. It’ll not let you fool strides as they are unpredictable by the nature…

Leave a Reply

Your email address will not be published. Required fields are marked *