• About Jakob Engblom and this blog
Observations from Uppsala Computer Simulation, Virtual Platforms, Embedded Programming, Multicore and More (by Jakob Engblom)

Execution Time is Random, How Useful

2011 February 13 23:49 / 6 Comments / Jakob

When I was working on my PhD in WCET – Worst-Case Execution Time analysis - our goal was to utterly precisely predict the precise number of cycles that a processor would take to execute a certain piece of code.  We and other groups designed analyses for caches, pipelines, even branch predictors, and ways to take into account information about program flow and variable values.

The complexity of modern processors – even a decade ago – was such that predictability was very difficult to achieve in practice. We used to joke that a complex enough processor would be like a random number generator.

Funnily enough, it turns out that someone has been using processors just like that.  Guess that proves the point, in some way.

I was recently introduced to the concept of the HAVEGE project – HArdware Volatile Entropy Gathering and Expansion, run at IRISA in Rennes in France from what seems to be around 2002 to 2006.  The main author, Andre Seznec, has also published in the WCET field. Today, the same idea is found nicely packaged in the HAVEGED code base for Linux, found at http://www.issihosts.com/haveged.

The idea behind HAVEGE is to run a piece of code that is designed to incur cache misses, confuse branch predictors, and generally strain the prediction mechanisms of a processor. In this way, the timing of the code will fluctuate even though it is basically straight-line code with no decision-making. These timing variations can be captured by reading a high-resolution timer such as the x86 processor’s TSC (Time Stamp Counter), or some other source that can report the execution time of a piece of code.

The key advantage of such a source of randomness is that it is easy to quickly acquire lots of randomness (or entropy in crypto language), and it is also impossible to predict the results. For cryptographic applications, this unpredictability from the perspective of an outside observer is very important, as it makes random numbers generated based on this much stronger in the face of an attack.

I think HAVEGE offers a good example of how to make lemonade from lemons.  If we conclude that processor timing cannot be predicted, consider that fact as a feature for cryptography rather than as a problem for WCET.

The first paper on HAVEGE is called “Hardware Volatile Entropy Gathering and Expansion: Generating unpredictable random numbers at user level“, IRISA internal report 1492, October 2002. It presents the core idea a little differently from later papers.  In it, they measure the cache and TLB effects on randomness, assuming the key to randomness being the effects of interrupts where OS code affect the cache and TLB entries used by the program.  An underlying assumption is that if you just run a program in isolation, the caching and speculation mechanism will converge to a good state for the program, with no or little timing variation as a result.

I wonder if that is still true on a modern machine. Their measurements were performed on a mid-1990s UltraSPARC II, which is in-order and much simpler than current Intel Core processors. Even an ARM Cortex-class processor is much more complex.  I would really like to see measurements about the inherent randomness in today’s processors, without any recourse to interrupts and hardware actions to disturb the picture.  I wonder if you would still see variations in the execution time of a body of code due to the different periods of various hardware mechanisms, or if it all converges to maximum throughput and minimal hardware latencies for all parts of the pipeline. For some reason, I have my doubts that the hardware would be that ideal in practice.

What makes the randomness of the actual hardware hard to evalutate  is that the available codebase is the HAVEGE code, which is an “expansion” of the basic HAVEG idea. The expansion being to couple a PRNG to the collection of entropy from the hardware, in order to produce much more random noise (in terms of random bits per second) than just the hardware would provide. While very practical, this also serves to obscure the fundamental randomness of the hardware from direct measurement.

Essentially, HAVEGE generates a ton of random data that appears to be of high quality in the tests provided.  But that data mixes three factors into a single measurement:

  • Hardware low-level random fluctuations (cache, pipeline, branch predictor)
  • Hardware coarse-grained variation (interrupt timing, the time taken
    to perform OS actions in response to interrupts)
  • The effectiveness of the PRNG code

Picking these three apart would be interesting, and it is a shame that there seems to be no recent evaluation of HAVEGE.

Tweet
Posted in: computer architecture, security / Tagged: Andre Szenec, HAVEG, HAVEGE, random number generation, wcet

6 Thoughts on “Execution Time is Random, How Useful”

  1. Dr. Guillem Bernat on 2011 February 17 at 19:12 said:

    This is a very interesting concept. The PROARTIS EU project (http://www.proartis-project.eu/) that started in 2009 pushes that idea much further. The aim is to design processors that have randomised timing behaviour *by design* so that only probabilistic analyis are possible.

    For example, use random cache replacement policies. This makes static prediction of which memory accesses are cache hits or misses impossible, but the behaviour can be analysed probabilistically to derive the probability distribution that a particular execution of the program will exceed X cycles with a probability lower than 10E-12 (or any arbitrary low number).

    The idea is to introduce these sort of mechanism across the whole architecture. CPU, caches, memory latencies, bus arbitration protrocols, RTOS and applications. The benefit is that even though pathological worst-case scenarios can still happen, they are provable (in the proof mathematical sense) that the probability of happening together is negligible.

    Guillem Bernat

  2. Jakob on 2011 February 17 at 21:50 said:

    That is yet another angle on randomness: as a way to smooth out large-scale variations by encouraging small-scale variations. Making hardware locally variable but globally predictable. Do you take crytographic aspects into account in the architecture?

  3. Pingback: Observations from Uppsala » Evaluating HAVEGE Randomness

  4. Guillem Bernat on 2011 February 21 at 10:42 said:

    The key issue is not only to smooth large scale variations, but to break the systematic nature of some pathological cases.

    consider the following code

    for(i=0;i<N;i++) {
    a();
    b();
    c();
    d();
    e();
    }

    In a processor with cache associativity of 4, you can have 2 very different behaviours.

    1) If the linker happens to put the code spread out across the cache, then the typical scenario of "the first iteration all functions have instruction cache misses" and then "any following iteration of the loop, all functions hit the instruction cache" applies

    2) however, there is a pathological case…

    (note to the reader. Stop here, and try to think how could it be that (1) does not apply!!!).

    The problem is that if all these functions share the same cache lines, then every execution of the loop would result in cache misses. The difference in performance is non linear (and can be actually quite brutal 5X or 10X depending on the memory access latency). This happens for some particular cache replacements policies, e.g. LRU. And similar scenarios can be build for other cache policies.

    The problem here is therefore that a "small" change in the code layout, can have a tremendous change in the execution time of the program. The second key issue is that this behaviour is systematic. Every iteration will have the same behaviour and therefore the pathological case (e.g. all executions miss the cache, will happen every time).

    By adding randomisation in the cache replacement, for example, you may lose some average case performance, but the probability of that pathological case from happening, e.g. every instruction access is a cache miss can not happen every time. It happens only with exponentially decreasing probability.

    –
    Guillem Bernat

  5. Jakob on 2011 February 27 at 22:18 said:

    @Guillem Bernat
    I have to apologize – this reply got stuck in the spam filter, and I did not see it until today. Bad of me. Thanks for the insightful reply, Bernat!

  6. Jakob on 2011 February 27 at 22:21 said:

    Guillem Bernat :
    By adding randomisation in the cache replacement, for example, you may lose some average case performance, but the probability of that pathological case from happening, e.g. every instruction access is a cache miss can not happen every time. It happens only with exponentially decreasing probability.

    Do you also randomize cache placement in some way? That could be interesting, implementing a fully associative cache with random replacement. For a case like this, after some bad luck, eventually you would expect the cache to settle down with all code mapped since that is hte only scenario not changing the cache contents :)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation

← Previous Post
Next Post →

Recent Posts

  • Military Science Fiction – The Books Blur Together
  • Wind River Blog: Starting & Configuring Simics
  • Wind River Blog:
  • Nudge Theory and Graphical User Interfaces
  • Wind River Blog: Collaborating with Recording Checkpoints
  • Wind River Blog: Simics 4.8 is Here
  • A Few Electrons too Many
  • Wind River Blog: Visuality NQ CIFS Server on Simics
  • Everything in the Cloud?
  • Wind River Blog: TCF and Simics
  • Off-Topic: Moving Bad Piggies Save Games
  • Two Cores, Four Cores, Eight Cores – Mobile Variety
  • Bliss: Failing to Pivot for Ideology
  • Wind River Blog and Movie: Demo of Simics Debugging
  • Simulation vs Reality in Schlock Mercenary

Categories

  • appearances (30)
  • articles (21)
  • blogging (10)
  • books (7)
  • business issues (31)
  • computer architecture (35)
  • conferences (34)
  • EDA (50)
    • ESL (35)
  • embedded (78)
    • embedded software (57)
    • embedded systeme (50)
  • general research (6)
  • history (32)
    • general history (7)
    • history of computing (26)
  • off-topic (94)
    • biking (5)
    • board games (1)
    • computer games (3)
    • desktop software (35)
    • food and drink (1)
    • funny (12)
    • gadgets (24)
    • Politics (3)
    • popular culture (5)
    • trains (5)
    • transportation (10)
    • travel (10)
    • websites (3)
  • parallel computing (92)
    • multicore computer architecture (51)
    • multicore debug (22)
    • multicore software (65)
  • programming (109)
  • review (8)
  • security (19)
  • teaching (7)
  • testing (9)
  • uncategorized (12)
  • virtual things (131)
    • computer simulation technology (68)
    • virtual machines (18)
    • virtual platforms (99)
    • virtualization (14)
  • Wind River Blog (43)

Tags

ARM blog commentary Cadence Checkpointing clock-cycle models Communications of the ACM computer architecture conference cycle accuracy debugging Domain-specific languages eclipse embedded freescale G900 heterogeneous homogeneous IBM Intel iPod lego linux mobile phones multicore off-topic office 2007 operating systems p4080 podcast commentary power architecture rant research reverse debugging reverse execution S4D SiCS Multicore days Simics simulation software tools Sun SystemC video virtualization Vista Windows

1

  • F-Secure Blog

Blogs and news

  • Andras Vajda's blog (on multicore)
  • Embedded in Academia (John Regehr)
  • Grant Martin
  • Jack Ganssle
  • My Wind River Blog
  • Security Now podcast
  • Secworks (Joachim Strömbergson)
  • Simon Kågström
  • Synopsys View from the Top
  • Worse Than Failure

Archives

  • June 2013 (3)
  • May 2013 (4)
  • April 2013 (1)
  • March 2013 (4)
  • February 2013 (1)
  • January 2013 (3)
  • December 2012 (2)
  • November 2012 (2)
  • October 2012 (1)
  • September 2012 (6)
  • August 2012 (4)
  • July 2012 (4)
  • June 2012 (3)
  • May 2012 (4)
  • April 2012 (2)
  • March 2012 (3)
  • February 2012 (1)
  • January 2012 (6)
  • December 2011 (2)
  • November 2011 (3)
  • October 2011 (4)
  • September 2011 (5)
  • August 2011 (4)
  • July 2011 (3)
  • June 2011 (4)
  • May 2011 (7)
  • April 2011 (1)
  • March 2011 (3)
  • February 2011 (5)
  • January 2011 (1)
  • December 2010 (4)
  • November 2010 (3)
  • October 2010 (5)
  • September 2010 (5)
  • August 2010 (5)
  • July 2010 (6)
  • June 2010 (5)
  • May 2010 (3)
  • April 2010 (4)
  • March 2010 (3)
  • February 2010 (4)
  • January 2010 (7)
  • December 2009 (6)
  • November 2009 (6)
  • October 2009 (7)
  • September 2009 (6)
  • August 2009 (7)
  • July 2009 (11)
  • June 2009 (5)
  • May 2009 (10)
  • April 2009 (7)
  • March 2009 (8)
  • February 2009 (9)
  • January 2009 (12)
  • December 2008 (8)
  • November 2008 (9)
  • October 2008 (9)
  • September 2008 (10)
  • August 2008 (13)
  • July 2008 (12)
  • June 2008 (8)
  • May 2008 (9)
  • April 2008 (10)
  • March 2008 (7)
  • February 2008 (8)
  • January 2008 (5)
  • December 2007 (5)
  • November 2007 (7)
  • October 2007 (7)
  • September 2007 (12)
  • August 2007 (9)
  • July 2007 (2)
© Copyright 2013 - Observations from Uppsala
Infinity Theme by DesignCoral / WordPress