• About Jakob Engblom and this blog
Observations from Uppsala Computer Simulation, Virtual Platforms, Embedded Programming, Multicore and More (by Jakob Engblom)

Simulation Determinism: Necessary or Evil?

2009 April 19 21:36 / 1 Comment / Jakob

gearsIn my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning “unchanging” or “constant” behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.

What?

Determinism really means this:

  • Given a certain initial state
  • And a certain sequence of external inputs
  • The end result and state of the simulation will always be the same

The key to note is that you need to require both the starting state and the sequence of external inputs to be the same in order to get the same result. If either of these change, you can well get a different result. Implementing a deterministic simulator requires all internal events and activities in the simulator to be performed in the same order and at the same time in each simulation run. It means that the host computer environment state cannot be allowed to affect the simulator execution, and that in turn means that all sorting of internal events have to be done in defined orders in all instances.

I have a story about how hard that can be in practice. I once talked to some compiler developers who had the issue that when recompiling the same program with the same set of compiler options, the results might come out different, even on the same machine. The problem was that each run of the compiler was done in a different overall system state, and this might affect how the OS memory allocation functions allocated items in memory. It turned out that in some cases, the precise value of the pointers to the items in a complex data structure were used by standard libraries to handle iteration over nodes in the data structures. Thus, a different memory allocation pattern gave a different iteration order and a different traversal order of nodes, and in the end an almost arbitrarily different result. The correct solution they had to implement was to use a defined lexical ordering to traverse and iterate, not anything dependent on the state of the host machine. It is nothing different in a simulator: define the order of everything, in order to be deterministic.

Why?

The crucial benefit that determinism brings to a simulation in general and a virtual platform in particular is repeatable debugging. With determinism and an appropriate recording mechanism (and most practically checkpointing) you can rely on being able to repeat a run resulting in a bug any number of times with the precise same sequence of events in the simulation. In particular, the same sequence and timing and timing relative to instructions executed for events visible to and relevant for the software running on the virtual platform. Especially for multicore and parallel computing systems this is incredibly powerful, and something that just cannot be achieved on physical hardware (due to its inherent randomness and chaotic behavior, see my 2006 and 2007 ESC Silicon Valley talks for more on this, at my publications and presentations pages).

If you assume stability of the simulation infrastructure and the simulation platform, determinism also makes debugging the simulation itself easier. Often, a bug in a simulation model is repeatable, and with determinism, it is easy to repeat the same external stimulus sequence to the module and debug it repeatably.

Determinism also makes it easy to detect change in the behavior of a simulation: if the same simulation setup results in a different result or final simulation state, you know something in the setup (model, model parameters, or software) changed. There is no randomness that cause changes without some fundamental parameter being changed. Such boring reliable behavior is generally exactly what you want when testing and debugging large, complex systems.

Obviously, once determinism becomes a requirement, missing determinism in a model is a bug in itself — and finding such bugs can certainly be interesting exercises.

Why Not?

Just like for checkpointing, one reason not do to determinism is that it is hard, as discussed above.

The most common reason that people claim to want to avoid determinism is that they want to explore alternatives within their simulation. Basically, there is a need for variability that would seem to be at odds with determinism. The typical argument is that “if my simulation model contains a non-deterministic choice, I want the simulation to expose that and not just make the same decision every time”. This is where determinism tends to be considered evil. However, this argument is not correct.

If we take the case that at some point P in a simulation run there are two different events E and F that can fire (since they are both posted to the same point in virtual time), a deterministic simulator will always select one and the same. This is necessary to reap the system-level benefits discussed above. However, nothing prevents us from programming a change from this behavior into our system explicitly, introducing controlled and repeatable variation. In such a setup, we will have a random decision being made in each simulation run, but one where the outcome in any particular run can be repeated by setting the same random seed parameter.

This brings the best of both worlds: variation to expose issues where there is potential non-determinism or lack of synchronization in the model, and perfect repeatability of the issues this poses in terms of target software and simulation system behavior. The reason for the simultaneous readiness can be considered to be lacking synchronization in the model, in general, and such a randomizer of behavior will expose that at several different levels. But uncontrolled randomness is not the answer.

Another common misconception is that at a higher level, determinism in a virtual platform means that target software will always run in the same way. That is not true, and misses the importance of state in the deterministic behavior equation. If the initial state when a program starts is different, a different execution will result. If software is run on top of any non-trivial operating system, there is plenty of such variation. In one of our simplest Simics demos, we show this by running an intentionally buggy race-condition-ridden program. Each time it is run, it hits a different number of race conditions. But thanks to determinism (best demoed using reverse execution), we can repeat each run perfectly.

Thus, determinism is not equal to constant behavior or lack of variation.

The reverse argument

Finally, determinism is the simplest way to implement reverse execution: if you have recording, determinism, and checkpointing, you can easily virtually reverse the execution by going back to a checkpoint and replay the execution from that point. If you stop one instruction before the current instruction, you have in essence stepped backwards one step in time. This is how both VMWare and Simics implement reverse execution and debugging. And it could not happen without determinism.

Tweet
Posted in: computer simulation technology, multicore debug, programming, virtual platforms / Tagged: debugging, determinism, multicore, repeatability, reverse execution, Simics, VMWare

One Thought on “Simulation Determinism: Necessary or Evil?”

  1. Pingback: Observations from Uppsala » Book review: ZX Spectrum BASIC

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation

← Previous Post
Next Post →

Recent Posts

  • Wind River Blog: Simics 4.8 is Here
  • A Few Electrons too Many
  • Wind River Blog: Visuality NQ CIFS Server on Simics
  • Everything in the Cloud?
  • Wind River Blog: TCF and Simics
  • Off-Topic: Moving Bad Piggies Save Games
  • Two Cores, Four Cores, Eight Cores – Mobile Variety
  • Bliss: Failing to Pivot for Ideology
  • Wind River Blog and Movie: Demo of Simics Debugging
  • Simulation vs Reality in Schlock Mercenary
  • Programming like Lego
  • Does ISA Matter for Performance?
  • Wind River Blog: Debugging Simics using Simics
  • Wind River Blog: Simics and Flying Piggies
  • Dragons can be Useful – when AT Models Make Sense

Categories

  • appearances (30)
  • articles (21)
  • blogging (10)
  • books (6)
  • business issues (31)
  • computer architecture (35)
  • conferences (34)
  • EDA (50)
    • ESL (35)
  • embedded (78)
    • embedded software (57)
    • embedded systeme (50)
  • general research (6)
  • history (32)
    • general history (7)
    • history of computing (26)
  • off-topic (94)
    • biking (5)
    • board games (1)
    • computer games (3)
    • desktop software (35)
    • food and drink (1)
    • funny (12)
    • gadgets (24)
    • Politics (3)
    • popular culture (5)
    • trains (5)
    • transportation (10)
    • travel (10)
    • websites (3)
  • parallel computing (92)
    • multicore computer architecture (51)
    • multicore debug (22)
    • multicore software (65)
  • programming (107)
  • review (8)
  • security (19)
  • teaching (7)
  • testing (9)
  • uncategorized (12)
  • virtual things (129)
    • computer simulation technology (68)
    • virtual machines (17)
    • virtual platforms (98)
    • virtualization (14)
  • Wind River Blog (40)

Tags

ARM blog commentary Cadence Checkpointing clock-cycle models Communications of the ACM computer architecture conference cycle accuracy debugging DML Domain-specific languages embedded freescale G900 heterogeneous homogeneous IBM Intel iPod lego linux mobile phones multicore off-topic office 2007 operating systems p4080 podcast commentary power architecture rant research reverse debugging reverse execution S4D SiCS Multicore days Simics simulation software tools Sun SystemC video virtualization Vista Windows

1

  • F-Secure Blog

Blogs and news

  • Andras Vajda's blog (on multicore)
  • Embedded in Academia (John Regehr)
  • Grant Martin
  • Jack Ganssle
  • My Wind River Blog
  • Security Now podcast
  • Secworks (Joachim Strömbergson)
  • Simon Kågström
  • Synopsys View from the Top
  • Worse Than Failure

Archives

  • May 2013 (2)
  • April 2013 (1)
  • March 2013 (4)
  • February 2013 (1)
  • January 2013 (3)
  • December 2012 (2)
  • November 2012 (2)
  • October 2012 (1)
  • September 2012 (6)
  • August 2012 (4)
  • July 2012 (4)
  • June 2012 (3)
  • May 2012 (4)
  • April 2012 (2)
  • March 2012 (3)
  • February 2012 (1)
  • January 2012 (6)
  • December 2011 (2)
  • November 2011 (3)
  • October 2011 (4)
  • September 2011 (5)
  • August 2011 (4)
  • July 2011 (3)
  • June 2011 (4)
  • May 2011 (7)
  • April 2011 (1)
  • March 2011 (3)
  • February 2011 (5)
  • January 2011 (1)
  • December 2010 (4)
  • November 2010 (3)
  • October 2010 (5)
  • September 2010 (5)
  • August 2010 (5)
  • July 2010 (6)
  • June 2010 (5)
  • May 2010 (3)
  • April 2010 (4)
  • March 2010 (3)
  • February 2010 (4)
  • January 2010 (7)
  • December 2009 (6)
  • November 2009 (6)
  • October 2009 (7)
  • September 2009 (6)
  • August 2009 (7)
  • July 2009 (11)
  • June 2009 (5)
  • May 2009 (10)
  • April 2009 (7)
  • March 2009 (8)
  • February 2009 (9)
  • January 2009 (12)
  • December 2008 (8)
  • November 2008 (9)
  • October 2008 (9)
  • September 2008 (10)
  • August 2008 (13)
  • July 2008 (12)
  • June 2008 (8)
  • May 2008 (9)
  • April 2008 (10)
  • March 2008 (7)
  • February 2008 (8)
  • January 2008 (5)
  • December 2007 (5)
  • November 2007 (7)
  • October 2007 (7)
  • September 2007 (12)
  • August 2007 (9)
  • July 2007 (2)
© Copyright 2013 - Observations from Uppsala
Infinity Theme by DesignCoral / WordPress