<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; VMWare</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/vmware/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Reverse History Part Three &#8211; Products</title>
		<link>http://jakob.engbloms.se/archives/1564?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1564#comments</comments>
		<pubDate>Sun, 08 Jan 2012 19:51:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[Green Hills]]></category>
		<category><![CDATA[Lauterbach]]></category>
		<category><![CDATA[Multi]]></category>
		<category><![CDATA[reverse debug]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[TotalView]]></category>
		<category><![CDATA[UndoDB]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1564</guid>
		<description><![CDATA[In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. Part one of this series provided a background on the technology and part two [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. <a href="http://jakob.engbloms.se/archives/1547">Part one </a>of this series provided a background on the technology and <a href="http://jakob.engbloms.se/archives/1554">part two </a>discussed various research papers on the topic going back to the early 1970s. The first commercial product featuring reverse debugging was launched in 2003, and then there have been a steady trickle of new products up until today.</p>
<p><span id="more-1564"></span></p>
<p><strong>2003</strong>. The embedded tools company Green Hills launched their<br />
<a href="http://www.ghs.com/news/20030930_best_of_show.html">Time Machine</a> feature in their well-known MULTI debugger. I consider this the start of commercial reverse debugging, as it was the first<br />
commercial-grade product to include reverse debugging. The implementation was based on tracing the execution of a program on actual hardware, using a debug probe and a &#8220;JTAG&#8221; debug interface. The trace box would capture several gigabytes of execution data, and then the debugger performed operations based on this trace. To check a backwards breakpoint, you scan back over the trace until you find a matching state or operation (such a memory access or instruction address that is being executed). The main limitation of the method is that the trace buffer can only capture a few seconds of execution on a typical 100s of MHz embedded processor. It only works for a single processor, and it does not capture IO actions (except as memory-mapped IO). It is system-level and cross-target.</p>
<p>Extending this kind of trace to multicore has proven hard, since getting a synchronized trace out of several processors is very hard. There might be debug hardware coming out in the next few years that can indeed support a time-stamped consistent trace of multiple cores, and with such hardware, the Time Machine approach could well be extended into multicore.</p>
<p><strong>2005</strong>. <a href="http://www.windriver.com/products/simics/">Simics </a>3.0 was launched by Virtutech (later acquired by Wind River and Intel) with full-system reverse execution and reverse debugging. The Simics approach was also unique, being based on a full-system simulator. By simulating the entire target, it is trivial to reverse (and put reverse breakpoints on) changes to memory, persistent storage like disks, and hardware devices. Since all device models in the simulator are deterministic in their implementation, re-executing hardware events like interrupts and IO outputs is just as easy as re-executing code on the main processor, something that had eluded all previous approaches. Recording is used at the interface between the simulator and the outside world, such as user interaction over graphics displays and serial ports and connections to the real-world network. The software stack is unmodified and system-level, and the simulator can handle multiple processors and even multiple machines in a network as a unit. The use case is normally cross-target (even if a system identical to the host can be simulated, it would work like a cross target logically). Time is handled by counting clock cycles on all processors in the system, and reverse debugging can position the simulation at any point in time based on the virtual time.</p>
<p>There is a cost in execution speed from simulation rather than direct execution, and an intrusion effect from running on a simulator rather than on a physical machine. This affects the <a href="http://jakob.engbloms.se/archives/97">timing of events</a>, even with a software stack that is not modified. Still, the fact that you can run a complete real software stack with no modifications needed before starting to run the target system is fairly rare in the world of reverse debuggers.<strong></strong></p>
<p><strong></strong>Simics shipped with a modified gdb that talked gdb serial to Simics and accessed reverse execution with some new debugger commands as well as extensions to the gdb serial protocol. This was offered to the gdb community, but not accepted. However, prompted by this, the gdb community started to discuss reverse execution. Some interesting old threads can still be found, such as <a href="http://sourceware.org/ml/gdb/2005-05/msg00225.html">http://sourceware.org/ml/gdb/2005-05/msg00225.html</a>. Clearly, at that point in time Virtutech did not really explain how Simics worked, and there were some pretty bad proposals floated in the community for how to do reverse. In the end, the gdb serial design did turn out in the right way, assuming<br />
the remote debugger would reverse itself and <a href="http://sourceware.org/ml/gdb/2005-05/msg00235.html">gdb would just ask it to do so</a>. This separation of concerns is important to creating practical reverse debugging solutions that can use any debugger backend.</p>
<p><strong>2005</strong>. Also in 2005, Lauterbach launched the <a href="http://www.lauterbach.com/cts.html">Context Tracking System, CTS</a>. Lauterbach is a big player in the embedded debug market, with their TRACE32 debugger. CTS can be seen as their reply to the Time Machine debugger. CTS is also based on a trace from a hardware unit or from an instruction-set simulator. However, from the available information is also appears to be more limited &#8211; you can step back and go back in time and replay forward, but there is no mention of actual backwards breakpoints (even today, six years later).  Thus, I count this as record-replay rather than reverse debug. It is cross-target, system-leve, and uniprocessor like Time Machine.</p>
<p><strong>2006</strong>. <a href="http://undo-software.com/undodb_about.html">Undo Software </a>launched the first Linux-targeting host-based reverse debugger, <a href="http://undo-software.com/pressrelease-1.html">UndoDB</a>. It is described as a <em>bidirectional</em> debugger (the same terminology as the Boothe 2000 PLDI paper). It is user-level, does do reverse breakpoints (and data breakpoints, also known as watchpoints, which is really useful). It handles multiple threads (at least in 3.0), but from the description of the recording technology used I believe they have to serialize their execution. The implementation is based on checkpoint and re-execution, with recording of all non-deterministic events like IO. There is a feature to move to a certain point in time, based on &#8220;simulated nanoseconds&#8221;. These are not really nanoseconds, but values which are guaranteed to increase even between two instructions (which probably means that they are sub-nanoseconds and on a &gt; 1GHz CPU single-cycle instructions will indeed take less than one nanosecond).</p>
<p>There is a nice description of how it works on their <a href="http://www.undo-software.com/undodb-gdb.1.html">online man page</a>. It is worth noting that they call it &#8220;gdb&#8221;, but the command set is distinct from what gdb introduced with its reverse execution in 2009. They use the &#8220;b&#8221; prefix for backwards commands rather than &#8220;r&#8221; for reverse.  In some way, UndoDB is in direct competition with the gdb reverse target, but it is much much faster and has more features.</p>
<p><strong>2008</strong>. The Rogue Wave (at the time, it was an independent company) TotalView debugger gained support for reverse debugging, with the <a href="http://www.roguewave.com/products/totalview-family/replayengine.aspx">ReplayEngine </a>add-on. TotalView is an old mainstay in the HPC market, having been around since <a href="https://computing.llnl.gov/tutorials/totalview/#Overview">at least 1993</a>. Indeed, it was developed initially for the <a href="http://en.wikipedia.org/wiki/BBN_Butterfly">BBN Butterfly computer</a>, and thus it might have had a touch with reverse execution as far back as the 1987 paper cited in my <a href="http://jakob.engbloms.se/archives/1554">previous blog post</a>.</p>
<p>Judging from <a href="http://www.roguewave.com/documents.aspx?Command=Core_Download&amp;EntryId=739">the available materials</a>, TotalView can clearly can step back in various ways. However, it is not clear that it triggers breakpoints when going backwards. Thus, it has to count as record-replay debugging rather than reverse debugging. The base of the implementation is extensive instrumentation of the the runtime system of the target computer.  The implementation builds on the fact that the target programs tend to b clustered programs that use MPI to communicate &#8211; and thus a large part of the communication between threads is explicit and easily intercepted and recorded.  There is also an existing infrastructure of checkpoint and restart for parallel programs using MPI to support fault tolerance that was used as the base of the implementation.  Finally, in a slightly ugly hack, they make each multi-threaded program run on a single processor by a big lock. In this way, all that needs to be replayed is the interleaving of threads on a single processor, a far more tractable problem compared to trying to replicate a true parallel execution in a new session.</p>
<p><strong>2008</strong>. VmWare officially launched a record-replay debugger based on their virtual machine technology with <a href="http://www.replaydebugging.com/2008/08/vmware-workstation-65-reverse-and.html">VmWare Workstation 6.5</a>. Single-processor, system-level (but really only supported for user-level debugging), cross target (since the VM is not really the absolutely same hardware as the host), time model is based on the virtual machine which I believe is cycles-based. Mostly used for record-replay debug of non-deterministic software bugs, but could also do reverse debugging including reverse data breakpoints. Based on snapshot and deterministic re-execution, plus recording of all non-deterministic device accesses (not all devices in the VmWare hardware emulation layer are deterministic). Going back to a snapshot was a very heavy operation (I tried it) since you had to restore the entire target memory (quickly got into gigabytes). The hardware supported in the VM was quite limited, and things like CD-ROMs and floppies could not be part of a record/replay session. Replay logs could be moved between hosts.</p>
<p>The VmWare reverse debug functionality was removed from VmWare workstation version 8 in 2011, since it required a large investment and was not apparently used by very many VmWare users. This indicates that trying to build developer-oriented functionality into a technology base that is fundamentally driven by the need of deployed virtual machines was hard. There are contradictions between these two goals, as the determinism and control needed for a good reverse debugger is not necessarily consistent with maximum performance for running virtual machines in a production setting.</p>
<p><strong>2009</strong>. gdb 7.0 added support for reverse execution (a work that began in 2006). The built-in &#8220;record&#8221; target supports reverse debugging on user-level single-threaded programs on the same host. The command set for reverse debugging is fairly full-featured, but is a bit quirky with a &#8220;<a href="http://sourceware.org/gdb/news/reversible.html">set direction</a>&#8221; command that makes regular run-control commands work in reverse. The record technology is quite slow since it basically records the effect of each and every instruction run in the program.</p>
<p>In addition to its built-in target, gdb can also control external reversible debug systems over the gdb serial protocol. This made the changes to gdb-serial created by Virtutech for Simics in 2005 part of the mainline gdb release. <a href="http://sourceware.org/gdb/news/reversible.html">Several tools support the command set</a>, including VmWare, UndoDB, and Simics. There was also a set of MI commands added to basically let Eclipse use gdb as a backend for reverse debug, including using it to control external tools via gdb-serial. How this happened is quite a long story, and I made a small contribution to the gdb code base myself in the process. Read about this <a href="http://jakob.engbloms.se/archives/1065">here</a>.</p>
<p><strong>2009</strong>. Eclipse CDT added support for <a href="(http://www.eclipse.org/community/training/webinars/090526_CDT_Webinar.pdf">reverse execution</a>, using gdb 7.0 reverse as the initial backend. As noted above, this lets Eclipse also use other reverse debugging backends (Eclipse uses the gdb-MI interface to gdb to control the debug session). This is noteworthy since it meant that the buttons to control reverse execution are now part of the CDT, making it much easier to use Eclipse to build a frontend to any reversible backend. Eclipse is not really a debugger, just an interface to a debugger.</p>
<p><strong>2009</strong>. Microsoft Visual Studio <a href="http://blogs.msdn.com/b/ianhu/archive/2009/05/13/historical-debugging-in-visual-studio-team-system-2010.aspx">got record-replay debugging with IntelliTrace</a>. It is strictly about replay debugging, including the nice ability to send traces around between developers. There are no backwards breakpoints. The support is limited to programs running on top of the .net runtime system, meaning that <a href="http://msdn.microsoft.com/en-us/library/dd264915.aspx">it does not apply to classic Windows software</a>. Using the CLR virtual machine as the implementation basis should make the implementation easier, cleaner, and faster compared to a machine-level native solution. It is user-level, single-threaded, and host-based. Time concept is unknown.</p>
<p><strong>2011</strong>. Adobe demonstrated (not launched) reverse debugging in their Flash Builder programming environment. A <a href="http://tv.adobe.com/watch/max-2011-sneak-peeks/max-2011-sneak-peek-reverse-debugging-in-flash-builder/">nice video is posted on the Adobe website</a>. Seems to be based on the virtual machine that flash runs on, and includes what looks like pretty powerful backwards data analysis tools. In a <a href="http://anirudhsasikumar.net/blog/2011.12.15.html">blog post</a>, the developer describes some of the features, which to me seem to indicate some pretty heavy recording.</p>
<p><strong>Final notes.</strong>In researching these commercial tools, there also seems to be a lost one. A company called Visicomp launched a Java debugger called RetroVue in 2002 which supposedly did allow backwards debugging in some way. However, it seems that this tool was not really practical, being too slow for actual use. It seems to have disappeared since without anyone picking up its legacy. The technology was apparently pretty much like the Omniscient Debugger presented in 2003 and which I described in the <a href="http://jakob.engbloms.se/archives/1554">blog post on reverse execution research</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1564"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1564" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1564" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1564/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Driving an Old Canon Scanner using a VM</title>
		<link>http://jakob.engbloms.se/archives/842?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/842#comments</comments>
		<pubDate>Wed, 15 Jul 2009 18:43:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Canon]]></category>
		<category><![CDATA[LIDE30]]></category>
		<category><![CDATA[scanner]]></category>
		<category><![CDATA[USB]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Vista]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[XP]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=842</guid>
		<description><![CDATA[I have an old Canon LIDE 30 scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-843" style="margin-left: 5px; margin-right: 5px;" title="lide30" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/lide30.gif" alt="lide30" width="100" height="67" />I have an old <a href="http://www.canon-europe.com/For_Home/Product_Finder/Scanners/Flatbed/LIDE30/index.asp">Canon LIDE 30 </a>scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny way around this though, using a virtual machine.</p>
<p><span id="more-842"></span>What I ended up doing to keep using my scanner (whose hardware is still very much intact and solid) is fairly obvious: I installed my old Windows XP license on a VMWare virtual machine (I had the good luck to have a full license with physical media), and then install the Canon LIDE30 driver on that virtualized XP.</p>
<p>VMWare Player is sufficient to let me attach the physical scanner to the virtual machine&#8217;s USB interface, and drive it without the host Vista 64 machine being any the wiser. To get the scanned pictures out, I have to resort to drag-and-drop, as I have failed to get shared folders to work with Player for some unknown reason.</p>
<p>The end result can be pretty complex&#8230; To send some emails from my work computer including scans with this scanner, I had to:</p>
<ul>
<li> Scan on the virtual XP machine</li>
<li>Drag-and-drop to the Pictures folder on my Vista 64 machine</li>
<li>Use file-sharing in Windows to move to my work laptop</li>
<li>Attach in Outlook</li>
</ul>
<p>Workable. It is also a pretty good demo of the power afforded by modern consumer operating systems. Imagine trying to do that in 1995&#8230; would not have been quite as fun.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/842"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/842" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/842" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/842/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulation Determinism: Necessary or Evil?</title>
		<link>http://jakob.engbloms.se/archives/734?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/734#comments</comments>
		<pubDate>Sun, 19 Apr 2009 20:36:02 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[determinism]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[repeatability]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=734</guid>
		<description><![CDATA[In my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-735" style="margin-left: 10px; margin-right: 10px;" title="gears" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png" alt="gears" width="56" height="57" />In my series (well, I have one previous post about <a href="http://jakob.engbloms.se/archives/714"><em>checkpointing</em></a>) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: <em>determinism.</em> Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.</p>
<p><span id="more-734"></span></p>
<h2>What?</h2>
<p>Determinism really means this:</p>
<ul>
<li>Given a certain initial state</li>
<li>And a certain sequence of external inputs</li>
<li>The end result and state of the simulation will always be the same</li>
</ul>
<p>The key to note is that you need to require both the starting state and the sequence of external inputs to be the same in order to get the same result. If either of these change, you can well get a different result. Implementing a deterministic simulator requires all internal events and activities in the simulator to be performed in the same order and at the same time in each simulation run. It means that the host computer environment state cannot be allowed to affect the simulator execution, and that in turn means that all sorting of internal events have to be done in defined orders in all instances.</p>
<p>I have a story about how hard that can be in practice. I once talked to some compiler developers who had the issue that when recompiling the same program with the same set of compiler options, the results might come out different, even on the same machine. The problem was that each run of the compiler was done in a different overall system state, and this might affect how the OS memory allocation functions allocated items in memory. It turned out that in some cases, the precise value of the <em>pointers </em>to the items in a complex data structure were used by standard libraries to handle iteration over nodes in the data structures. Thus, a different memory allocation pattern gave a different iteration order and a different traversal order of nodes, and in the end an almost arbitrarily different result. The correct solution they had to implement was to use a defined lexical ordering to traverse and iterate, not anything dependent on the state of the host machine. It is nothing different in a simulator: define the order of <em>everything</em>, in order to be deterministic.</p>
<h2>Why?</h2>
<p>The crucial benefit that determinism brings to a simulation in general and a virtual platform in particular is <em>repeatable debugging</em>. With determinism and an appropriate recording mechanism (and most practically <a href="http://jakob.engbloms.se/archives/714">checkpointing</a>) you can rely on being able to repeat a run resulting in a bug any number of times with the precise same sequence of events in the simulation. In particular, the same sequence and timing and timing relative to instructions executed for events visible to and relevant for the software running on the virtual platform. Especially for multicore and parallel computing systems this is incredibly powerful, and something that just cannot be achieved on physical hardware (due to its inherent randomness and chaotic behavior, see my 2006 and 2007 ESC Silicon Valley talks for more on this, at my <a href="http://www.engbloms.se/jakob_publications.html">publications </a>and <a href="http://www.engbloms.se/jakob_presentations.html">presentations </a>pages).</p>
<p>If you assume stability of the simulation infrastructure and the simulation platform, determinism also makes debugging the simulation itself easier. Often, a bug in a simulation model is repeatable, and with determinism, it is easy to repeat the same external stimulus sequence to the module and debug it repeatably.</p>
<p>Determinism also makes it easy to detect change in the behavior of a simulation: if the same simulation setup results in a different result or final simulation state, you know something in the setup (model, model parameters, or software) changed. There is no randomness that cause changes without some fundamental parameter being changed. Such boring reliable behavior is generally exactly what you want when testing and debugging large, complex systems.</p>
<p>Obviously, once determinism becomes a requirement, missing determinism in a model is a bug in itself &#8212; and finding such bugs can certainly be interesting exercises.</p>
<h2>Why Not?</h2>
<p>Just like for checkpointing, one reason not do to determinism is that it is hard, as discussed above.</p>
<p>The most common reason that people claim to want to avoid determinism is that they want to explore alternatives within their simulation. Basically, there is a need for <em>variability </em>that would seem to be at odds with determinism. The typical argument is that &#8220;if my simulation model contains a non-deterministic choice, I want the simulation to expose that and not just make the same decision every time&#8221;. This is where determinism tends to be considered <em>evil</em>. However, this argument is not correct.</p>
<p>If we take the case that at some point P in a simulation run there are two different events <em>E</em> and <em>F</em> that can fire (since they are both posted to the same point in virtual time), a deterministic simulator will always select one and the same. This is necessary to reap the system-level benefits discussed above. However, nothing prevents us from programming a change from this behavior into our system explicitly, <em>introducing controlled and repeatable variation. </em>In such a setup, we will have a random decision being made in each simulation run, but one where the outcome in any particular run can be repeated by setting the same random seed parameter.</p>
<p>This brings the best of both worlds: variation to expose issues where there is potential non-determinism or lack of synchronization in the model, and perfect repeatability of the issues this poses in terms of target software and simulation system behavior. The reason for the simultaneous readiness can be considered to be lacking synchronization in the model, in general, and such a randomizer of behavior will expose that at several different levels. But uncontrolled randomness is not the answer.</p>
<p>Another common misconception is that at a higher level, determinism in a virtual platform means that target software will always run in the same way. That is not true, and misses the importance of state in the deterministic behavior equation. If the initial state when a program starts is different, a different execution will result. If software is run on top of any non-trivial operating system, there is plenty of such variation. In one of our simplest Simics demos, we show this by running an intentionally buggy race-condition-ridden program. Each time it is run, it hits a different number of race conditions. But thanks to determinism (best demoed using reverse execution), we can repeat each run perfectly.</p>
<p>Thus, determinism is not equal to constant behavior or lack of variation.</p>
<h2>The reverse argument</h2>
<p>Finally, determinism is the simplest way to implement reverse execution: if you have recording, determinism, and checkpointing, you can easily virtually reverse the execution by going back to a checkpoint and replay the execution from that point. If you stop one instruction before the current instruction, you have in essence stepped backwards one step in time. This is how both VMWare and Simics implement reverse execution and debugging. And it could not happen without determinism.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/734"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/734" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/734" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/734/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing: Meaningless, Difficult, or just Overlooked?</title>
		<link>http://jakob.engbloms.se/archives/714?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/714#comments</comments>
		<pubDate>Thu, 09 Apr 2009 19:56:16 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[Mambo]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[ZX Spectrum]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=714</guid>
		<description><![CDATA[One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-737" style="margin-left: 10px; margin-right: 10px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="gears1" width="56" height="57" />One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without it. Not having checkpointing is like having a word processor where you only get to save once, when your document is finished, with no option of saving intermediate states.</p>
<p>But not everyone seems to consider this an important feature, judging from its relative rarity in the world of EDA and virtual platforms. Why is this? Let&#8217;s look at some possible explanations.</p>
<p><span id="more-714"></span></p>
<p>But first, let&#8217;s examine the subject of this post a bit more. What is checkpointing, precisely?</p>
<h2>What?</h2>
<p>In short, it is the ability of a virtual platform or virtualization environment to save the state of an executing simulation to disk (or memory or something) and later bring the saved state back and continue the simulation as if nothing had happened.</p>
<p>In detail, there are four operations that need to be supported for this to be truly useful:</p>
<p><img class="aligncenter size-full wp-image-715" title="checkpoints" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/checkpoints.png" alt="checkpoints" width="632" height="494" /></p>
<ul>
<li>Saving and restoring to the same simulation system on the same host machine (i.e., into the exact same program binary for the simulation).</li>
<li>Restoring on a different machine (where different can mean a machine with a different word-length, endianness, and operating system).</li>
<li>Restoring into a bug-fixed version of the same simulation model.</li>
<li>Restoring into a completely different simulation model that happens to have the same state.</li>
</ul>
<h2>Why?</h2>
<p>Let&#8217;s look at some use cases for checkpointing:</p>
<p>The last operation is very interesting, since it carries with it the ability to change abstraction level. It is used in IBM Mambo (see a <a href="http://www.research.ibm.com/journal/rd/502/peterson.html">2006 IBM paper that you now have to buy due to an annoying change in IBM policy</a>) to exactly this effect, and in Simics for the Freescale QorIQ P4080 as well. It is also well exploited by academic research frameworks for Simics, such as <a href="http://www.cs.wisc.edu/gems/">GEMS </a>and <a href="http://www.ece.cmu.edu/~simflex/">SimFlex</a>. Essentially, the idea is to position using fast mode, and then move over to detailed mode. The advantage to doing this over a checkpoint is that you can farm out the experiments across many different hosts, save the precise starting point for future regression tests, and try different detailed settings from a known common starting position.</p>
<p>The most obvious use for checkpoints is to avoid repeating simulation work that does not add value, in particular booting of operating systems. A modern OS boot  easily takes billions of instructions (say 10 seconds on a dual-core gigahertz machine&#8230; do the math). Being able to save a simulation effort like this for instant reuse is such a standard part of how I work with virtual platforms that I could not imagine the pain of not having it.</p>
<p>Checkpointing is also a useful communications tool: it makes it possible for any user of a virtual platform to precisely communicate the system state and configuration to anybody else with access to the same virtual platform system (note that a Checkpoint, at least in Simics land, contains the list of objects in the simulation and how they are connected, so you do not need any other description of the simulation setup). This helps in debugging models &#8211; a user testing it can easily package problems and report them to the modeling team. And it helps in debugging software running on the virtual platform, as a tester can package up the precise system state right before a bug hits and send it back to development. Incredibly powerful! Here, portability of checkpoings across hosts is obviously very important, as well as across model versions. Once you have a fix for a model bug, you test it using the checkpoint, and check that things now proceed as they should.</p>
<p>Checkpointing also comes in handy as a backup-save ability when configuring an interactive target system. In many cases, the loading and configuration of software on a target is a very valuable and hard-to-repeat-exactly activity. Adding in software, configuring it, starting servers, assigning network addresses, configuring communications paths for backplanes can take a lot of time. On physical machines or virtual platforms, if you mess up, you have to go back and start over. With checkpointing, you can incrementally save work as you go along. This is a common use case for the snapshotting ability in VmWare, for example. But it works equally well for embedded targets modeled as virtual platforms.</p>
<p>There are more uses, the paragraphs above just scratch the surface of the utility of checkpoints.</p>
<h2>Why Not?</h2>
<p>But despite the obvious benefits, this feature is very rarely found in virtual platforms. I can see three main lines of argument:</p>
<ul>
<li><em>Meaningless</em>: for tests comprising only short software runs like a few million or tens of millions of instructions, rerunning it is fast enough. Or changes major enough. That checkpointing seems pointless. I can buy that &#8212; but only until the simple target is part of a greater context. If a DSP, for example, is part of a big system setup, you want to save its state even if it is only running a few small million-instruction loops.</li>
<li><em>Difficult</em>: I think this might be the most important explanantion. Doing checkpointing right puts requirements on the simulation kernel and on all processors and device models. All models have to be coded with discipline so that all state is available and can be set at any point in time. In particular, this means that explicit threading like employed in SystemC SC_THREAD is out. It must also be admitted that certain types of models like detailed processor models can be very difficult to serialize and deserialize from disk, simply due to the enormous intricacies of their implementations. But had they been designed with checkpointing in mind from the start, it would have been less difficult.</li>
<li><em>Overlooked</em>: The virtual platform was designed without thinking of checkpointing. Alternatively, no customers asked for it, so it was not built.</li>
</ul>
<p>I find the last argument very interesting, since I can see what happens once you have tried checkpointing. In my experience, once a user of a virtual platform has tried checkpointing, they want it. It goes from a interesting idea to a must-have feature very quickly. No arguments about why it is hard or why they can do without it work, as they have seen how things should be done.</p>
<p>For me, I think it is akin to my first encounter with a Macintosh computer, and the concept of &#8220;undo&#8221; in programs. Before that, I was happily editing code on a ZX Spectrum, in an environment where &#8220;undo&#8221; meant &#8220;manually remember how it looked at change it&#8221;. I had no problems with that, but once I saw how things could be done, there was no going back.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/714"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/714" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/714" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/714/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>VMM Detection Myths and Realities from a Simics and Embedded Perspective</title>
		<link>http://jakob.engbloms.se/archives/97?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/97#comments</comments>
		<pubDate>Sun, 20 Apr 2008 00:02:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Andrew Warfield]]></category>
		<category><![CDATA[HOTOS]]></category>
		<category><![CDATA[Jason Franklin]]></category>
		<category><![CDATA[Keith Adams]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Tal Garfinkel]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Timing attack]]></category>
		<category><![CDATA[Virtual machine detection]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=97</guid>
		<description><![CDATA[It must have been Google Alerts that send me a link to the HOTOS 2007 (Hot Topics in Operating Systems) paper by Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin called Compatibility is not Transparency: VMM Detection Myths and Realities. This paper is slightly less than a year old today, so it is old [...]]]></description>
			<content:encoded><![CDATA[<p>It must have been Google Alerts that send me a link to the <a href="http://www.usenix.org/events/hotos07/">HOTOS 2007</a> (Hot Topics in Operating Systems) paper by Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin called <a href="http://www.usenix.org/events/hotos07/tech/full_papers/garfinkel/garfinkel_html/">Compatibility is not Transparency: VMM Detection Myths and Realities</a>. This paper is slightly less than a year old today, so it is old by blog standards and quite recent by research paper standards. It deals with the interesting problem of whether a virtual machine can be made undetectable by software running on it &#8212; and software that is trying to detect it. Their conclusion is that it is not feasible, and I agree with that. The reason WHY that is the case can use some more discussion, though&#8230; and here is my take on that issue from a Simics/embedded systems virtualization perspective.</p>
<p><span id="more-97"></span></p>
<p>Their main important assumption is that the VMM cannot be tailored to avoid detection by any particular piece of software, but has to be sufficiently like the real thing to fool something the first time it appears. They discuss from the perspective of virtualization solutions like VmWare that aim at high performance before all else. The virtual PCs generated by VmWare, Parallels, KQemu, and others are all compatible with physical PCs &#8212; run the same software &#8212; but are not at all identical in detail. So they are not transparent in the words of the paper. This means that they are quite easy to spot.</p>
<p>There are some holes in functional differences that VMMs can quite easily plug. The paper shows how you can get a different-sized TLB (compared to the physical hardware), for example, from interference from the VMM. This can obviously be fixed in the VMM, at a cost in performance. The reason such differences are there is that VMMs are optimized for performance at almost any cost. As long as the requisite operating systems run as they should, the VMM is fine even if it is does actually correspond to any particular existing physical machine. This is a testament to the tolerance of modern operating systems towards their hardware. Basically, any OS that probes hardware and discovers what is there will work fine as long as the (virtual) hardware exposes devices that it can recognize. This is quite different from the 1970s or 1980s where an OS would definitely expect a very particular hardware setup with very peculiar timing to run at all. Thus, making a VMM totally identical to some physical machine is a waste of effort and performance.</p>
<p>Paravirtual approaches like Xen and what Sun has with Niagara and IBM on their Power servers, where the OS is rewritten by having drivers for a purely virtual hardware/software interface is an obvious generalization from the VmWare compatibility approach. Compatible versus transparent/invisible  virtualization is really only an issue in the x86 PC world, since all other datacenter architectures are virtual by definition and all operating systems work towards a standard virtual layer. In such an environment, I have hard time seeing that the question posed in the paper does even make sense. You are always virtualized, period.</p>
<p><strong>Embedded Virtual Platforms</strong></p>
<p>Anyhow, back to the main thread. There is still a large set of targets where transparency and compatibility are of interest. x86 PCs is one such target, it is an interesting question for older architectures (Alpha, Vax, Sun and IBM in older generations). In particular,  it is an important topic for embedded systems where you want to use virtual or simulated approaches to develop and test software. As part of that software development process on a virtual machine, you could potentially be examining malware of various kinds. A good not-too-hypothetical example are mobile phone viruses.</p>
<p>If we look at embedded system virtual platforms, the functionality of the simulator is usually more complete and more like a particular physical machine than what a VmWare-style datacenter VMM. This is partially due to embedded software stacks tending to be a bit pickier about what they run on, and partially due to the simple fact that the goal really IS to expose the hardware/software interface of a particular piece of hardware as closely as possible. Also, since this is usually cross-targets (Power Arch on x86, for example), there is no performance gain from using features of the host directly. So items like TLB counts, memory layout, memory content, flash memory programming, etc. are all going to be functionally identical to the physical machine.</p>
<p><strong>Timing is Key</strong></p>
<p>Thus, just like for a patched VmWare-style VMM as discussed in the article, the main attack vector remains <em>timing</em>.</p>
<p>The best way, according to the authors, to spot a VMM is to look for timing differences compared to the behavior on normal hardware. Despite the inherent variability of typical hardware, there are cases where VMMs by necessity vary detectable amounts. I would say this means a factor five or more over many tests of a case.</p>
<p>The authors discuss whether tools like Virtutech Simics could be used to overcome this problem in the context of x86 PCs.  I think the main argument for something like Simics for this purpose is that by simulating the entire hardware platform and providing all timing measurements from a strong virtual time base, you do not see the types of time differences that can be used to detect a &#8220;normal&#8221; VMM. However, since the paper considers Simics and SimNow (from AMD) to be about ten times slower than native hardware, you can always detect them using a non-local time source. That is likely true. But it less obviously true for an embedded target where the simulator running on a fast PC might well be just as fast as the target.</p>
<p><strong>The Multicore Timing Attack</strong></p>
<p>A more intriguing aspect of embedded virtual platforms that could be used to detect virtual platforms is how simulation of multicore machines is handled. For performance reasons, simulators use <em>temporal decoupling</em>,  where each virtual processors is run for a &#8220;long&#8221; time slice before switching to the next. We discussed the effect of this in a recent presentation at the multicore expo (<a href="http://jakob.engbloms.se/archives/89">link to previous blog post</a>), and some of that data is worth repeating.</p>
<p>Here is a slide explaining how temporal decoupling works:</p>
<p><img class="aligncenter size-full wp-image-105" style="vertical-align: middle;" title="temporaldecoupling-what-it-is" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/temporaldecoupling-what-it-is.png" alt="Illustration of temporal decoupling" width="500" height="375" /></p>
<p>So what does this mean in practice for detecting that you are running in a virtual machine?</p>
<p>It means that the communication latency between parallel threads is proportional to the size of the time slicing. If you have two threads progressing in parallel doing spinlocks, on a real machine they will be stealing the lock from each other all the time. On a temporally decoupled simulator, you will rather see a behavior where you can take the lock and then recapture it a few times before missing it. This effect was captured by a simple test program that we wrote, and the data is shown in the slide below:</p>
<p><img class="aligncenter size-full wp-image-106" title="temporaldecoupling-visible-disturbance" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/temporaldecoupling-visible-disturbance.png" alt="Visible disturbance from temporal decoupling" width="500" height="375" /></p>
<p>The program here is running two threads in parallel, updating a shared variable, with three types of locking for the accesses:</p>
<ul>
<li>No locking at all</li>
<li>A local lock to each thread being used (&#8220;fake locking&#8221;)</li>
<li>A proper lock</li>
</ul>
<p>The interesting behavior is the execution time of the program for each of these locking styles. Obviously, running with no lock is the fastest, and with proper locking the slowest. The relative speed of these is the factor to consider. On real hardware, this program observes a very steep increase in execution time when using proper locking. On the simulator, as seen above, the difference in execution time between fake locking and proper locking is significantly smaller when using a long time slice compared to when using a short time slice. The behavior on physical machines is much more like that observed at time slice lengths of ten than that at time slices of 10000.</p>
<p>Normally, a multiprocessor simulator with any ambition to be fast has to use a time slice of 1000 or more. Thus, detecting that you are running inside a simulator is quite simple. If the outside world time seems right, check if you can see strange timing behavior when using locks. Since high speed requires a long time slice, you cannot have both correct real-world timing and a large performance difference. And on the other hand, if the behavior with locking seems reasonable, you should check the real-world time &#8212; as a simulator with a short time slice will be way slower than the real world.</p>
<p>The paper authors note a similar aspect in desktop/server x86 VMM detection. They discuss &#8220;performance cliffs&#8221; that appear when doing &#8220;unusual&#8221; things. For example, VmWare is engineered assuming a minimum use of self-modifying code. Performance is much worse if you use it extensively, and this can be used to detect VmWare quite effectively. This effect is quite similar to the time slice effect in embedded virtual platforms.</p>
<p>Hope you enjoyed this fairly long rant. And we have not even begun exhausting the contents of this topic&#8230; luckily, these discrepancies only very rarely impact the usefulness of virtual platforms. Since most software even on an embedded system does not care about detailed timing like this. In the example above, we still see the lock contention. So we know that we are getting an increase in execution time from the lock. Only not a complete picture of what it means in absolute terms. We will still find missing locks and overused locks.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/97"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/97" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/97" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/97/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

