<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala</title>
	<atom:link href="http://jakob.engbloms.se/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Off-Topic: Ticket-to-Ride Pocket is Broken</title>
		<link>http://jakob.engbloms.se/archives/1606?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1606#comments</comments>
		<pubDate>Sun, 29 Jan 2012 19:45:28 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[board games]]></category>
		<category><![CDATA[highscore]]></category>
		<category><![CDATA[ticket to ride]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1606</guid>
		<description><![CDATA[Ticket to Ride is a nice real-world board game that is generally considered one of the best family and gateway games (and a decent game even for experienced gamers). We recently got it for our iPod Touches, and the weakness of the computer players quickly turned it from &#8220;I wonder if I can win this [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.boardgamegeek.com/boardgame/9209/ticket-to-ride"><img class="alignleft size-full wp-image-1607" style="margin: 10px 5px;" title="ttr pocket logo" src="http://jakob.engbloms.se/wp-content/uploads/2012/01/ttr-pocket-logo.png" alt="" width="60" height="60" />Ticket to Ride </a>is a nice real-world board game that is <a href="http://www.boardgamegeek.com/boardgame/9209/ticket-to-ride">generally considered one of the best family and gateway games </a>(and a decent game even for experienced gamers). We recently <a href="http://itunes.apple.com/us/app/ticket-to-ride-pocket/id471857988?mt=8">got it for our iPod Touches</a>, and the weakness of the computer players quickly turned it from &#8220;I wonder if I can win this game&#8221; into &#8220;let&#8217;s shoot for the highest score possible&#8221;.</p>
<p>Chasing high scores is fairly typical for computer games &#8211; playing against human beings you are motivated to win, even if you win by scoring a measly 75 points&#8230; while against the computer it becomes about beating your own old scores. Unfortunately, this also turns repetitive after a while, due to some small design flaws that really should be easy to fix.</p>
<p><span id="more-1606"></span>What I have found is that I believe there is an optimum high-score strategy in the game, due to the very uneven spread of tickets and a very bad computer player.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2012/01/IMG_00961.png"><img class="aligncenter size-full wp-image-1609" title="IMG_0096" src="http://jakob.engbloms.se/wp-content/uploads/2012/01/IMG_00961.png" alt="" width="354" height="112" /></a></p>
<p>My way to play it to get a high score (not necessarily a highscore, though) is to essentially build the exact same network each time, and then stop and pull as many tickets as a I can that fits into this network. When lucky, 15 tickets can fit inside and none miss, and the score will be 250 or more. If not lucky, some tickets will miss, and the score will be lower.</p>
<p>The reason this style of play works is that the computer player is blind to what you do and does not try to stop you from connecting up obvious points. In a real game, if you see someone starting to build a route that looks familiar, you will tend to play a blocking move. In the iPod version, you can have two 20-car segments separated by a single two-segment track&#8230; and the computer just ignores it and keeps plugging away at its own goal. Also, when playing with humans, you cannot just say &#8220;aw, I got some godawful starting tickets, let&#8217;s all start over with something better&#8221;. With just yourself and the computer, that&#8217;s very natural.</p>
<p>The critical network is this:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2012/01/IMG_0097.png"><img title="IMG_0097" src="http://jakob.engbloms.se/wp-content/uploads/2012/01/IMG_0097.png" alt="" width="614" height="371" /></a></p>
<p>Tracks in green are definitely on the critical network, while the yellow parts depend on exactly which tickets show up. Sometimes, Boston is included, sometimes not. The red circles mark cities which are never on any tickets, and which thus are useless.</p>
<p>The connect the corners strategy is good even in the normal real-world Ticket to Ride &#8211; but with observant human opponents, very hard to pull of. Also, it seems to me that the Tickets in place in the iPod version is different from what&#8217;s in the original board game, and they are bit less balanced.</p>
<p>What I Days of Wonder should do is to copy Angry Birds and just release some free updates containing a ton of new tickets. If there were more tickets connecting from mid-land (Helena, Salt Lake City, Las Vegas) to the unused cities on the east coast, if would be much more variable. Also, the computer components could well be made to be a bit more aggressive and blocking.</p>
<p>In the iPod version of <a href="http://itunes.apple.com/us/app/carcassonne/id375295479?mt=8">Carcassonne</a>, the computer styles of play vary much more between different personalities, and some of them are very good at playing offensively and blocking you. A similar development could make Ticket to Ride for iPod a bit more exciting to play. As it is right now, it feels like the game has been conquered and won and is pretty pointless to play. The chance of grabbing a higher score is rapidly diminishing, as the random luck needed to get the just right set of tickets with the computer getting a non-conflicting set and being fairly slow in building feels very small. It will happen if you just play many times enough &#8211; but where is the fun in that?</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1606"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1606" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1606" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1606/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Fault Injection with Simics</title>
		<link>http://jakob.engbloms.se/archives/1600?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1600#comments</comments>
		<pubDate>Mon, 23 Jan 2012 20:57:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[testing]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[fault injection]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1600</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about how you actually do fault injection in Simics. This particular post is pretty detailed, showing the actual architecture of a fault injector in Simics, not just &#8220;yes you can do it&#8221;. It includes actual diagrams of system components and how you can insert fault [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a <a href="http://blogs.windriver.com/tools/2012/01/making-a-faulty-serial-port.html">new post </a>at my Wind River blog, about how you actually do <a href="http://blogs.windriver.com/tools/2012/01/making-a-faulty-serial-port.html">fault injection in Simics</a>. This particular post is pretty detailed, showing the actual architecture of a fault injector in Simics, not just &#8220;yes you can do it&#8221;. It includes actual diagrams of system components and how you can insert fault injection into an existing system, so it is a bit more technical than most my Wind River blog posts that tend to be more conceptual.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1600"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1600" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1600" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1600/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reverse History Part Three &#8211; Products</title>
		<link>http://jakob.engbloms.se/archives/1564?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1564#comments</comments>
		<pubDate>Sun, 08 Jan 2012 19:51:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[Green Hills]]></category>
		<category><![CDATA[Lauterbach]]></category>
		<category><![CDATA[Multi]]></category>
		<category><![CDATA[reverse debug]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[TotalView]]></category>
		<category><![CDATA[UndoDB]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1564</guid>
		<description><![CDATA[In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. Part one of this series provided a background on the technology and part two [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. <a href="http://jakob.engbloms.se/archives/1547">Part one </a>of this series provided a background on the technology and <a href="http://jakob.engbloms.se/archives/1554">part two </a>discussed various research papers on the topic going back to the early 1970s. The first commercial product featuring reverse debugging was launched in 2003, and then there have been a steady trickle of new products up until today.</p>
<p><span id="more-1564"></span></p>
<p><strong>2003</strong>. The embedded tools company Green Hills launched their<br />
<a href="http://www.ghs.com/news/20030930_best_of_show.html">Time Machine</a> feature in their well-known MULTI debugger. I consider this the start of commercial reverse debugging, as it was the first<br />
commercial-grade product to include reverse debugging. The implementation was based on tracing the execution of a program on actual hardware, using a debug probe and a &#8220;JTAG&#8221; debug interface. The trace box would capture several gigabytes of execution data, and then the debugger performed operations based on this trace. To check a backwards breakpoint, you scan back over the trace until you find a matching state or operation (such a memory access or instruction address that is being executed). The main limitation of the method is that the trace buffer can only capture a few seconds of execution on a typical 100s of MHz embedded processor. It only works for a single processor, and it does not capture IO actions (except as memory-mapped IO). It is system-level and cross-target.</p>
<p>Extending this kind of trace to multicore has proven hard, since getting a synchronized trace out of several processors is very hard. There might be debug hardware coming out in the next few years that can indeed support a time-stamped consistent trace of multiple cores, and with such hardware, the Time Machine approach could well be extended into multicore.</p>
<p><strong>2005</strong>. <a href="http://www.windriver.com/products/simics/">Simics </a>3.0 was launched by Virtutech (later acquired by Wind River and Intel) with full-system reverse execution and reverse debugging. The Simics approach was also unique, being based on a full-system simulator. By simulating the entire target, it is trivial to reverse (and put reverse breakpoints on) changes to memory, persistent storage like disks, and hardware devices. Since all device models in the simulator are deterministic in their implementation, re-executing hardware events like interrupts and IO outputs is just as easy as re-executing code on the main processor, something that had eluded all previous approaches. Recording is used at the interface between the simulator and the outside world, such as user interaction over graphics displays and serial ports and connections to the real-world network. The software stack is unmodified and system-level, and the simulator can handle multiple processors and even multiple machines in a network as a unit. The use case is normally cross-target (even if a system identical to the host can be simulated, it would work like a cross target logically). Time is handled by counting clock cycles on all processors in the system, and reverse debugging can position the simulation at any point in time based on the virtual time.</p>
<p>There is a cost in execution speed from simulation rather than direct execution, and an intrusion effect from running on a simulator rather than on a physical machine. This affects the <a href="http://jakob.engbloms.se/archives/97">timing of events</a>, even with a software stack that is not modified. Still, the fact that you can run a complete real software stack with no modifications needed before starting to run the target system is fairly rare in the world of reverse debuggers.<strong></strong></p>
<p><strong></strong>Simics shipped with a modified gdb that talked gdb serial to Simics and accessed reverse execution with some new debugger commands as well as extensions to the gdb serial protocol. This was offered to the gdb community, but not accepted. However, prompted by this, the gdb community started to discuss reverse execution. Some interesting old threads can still be found, such as <a href="http://sourceware.org/ml/gdb/2005-05/msg00225.html">http://sourceware.org/ml/gdb/2005-05/msg00225.html</a>. Clearly, at that point in time Virtutech did not really explain how Simics worked, and there were some pretty bad proposals floated in the community for how to do reverse. In the end, the gdb serial design did turn out in the right way, assuming<br />
the remote debugger would reverse itself and <a href="http://sourceware.org/ml/gdb/2005-05/msg00235.html">gdb would just ask it to do so</a>. This separation of concerns is important to creating practical reverse debugging solutions that can use any debugger backend.</p>
<p><strong>2005</strong>. Also in 2005, Lauterbach launched the <a href="http://www.lauterbach.com/cts.html">Context Tracking System, CTS</a>. Lauterbach is a big player in the embedded debug market, with their TRACE32 debugger. CTS can be seen as their reply to the Time Machine debugger. CTS is also based on a trace from a hardware unit or from an instruction-set simulator. However, from the available information is also appears to be more limited &#8211; you can step back and go back in time and replay forward, but there is no mention of actual backwards breakpoints (even today, six years later).  Thus, I count this as record-replay rather than reverse debug. It is cross-target, system-leve, and uniprocessor like Time Machine.</p>
<p><strong>2006</strong>. <a href="http://undo-software.com/undodb_about.html">Undo Software </a>launched the first Linux-targeting host-based reverse debugger, <a href="http://undo-software.com/pressrelease-1.html">UndoDB</a>. It is described as a <em>bidirectional</em> debugger (the same terminology as the Boothe 2000 PLDI paper). It is user-level, does do reverse breakpoints (and data breakpoints, also known as watchpoints, which is really useful). It handles multiple threads (at least in 3.0), but from the description of the recording technology used I believe they have to serialize their execution. The implementation is based on checkpoint and re-execution, with recording of all non-deterministic events like IO. There is a feature to move to a certain point in time, based on &#8220;simulated nanoseconds&#8221;. These are not really nanoseconds, but values which are guaranteed to increase even between two instructions (which probably means that they are sub-nanoseconds and on a &gt; 1GHz CPU single-cycle instructions will indeed take less than one nanosecond).</p>
<p>There is a nice description of how it works on their <a href="http://www.undo-software.com/undodb-gdb.1.html">online man page</a>. It is worth noting that they call it &#8220;gdb&#8221;, but the command set is distinct from what gdb introduced with its reverse execution in 2009. They use the &#8220;b&#8221; prefix for backwards commands rather than &#8220;r&#8221; for reverse.  In some way, UndoDB is in direct competition with the gdb reverse target, but it is much much faster and has more features.</p>
<p><strong>2008</strong>. The Rogue Wave (at the time, it was an independent company) TotalView debugger gained support for reverse debugging, with the <a href="http://www.roguewave.com/products/totalview-family/replayengine.aspx">ReplayEngine </a>add-on. TotalView is an old mainstay in the HPC market, having been around since <a href="https://computing.llnl.gov/tutorials/totalview/#Overview">at least 1993</a>. Indeed, it was developed initially for the <a href="http://en.wikipedia.org/wiki/BBN_Butterfly">BBN Butterfly computer</a>, and thus it might have had a touch with reverse execution as far back as the 1987 paper cited in my <a href="http://jakob.engbloms.se/archives/1554">previous blog post</a>.</p>
<p>Judging from <a href="http://www.roguewave.com/documents.aspx?Command=Core_Download&amp;EntryId=739">the available materials</a>, TotalView can clearly can step back in various ways. However, it is not clear that it triggers breakpoints when going backwards. Thus, it has to count as record-replay debugging rather than reverse debugging. The base of the implementation is extensive instrumentation of the the runtime system of the target computer.  The implementation builds on the fact that the target programs tend to b clustered programs that use MPI to communicate &#8211; and thus a large part of the communication between threads is explicit and easily intercepted and recorded.  There is also an existing infrastructure of checkpoint and restart for parallel programs using MPI to support fault tolerance that was used as the base of the implementation.  Finally, in a slightly ugly hack, they make each multi-threaded program run on a single processor by a big lock. In this way, all that needs to be replayed is the interleaving of threads on a single processor, a far more tractable problem compared to trying to replicate a true parallel execution in a new session.</p>
<p><strong>2008</strong>. VmWare officially launched a record-replay debugger based on their virtual machine technology with <a href="http://www.replaydebugging.com/2008/08/vmware-workstation-65-reverse-and.html">VmWare Workstation 6.5</a>. Single-processor, system-level (but really only supported for user-level debugging), cross target (since the VM is not really the absolutely same hardware as the host), time model is based on the virtual machine which I believe is cycles-based. Mostly used for record-replay debug of non-deterministic software bugs, but could also do reverse debugging including reverse data breakpoints. Based on snapshot and deterministic re-execution, plus recording of all non-deterministic device accesses (not all devices in the VmWare hardware emulation layer are deterministic). Going back to a snapshot was a very heavy operation (I tried it) since you had to restore the entire target memory (quickly got into gigabytes). The hardware supported in the VM was quite limited, and things like CD-ROMs and floppies could not be part of a record/replay session. Replay logs could be moved between hosts.</p>
<p>The VmWare reverse debug functionality was removed from VmWare workstation version 8 in 2011, since it required a large investment and was not apparently used by very many VmWare users. This indicates that trying to build developer-oriented functionality into a technology base that is fundamentally driven by the need of deployed virtual machines was hard. There are contradictions between these two goals, as the determinism and control needed for a good reverse debugger is not necessarily consistent with maximum performance for running virtual machines in a production setting.</p>
<p><strong>2009</strong>. gdb 7.0 added support for reverse execution (a work that began in 2006). The built-in &#8220;record&#8221; target supports reverse debugging on user-level single-threaded programs on the same host. The command set for reverse debugging is fairly full-featured, but is a bit quirky with a &#8220;<a href="http://sourceware.org/gdb/news/reversible.html">set direction</a>&#8221; command that makes regular run-control commands work in reverse. The record technology is quite slow since it basically records the effect of each and every instruction run in the program.</p>
<p>In addition to its built-in target, gdb can also control external reversible debug systems over the gdb serial protocol. This made the changes to gdb-serial created by Virtutech for Simics in 2005 part of the mainline gdb release. <a href="http://sourceware.org/gdb/news/reversible.html">Several tools support the command set</a>, including VmWare, UndoDB, and Simics. There was also a set of MI commands added to basically let Eclipse use gdb as a backend for reverse debug, including using it to control external tools via gdb-serial. How this happened is quite a long story, and I made a small contribution to the gdb code base myself in the process. Read about this <a href="http://jakob.engbloms.se/archives/1065">here</a>.</p>
<p><strong>2009</strong>. Eclipse CDT added support for <a href="(http://www.eclipse.org/community/training/webinars/090526_CDT_Webinar.pdf">reverse execution</a>, using gdb 7.0 reverse as the initial backend. As noted above, this lets Eclipse also use other reverse debugging backends (Eclipse uses the gdb-MI interface to gdb to control the debug session). This is noteworthy since it meant that the buttons to control reverse execution are now part of the CDT, making it much easier to use Eclipse to build a frontend to any reversible backend. Eclipse is not really a debugger, just an interface to a debugger.</p>
<p><strong>2009</strong>. Microsoft Visual Studio <a href="http://blogs.msdn.com/b/ianhu/archive/2009/05/13/historical-debugging-in-visual-studio-team-system-2010.aspx">got record-replay debugging with IntelliTrace</a>. It is strictly about replay debugging, including the nice ability to send traces around between developers. There are no backwards breakpoints. The support is limited to programs running on top of the .net runtime system, meaning that <a href="http://msdn.microsoft.com/en-us/library/dd264915.aspx">it does not apply to classic Windows software</a>. Using the CLR virtual machine as the implementation basis should make the implementation easier, cleaner, and faster compared to a machine-level native solution. It is user-level, single-threaded, and host-based. Time concept is unknown.</p>
<p><strong>2011</strong>. Adobe demonstrated (not launched) reverse debugging in their Flash Builder programming environment. A <a href="http://tv.adobe.com/watch/max-2011-sneak-peeks/max-2011-sneak-peek-reverse-debugging-in-flash-builder/">nice video is posted on the Adobe website</a>. Seems to be based on the virtual machine that flash runs on, and includes what looks like pretty powerful backwards data analysis tools. In a <a href="http://anirudhsasikumar.net/blog/2011.12.15.html">blog post</a>, the developer describes some of the features, which to me seem to indicate some pretty heavy recording.</p>
<p><strong>Final notes.</strong>In researching these commercial tools, there also seems to be a lost one. A company called Visicomp launched a Java debugger called RetroVue in 2002 which supposedly did allow backwards debugging in some way. However, it seems that this tool was not really practical, being too slow for actual use. It seems to have disappeared since without anyone picking up its legacy. The technology was apparently pretty much like the Omniscient Debugger presented in 2003 and which I described in the <a href="http://jakob.engbloms.se/archives/1554">blog post on reverse execution research</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1564"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1564" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1564" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1564/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reverse History Part Two &#8211; Research</title>
		<link>http://jakob.engbloms.se/archives/1554?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1554#comments</comments>
		<pubDate>Sun, 08 Jan 2012 19:42:59 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Bil Lewis]]></category>
		<category><![CDATA[Bryce Cogswell]]></category>
		<category><![CDATA[Channing Brown]]></category>
		<category><![CDATA[Daniel Jacobowitz]]></category>
		<category><![CDATA[George Dunlap]]></category>
		<category><![CDATA[John Mellor-Crummey]]></category>
		<category><![CDATA[Mark Russinovich]]></category>
		<category><![CDATA[Marvin Zelkowitz]]></category>
		<category><![CDATA[Mireille Ducasse]]></category>
		<category><![CDATA[Murasa Bazrai]]></category>
		<category><![CDATA[omniscient debugger]]></category>
		<category><![CDATA[Paul Brook]]></category>
		<category><![CDATA[Peter Chen]]></category>
		<category><![CDATA[qemu]]></category>
		<category><![CDATA[reverse debugging]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[ReVirt]]></category>
		<category><![CDATA[Samuel King]]></category>
		<category><![CDATA[Stuart Feldman]]></category>
		<category><![CDATA[Sukru Cinar]]></category>
		<category><![CDATA[Tankgut Akgul]]></category>
		<category><![CDATA[Thomas LeBlanc]]></category>
		<category><![CDATA[TTVM]]></category>
		<category><![CDATA[Vincent Mooney]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1554</guid>
		<description><![CDATA[This is the second post in my series on the history of reverse execution, covering various early research papers. It is clear that reverse debugging has been considered a good idea for a very long time. Sadly though, not a practical one (at the time). The idea is too obvious to be considered new. Here [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>This is the second post in my series on the history of reverse execution, covering various early research papers. It is clear that reverse debugging has been considered a good idea for a very long time. Sadly though, not a practical one (at the time). The idea is too obvious to be considered new. Here are some papers that I have found dating from the time before &#8220;practical&#8221; reverse debug which for me starts in 2003 (as well as a couple of later entrants).</p>
<p><span id="more-1554"></span></p>
<p>When searching through the literature using the ACM Digital Library (and some Yahoo and Google web searches), I find quite a large body of research literature dating to the late 2000s. Apparently, the appearance of commercial reverse debuggers have inspired research into the topic &#8211; and our increasingly powerful machines have made heavier solutions feasible (at least for single programs).</p>
<p>Disclaimer &#8211; this is not really an exhaustive survey of the field, but some of the more interesting papers that I have stumbled on.</p>
<p><strong>1973</strong>. The first actual mention of reverse debugging I have found is a short 1973 Communications of the ACM article called &#8220;<a href="http://dx.doi.org/10.1145/362342.362360">Reversible Execution</a>&#8221; by <a href="http://www.cs.umd.edu/~mvz/">Marvin Zelkowitz</a>. Basically, this paper proposed to apply a backtracking facility built into a PL/I compiler to support AI research (a precursor to Prolog execution semantics from what it sounds like) to also replay parts of a normal program for debugging. This in essence would enable replay of a program execution, based on a log of how the program ran and the values of variables changed. Basically, a forward-only record-replay solution for user-level single-threaded programs.</p>
<p><strong>1987</strong>. Thomas LeBlanc and John Mellor-Crummey, &#8220;<a href="http://dx.doi.org/10.1109/TC.1987.1676929">Debugging Parallel Programs with Instant Replay</a>&#8220;, IEEE Transactions on Computers. This paper covers record-replay debugging of <em>parallel</em> user-level programs on the <a href="http://en.wikipedia.org/wiki/BBN_Butterfly">BBN Butterfly</a>. They record all interaction between processes (threads) by intrumenting the OS calls that are used to communicate and synchronize between processes. Programs also sometimes had to be changed so that their communication protocols allowed the instrumentation to work properly. Thus, the approach did not apply to unchanged code either at source or binary level. Processes still compute their results, only the order or interactions are enforced. In this way, the overhead was kept usefully low. The approach does not work well for programs that deal with asynchronous interrupts and hardware interaction by the software, since there is no good way to hook and replay such events in their implementation.</p>
<p>The most important concept here is the use of re-execution of code to reconstruct values of a program, assuming deterministic computation as long as asynchronous interactions can be controlled. This is the basis of all reverse execution approaches based on checkpointing and re-execution. Not sure if they invented the idea, but this is the earliest distinct mention I have seen of this.</p>
<p><strong>1988</strong>. Stuart Feldman and Channing Brown, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=69215.69226">IGOR: a system for program debug via reversible execution</a>&#8220;, 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging (PADD) . This paper looks at the problem of how to debug user-level single-threaded code on a UNIX-like system. The programs need to be recompiled to add a small amount of support, in particular to help with restarting<br />
execution. It is essentially a record-replay approach, but capable of placing the execution of a program at any point in its past execution. They save regular checkpoints during program execution (every tenth of a second or so seemed appropriate in their testing). The checkpoints only include the data that has been changed since the previous checkpoint (they added an OS facility to mark all pages that a program writes between checkpoints). A unique feature is the ability to look at the value of a variable as it is stored in successive checkpoints, and to &#8220;go to the point in time where variable X has value V&#8221; (assuming a variable is monotonically increasing or decreasing, like an outermost loop counter). To do the detailed positioning, they use an interpreter for the target system, essentially mixing native and interpreted execution in their debugger.</p>
<p><strong>1996</strong>. Mark Russinovich (who I believe is the man behind the recent novel &#8220;<a href="http://www.zerodaythebook.com/">Zero Day</a>&#8220;) and Bryce Cogswell, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=231379.231432">Replay for Concurrent Non-Deterministic Shared-Memory Applications</a>&#8220;, PLDI 1996. In this paper, instrumented system libraries and binary code modification is used to enable deterministic replay of multithreaded applications. The binary modification is used to implement an instruction counter to provide a logical time-base, and instrumented system libraries are used to force a serialized (but interleaved) execution of the program. It does not run the program in parallel on a multiprocessor, but rather runs it on  a single processor. User-level, instrumented and intrusive, parallel on single processor.</p>
<p><strong>2000</strong>. Bob Boothe, &#8220;<a href="http://dx.doi.org/10.1145/349299.349339">Efficient Algorithms for Bidirectional Debugging</a>&#8220;, PLDI 2000. I actually saw this paper being presented live at the PLDI conference in Vancouver. At the time, I was working on embedded compilers and debuggers, and our little group from <a href="http://www.iar.se">IAR </a>looked at what was being presented at said &#8220;yeah, might work on a big desktop, but you can&#8217;t do that in a real machine&#8221;. Why was that the case? Essentially, what Boothe did was to use the Unix fork call to spawn off checkpoints of a program as it was executing. This was a smart way to achieve checkpointing using existing OS facilities. In addition, he recorded the results of system calls to replay them during replay.</p>
<p>Time was given by counters in target program (compiled-into it as part of the debugging build). Most of the paper was spent on basic algorithms for how to do &#8220;previous&#8221; (step back 1 line), &#8220;before&#8221; (inverse of finish), &#8220;buntil&#8221; (backwards until, looking at data conditionals to determine when to stop). His use of &#8220;backwards&#8221; rather than &#8220;reverse&#8221; as the name of the concept does make this paper a bit harder to find. While limited to a single threaded user-level process, this paper is probably the first to introduce reverse debugging in the sense of the primary user interface being the ability to step backwards and check for breakpoints backwards in time. A good idea in the UI design was the <em>undo</em> command that jumped the debugger back to the last position where you stopped. Having worked with reverse debugging quite a lot myself, this operation makes a lot of sense as an addition to standard run-control commands.</p>
<p><strong>2002</strong>. Tankgut Akgul and Vincent Mooney, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=634636.586101">Instruction-level Reverse Execution for Debugging</a>&#8220;, at PASTE 2002. This paper is an odd one, as it was neither really reverse nor practical. The approach is based on doing a static analysis of the assembly code of a program to generate code that could reverse operations based on some logging of destructive operations. Interesting, but not really practical as I see it. One noteworthy tidbit in the paper is the measurement that logging only destructive operations is some 50 x more efficient than logging all changes.</p>
<p><strong>2002</strong>. George Dunlap, Samuel King, Sukru Cinar, Murasa Bazrai, and Peter Chen, &#8220;<a href="http://dx.doi.org/10.1145/844128.844148">ReVirt: enabling intrusion analysis through virtual-machine logging and replay</a>&#8220;, at OSDI 2002. This paper uses the User-Mode Linux (UML) paravirtual machine to checkpoint and replay an operating system. I think this is the first use of a true virtual machine to achieve reverse execution, even though it is a paravirtual machine rather than a full virtual machine. The replay starts from a single checkpoint at the start of execution of the guest system, so moving to a certain point in time can be very time-consuming. It does support reverse breakpoints. To move to a certain point in time, they count executed instructions, not actual time. The use of a paravirtual solution precludes the use of many real device drivers and debugging issues related to interaction with most real devices. It is uniprocessor, system-level, unmodified target (except the change to make the OS itself run paravirtualized on top of UML).</p>
<p><strong>2003</strong>. Bil Lewis and Mireille Ducasse, &#8220;<a href="http://dx.doi.org/10.1145/949344.949367">Using events to debug Java programs backwards in time</a>&#8220;, OOPSLA 2003. This paper targets programs running on top of a Java Virtual Machine, resulting in an <em>Omniscient Debugger</em>. They instrument the target program to log all state changes. A big lock is used to serialize all threads into a single interleaved execution. The implementation is admitted to be almost worst-case inefficient, but the debugger was still considered useful for some real work. The debug UI does feature backwards breakpoints, so it qualifies as actual reverse debugging.</p>
<p>Note that in principle, backwards debugging and record-replay should be possible to implement quite efficiently for a language virtual machine like the JVM, if you can modify the VM itself. This has indeed been done today, for example by Microsoft for their CLR and in the use of full-system simulators (and other simulators) to implement reverse debugging.</p>
<p><strong>2005</strong>. Samuel King, George Dunlap, and Peter Chen, &#8220;<a href="http://www.usenix.org/events/usenix05/tech/general/king.html">Debugging Operating Systems with Time-Traveling Virtual Machines</a>&#8220;, USENIX 2005. This paper builds on the foundation of ReVirt, but adds checkpoints during a run to make the replay to a certain point in time more efficient. They also introduce the differential storing of disk state in the checkpointing (just like done in Simics). Overall, it is very similar to the replay approach taken by VmWare a few years later. Still single-processor. The paper contains a nice list of example bugs for which reverse debugging work much better than cyclic debugging. They also reference Simics as something that might do something with reverse, but do not quite know what (this is right after the launch of Simics reverse execution as discussed in the next post).</p>
<p><strong>2007</strong>. Paul Brook and Daniel Jacobowitz, &#8220;<a href="http://www.linuxsymposium.org/archives/GCC/Reprints-2007/jacobowitz-reprint.pdf">Reversible Debugging</a>&#8220;, GCC Developer&#8217;s Summit 2007. This paper presented some early work on implementing reverse debugging inside the Qemu full-system simulator. Logically, it follows in the steps of the Simics reversible debugger announced in 2005 (see the <a href="http://jakob.engbloms.se/archives/1564">third post </a>in this series of blogs). They use gdb as their frontend, and the paper introduced the idea of setting the default execution direction of the debugger. This UI choice was later adopted in the reversible gdb 7.0 released in 2009. The paper contains a good discussion of some of the practical issues involved in performing operations like reverse-step to go back a line, or stepping back up through a function call. It is a good explanation of the basic problems.</p>
<p>The actual reversing is implemented in Qemu using a checkpoint and deterministic replay, along with logging of any operation that is not deterministic. Their main test case is actually using the semihosting variant of Qemu where IO calls are intercepted and passed on to the host. Just like all other such solutions, they record the return values and do not repeat side-effects. It is system-level, single processor, unmodified target software, and time is based on counting executed instructions.</p>
<p>The Qemu architecture makes enforcing reversibility across all devices fairly difficult, since each device manages its own interaction with the user. One interesting tidbit in the paper is that Qemu by default triggers timer interrupts based on host time, but these had to be made fully virtual to be reversible.</p>
<p><strong>Summary</strong> The above is a small selection from all the papers published on this topic, but I think they capture the most common methods used. Basically, almost all of the approaches require some kind of change to the target software to enable reversibility. Interestingly, the first few commercial solutions for reverse debug that appeared did not take that approach, but rather targeted unmodified programs using a different set of fundamental techniques. See the next blog post for that part of reverse history.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1554"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1554" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1554" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1554/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reverse History Part One</title>
		<link>http://jakob.engbloms.se/archives/1547?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1547#comments</comments>
		<pubDate>Sun, 08 Jan 2012 18:40:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[record-replay]]></category>
		<category><![CDATA[replay]]></category>
		<category><![CDATA[reverse debugging]]></category>
		<category><![CDATA[reverse execution]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1547</guid>
		<description><![CDATA[For some reason, when I think of reverse execution and debugging, the sound track that goes through my head is a UK novelty hit from 1987, &#8220;Star Trekkin&#8221; by the Firm. It contains the memorable line &#8220;we&#8217;re only going forward &#8217;cause we can&#8217;t find reverse&#8220;. To me, that sums up the history of reverse debugging [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>For some reason, when I think of reverse execution and debugging, the sound track that goes through my head is a UK novelty hit from 1987, &#8220;<a href="http://en.wikipedia.org/wiki/Star_Trekkin%27">Star Trekkin</a>&#8221; by the Firm. It contains the memorable line &#8220;<a href="http://www.youtube.com/watch?v=FCARADb9asE">we&#8217;re only going forward &#8217;cause we can&#8217;t find reverse</a>&#8220;. To me, that sums up the history of <em>reverse debugging</em> nicely. The only reason we are not all using it every day is that practical reverse debugging has not been available until quite recently.  However, in the past ten years, I think we can say that software development has indeed found reverse.  It took a while to get there, however. This is the first of a series of blog posts that will try to cover some of the history of reverse debugging. The text turned out to be so long that I had to break it up to make each post usefully short. <a href="http://jakob.engbloms.se/archives/1554">Part two </a>is about research, and <a href="http://jakob.engbloms.se/archives/1564">part three </a>about products.<br />
<span id="more-1547"></span></p>
<p>Let&#8217;s start with background and definitions.</p>
<p>To me, the key defining factor of <strong>reverse debugging</strong> is the ability to <em>apply breakpoints in reverse</em> &#8211; essentially, to be able to go to the previous occurence of a breakpoint just as well as going to the next. The implementation allowing this might vary.</p>
<p>The contrast to reverse debugging is classic <strong>cyclic debugging</strong> where you run and rerun a program with a problem, looking at variables and setting breakpoints during each run to zoom in on an issue. For cyclic debugging to work, you pretty much require each run to behave the same way. The program has to be fundamentally deterministic. This is<br />
usually the case for non-interactive single-threaded programs, but not the case for real-time programs, parallel programs, or programs that involve some kind of asynchronous input/output (reading and writing files is a special case of IO that can indeed be deterministic since there is usually no interference from the environment in those operations).</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-21.png"><img class="aligncenter size-full wp-image-1589" title="reverse 2" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-21.png" alt="" width="376" height="470" /></a></p>
<p>Thus, for non-deterministic programs and rare errors, reverse debug is what you want. Run once, hit the error, reverse to diagnose it.</p>
<p>In addition to <strong>reverse debug</strong> and <strong>cyclic debug</strong>, we also have<strong> record-replay</strong> debug. In such a system, you record an execution and later replay it forward. Debugging is strictly forward: you cannot step back in time instruction by instruction, nor trigger breakpoints backwards in time. Record-replay debug is a way to allow cyclic debugging on non-deterministic highly variable program runs. This can be implemented with a lot less debugger complexity than a true reverse debugger, even if the underlying runtime system is almost identical to reverse debugging.</p>
<p>So, how do you implement reverse debugging?</p>
<p><strong>Reverse execution</strong> is one way to implement reverse debug. In reverse execution, the execution system has the ability to move backwards in time and put the entire system in the state it was in at some previous point in time.  You can step the system back instruction by instruction or cycle by cycle.  You can also continue the execution<br />
forward from any point in time, throwing away the history of asynchronous inputs to actually take a new execution path.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-1.png"><img class="aligncenter size-full wp-image-1548" title="reverse 1" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-1.png" alt="" width="369" height="186" /></a></p>
<p>Typically, reverse execution is implemented by using checkpoints of previous system states plus a deterministic way to re-execute the system from those checkpoints. The key technical problems to be solved is how to checkpoint the system and how to make re-execution deterministic (note that <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">determinism does not imply invariant or predictable behavior</a>, just that you can reliable recreate a particular execution). You also need to record and replay anything that you cannot re-execute, such as user interactions or network communications with the world outside the controlled system.</p>
<p>Schematically, it works like this, note how simulation time always moves forward even though the logical system time sometimes moves backwards:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-31.png"><img class="aligncenter size-full wp-image-1586" title="reverse 3" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-31.png" alt="" width="456" height="342" /></a></p>
<p>Record-replay debug is usually implemented in a way similar to this, without the ability to go back in time and usually without the intermediate checkpoints.</p>
<p><strong>Trace-based</strong> debugging is based on recording everything a system does into a trace, and once the recording is done, debugger works on the trace rather than using the log to drive an actual system.  Reverse debug is implemented by recreating the state of a system by reading the trace, and finding points in the trace where breakpoint conditions are true. This is also known as <em>post-mortem</em> <em>debug,</em> since you debug after the target system has finished executing (typically). A log can be captured by a hardware device, or by software being instrumented to log everything that is going on. The technology can also be used to implement record-replay debugging.</p>
<p>There are some other important distinctions between different approaches to reverse debugging that we need to keep in mind as we review the history of the field.</p>
<ul>
<li><strong>System or user-level</strong>? Do you debug a system, including the OS, or just a user-level application running on top of the operating system? User-level debug can often be solved in simpler ways than system-level debug, but also suffers from some limitations. There are also times where system-level is simpler.</li>
<li><strong>Single-threaded or multi-threaded</strong>? Can you debug multiple threads, processes, processors, or just a single processor or user-level thread? A single thread of control greatly simplifies the problem.</li>
<li><strong>Cross-target or host-based</strong>? Do you debug programs running on the same host  as the debugger, or can it also target remote targets or cross targets (such as embedded systems)?</li>
<li><strong>Instrumented or unchanged programs</strong>? Quite a few solutions for reverse debug tried over the years involve compiling programs in a special way, using instrumented OS libraries, or other implementation variants where the program debugged is not identical to the eventual deployed program.</li>
</ul>
<p>Given this background, the next post will cover early research, and the third post the beginning of commercial products.</p>
<p>Note that on Stack Overflow, reverse debug does not seem to be particularly big thing still. The reverse-debugging tag has all of <a href="http://stackoverflow.com/questions/tagged/reverse-debugging">10 members</a>, while other topics like C# and Android programming has millions&#8230; so maybe reverse is not quite mainstream yet. Or maybe it is just the case that Stack Overflow has more web-style developers than low-level developers in their membership.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1547"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1547" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1547" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1547/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Keynote on System-Level Debug</title>
		<link>http://jakob.engbloms.se/archives/1568?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1568#comments</comments>
		<pubDate>Sun, 01 Jan 2012 20:12:33 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1568</guid>
		<description><![CDATA[I have now posted the slides from my keynote talk at the S4D 2011 conference to the presentations list on my regular home page. The topic of that talk was &#8220;System-Level Debug&#8221;, something which has started to interest me in recent years. Essentially, as the (embedded) software systems we are developing get more and more [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>I have now posted the slides from my keynote talk at the S4D 2011 conference to the <a href="http://www.engbloms.se/jakob_presentations.html">presentations list </a>on my regular home page. The topic of that talk was &#8220;System-Level Debug&#8221;, something which has started to interest me in recent years.</p>
<p><span id="more-1568"></span>Essentially, as the (embedded) software systems we are developing get more and more complex, we need to update our debuggers to keep up with the times. This means moving beyond looking at single processors, threads, programs, or even nodes &#8211; and making debuggers that support working with multiple contexts at once. For example, debugging a couple of different boards in a rack from a single debug session. This would mean supporting multiple processor architectures, execution modes, and operating systems inside a single debugger session. Debuggers need to orient themselves towards handling software that is already running in a target, rather than starting the target software. The debugger becomes more of a passive observer of the target system than an active component.</p>
<p>Take a look!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1568"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1568" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1568" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1568/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Touch the Screen vs Press a Button</title>
		<link>http://jakob.engbloms.se/archives/1536?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1536#comments</comments>
		<pubDate>Mon, 26 Dec 2011 09:02:48 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[gadgets]]></category>
		<category><![CDATA[Blackberry]]></category>
		<category><![CDATA[BlackBerry Torch]]></category>
		<category><![CDATA[G900]]></category>
		<category><![CDATA[GUI design]]></category>
		<category><![CDATA[iPod]]></category>
		<category><![CDATA[iPod Nano]]></category>
		<category><![CDATA[iPod Touch]]></category>
		<category><![CDATA[keyboard]]></category>
		<category><![CDATA[SonyEricsson]]></category>
		<category><![CDATA[touch screen]]></category>
		<category><![CDATA[UIQ]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1536</guid>
		<description><![CDATA[Is the touchscreen the end-all of user interfaces for mobile devices? There were rumors in early 2011 that the iPad2 would lose all physical buttons (which did not come true, obviously).  To me, that sounds like a really good and bad idea. Good, in the sense that a device that is all a big screen [...]]]></description>
			<content:encoded><![CDATA[<p>Is the touchscreen the end-all of user interfaces for mobile devices? There were rumors in <a href="http://news.cnet.com/8301-17938_105-20028516-1/rumor-no-home-button-for-ipad-2-and-next-iphone/">early 2011 that the iPad2 would lose all physical buttons </a>(which did not come true, obviously).  To me, that sounds like a really good and bad idea. Good, in the sense that a device that is all a big screen certainly looks nice. Bad, since it would be much less user-friendly than a device with some real physical buttons to press.</p>
<p>I have been thinking about this subject lately, after using a <a href="http://www.reghardware.com/2010/10/13/review_smartphone_blackberry_torch_9800/">BlackBerry Torch 9800</a> as my work phone for a few months.  I like the device a lot, but there are certainly some rough edges and some places where there is a UI conflict between touching the screen and pressing the buttons. At the same time, I am using both an <a href="http://jakob.engbloms.se/archives/28">iPod Nano 3G</a>, and a couple of iPod Touches. I used to have SonyEricsson Symbian-based P900, P990i, and <a href="http://jakob.engbloms.se/archives/310">G900</a> smart phones which also were combined touch/press devices with a stylus.</p>
<p><span id="more-1536"></span>I think it is clear that using a physical keyboard is preferable to an on-screen keyboard for typing serious amounts of text.  On the iPod, entering a URL or search term in a browser just feels fiddly, compared to the ease of composition with the physical keyboard on the<br />
BlackBerry.  To me, I cannot get over the feeling that an iPod Touch or iPhone is really best as a consumption device, but that a BlackBerry is a superb creation device.  On a BB, I can type quite long emails, while an iOS device feels more appropriate for the occasional short messages.  To me, this is an important aspect, as I tend use the smart phone as a two-way email communications device.</p>
<p>The slide-out keyboard solution on the BB Torch (and many other phones from many different vendors) seems just right for this, giving you a full-screen device for reading tasks, with a full keyboard when needed.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/blackberry-torch-9800-keyboard-out.png"><img class="aligncenter size-full wp-image-1537" title="blackberry torch 9800 keyboard out" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/blackberry-torch-9800-keyboard-out.png" alt="" width="300" height="181" /></a><br />
On my old G900, I used T9 with a 0 to 9 keypad, and that worked surprisingly well.  A problem both with the BB and definitely with an on-screen keyboard is the fact that using it single-handedly is very hard.  On a screen, you do not get feedback from your fingers where you are on the keyboard, and the full keyboard of the BB is a bit too small to reliably use with a single hand.  I noted this in my <a href="http://jakob.engbloms.se/archives/310">G900 </a>review three years ago, and it still holds true. The G900 looked like this:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/sonyericsson-g900.png"><img class="aligncenter size-full wp-image-1538" title="sonyericsson g900" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/sonyericsson-g900.png" alt="" width="101" height="200" /></a>Since a primary use of our iPods is gaming, I have noted some different UI principles for games.  There seem to be one class of games where touch makes perfect sense, like Angry Birds.  Press and draw to load, swipe to move the display, pinch to zoom &#8211; perfect and logical. Another class of games are obviously ports of mouse-based games, like Plants vs Zombies.  In such games, the touch screen is essentially used as it was back in stylus-time. It is just another way to generate single-point clicks.</p>
<p>Finally, we have the category of games that really would work best with a physical controller, like PacMan. Displaying a four-way control key setup on the screen does not get close to the right feeling.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/pacman-game-on-iphone.png"><img class="aligncenter size-full wp-image-1539" title="pacman game on iphone" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/pacman-game-on-iphone.png" alt="" width="215" height="226" /></a>I think the SonyEricsson <a href="http://www.reghardware.com/2011/04/19/review_sony_ericsson_xperia_play/">Xperia Play </a>was a brilliant idea &#8211; imagine an Apple device with that little built-in game controller.  Seems that SonyEricsson has not quite managed to execute on the idea, but in an ecosystem that generates variation like Android, this kind of device should have a place.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/sonyericsson-xperia-ray-controller.png"><img class="aligncenter size-full wp-image-1540" title="sonyericsson xperia ray controller" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/sonyericsson-xperia-ray-controller.png" alt="" width="300" height="192" /></a><br />
However, I have a hard time seeing the swiss-army-knife design of a phone having both a slide out keyboard and a slide-out game controller. Would be neat, but mechanically I don&#8217;t think it would work very well.</p>
<p>On to navigation.  The idea of touch gestures (in particular swipes) to navigate around the UI as popularized by Apple in the first iPhone is great in many ways.  My youngest child has used it since she was two, and it just works quite naturally.  However, there are still cases where a plain old navigation key works better. And having both is best.</p>
<p>Holding a device in one hand and scrolling a web page is faster on the navigation key on my BB than using touch on an iPod. The old G900 was even more efficient, just hold the down key &#8211; no physical motion needed at all. The iPod would be so much better if it could just have a scroll wheel on the side or something so you could use it one-handed<br />
for reading without having to put a finger on top of the screen (and quite often accidentally clicking links in the process).</p>
<p>I also like the menu key on the BB (and on my old SonyEricsson phones), as a way to quickly get to the most important functions. This seems hard to emulate on a touch screen in a good way. In their touch-enabled phones, BB has tried it with a press-and-hold action &#8211; if you press down on the screen, a little action palette opens. Which does not quite work for me.</p>
<p>Another great advantage of some well-choosen physical buttons is that they offer immediate access to certain functions.  On a touch-screen device, I find that you have to dig through several steps to get into any function.  Home screens with most commonly accessed functions in all honor, but a direct key to dial a call or activate the camera is faster.  It cannot support all functions, obviously, but it lets you bring up the most common and important functions quickly. This is an aspect where my iPod Nano is much better than an iPod touch. On the iPod Nano, you can have it in a pocket and just hit the device to move to the next song or pause. On an iPod Touch, you have to look at the screen and find the right spot to hit.</p>
<p>On the other hand, the Nano would have been even better with a couple of buttons to increase/decrease volume rather than the scroll wheel. When out running, having to hold the things in both hands to make simple adjustments is really annoying. It also does not work with gloves on, which is a big problem with all touch device as well as touch-wheels &#8211; they require naked fingers, which is annoying in a climate where gloves are on for at least half the year. <a href="http://jakob.engbloms.se/archives/290">As I said before</a>, someone should put a design center in Luleå or Novosibirsk or Alaska, and then see what kinds of devices come out.</p>
<p>A full keyboard like the BB also have the nice property of providing many points of access to shortcuts.  It might seem quite backwards of me, but I like the speed of action you get with the keyboard shortcuts in the BB email application. It has a lot in common with classic computer UIs, actually. It is hard to afford immediate access to 20+ functions on a touch screen without creating a very cluttered interface.  On the other hand, a touch interface done right makes it simple to access system-level things quickly by just touching the status icons on the screen &#8211; you get out of the application faster than if you had to use the keyboard to navigate up to some button.  For some reason, neither the iOS devices nor the BB does this right, while my old SonyEricsson smartphones all let me access detailed status by hitting the battery icon or network icons.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/g900-icons1.png"><img class="aligncenter size-full wp-image-1542" title="g900 icons" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/g900-icons1.png" alt="" width="347" height="113" /></a><br />
However, after singing the praise of the keyboard and touch screen in union, I must admit that there is a big downside to a combined touch/press device. UI design is harder when you have to both afford a good keyboard-controlled interface and a touch interface. The BB Torch I have suffers from this quite badly &#8211; having to afford both touch and non-touch interaction, as well as non-touch versions of the device itself does mean that the UI is not as streamlined and elegant as what you can get on a pure touch device. I often find myself jumping between keyboard and touch as I cannot quite complete what I want to do using just one or the other interaction mode, which really should not be necessary. Only having to think about a single type of input does seem to make things simpler for a UI designers, even if it also sometimes precludes creating truly efficient applications.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1536"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1536" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1536" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1536/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fujitsu Server Fault Injection Robot</title>
		<link>http://jakob.engbloms.se/archives/1530?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1530#comments</comments>
		<pubDate>Sun, 11 Dec 2011 20:53:25 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[fault injection]]></category>
		<category><![CDATA[fujitsu]]></category>
		<category><![CDATA[Masafumi Matsuo]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[Yuichi Kurita]]></category>
		<category><![CDATA[Yuji Uchiyama]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1530</guid>
		<description><![CDATA[Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/fujitsulogga.png"><img class="alignleft size-full wp-image-1531" style="margin: 10px 5px;" title="fujitsulogga" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/fujitsulogga.png" alt="" width="57" height="47" /></a>Fault Injection is a topic that has fascinated me for a long time. Not just the area of <a href="http://en.wikipedia.org/wiki/Fault_injection">software-to-software fault injection</a>, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a <a href="http://blogs.windriver.com/engblom/2010/10/the-virtual-basil-fawlty.html">simulator</a>). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.</p>
<p><span id="more-1530"></span>The <a href="http://www.fujitsu.com/global/news/publications/periodicals/fstj/">Fujitsu Scientific and Technical Journal </a>is the Fujitsu equivalent of IBM&#8217;s Journal of Research and Development. Thankfully, the FSTJ is still free while IBM erected a paywall around their articles. <a href="http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol47-2.html">Number 2 of 2011 </a>had the theme of servers, and there is an article about <a href="http://www.fujitsu.com/downloads/MAG/vol47-2/paper07.pdf">Quality Assurance for  Stable Server Operation </a>by Masafumi Matsuo, Yuji Uchiyama, and Yuichi Kurita.</p>
<p>The article describes the process of ensuring that the final servers that are shipped to customers (from what seems to be the Sparc-based line of Fujitsu computers, even though it might actually apply equally to their mainframes and x86-based servers) are as stable as possible. Apart from designing things right, this also requires testing the fault handling and recovery operations.</p>
<p>I found it noteworthy that they do a lot of configuration testing, where various hardware and software configurations are played off against each other. In this way, corner cases are explored and coverage of the actual configurations that customers will be using becomes more likely (it is always dangerous to only test on one or a few configurations). They push memory system and processor loads to very high levels to ensure continued operation even in extreme cases, and also try to push the actual chips to make sure they will operate reliably in a wide range of environmental conditions. Indeed, a large focus is placed on pure physical reliability, as that is the basis for system reliability.</p>
<p>The best part, however, is on page four of the article, where they show the physical fault injection robot that is applied to the chips mounted on boards. This robot  shorts out individual pins on chips, clamping them to zero volts. It goes over all pins, and the test system checks what happens to the system in each case. Some kind of exhaustive testing going on here.</p>
<p>Neat. I have heard other stories of physical fault injection, including complex mechanisms like passing computer boards through irradiation chambers all the way to brutally simple tasks like putting an axe into a board to break it or pulling boards out of racks to simulate a sudden catastrophic failure. I would like to see more of just how these things are done in the real world. I suspect there are quite a few interesting robotics setups out there that do fault injection.</p>
<p>In any case, the article offered an interesting glimpse of many of the techniques used to make computer systems robust and reliable. Recommended.</p>
<p>It ends by noting that deeply consolidated SoC designs and aggressive dynamic power management are challenging from a testing and observation perspective, as well as creating more single points of failure. If a single system on chip fails, that system is all gone&#8230;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1530"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1530" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1530" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1530/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debug, Design, and Microsoft Data</title>
		<link>http://jakob.engbloms.se/archives/1527?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1527#comments</comments>
		<pubDate>Sat, 19 Nov 2011 15:38:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Communications of the ACM]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[Kinshuman Kinshumann]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Steven Sinofsky]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Windows 8]]></category>
		<category><![CDATA[Windows Explorer]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1527</guid>
		<description><![CDATA[It used to be that Microsoft was the big, boring, evil company that nobody felt was very inspiring. Today, with competition from Google and Apple as well as a strong internal research department, Microsoft feels very different. There are really interesting and innovative ideas and paper coming out of Microsoft today.  It seems that their [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png"><img class="alignleft size-full wp-image-1205" title="windows phone logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png" alt="" width="66" height="58" /></a>It used to be that Microsoft was the big, boring, evil company that nobody felt was very inspiring. Today, with competition from Google and Apple as well as a strong internal research department, Microsoft feels very different. There are really interesting and innovative ideas and paper coming out of Microsoft today.  It seems that their investments in research and software engineering are generating very sophisticated software tools (and good software products).</p>
<p>I have recently seen a number of examples of what Microsoft does with the user feedback data they collect from their massive installed base. I am not talking about Google-style personal information collection, but rather anonymous collection of user interface and error data in a way that is more designed to built better products than targeting ads.</p>
<p><span id="more-1527"></span>The first paper is &#8220;<a href="http://dl.acm.org/citation.cfm?id=1965749&amp;bnc=1">Debugging in the (very) large: ten years of implementation and experience</a>&#8221; by Kinshumann et al, Communications of the ACM, July 2011. This paper describes how Microsoft uses of the data they collect from Windows Error Reporting (you know, the little dialog boxes that appear every once in a while on Windows when a program has crashed or frozen, or Windows restored from a crash).</p>
<p>Microsoft has a number of heuristics that look at the data collected, grouping the bug reports into buckets. Ideally, each bucket corresponds to a single root cause for possibly quite different errors. They automatically analyze the errors and generate metadata about the error reports that can be used to generate statistics and allow database queries to be performed over all collected error<br />
reports.  Heuristics include walking through chains of threads blocked on synchronization objects to determine which one is the actual cause of a hang, and finding the most likely thread and stackframe for containing the root cause of an error.  Heuristics are applied both on the client and the server, but mostly on the server. Technically very hard to do right, I can appreciate the huge amount of work that has gone into engineering this.</p>
<p>With this huge pile of information, a new debugging method becomes available: statistics-driven bug finding and prioritization at large scale.  The introduction to the paper puts it very well:</p>
<blockquote><p>Beyond mere debugging from error reports, WER enables a new form of statistics-based debugging. WER gathers all error reports to a central database. In the large, programmers can mine the error report database to prioritize work, spot trends, and test hypotheses. Programmers use data from WER to prioritize debugging so that they fix the bugs that affect the most users, not just the bugs hit by the loudest customers. WER data also aids in correlating failures to co-located components. For example, WER can identify that a collection of seemingly unrelated crashes all contain the same likely culprit—say a device driver—even though its code was not running at the time of failure.</p></blockquote>
<p>For a product manager like me, used to working with individual bug reports in bug reporting systems and trying to manually assess the importance of each error, this is nothing short of a dream.  Instead of trying to guess how many users can be impacted by a bug, Microsoft can run queries against the error report database and get a fairly accurate idea of how common a certain error is in the user base.  This has allowed them to address the most common errors first, leading to Windows and Office becoming more stable for more users in recent generations.  They can also pinpoint which device drivers are causing the most issues, and putting pressure on vendors to clean up their act.</p>
<p>I wonder where else you really apply this idea of statistical debugging. You need a large user base, in systems which are connected to the Internet so you can collect data, and who are comfortable with providing direct feedback to you as a vendor.  Apparently, Apple has the same kind of feature built into iOS, with more than 100 million users which seem not to be too interested in strong privacy.  Presumably, Google can do the same thing with Android, at least its use in phones. Mozilla has a crash reporter, so I guess it makes sense in the consumer space.</p>
<p>But when your user base counts in thousands of seats and half of these are in defense sector beyond air-gaps, it is harder to apply. Products that call home are not taken to kindly in the professional field, as secrecy and confidentiality is very important to big companies. Industrial embedded products like telecom infrastructure might have sufficient volume of code and computer hardware to form a basis for statistical reporting &#8211; as long as operators agree to provide the information to the hardware vendors.</p>
<p>Another example of how Microsoft makes use of their collected data is in UI design. The blog post &#8220;<a href="http://blogs.msdn.com/b/b8/archive/2011/08/29/improvements-in-windows-explorer.aspx">Improvements in Windows Explorer</a>&#8220;, by Steven Sinofsky, from the Windows 8 blog discusses how Windows Explorer has evolved over the years, and how it is now getting a radical redesign based on usage data.  Microsoft is an enviable position here, having collected information about what millions of users are doing.  Definitely beats inspiration or trying with a few users in a classic user interface lab.</p>
<p>I have seen quite a few people criticize this blog post from a variety of angles &#8211; from the fact that they are not data-driven enough and keep rarely-used buttons in the ribbon to the fact that they remove somebody&#8217;s favorite function.  It is also the case that the measurements can only tell you which functions people are using from what is available today &#8211; if you want to invent new things, data like this might not be very helpful.</p>
<p>Fortunately, Microsoft also seems to have taken a clue from Linux and is allowing much more user customization than before. For me, this is great news, as I seem to have a user profile quite far from the mainstream.  We have not seen Windows 8 in its final form just yet, but hopefully this approach will be applied to other parts of that GUI overhaul too.  There are professional Windows users who need an OS that makes even very esoteric operations easy to access, and customizations of things like the start menu possible.  Hopefully, we do not get washed away by the flood of data from regular users.</p>
<p>For some reason, I feel that bug reporting is not as sensitive to the user style as GUI design &#8211; Windows and driver bugs would seem to be more evenly distributed as they depend more on hardware than on software. At least it seems to me that Windows is more stable today<br />
than it was a couple of years ago.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1527"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1527" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1527" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1527/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Interview with a Networked Simics User</title>
		<link>http://jakob.engbloms.se/archives/1524?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1524#comments</comments>
		<pubDate>Wed, 16 Nov 2011 15:58:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[testing]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Dan Poirot]]></category>
		<category><![CDATA[RTI]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1524</guid>
		<description><![CDATA[There is a new post at my Wind River blog, an interview with Dan Poirot at RTI who is using Simics to model and test heterogeneous, distributed, networked systems. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, an <a href="http://blogs.windriver.com/engblom/2011/11/simics-for-distributed-systems-an-interview-with-dan-poirot.html">interview with Dan Poirot at RTI </a>who is using Simics to model and test heterogeneous, distributed, networked systems.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1524"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1524" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1524" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1524/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DV* 30 Years</title>
		<link>http://jakob.engbloms.se/archives/1520?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1520#comments</comments>
		<pubDate>Sun, 13 Nov 2011 20:32:34 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[off-topic]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[DVL]]></category>
		<category><![CDATA[DVP]]></category>
		<category><![CDATA[Uppsala University]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1520</guid>
		<description><![CDATA[On the very binary date of 11-11-11, my alma mater, the computer science (DV, for datavetenskap) education at Uppsala University celebrated its thirty years&#8217; anniversary. It was a great classic student party in the evening with a nice mix of old alumni and fresh-faced students. Lots of singing and some nice skits on stage. Great [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/11/dv30%C3%A5r-100x96.jpg"><img class="alignleft size-full wp-image-1521" style="margin: 5px 10px;" title="dv30år (100x96)" src="http://jakob.engbloms.se/wp-content/uploads/2011/11/dv30%C3%A5r-100x96.jpg" alt="" width="100" height="96" /></a>On the very binary date of 11-11-11, my alma mater, the computer science (DV, for datavetenskap) education at Uppsala University celebrated its <a href="http://www.datavetenskap.nu/jubileum/">thirty years&#8217; anniversary</a>. It was a great classic student party in the evening with a nice mix of old alumni and fresh-faced students. Lots of singing and some nice skits on stage. Great fun, and my voice has still not recovered. It also got me thinking about it is that we really do as computer scientists.</p>
<p><span id="more-1520"></span>As David Alan Grier would have said, this kind of event tends to serve to build the professional identity of a group of people. Computer science is not a profession per se, but it is clear that the Uppsala computer science students (almost 2000 has started since 1981) has a particular culture that comes back to us very easily when amongst our peers. I think it is based in the art and practice of <strong>programming</strong>.</p>
<p>The talks and discussions during the dinner often went back to the defining experiences of our student days, and these experiences were mostly about night hacks and classic labs. A speaker told us about how in 1983 they tried to program a simple computer built by the department itself, and how it turned out that 3 out 4 machines were broken. There idea that they expected the program to just work the first time it loaded made me look up my <a href="http://blogs.windriver.com/engblom/2011/05/twenty-thirty-and-sixty-years-ago.html">favorite debugging quote</a>. Somebody reminded me (19 years after the deed) about the time I made a Prolog program exhaust all memory on a Sun server, by performing an exhaustive search for a problem with no solution).</p>
<p>I remembered how some students I taught (while still an undergraduate myself, a common practice at DV) spent 24 hours a day for almost a week in a lab room trying to get their operating systems to boot on some MIPS-based lab machines. In particular, the group that was gripped by ambition and tried to turn on the MMU. Each time they reset the machine and tried to boot their OS, they would see a serial terminal spewing out diagnostics text and then stopping cold&#8230; as they failed again to make it work (they passed the course anyway).</p>
<p>This all indicates the fundamental importance of programming to computer science students. That is also what we believed was our core mission when I was a student &#8211; to go out into the world and create great software. Many still do, even if quite a few of us have left day-to-day coding to become project leaders and outright managers. Can&#8217;t say I program all that much myself, apart from some demos and virtual machine scripts, but I still find the topic incredibly interesting and important.</p>
<p>The event also touched on institutional memory and the longevity of data. The very ambitious anniversary committee had produced a brand new version of the classic &#8220;Manualen&#8221;, a song book first produced in 1995. In it, I found a text I wrote in 1995 about what a computer scientist actually does (a bit pompous, as can be expected from a proud student) as well as a photo of myself from a 1996 cover of the DV student magazine &#8220;Blurgel&#8221;. However, in both cases, these had been reproduced from paper copies. There was no digital memory in place of these 15-year-old pieces of data.</p>
<p>I actually still have both paper copies in good shape and the data &#8211; on a multisession CD-R created in 1999 on a Mac. My current computers all being Windows-based cannot extract the data. I know a Mac can still read them, but it is not clear that the data can be used today. It is likely in some old version of <a href="http://en.wikipedia.org/wiki/Pagemaker">Aldus PageMaker </a>and with no file extensions to guide you as to what each file is. The images are greyscale or black-and-white high-resolution TIFF files I believe (from the era before PNG), once again with no file extensions to hint at what is what. It underscores the importance <a href="http://jakob.engbloms.se/archives/180">of actual running programs and systems </a>as a way to access our digital past. The data might all be there and readable, but with no software to interpret it, what can you do? I actually <a href="http://jakob.engbloms.se/archives/19">noted the same</a> four years ago for the somewhat late 25 years anniversary.</p>
<p>Overall, a very memorable evening, and kudos to the organizers for putting it all together.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1520"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1520" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1520" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1520/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Jan Bosch: Software Provocateur</title>
		<link>http://jakob.engbloms.se/archives/1516?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1516#comments</comments>
		<pubDate>Sat, 29 Oct 2011 18:09:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Jan Bosch]]></category>
		<category><![CDATA[Lindholmen Software Development Day]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1516</guid>
		<description><![CDATA[Last week, I had the honor of presenting at and attending the talks of the Lindholmen Software Development Day. The first keynote speaker was Professor Jan Bosch from Chalmers, who did his best to provoke, prod, and shock the audience into action to change how they do software. While I might not agree with everything [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/10/lindholmen-logo.png"><img class="alignleft size-full wp-image-1517" style="margin: 5px;" title="lindholmen logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/10/lindholmen-logo.png" alt="" width="57" height="55" /></a>Last week, I had the honor of presenting at and attending the talks of the <a href="http://www.lindholmen.se/sv/node/17005">Lindholmen Software Development Day</a>. The first keynote speaker was Professor <a href="http://janbosch.com/Jan_Bosch/Jan_Bosch.html">Jan Bosch </a>from <a href="http://www.chalmers.se/sv/Sidor/default.aspx">Chalmers</a>, who did his best to provoke, prod, and shock the audience into action to change how they do software. While I might not agree with everything he said, overall it was very enjoyable and insightful talk.</p>
<p><span id="more-1516"></span></p>
<p>Jan Bosch is a professor in the software engineering center at Chalmers in Gothenburg. He is a dutchman who has worked all over the world. He used to run Nokia&#8217;s research labs in Finland, and then moved to Intuit in Silicon valley. Now, he is back in Scandinavia at Chalmers. What I think he tried to do was to impress the audience with the software development ethos and style of Silicon Valley. I got a feeling that he wanted to make it felt that Europe is maybe behind on modern software development.</p>
<p>However, I am not always sure how the experience from web companies like Google and Amazon can translate into the development of software for systems like telecoms and vehicles (which formed a large part of the audience).</p>
<p>The main points of the talk, and my comments:</p>
<p><strong>Speed is more important than anything else</strong>. If we can get 10% efficiency improvement out of our , it should be used to cut lead times by 10%, not reduce cost and keep the same time. The earlier we get software out, the bigger our revenue. This means that you should focus engineering efforts on shortening lead times rather than reducing man hours worked- and if you get the lead times shorter, you will automatically get efficiency improvements. I must say I agree with this point.</p>
<p>We should <strong>turn R&amp;D into an experimental system</strong> &#8211; quickly implement features and ideas and test them in some way. Maybe not with external customers, internal users might do just as well (that&#8217;s how Apple works, for example, by evaluating many candidate designs by prototypes that are never seen by the outside). This is a good idea, but I think it might be hard to find the resources to do this in many cases &#8211; Apple is kind of extreme in that it rides high, is immensely profitable, and release only a few products each year.</p>
<p>Once we have experiments, <strong>product planning should be based on data</strong>, not on opinion. Sometimes, vision is needed, but that should be turned into small experiments to validate ideas, not big long-term full-blown plans based on opinion. This sounds very nice &#8211; but getting hard data can be very hard in practice.</p>
<p>Initially,  I felt that this would conflict with the concept of the &#8220;BHAG&#8221; &#8211; Big Hairy Audacious Goals &#8211; from the book &#8220;<a href="http://en.wikipedia.org/wiki/Built_to_Last:_Successful_Habits_of_Visionary_Companies">Built to Last</a>&#8220;. There, vision that transcends the current state of affairs is crucial to build products that really change the world. You cannot base that on data &#8211; if you ask people what they need, they will not reply with what they do not know. But I guess that if you have a great idea, it is a good idea to vet it with some quick experiments to ascertain that it makes sense at all, before betting the firm on it. So, I guess there is no real conflict here.</p>
<p><strong>Quickly test new features with customers</strong>. He sees the world switching to a model where the <em>producer decides when an updrade is applied</em> to the market, not the consumers. Already in place for things like Facebook, Google, and Apple iOS devices. I wonder though if this works for professional software and systems. The customers that I know have to plan the rollout of new versions carefully in order to not disrupt projects that rely on them. When someone pays for a piece of software, they tend to want to have something to say about new features too. Consider the uproar that happens every time Facebook change their UI &#8211; if you had paid a hundred thousand dollars for the system, I think Facebook would have listened rather more to their users than they do now. As long as there is no alternative, people will grudgingly accept and learn to live with it &#8211; even though the new version might be strictly worse for them than the old version.</p>
<p>When users do have a choice, they often stay with older version. Just consider the vast numbers of PCs out there still running Windows XP. In some way, I feel that this freedom to choose and control your own computing environment is being threatened by this trend of the web.</p>
<p>If you can recruit willing beta testers in your user community, this is great. But you need quite a few customers in place to be able to have enough beta testers to have anything like a representative sample. For a consumer application with tens or hundreds of millions of users, this is not too hard. For a professional piece of software tooling with less than a hundred different customers, you tend to get three loud voices &#8211; and we are essentially back to opinion rather than data. It would have been interesting to discuss this with Jan, but I never managed to grab him during the day to discuss this. There is also the time it takes to upgrade a user and have them test a new version &#8211; even if the upgrade does not require any changes to their existing code or systems, they still might need training on the new system to fully use it.</p>
<p>The counter to this is obvious: if things are this complicated to use and deploy, maybe we should think about making deployment and use simpler?  And not hide behind &#8220;it is hard and we have always done things this way&#8221;. Make no mistake &#8211; I would love to be able to this, especially with things like debuggers and other daily tools of the programming trade.</p>
<p>Still, I do think that stability is sometimes a hard requirement. When you have a system that you want to maintain for decades, it is very sound to freeze the versions of critical tools used to develop it and stick to these, to minimize the risk of failure due to changes. If the system is certified, you basically have to. Amazon does not build flight control systems. Their system going down is certainly going to cause economic havoc, but nobody will die.</p>
<p>Create an <strong>after market</strong> &#8211; customers will come to expect new features over time, and might well pay for them. This &#8220;app market&#8221; idea for software features keeps coming up in current discussions on architecture and software design. My guess is that it will actually work, once professional users get influenced enough by the overall consumerization of IT to take certain patterns for granted. Selling new software for car engine control or new features into a telecom switch sounds pretty sane &#8211; and is sometimes already practiced today.</p>
<p><strong>Get to know the customer</strong>- at Intuit, each engineer famously follows a customer for one day each year to understand their life and get empathy with their users. Knowing your customer is key, the better you know what your customer wants, the more they will take to your product. This idea is something that I have actually seen being tried in the past, with mixed results. Once again, in a business-to-business world this requires some selling to happen, and customers might be sensitive to information leakage and secrecy. However, often the value of having a product expert available to them for a few days to help them work better and run their systems more efficiently is  great idea. If your business has a consulting or services arm, this can be part of regular business &#8211; as long as you let core development engineers consult a bit and not keep them all hidden inside the company.</p>
<p>When it came down to practical software development practice, Jan Bosch was all about agile, automatic building, testing, and deploying, and standard modern practice. He retold the Amazon &#8220;<a href="http://www.brainwatt.com/two-pizza-rule/">Two Pizza Rule</a>&#8221; &#8211; i.e., keep groups small. Keep groups self-governing, based on quantitative assessments of their output, and guide them with goals, not detailed requirements. That certainly works for a certain class of highly motivated and skilled labor that tends to concentrate in Silicon Valley &#8211; but in the real world, we also have to deal with less able developers that do need leadership to perform and do the right thing. Not everyone can be totally selective in hiring, unfortunately.</p>
<p>Still, a provocative talk that really did get me thinking about how we do things. Which I think was Jan&#8217;s goal to start with, so I guess that means mission accomplished. If you ever get the chance to listen to Jan, take it!</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1516"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1516" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1516" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1516/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Why Simics will not run Super Mario</title>
		<link>http://jakob.engbloms.se/archives/1510?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1510#comments</comments>
		<pubDate>Fri, 14 Oct 2011 09:27:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1510</guid>
		<description><![CDATA[On my Wind River blog, I just posted a fairly long post about simulation abstraction levels. It was inspired by a cool article in ArsTechnica about Nintendo emulators, and the costs and benefits of being ever more faithful to the hardware. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />On my Wind River blog, I just posted a <a href="http://blogs.windriver.com/wind_river_blog/2011/10/why-simics-wont-run-super-mario-.html">fairly long post about simulation abstraction levels</a>. It was inspired by a cool article in <a href="http://arstechnica.com/gaming/news/2011/08/accuracy-takes-power-one-mans-3ghz-quest-to-build-a-perfect-snes-emulator.ars">ArsTechnica</a> about Nintendo emulators, and the costs and benefits of being ever more faithful to the hardware.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1510"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1510" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1510" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1510/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GPGPU for Instruction-Set Simulation &#8211; Maybe, Maybe not</title>
		<link>http://jakob.engbloms.se/archives/1506?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1506#comments</comments>
		<pubDate>Sat, 08 Oct 2011 19:17:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[CCGrid]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[GPGPU]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[simulation]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1506</guid>
		<description><![CDATA[I just read a quite interesting article by Christian Pinto et al, &#8220;GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms&#8220;, published at the CCGRID 2011 conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/05/coreshrink1.png"><img class="alignleft size-full wp-image-125" style="margin: 5px 10px;" title="coreshrink1" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/coreshrink1.png" alt="" width="100" height="100" /></a>I just read a quite interesting article by Christian Pinto et al, &#8220;<a href="http://infoscience.epfl.ch/record/164471">GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms</a>&#8220;, published at the <a href="http://www.ics.uci.edu/~ccgrid11/">CCGRID 2011 </a>conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the execution is not without its flaws and it is unclear at least from the paper just how well this generalizes, scales, or compares to parallel simulation on a general-purpose multicore machine.</p>
<p><span id="more-1506"></span>The paper describes a simulation for a network-on-chip based homogeneous system containing a &#8220;ARM-subset&#8221; ISS instances with local instruction and data caches, some local RAM, and also some shared RAM. Each core runs its own local software load, there is no SMP operating system. All communication between cores is over shared memory, using explicit operations across the NoC. All cores run a single cycle before they check communications from their neighbors.</p>
<p>This last point is crucial to understanding why this is feasible at all &#8211; in general, simulating a general shared-memory multiprocessor machine on a shared-memory multiprocessor falls down on the synchronization overhead. If your simulation semantics dictate that you synchronize every cycle anyway, and you do not try to optimize each core simulator, there is clearly decent room for parallel execution. By including the cache, they increase scalability, since there is more work per target cycle that can be run in isolation.</p>
<p>After reading the article, I am impressed by their work &#8211; just getting this to work is pretty good work. But there are quite a few questions which are not really answered in the article and which are crucial to understanding just how well GPGPUs could be used for this kind of ISS work.</p>
<ul>
<li>The targeted level of abstraction is a bit confusing. The authors claim it is &#8220;instruction accurate and not cycle accurate&#8221;, but still simulate caches and cycle-based communications across the NoC. If I read the paper right, communications will take a varying number of cycles depending on the distance for messages to travel. This is more detailed than a typical &#8220;instruction accurate&#8221; simulator.</li>
<li>The target system does not run an OS &#8211; that might (but I do not know) be an advantage for their approach, since it probably implies less variation in the instruction flow in cores, potentially enhancing the amount of time that all ISSes in a thread group in the GPU can execute the same instruction. This would seem crucial, as if each ISS was running a totally different program, the instruction execution part of the code would be running serialized.</li>
<li>They should really try to run the same kind of simulation on a high-end x86 CPU like an Intel Sandy Bridge with 8 or more hardware threads. I wonder if their scaling might not work just as well there &#8211; and with a much faster serial execution engine. This should give  a much more relevant point of comparison for GPU vs CPU execution of the simulator than&#8230;</li>
<li>the comparison object they use right now, a JIT-accelerated multicore simulation using OVP seems pretty irrelevant since it is not doing the same thing at all. That simulator does not simulate the caches or NoC, just a large number of isolated processors. They also do not run a parallel program on OVP, but rather a large number of single-core fibonacci and dhrystone programs. Thus, the fact that OVP uses a large temporal decoupling time slice does not matter for semantics. It just does not seem like a very relevant comparison point. OVP and their simulator try to solve different problems &#8211; fast execution of general code vs. performance profiling of massively parallel machines.</li>
<li>As I understand it, the given &#8220;S-MIPS&#8221; numbers in the evaluation tell us the total number of MIPS that we get out across all target cores. That seems to peak around 2000 &#8211; which isn&#8217;t necessarily that fantastic if we compare to high-performance ISS work in general where a few GIPS is definitely achievable. It is pretty good considering the level of detail here, though, where i would expect a normal ISS + cache simulator to produce at most a few MIPS. Once again, the authors need to be a bit more precise as to what they compare to what.</li>
<li>Not having an MMU and not implementing any interrupts or exceptions in the target machines avoids a large part of the complexity of a real ISS. That complexity might well be too much for the quite rigid execution environment of a GPGPU.</li>
<li>They missed that Simics, unique among instruction-accurate mainstream simulators, is <a href="http://jakob.engbloms.se/archives/128">parallel </a>since version 4.0.</li>
</ul>
<p>So, overall, this paper does not really tell us much whether a GPGPU can be used for instruction-set simulation in general. It does tell us that it might be doable, but there are many crucial complications which are not addressed.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1506"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1506" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1506" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1506/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Adversarial Approach to Compilation</title>
		<link>http://jakob.engbloms.se/archives/1504?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1504#comments</comments>
		<pubDate>Sun, 02 Oct 2011 19:07:55 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[ACM Queue]]></category>
		<category><![CDATA[Communications of the ACM]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1504</guid>
		<description><![CDATA[Paul Henning-Kamp has written a series of columns for the ACM Queue and Communications of the ACM. He is  pointed, always controversial, and often quite funny. One recent column was called &#8220;The Most Expensive One-Byte Mistake&#8220;, which discusses the bad design decision of using null-terminated strings (with the associated buffer overrun risks that would have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/08/acm-queue-logo.gif"><img class="alignleft size-full wp-image-249" style="margin: 5px 10px;" title="acm-queue-logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/08/acm-queue-logo.gif" alt="" width="150" height="91" /></a>Paul Henning-Kamp has written a series of columns for the <a href="http://queue.acm.org">ACM Queue </a>and <a href="http://cacm.acm.org/">Communications of the ACM</a>. He is  pointed, always controversial, and often quite funny. One recent column was called &#8220;<a href="http://queue.acm.org/detail.cfm?id=2010365">The Most Expensive One-Byte Mistake</a>&#8220;, which discusses the bad design decision of using null-terminated strings (with the associated buffer overrun risks that would have been easily avoided with a length+data-style string format). Well worth a read. A key part of the article is the dual observation that compilers are starting to try to solve the efficiency problems of null-terminated strings &#8211; and that such heavily optimizing compilers quite often very hard to use.</p>
<p><span id="more-1504"></span>He has a wonderful line from an anonymous Convec C3800 programmer who accounted his experience with the optimizing compiler as &#8220;<em>having to program as if the compiler was my ex-wife&#8217;s lawyer</em>&#8220;. I can recognize and understand the sentiment. The C language, in particular, is full of corner cases where behavior is &#8220;undefined&#8221; or &#8220;implementation dependent&#8221;. The exact behavior of some statements is also pretty complex &#8211; and as a compiler writer, you end up having to work with the rules of the language like a lawyer works the law &#8211; the thinking is not all that different. And if you take this approach to the extreme, the description of the compiler as an adversarial lawyer is spot-on.</p>
<p>Unfortunately, it is quite easy in C (and trivial in C++) to wander into such territory without noticing and without doing anything that is obviously wrong for an average programmer. For example, <a href="http://jakob.engbloms.se/archives/1489">the program I described in my recent post about a bug demo</a> is strictly speaking using undefined behavior, and the compiler is theoretically free to generate a program that reformats the harddrive or does nothing at all. In practice, I don&#8217;t think compiler designers strive to generate truly creative unexpected results. Rather, if you detect unclear statement in the program, you would emit a warning.</p>
<p>Still, I can identify with this kind of adversarial relationship with a compiler. It really feels like being in a game with a <a href="http://en.wikipedia.org/wiki/Rules_lawyer">rules lawyer</a>, where you read the rules (programming language book) and try to figure out how to play the better than your opponent.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1504"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1504" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1504" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1504/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>EETimes Articles on Simics</title>
		<link>http://jakob.engbloms.se/archives/1500?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1500#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:26:38 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[EETimes]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1500</guid>
		<description><![CDATA[I just had two articles published the Embedded Design part of the EETimes. First, &#8220;Rethink your project planning with a virtual platform&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png"><img class="alignright size-full wp-image-155" title="eetimes logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png" alt="" width="127" height="56" /></a><a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png"><img class="alignleft size-full wp-image-1501" style="margin: 5px 10px;" title="simics logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png" alt="" width="44" height="44" /></a>I just had two articles published the Embedded Design part of the <a href="http://www.eetimes.com">EETimes</a>.</p>
<p>First, &#8220;<a href="http://www.eetimes.com/design/embedded/4226939/Rethink-your-project-planning-with-a-virtual-platform?Ecosystem=embedded">Rethink your project planning with a virtual platform</a>&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work much better.</p>
<p>Then we have &#8220;<a href="http://www.eetimes.com/design/embedded/4227781/Transporting-bugs-with-virtual-checkpoints?Ecosystem=embedded">Transporting bugs with virtual checkpoints</a>&#8220;, which is a shorter, popular science, version of the paper I published last year at <a href="http://jakob.engbloms.se/archives/1231">S4D</a>. This describes how you can use checkpointing in a virtual platform to communicate bugs across time, space, and teams.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1500"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1500" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1500" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1500/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nvidia &#8220;Kal-El&#8221; Variable SMP</title>
		<link>http://jakob.engbloms.se/archives/1496?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1496#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:16:33 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[multicore computer architecture]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1496</guid>
		<description><![CDATA[Nvidia recently announced that their already-known &#8220;Kal-El&#8221; quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a &#8220;normal&#8221; quad-core would. They call the architecture &#8220;Variable SMP&#8221;, and it is a pretty smart design. The one where you think, &#8220;I should have thought of that&#8221;, which is the best sign of something [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/nvidia-logo.jpg"><img class="alignleft size-full wp-image-1497" style="margin: 5px 10px;" title="nvidia logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/nvidia-logo.jpg" alt="" width="48" height="48" /></a>Nvidia <a href="http://blogs.nvidia.com/2011/09/quad-core-kal-el’s-stealth-fifth-core-lets-it-save-on-energy/">recently announced </a>that their already-known &#8220;Kal-El&#8221; quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a &#8220;normal&#8221; quad-core would. They call the architecture &#8220;Variable SMP&#8221;, and it is a pretty smart design. The one where you think, &#8220;I should have thought of that&#8221;, which is the best sign of something truly good.</p>
<p><span id="more-1496"></span>It is common practice in multicore computing today to dynamically change the clock frequency of a processor and turn cores on and off in order to adjust the compute power available to the current workload. Such operations tend to be limited in scope, as processors have minimum clock frequencies that make sense, and often the memory system requires all cores to be at the same frequency. Operating systems also tend to want to work with homogeneous sets of cores, as that makes scheduling reasonably straight-forward. This is probably what has kept the idea of &#8220;small + large&#8221; cores of the same ISA out of the mainstream of SMP design, despite all its advantages in principle.</p>
<p>Now, Nvidia has managed to implement some of that idea in Kal-El.</p>
<p>The key observation is that if you can turn cores on and off, once you get down to a single active core, any system is by definition homogeneous across all cores regardless of what that core is. Changing the nature of this core should then be much easier, since there is only a single core to contend with.</p>
<p>What Nvidia does in Kal-El is to add a fifth low-power core to the main group of four high-performance cores. The fifth core is architecturally identical (ARM Cortex-A9), so that the system state can be moved from the high-performance to the low-performance cores without undue complexities. Indeed, this is all done in hardware, so the OS (typically, Android) thinks it is running on a homogeneous quad-core. When the system is lightly loaded and the OS decides to only have a single core on, the hardware can detect the load is <em>really</em> light, and effectively change the nature of the active core to a low-power-optimized version.</p>
<p>Once more compute power is needed, the hardware invisible slips back to the first high-power core, and then the OS can start increasing clocks and turning on cores as usual. It is effectively the same as a regular ARM Cortex-A9 quad-core setup, but with better low-power performance. The following graph from the Nvidia <a href="http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911b.pdf">white paper </a>shows it pretty clearly (red text is my added comment):</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/tegra-1.png"><img class="aligncenter size-full wp-image-1498" title="tegra kal-el power curve" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/tegra-1.png" alt="" width="655" height="446" /></a></p>
<p>Note the slope of the green line: that core is not a good one if you want high performance. It is optimized to scale within a range of low compute-power requirements, rather than provide the best performance per watt at the high end. Using Variable SMP, Nvidia lets us have both.</p>
<p>Neat.</p>
<p>More reading:</p>
<ul>
<li><a href="http://arstechnica.com/gadgets/news/2011/09/tegra-3-includes-5th-stealth-core-to-optimize-power-efficiency.ars">ArsTechnica</a> has a short summary</li>
<li>There does not seem to be much more right now, everyone is really just reiterating the points from the white paper.</li>
</ul>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1496"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1496" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1496" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1496/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Stop, Think, and Tie Your Shoes Right</title>
		<link>http://jakob.engbloms.se/archives/1492?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1492#comments</comments>
		<pubDate>Wed, 21 Sep 2011 18:09:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1492</guid>
		<description><![CDATA[There is a new post at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better. It is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /> <a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg"><img class="alignright size-full wp-image-1494" style="margin: 5px 10px;" title="shoes 2" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg" alt="" width="333" height="300" /></a> There is a <a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">new post </a>at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better.</p>
<p>It is way too common to be so busy running around being inefficient that there is no time to think about how to become more efficient. Change also requires some discipline to actually keep pushing at habits until they change for the better.</p>
<p><a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">All of this can be illustrated by tying shoes. </a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1492"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1492" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1492" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1492/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Bug Doesn&#8217;t Work!</title>
		<link>http://jakob.engbloms.se/archives/1489?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1489#comments</comments>
		<pubDate>Wed, 14 Sep 2011 03:27:07 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[compilers]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[VxWorks]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1489</guid>
		<description><![CDATA[Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, finding a good bug when you need one isn&#8217;t always easy.  The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png"><img class="alignleft size-full wp-image-982" title="butterfly" src="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png" alt="" width="90" height="91" /></a>Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, <a href="http://jakob.engbloms.se/archives/975">finding a good bug when you need one isn&#8217;t always easy</a>.  The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in a VxWorks program.</p>
<p>It is pretty very nice demo in which you first start a period program A, which prints the value of an incrementing counter every target second.  You then run a supposedly unrelated program B, resulting in the values that program A prints to become corrupted.  Perfect to show off reverse execution and data breakpoints in reverse as you go from the point where the corrupted value is printed to the piece of code that overwrote the variable.</p>
<p>But then I ported the demo to a new platform&#8230; and the bug didn&#8217;t work anymore. My bug had caught a bug and was now not working, or at least not they way I expected it to. What had happened?<br />
<span id="more-1489"></span><br />
Very simple. I changed the compiler I used. Since my bug relied on an unspecified behavior in C, the change was totally valid and really expected.  Still, it was interesting to see how things played out&#8230; in the end, we got a different bug from the same code thanks to the change.</p>
<p>The code is essentially the following, with some simplifications that make it easier to read for those not familiar with VxWorks, and ignoring all the code to start tasks initially.</p>
<pre>// Global variables
int     iDataArray[100];
int     myWdISRcount;
WDOG_ID myWatchDogId;

// Periodic task - program A
void myWdISR(void)
{
  /* Increment ISR invocation count */
  myWdISRcount = myWdISRcount+1;
  printf("wd Fired %d times\n",myWdISRcount);

  /* Start off next invocation */
  wdStart (
    myWatchDogId,
    WD_INTERVAL,
    (FUNCPTR) myWdISR,
    (int) NULL
  );   
}

// Overwrite code - program B
int myCompletelySafeRoutine(void)
{
  uint32_t *a,i;
  a = iDataArray;   
  // This loop writes one word beyond the
  // limit of the iDataArray
  for (i=0; i&lt;=100; i++) {
     a[i] = 0x7fffffff;
   }   
   return OK;
}</pre>
<p>In the original setup, compiled for a Power Architecture target, iDataArray ended up right before myWdISRcount in memory.  Thus, the buffer overflow changed the value of the counter from something like 10 to 0x7fffffff.  Very noticeable in the printouts from the periodic task.</p>
<p>When I changed to an x86 target (using a compiler from the same family, but obviously with a different code generator since the target was different), the variable order in memory changed and it seems that we got iDataArray placed last.  Suddenly, the effect of running the safe code was that nothing happened at all.  A bit annoying for a demo.  Some small source-code changes and a recompile later, the effect was instead to crash the target with a triple fault (page fault inside a page fault handler). Seems the program now managed to corrupt some kernel state. While impressive as a bug, it was not quite what I was looking for.</p>
<p>I then changed the compiler type to compiler 2, and the data layout changed once more.  This produced a very useful bug, but it took me a while to actually understand this.  Now, when program B was run, program A stopped.  This looked like a bug in my program, and I actually started trying to fix this &#8211; until I realized that this was the bug I was looking for.  Running program B kills program A is just as good a bug as corrupting a counter value, after all.  In this case, the array overrun hits the myWatchDogId variable, and when that gets corrupted, the wdStart call ignores the request since it does not recognize the ID it gets.</p>
<p>So, in the end, I got a bug that was just as good as the first, and arguably a bit more intruiging. It is still obviously a contrived example &#8211; but I think that a good demo or lab exercise can be artificial as long as it gets the point across.  Judging from how people who have done the lab reacts, the goal seems to have been<br />
achieved.</p>
<p>The moral of the story is really that compilers are free to change things which are explicitly implementation-defined or not specified at all in the C standard. That is a good thing as it gives the compiler freedom to optimize the code. If you want to control how variables are laid out in memory, I guess you have to resort to linker scripts or similar &#8211; but that was too much pain for me in this case. Just changing things around until I got a good bug, and then freezing the binary (and not recompiling it ever again) is a sufficient strategy.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1489"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1489" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1489" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1489/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Surfing the Web with Netscape 4</title>
		<link>http://jakob.engbloms.se/archives/1485?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1485#comments</comments>
		<pubDate>Wed, 14 Sep 2011 03:06:25 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1485</guid>
		<description><![CDATA[Just for fun, I tried to surf the web of today using a Netscape 4 browser from 2001. The result: not exactly useful. Netscape 4 was bad back then, and it does not work at all with the current style of web coding. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />Just for fun, I <a href="http://blogs.windriver.com/engblom/2011/09/surfing-the-web-with-netscape-4-1.html">tried to surf the web of today using a Netscape 4 browser </a>from 2001.</p>
<p>The result: not exactly useful. Netscape 4 was bad back then, and it does not work at all with the current style of web coding.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1485"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1485" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1485" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1485/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

