<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; programming</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/programming/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Reverse History Part Three &#8211; Products</title>
		<link>http://jakob.engbloms.se/archives/1564?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1564#comments</comments>
		<pubDate>Sun, 08 Jan 2012 19:51:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[Green Hills]]></category>
		<category><![CDATA[Lauterbach]]></category>
		<category><![CDATA[Multi]]></category>
		<category><![CDATA[reverse debug]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[TotalView]]></category>
		<category><![CDATA[UndoDB]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1564</guid>
		<description><![CDATA[In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. Part one of this series provided a background on the technology and part two [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>In this final part of my series on the history of reverse debugging I will look at the products that launched around the mid-2000s and that finally made reverse debugging available in a commercially packaged product and not just research prototypes. <a href="http://jakob.engbloms.se/archives/1547">Part one </a>of this series provided a background on the technology and <a href="http://jakob.engbloms.se/archives/1554">part two </a>discussed various research papers on the topic going back to the early 1970s. The first commercial product featuring reverse debugging was launched in 2003, and then there have been a steady trickle of new products up until today.</p>
<p><span id="more-1564"></span></p>
<p><strong>2003</strong>. The embedded tools company Green Hills launched their<br />
<a href="http://www.ghs.com/news/20030930_best_of_show.html">Time Machine</a> feature in their well-known MULTI debugger. I consider this the start of commercial reverse debugging, as it was the first<br />
commercial-grade product to include reverse debugging. The implementation was based on tracing the execution of a program on actual hardware, using a debug probe and a &#8220;JTAG&#8221; debug interface. The trace box would capture several gigabytes of execution data, and then the debugger performed operations based on this trace. To check a backwards breakpoint, you scan back over the trace until you find a matching state or operation (such a memory access or instruction address that is being executed). The main limitation of the method is that the trace buffer can only capture a few seconds of execution on a typical 100s of MHz embedded processor. It only works for a single processor, and it does not capture IO actions (except as memory-mapped IO). It is system-level and cross-target.</p>
<p>Extending this kind of trace to multicore has proven hard, since getting a synchronized trace out of several processors is very hard. There might be debug hardware coming out in the next few years that can indeed support a time-stamped consistent trace of multiple cores, and with such hardware, the Time Machine approach could well be extended into multicore.</p>
<p><strong>2005</strong>. <a href="http://www.windriver.com/products/simics/">Simics </a>3.0 was launched by Virtutech (later acquired by Wind River and Intel) with full-system reverse execution and reverse debugging. The Simics approach was also unique, being based on a full-system simulator. By simulating the entire target, it is trivial to reverse (and put reverse breakpoints on) changes to memory, persistent storage like disks, and hardware devices. Since all device models in the simulator are deterministic in their implementation, re-executing hardware events like interrupts and IO outputs is just as easy as re-executing code on the main processor, something that had eluded all previous approaches. Recording is used at the interface between the simulator and the outside world, such as user interaction over graphics displays and serial ports and connections to the real-world network. The software stack is unmodified and system-level, and the simulator can handle multiple processors and even multiple machines in a network as a unit. The use case is normally cross-target (even if a system identical to the host can be simulated, it would work like a cross target logically). Time is handled by counting clock cycles on all processors in the system, and reverse debugging can position the simulation at any point in time based on the virtual time.</p>
<p>There is a cost in execution speed from simulation rather than direct execution, and an intrusion effect from running on a simulator rather than on a physical machine. This affects the <a href="http://jakob.engbloms.se/archives/97">timing of events</a>, even with a software stack that is not modified. Still, the fact that you can run a complete real software stack with no modifications needed before starting to run the target system is fairly rare in the world of reverse debuggers.<strong></strong></p>
<p><strong></strong>Simics shipped with a modified gdb that talked gdb serial to Simics and accessed reverse execution with some new debugger commands as well as extensions to the gdb serial protocol. This was offered to the gdb community, but not accepted. However, prompted by this, the gdb community started to discuss reverse execution. Some interesting old threads can still be found, such as <a href="http://sourceware.org/ml/gdb/2005-05/msg00225.html">http://sourceware.org/ml/gdb/2005-05/msg00225.html</a>. Clearly, at that point in time Virtutech did not really explain how Simics worked, and there were some pretty bad proposals floated in the community for how to do reverse. In the end, the gdb serial design did turn out in the right way, assuming<br />
the remote debugger would reverse itself and <a href="http://sourceware.org/ml/gdb/2005-05/msg00235.html">gdb would just ask it to do so</a>. This separation of concerns is important to creating practical reverse debugging solutions that can use any debugger backend.</p>
<p><strong>2005</strong>. Also in 2005, Lauterbach launched the <a href="http://www.lauterbach.com/cts.html">Context Tracking System, CTS</a>. Lauterbach is a big player in the embedded debug market, with their TRACE32 debugger. CTS can be seen as their reply to the Time Machine debugger. CTS is also based on a trace from a hardware unit or from an instruction-set simulator. However, from the available information is also appears to be more limited &#8211; you can step back and go back in time and replay forward, but there is no mention of actual backwards breakpoints (even today, six years later).  Thus, I count this as record-replay rather than reverse debug. It is cross-target, system-leve, and uniprocessor like Time Machine.</p>
<p><strong>2006</strong>. <a href="http://undo-software.com/undodb_about.html">Undo Software </a>launched the first Linux-targeting host-based reverse debugger, <a href="http://undo-software.com/pressrelease-1.html">UndoDB</a>. It is described as a <em>bidirectional</em> debugger (the same terminology as the Boothe 2000 PLDI paper). It is user-level, does do reverse breakpoints (and data breakpoints, also known as watchpoints, which is really useful). It handles multiple threads (at least in 3.0), but from the description of the recording technology used I believe they have to serialize their execution. The implementation is based on checkpoint and re-execution, with recording of all non-deterministic events like IO. There is a feature to move to a certain point in time, based on &#8220;simulated nanoseconds&#8221;. These are not really nanoseconds, but values which are guaranteed to increase even between two instructions (which probably means that they are sub-nanoseconds and on a &gt; 1GHz CPU single-cycle instructions will indeed take less than one nanosecond).</p>
<p>There is a nice description of how it works on their <a href="http://www.undo-software.com/undodb-gdb.1.html">online man page</a>. It is worth noting that they call it &#8220;gdb&#8221;, but the command set is distinct from what gdb introduced with its reverse execution in 2009. They use the &#8220;b&#8221; prefix for backwards commands rather than &#8220;r&#8221; for reverse.  In some way, UndoDB is in direct competition with the gdb reverse target, but it is much much faster and has more features.</p>
<p><strong>2008</strong>. The Rogue Wave (at the time, it was an independent company) TotalView debugger gained support for reverse debugging, with the <a href="http://www.roguewave.com/products/totalview-family/replayengine.aspx">ReplayEngine </a>add-on. TotalView is an old mainstay in the HPC market, having been around since <a href="https://computing.llnl.gov/tutorials/totalview/#Overview">at least 1993</a>. Indeed, it was developed initially for the <a href="http://en.wikipedia.org/wiki/BBN_Butterfly">BBN Butterfly computer</a>, and thus it might have had a touch with reverse execution as far back as the 1987 paper cited in my <a href="http://jakob.engbloms.se/archives/1554">previous blog post</a>.</p>
<p>Judging from <a href="http://www.roguewave.com/documents.aspx?Command=Core_Download&amp;EntryId=739">the available materials</a>, TotalView can clearly can step back in various ways. However, it is not clear that it triggers breakpoints when going backwards. Thus, it has to count as record-replay debugging rather than reverse debugging. The base of the implementation is extensive instrumentation of the the runtime system of the target computer.  The implementation builds on the fact that the target programs tend to b clustered programs that use MPI to communicate &#8211; and thus a large part of the communication between threads is explicit and easily intercepted and recorded.  There is also an existing infrastructure of checkpoint and restart for parallel programs using MPI to support fault tolerance that was used as the base of the implementation.  Finally, in a slightly ugly hack, they make each multi-threaded program run on a single processor by a big lock. In this way, all that needs to be replayed is the interleaving of threads on a single processor, a far more tractable problem compared to trying to replicate a true parallel execution in a new session.</p>
<p><strong>2008</strong>. VmWare officially launched a record-replay debugger based on their virtual machine technology with <a href="http://www.replaydebugging.com/2008/08/vmware-workstation-65-reverse-and.html">VmWare Workstation 6.5</a>. Single-processor, system-level (but really only supported for user-level debugging), cross target (since the VM is not really the absolutely same hardware as the host), time model is based on the virtual machine which I believe is cycles-based. Mostly used for record-replay debug of non-deterministic software bugs, but could also do reverse debugging including reverse data breakpoints. Based on snapshot and deterministic re-execution, plus recording of all non-deterministic device accesses (not all devices in the VmWare hardware emulation layer are deterministic). Going back to a snapshot was a very heavy operation (I tried it) since you had to restore the entire target memory (quickly got into gigabytes). The hardware supported in the VM was quite limited, and things like CD-ROMs and floppies could not be part of a record/replay session. Replay logs could be moved between hosts.</p>
<p>The VmWare reverse debug functionality was removed from VmWare workstation version 8 in 2011, since it required a large investment and was not apparently used by very many VmWare users. This indicates that trying to build developer-oriented functionality into a technology base that is fundamentally driven by the need of deployed virtual machines was hard. There are contradictions between these two goals, as the determinism and control needed for a good reverse debugger is not necessarily consistent with maximum performance for running virtual machines in a production setting.</p>
<p><strong>2009</strong>. gdb 7.0 added support for reverse execution (a work that began in 2006). The built-in &#8220;record&#8221; target supports reverse debugging on user-level single-threaded programs on the same host. The command set for reverse debugging is fairly full-featured, but is a bit quirky with a &#8220;<a href="http://sourceware.org/gdb/news/reversible.html">set direction</a>&#8221; command that makes regular run-control commands work in reverse. The record technology is quite slow since it basically records the effect of each and every instruction run in the program.</p>
<p>In addition to its built-in target, gdb can also control external reversible debug systems over the gdb serial protocol. This made the changes to gdb-serial created by Virtutech for Simics in 2005 part of the mainline gdb release. <a href="http://sourceware.org/gdb/news/reversible.html">Several tools support the command set</a>, including VmWare, UndoDB, and Simics. There was also a set of MI commands added to basically let Eclipse use gdb as a backend for reverse debug, including using it to control external tools via gdb-serial. How this happened is quite a long story, and I made a small contribution to the gdb code base myself in the process. Read about this <a href="http://jakob.engbloms.se/archives/1065">here</a>.</p>
<p><strong>2009</strong>. Eclipse CDT added support for <a href="(http://www.eclipse.org/community/training/webinars/090526_CDT_Webinar.pdf">reverse execution</a>, using gdb 7.0 reverse as the initial backend. As noted above, this lets Eclipse also use other reverse debugging backends (Eclipse uses the gdb-MI interface to gdb to control the debug session). This is noteworthy since it meant that the buttons to control reverse execution are now part of the CDT, making it much easier to use Eclipse to build a frontend to any reversible backend. Eclipse is not really a debugger, just an interface to a debugger.</p>
<p><strong>2009</strong>. Microsoft Visual Studio <a href="http://blogs.msdn.com/b/ianhu/archive/2009/05/13/historical-debugging-in-visual-studio-team-system-2010.aspx">got record-replay debugging with IntelliTrace</a>. It is strictly about replay debugging, including the nice ability to send traces around between developers. There are no backwards breakpoints. The support is limited to programs running on top of the .net runtime system, meaning that <a href="http://msdn.microsoft.com/en-us/library/dd264915.aspx">it does not apply to classic Windows software</a>. Using the CLR virtual machine as the implementation basis should make the implementation easier, cleaner, and faster compared to a machine-level native solution. It is user-level, single-threaded, and host-based. Time concept is unknown.</p>
<p><strong>2011</strong>. Adobe demonstrated (not launched) reverse debugging in their Flash Builder programming environment. A <a href="http://tv.adobe.com/watch/max-2011-sneak-peeks/max-2011-sneak-peek-reverse-debugging-in-flash-builder/">nice video is posted on the Adobe website</a>. Seems to be based on the virtual machine that flash runs on, and includes what looks like pretty powerful backwards data analysis tools. In a <a href="http://anirudhsasikumar.net/blog/2011.12.15.html">blog post</a>, the developer describes some of the features, which to me seem to indicate some pretty heavy recording.</p>
<p><strong>Final notes.</strong>In researching these commercial tools, there also seems to be a lost one. A company called Visicomp launched a Java debugger called RetroVue in 2002 which supposedly did allow backwards debugging in some way. However, it seems that this tool was not really practical, being too slow for actual use. It seems to have disappeared since without anyone picking up its legacy. The technology was apparently pretty much like the Omniscient Debugger presented in 2003 and which I described in the <a href="http://jakob.engbloms.se/archives/1554">blog post on reverse execution research</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1564"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1564" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1564" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1564/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reverse History Part Two &#8211; Research</title>
		<link>http://jakob.engbloms.se/archives/1554?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1554#comments</comments>
		<pubDate>Sun, 08 Jan 2012 19:42:59 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Bil Lewis]]></category>
		<category><![CDATA[Bryce Cogswell]]></category>
		<category><![CDATA[Channing Brown]]></category>
		<category><![CDATA[Daniel Jacobowitz]]></category>
		<category><![CDATA[George Dunlap]]></category>
		<category><![CDATA[John Mellor-Crummey]]></category>
		<category><![CDATA[Mark Russinovich]]></category>
		<category><![CDATA[Marvin Zelkowitz]]></category>
		<category><![CDATA[Mireille Ducasse]]></category>
		<category><![CDATA[Murasa Bazrai]]></category>
		<category><![CDATA[omniscient debugger]]></category>
		<category><![CDATA[Paul Brook]]></category>
		<category><![CDATA[Peter Chen]]></category>
		<category><![CDATA[qemu]]></category>
		<category><![CDATA[reverse debugging]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[ReVirt]]></category>
		<category><![CDATA[Samuel King]]></category>
		<category><![CDATA[Stuart Feldman]]></category>
		<category><![CDATA[Sukru Cinar]]></category>
		<category><![CDATA[Tankgut Akgul]]></category>
		<category><![CDATA[Thomas LeBlanc]]></category>
		<category><![CDATA[TTVM]]></category>
		<category><![CDATA[Vincent Mooney]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1554</guid>
		<description><![CDATA[This is the second post in my series on the history of reverse execution, covering various early research papers. It is clear that reverse debugging has been considered a good idea for a very long time. Sadly though, not a practical one (at the time). The idea is too obvious to be considered new. Here [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>This is the second post in my series on the history of reverse execution, covering various early research papers. It is clear that reverse debugging has been considered a good idea for a very long time. Sadly though, not a practical one (at the time). The idea is too obvious to be considered new. Here are some papers that I have found dating from the time before &#8220;practical&#8221; reverse debug which for me starts in 2003 (as well as a couple of later entrants).</p>
<p><span id="more-1554"></span></p>
<p>When searching through the literature using the ACM Digital Library (and some Yahoo and Google web searches), I find quite a large body of research literature dating to the late 2000s. Apparently, the appearance of commercial reverse debuggers have inspired research into the topic &#8211; and our increasingly powerful machines have made heavier solutions feasible (at least for single programs).</p>
<p>Disclaimer &#8211; this is not really an exhaustive survey of the field, but some of the more interesting papers that I have stumbled on.</p>
<p><strong>1973</strong>. The first actual mention of reverse debugging I have found is a short 1973 Communications of the ACM article called &#8220;<a href="http://dx.doi.org/10.1145/362342.362360">Reversible Execution</a>&#8221; by <a href="http://www.cs.umd.edu/~mvz/">Marvin Zelkowitz</a>. Basically, this paper proposed to apply a backtracking facility built into a PL/I compiler to support AI research (a precursor to Prolog execution semantics from what it sounds like) to also replay parts of a normal program for debugging. This in essence would enable replay of a program execution, based on a log of how the program ran and the values of variables changed. Basically, a forward-only record-replay solution for user-level single-threaded programs.</p>
<p><strong>1987</strong>. Thomas LeBlanc and John Mellor-Crummey, &#8220;<a href="http://dx.doi.org/10.1109/TC.1987.1676929">Debugging Parallel Programs with Instant Replay</a>&#8220;, IEEE Transactions on Computers. This paper covers record-replay debugging of <em>parallel</em> user-level programs on the <a href="http://en.wikipedia.org/wiki/BBN_Butterfly">BBN Butterfly</a>. They record all interaction between processes (threads) by intrumenting the OS calls that are used to communicate and synchronize between processes. Programs also sometimes had to be changed so that their communication protocols allowed the instrumentation to work properly. Thus, the approach did not apply to unchanged code either at source or binary level. Processes still compute their results, only the order or interactions are enforced. In this way, the overhead was kept usefully low. The approach does not work well for programs that deal with asynchronous interrupts and hardware interaction by the software, since there is no good way to hook and replay such events in their implementation.</p>
<p>The most important concept here is the use of re-execution of code to reconstruct values of a program, assuming deterministic computation as long as asynchronous interactions can be controlled. This is the basis of all reverse execution approaches based on checkpointing and re-execution. Not sure if they invented the idea, but this is the earliest distinct mention I have seen of this.</p>
<p><strong>1988</strong>. Stuart Feldman and Channing Brown, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=69215.69226">IGOR: a system for program debug via reversible execution</a>&#8220;, 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging (PADD) . This paper looks at the problem of how to debug user-level single-threaded code on a UNIX-like system. The programs need to be recompiled to add a small amount of support, in particular to help with restarting<br />
execution. It is essentially a record-replay approach, but capable of placing the execution of a program at any point in its past execution. They save regular checkpoints during program execution (every tenth of a second or so seemed appropriate in their testing). The checkpoints only include the data that has been changed since the previous checkpoint (they added an OS facility to mark all pages that a program writes between checkpoints). A unique feature is the ability to look at the value of a variable as it is stored in successive checkpoints, and to &#8220;go to the point in time where variable X has value V&#8221; (assuming a variable is monotonically increasing or decreasing, like an outermost loop counter). To do the detailed positioning, they use an interpreter for the target system, essentially mixing native and interpreted execution in their debugger.</p>
<p><strong>1996</strong>. Mark Russinovich (who I believe is the man behind the recent novel &#8220;<a href="http://www.zerodaythebook.com/">Zero Day</a>&#8220;) and Bryce Cogswell, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=231379.231432">Replay for Concurrent Non-Deterministic Shared-Memory Applications</a>&#8220;, PLDI 1996. In this paper, instrumented system libraries and binary code modification is used to enable deterministic replay of multithreaded applications. The binary modification is used to implement an instruction counter to provide a logical time-base, and instrumented system libraries are used to force a serialized (but interleaved) execution of the program. It does not run the program in parallel on a multiprocessor, but rather runs it on  a single processor. User-level, instrumented and intrusive, parallel on single processor.</p>
<p><strong>2000</strong>. Bob Boothe, &#8220;<a href="http://dx.doi.org/10.1145/349299.349339">Efficient Algorithms for Bidirectional Debugging</a>&#8220;, PLDI 2000. I actually saw this paper being presented live at the PLDI conference in Vancouver. At the time, I was working on embedded compilers and debuggers, and our little group from <a href="http://www.iar.se">IAR </a>looked at what was being presented at said &#8220;yeah, might work on a big desktop, but you can&#8217;t do that in a real machine&#8221;. Why was that the case? Essentially, what Boothe did was to use the Unix fork call to spawn off checkpoints of a program as it was executing. This was a smart way to achieve checkpointing using existing OS facilities. In addition, he recorded the results of system calls to replay them during replay.</p>
<p>Time was given by counters in target program (compiled-into it as part of the debugging build). Most of the paper was spent on basic algorithms for how to do &#8220;previous&#8221; (step back 1 line), &#8220;before&#8221; (inverse of finish), &#8220;buntil&#8221; (backwards until, looking at data conditionals to determine when to stop). His use of &#8220;backwards&#8221; rather than &#8220;reverse&#8221; as the name of the concept does make this paper a bit harder to find. While limited to a single threaded user-level process, this paper is probably the first to introduce reverse debugging in the sense of the primary user interface being the ability to step backwards and check for breakpoints backwards in time. A good idea in the UI design was the <em>undo</em> command that jumped the debugger back to the last position where you stopped. Having worked with reverse debugging quite a lot myself, this operation makes a lot of sense as an addition to standard run-control commands.</p>
<p><strong>2002</strong>. Tankgut Akgul and Vincent Mooney, &#8220;<a href="http://dl.acm.org/citation.cfm?doid=634636.586101">Instruction-level Reverse Execution for Debugging</a>&#8220;, at PASTE 2002. This paper is an odd one, as it was neither really reverse nor practical. The approach is based on doing a static analysis of the assembly code of a program to generate code that could reverse operations based on some logging of destructive operations. Interesting, but not really practical as I see it. One noteworthy tidbit in the paper is the measurement that logging only destructive operations is some 50 x more efficient than logging all changes.</p>
<p><strong>2002</strong>. George Dunlap, Samuel King, Sukru Cinar, Murasa Bazrai, and Peter Chen, &#8220;<a href="http://dx.doi.org/10.1145/844128.844148">ReVirt: enabling intrusion analysis through virtual-machine logging and replay</a>&#8220;, at OSDI 2002. This paper uses the User-Mode Linux (UML) paravirtual machine to checkpoint and replay an operating system. I think this is the first use of a true virtual machine to achieve reverse execution, even though it is a paravirtual machine rather than a full virtual machine. The replay starts from a single checkpoint at the start of execution of the guest system, so moving to a certain point in time can be very time-consuming. It does support reverse breakpoints. To move to a certain point in time, they count executed instructions, not actual time. The use of a paravirtual solution precludes the use of many real device drivers and debugging issues related to interaction with most real devices. It is uniprocessor, system-level, unmodified target (except the change to make the OS itself run paravirtualized on top of UML).</p>
<p><strong>2003</strong>. Bil Lewis and Mireille Ducasse, &#8220;<a href="http://dx.doi.org/10.1145/949344.949367">Using events to debug Java programs backwards in time</a>&#8220;, OOPSLA 2003. This paper targets programs running on top of a Java Virtual Machine, resulting in an <em>Omniscient Debugger</em>. They instrument the target program to log all state changes. A big lock is used to serialize all threads into a single interleaved execution. The implementation is admitted to be almost worst-case inefficient, but the debugger was still considered useful for some real work. The debug UI does feature backwards breakpoints, so it qualifies as actual reverse debugging.</p>
<p>Note that in principle, backwards debugging and record-replay should be possible to implement quite efficiently for a language virtual machine like the JVM, if you can modify the VM itself. This has indeed been done today, for example by Microsoft for their CLR and in the use of full-system simulators (and other simulators) to implement reverse debugging.</p>
<p><strong>2005</strong>. Samuel King, George Dunlap, and Peter Chen, &#8220;<a href="http://www.usenix.org/events/usenix05/tech/general/king.html">Debugging Operating Systems with Time-Traveling Virtual Machines</a>&#8220;, USENIX 2005. This paper builds on the foundation of ReVirt, but adds checkpoints during a run to make the replay to a certain point in time more efficient. They also introduce the differential storing of disk state in the checkpointing (just like done in Simics). Overall, it is very similar to the replay approach taken by VmWare a few years later. Still single-processor. The paper contains a nice list of example bugs for which reverse debugging work much better than cyclic debugging. They also reference Simics as something that might do something with reverse, but do not quite know what (this is right after the launch of Simics reverse execution as discussed in the next post).</p>
<p><strong>2007</strong>. Paul Brook and Daniel Jacobowitz, &#8220;<a href="http://www.linuxsymposium.org/archives/GCC/Reprints-2007/jacobowitz-reprint.pdf">Reversible Debugging</a>&#8220;, GCC Developer&#8217;s Summit 2007. This paper presented some early work on implementing reverse debugging inside the Qemu full-system simulator. Logically, it follows in the steps of the Simics reversible debugger announced in 2005 (see the <a href="http://jakob.engbloms.se/archives/1564">third post </a>in this series of blogs). They use gdb as their frontend, and the paper introduced the idea of setting the default execution direction of the debugger. This UI choice was later adopted in the reversible gdb 7.0 released in 2009. The paper contains a good discussion of some of the practical issues involved in performing operations like reverse-step to go back a line, or stepping back up through a function call. It is a good explanation of the basic problems.</p>
<p>The actual reversing is implemented in Qemu using a checkpoint and deterministic replay, along with logging of any operation that is not deterministic. Their main test case is actually using the semihosting variant of Qemu where IO calls are intercepted and passed on to the host. Just like all other such solutions, they record the return values and do not repeat side-effects. It is system-level, single processor, unmodified target software, and time is based on counting executed instructions.</p>
<p>The Qemu architecture makes enforcing reversibility across all devices fairly difficult, since each device manages its own interaction with the user. One interesting tidbit in the paper is that Qemu by default triggers timer interrupts based on host time, but these had to be made fully virtual to be reversible.</p>
<p><strong>Summary</strong> The above is a small selection from all the papers published on this topic, but I think they capture the most common methods used. Basically, almost all of the approaches require some kind of change to the target software to enable reversibility. Interestingly, the first few commercial solutions for reverse debug that appeared did not take that approach, but rather targeted unmodified programs using a different set of fundamental techniques. See the next blog post for that part of reverse history.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1554"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1554" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1554" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1554/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reverse History Part One</title>
		<link>http://jakob.engbloms.se/archives/1547?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1547#comments</comments>
		<pubDate>Sun, 08 Jan 2012 18:40:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[record-replay]]></category>
		<category><![CDATA[replay]]></category>
		<category><![CDATA[reverse debugging]]></category>
		<category><![CDATA[reverse execution]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1547</guid>
		<description><![CDATA[For some reason, when I think of reverse execution and debugging, the sound track that goes through my head is a UK novelty hit from 1987, &#8220;Star Trekkin&#8221; by the Firm. It contains the memorable line &#8220;we&#8217;re only going forward &#8217;cause we can&#8217;t find reverse&#8220;. To me, that sums up the history of reverse debugging [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png"><img class="alignleft size-full wp-image-1550" title="reverse icon" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-icon.png" alt="" width="62" height="62" /></a>For some reason, when I think of reverse execution and debugging, the sound track that goes through my head is a UK novelty hit from 1987, &#8220;<a href="http://en.wikipedia.org/wiki/Star_Trekkin%27">Star Trekkin</a>&#8221; by the Firm. It contains the memorable line &#8220;<a href="http://www.youtube.com/watch?v=FCARADb9asE">we&#8217;re only going forward &#8217;cause we can&#8217;t find reverse</a>&#8220;. To me, that sums up the history of <em>reverse debugging</em> nicely. The only reason we are not all using it every day is that practical reverse debugging has not been available until quite recently.  However, in the past ten years, I think we can say that software development has indeed found reverse.  It took a while to get there, however. This is the first of a series of blog posts that will try to cover some of the history of reverse debugging. The text turned out to be so long that I had to break it up to make each post usefully short. <a href="http://jakob.engbloms.se/archives/1554">Part two </a>is about research, and <a href="http://jakob.engbloms.se/archives/1564">part three </a>about products.<br />
<span id="more-1547"></span></p>
<p>Let&#8217;s start with background and definitions.</p>
<p>To me, the key defining factor of <strong>reverse debugging</strong> is the ability to <em>apply breakpoints in reverse</em> &#8211; essentially, to be able to go to the previous occurence of a breakpoint just as well as going to the next. The implementation allowing this might vary.</p>
<p>The contrast to reverse debugging is classic <strong>cyclic debugging</strong> where you run and rerun a program with a problem, looking at variables and setting breakpoints during each run to zoom in on an issue. For cyclic debugging to work, you pretty much require each run to behave the same way. The program has to be fundamentally deterministic. This is<br />
usually the case for non-interactive single-threaded programs, but not the case for real-time programs, parallel programs, or programs that involve some kind of asynchronous input/output (reading and writing files is a special case of IO that can indeed be deterministic since there is usually no interference from the environment in those operations).</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-21.png"><img class="aligncenter size-full wp-image-1589" title="reverse 2" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-21.png" alt="" width="376" height="470" /></a></p>
<p>Thus, for non-deterministic programs and rare errors, reverse debug is what you want. Run once, hit the error, reverse to diagnose it.</p>
<p>In addition to <strong>reverse debug</strong> and <strong>cyclic debug</strong>, we also have<strong> record-replay</strong> debug. In such a system, you record an execution and later replay it forward. Debugging is strictly forward: you cannot step back in time instruction by instruction, nor trigger breakpoints backwards in time. Record-replay debug is a way to allow cyclic debugging on non-deterministic highly variable program runs. This can be implemented with a lot less debugger complexity than a true reverse debugger, even if the underlying runtime system is almost identical to reverse debugging.</p>
<p>So, how do you implement reverse debugging?</p>
<p><strong>Reverse execution</strong> is one way to implement reverse debug. In reverse execution, the execution system has the ability to move backwards in time and put the entire system in the state it was in at some previous point in time.  You can step the system back instruction by instruction or cycle by cycle.  You can also continue the execution<br />
forward from any point in time, throwing away the history of asynchronous inputs to actually take a new execution path.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-1.png"><img class="aligncenter size-full wp-image-1548" title="reverse 1" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-1.png" alt="" width="369" height="186" /></a></p>
<p>Typically, reverse execution is implemented by using checkpoints of previous system states plus a deterministic way to re-execute the system from those checkpoints. The key technical problems to be solved is how to checkpoint the system and how to make re-execution deterministic (note that <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">determinism does not imply invariant or predictable behavior</a>, just that you can reliable recreate a particular execution). You also need to record and replay anything that you cannot re-execute, such as user interactions or network communications with the world outside the controlled system.</p>
<p>Schematically, it works like this, note how simulation time always moves forward even though the logical system time sometimes moves backwards:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-31.png"><img class="aligncenter size-full wp-image-1586" title="reverse 3" src="http://jakob.engbloms.se/wp-content/uploads/2011/12/reverse-31.png" alt="" width="456" height="342" /></a></p>
<p>Record-replay debug is usually implemented in a way similar to this, without the ability to go back in time and usually without the intermediate checkpoints.</p>
<p><strong>Trace-based</strong> debugging is based on recording everything a system does into a trace, and once the recording is done, debugger works on the trace rather than using the log to drive an actual system.  Reverse debug is implemented by recreating the state of a system by reading the trace, and finding points in the trace where breakpoint conditions are true. This is also known as <em>post-mortem</em> <em>debug,</em> since you debug after the target system has finished executing (typically). A log can be captured by a hardware device, or by software being instrumented to log everything that is going on. The technology can also be used to implement record-replay debugging.</p>
<p>There are some other important distinctions between different approaches to reverse debugging that we need to keep in mind as we review the history of the field.</p>
<ul>
<li><strong>System or user-level</strong>? Do you debug a system, including the OS, or just a user-level application running on top of the operating system? User-level debug can often be solved in simpler ways than system-level debug, but also suffers from some limitations. There are also times where system-level is simpler.</li>
<li><strong>Single-threaded or multi-threaded</strong>? Can you debug multiple threads, processes, processors, or just a single processor or user-level thread? A single thread of control greatly simplifies the problem.</li>
<li><strong>Cross-target or host-based</strong>? Do you debug programs running on the same host  as the debugger, or can it also target remote targets or cross targets (such as embedded systems)?</li>
<li><strong>Instrumented or unchanged programs</strong>? Quite a few solutions for reverse debug tried over the years involve compiling programs in a special way, using instrumented OS libraries, or other implementation variants where the program debugged is not identical to the eventual deployed program.</li>
</ul>
<p>Given this background, the next post will cover early research, and the third post the beginning of commercial products.</p>
<p>Note that on Stack Overflow, reverse debug does not seem to be particularly big thing still. The reverse-debugging tag has all of <a href="http://stackoverflow.com/questions/tagged/reverse-debugging">10 members</a>, while other topics like C# and Android programming has millions&#8230; so maybe reverse is not quite mainstream yet. Or maybe it is just the case that Stack Overflow has more web-style developers than low-level developers in their membership.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1547"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1547" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1547" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1547/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Keynote on System-Level Debug</title>
		<link>http://jakob.engbloms.se/archives/1568?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1568#comments</comments>
		<pubDate>Sun, 01 Jan 2012 20:12:33 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1568</guid>
		<description><![CDATA[I have now posted the slides from my keynote talk at the S4D 2011 conference to the presentations list on my regular home page. The topic of that talk was &#8220;System-Level Debug&#8221;, something which has started to interest me in recent years. Essentially, as the (embedded) software systems we are developing get more and more [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>I have now posted the slides from my keynote talk at the S4D 2011 conference to the <a href="http://www.engbloms.se/jakob_presentations.html">presentations list </a>on my regular home page. The topic of that talk was &#8220;System-Level Debug&#8221;, something which has started to interest me in recent years.</p>
<p><span id="more-1568"></span>Essentially, as the (embedded) software systems we are developing get more and more complex, we need to update our debuggers to keep up with the times. This means moving beyond looking at single processors, threads, programs, or even nodes &#8211; and making debuggers that support working with multiple contexts at once. For example, debugging a couple of different boards in a rack from a single debug session. This would mean supporting multiple processor architectures, execution modes, and operating systems inside a single debugger session. Debuggers need to orient themselves towards handling software that is already running in a target, rather than starting the target software. The debugger becomes more of a passive observer of the target system than an active component.</p>
<p>Take a look!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1568"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1568" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1568" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1568/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Debug, Design, and Microsoft Data</title>
		<link>http://jakob.engbloms.se/archives/1527?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1527#comments</comments>
		<pubDate>Sat, 19 Nov 2011 15:38:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Communications of the ACM]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[Kinshuman Kinshumann]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Steven Sinofsky]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Windows 8]]></category>
		<category><![CDATA[Windows Explorer]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1527</guid>
		<description><![CDATA[It used to be that Microsoft was the big, boring, evil company that nobody felt was very inspiring. Today, with competition from Google and Apple as well as a strong internal research department, Microsoft feels very different. There are really interesting and innovative ideas and paper coming out of Microsoft today.  It seems that their [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png"><img class="alignleft size-full wp-image-1205" title="windows phone logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png" alt="" width="66" height="58" /></a>It used to be that Microsoft was the big, boring, evil company that nobody felt was very inspiring. Today, with competition from Google and Apple as well as a strong internal research department, Microsoft feels very different. There are really interesting and innovative ideas and paper coming out of Microsoft today.  It seems that their investments in research and software engineering are generating very sophisticated software tools (and good software products).</p>
<p>I have recently seen a number of examples of what Microsoft does with the user feedback data they collect from their massive installed base. I am not talking about Google-style personal information collection, but rather anonymous collection of user interface and error data in a way that is more designed to built better products than targeting ads.</p>
<p><span id="more-1527"></span>The first paper is &#8220;<a href="http://dl.acm.org/citation.cfm?id=1965749&amp;bnc=1">Debugging in the (very) large: ten years of implementation and experience</a>&#8221; by Kinshumann et al, Communications of the ACM, July 2011. This paper describes how Microsoft uses of the data they collect from Windows Error Reporting (you know, the little dialog boxes that appear every once in a while on Windows when a program has crashed or frozen, or Windows restored from a crash).</p>
<p>Microsoft has a number of heuristics that look at the data collected, grouping the bug reports into buckets. Ideally, each bucket corresponds to a single root cause for possibly quite different errors. They automatically analyze the errors and generate metadata about the error reports that can be used to generate statistics and allow database queries to be performed over all collected error<br />
reports.  Heuristics include walking through chains of threads blocked on synchronization objects to determine which one is the actual cause of a hang, and finding the most likely thread and stackframe for containing the root cause of an error.  Heuristics are applied both on the client and the server, but mostly on the server. Technically very hard to do right, I can appreciate the huge amount of work that has gone into engineering this.</p>
<p>With this huge pile of information, a new debugging method becomes available: statistics-driven bug finding and prioritization at large scale.  The introduction to the paper puts it very well:</p>
<blockquote><p>Beyond mere debugging from error reports, WER enables a new form of statistics-based debugging. WER gathers all error reports to a central database. In the large, programmers can mine the error report database to prioritize work, spot trends, and test hypotheses. Programmers use data from WER to prioritize debugging so that they fix the bugs that affect the most users, not just the bugs hit by the loudest customers. WER data also aids in correlating failures to co-located components. For example, WER can identify that a collection of seemingly unrelated crashes all contain the same likely culprit—say a device driver—even though its code was not running at the time of failure.</p></blockquote>
<p>For a product manager like me, used to working with individual bug reports in bug reporting systems and trying to manually assess the importance of each error, this is nothing short of a dream.  Instead of trying to guess how many users can be impacted by a bug, Microsoft can run queries against the error report database and get a fairly accurate idea of how common a certain error is in the user base.  This has allowed them to address the most common errors first, leading to Windows and Office becoming more stable for more users in recent generations.  They can also pinpoint which device drivers are causing the most issues, and putting pressure on vendors to clean up their act.</p>
<p>I wonder where else you really apply this idea of statistical debugging. You need a large user base, in systems which are connected to the Internet so you can collect data, and who are comfortable with providing direct feedback to you as a vendor.  Apparently, Apple has the same kind of feature built into iOS, with more than 100 million users which seem not to be too interested in strong privacy.  Presumably, Google can do the same thing with Android, at least its use in phones. Mozilla has a crash reporter, so I guess it makes sense in the consumer space.</p>
<p>But when your user base counts in thousands of seats and half of these are in defense sector beyond air-gaps, it is harder to apply. Products that call home are not taken to kindly in the professional field, as secrecy and confidentiality is very important to big companies. Industrial embedded products like telecom infrastructure might have sufficient volume of code and computer hardware to form a basis for statistical reporting &#8211; as long as operators agree to provide the information to the hardware vendors.</p>
<p>Another example of how Microsoft makes use of their collected data is in UI design. The blog post &#8220;<a href="http://blogs.msdn.com/b/b8/archive/2011/08/29/improvements-in-windows-explorer.aspx">Improvements in Windows Explorer</a>&#8220;, by Steven Sinofsky, from the Windows 8 blog discusses how Windows Explorer has evolved over the years, and how it is now getting a radical redesign based on usage data.  Microsoft is an enviable position here, having collected information about what millions of users are doing.  Definitely beats inspiration or trying with a few users in a classic user interface lab.</p>
<p>I have seen quite a few people criticize this blog post from a variety of angles &#8211; from the fact that they are not data-driven enough and keep rarely-used buttons in the ribbon to the fact that they remove somebody&#8217;s favorite function.  It is also the case that the measurements can only tell you which functions people are using from what is available today &#8211; if you want to invent new things, data like this might not be very helpful.</p>
<p>Fortunately, Microsoft also seems to have taken a clue from Linux and is allowing much more user customization than before. For me, this is great news, as I seem to have a user profile quite far from the mainstream.  We have not seen Windows 8 in its final form just yet, but hopefully this approach will be applied to other parts of that GUI overhaul too.  There are professional Windows users who need an OS that makes even very esoteric operations easy to access, and customizations of things like the start menu possible.  Hopefully, we do not get washed away by the flood of data from regular users.</p>
<p>For some reason, I feel that bug reporting is not as sensitive to the user style as GUI design &#8211; Windows and driver bugs would seem to be more evenly distributed as they depend more on hardware than on software. At least it seems to me that Windows is more stable today<br />
than it was a couple of years ago.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1527"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1527" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1527" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1527/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DV* 30 Years</title>
		<link>http://jakob.engbloms.se/archives/1520?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1520#comments</comments>
		<pubDate>Sun, 13 Nov 2011 20:32:34 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[off-topic]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[DVL]]></category>
		<category><![CDATA[DVP]]></category>
		<category><![CDATA[Uppsala University]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1520</guid>
		<description><![CDATA[On the very binary date of 11-11-11, my alma mater, the computer science (DV, for datavetenskap) education at Uppsala University celebrated its thirty years&#8217; anniversary. It was a great classic student party in the evening with a nice mix of old alumni and fresh-faced students. Lots of singing and some nice skits on stage. Great [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/11/dv30%C3%A5r-100x96.jpg"><img class="alignleft size-full wp-image-1521" style="margin: 5px 10px;" title="dv30år (100x96)" src="http://jakob.engbloms.se/wp-content/uploads/2011/11/dv30%C3%A5r-100x96.jpg" alt="" width="100" height="96" /></a>On the very binary date of 11-11-11, my alma mater, the computer science (DV, for datavetenskap) education at Uppsala University celebrated its <a href="http://www.datavetenskap.nu/jubileum/">thirty years&#8217; anniversary</a>. It was a great classic student party in the evening with a nice mix of old alumni and fresh-faced students. Lots of singing and some nice skits on stage. Great fun, and my voice has still not recovered. It also got me thinking about it is that we really do as computer scientists.</p>
<p><span id="more-1520"></span>As David Alan Grier would have said, this kind of event tends to serve to build the professional identity of a group of people. Computer science is not a profession per se, but it is clear that the Uppsala computer science students (almost 2000 has started since 1981) has a particular culture that comes back to us very easily when amongst our peers. I think it is based in the art and practice of <strong>programming</strong>.</p>
<p>The talks and discussions during the dinner often went back to the defining experiences of our student days, and these experiences were mostly about night hacks and classic labs. A speaker told us about how in 1983 they tried to program a simple computer built by the department itself, and how it turned out that 3 out 4 machines were broken. There idea that they expected the program to just work the first time it loaded made me look up my <a href="http://blogs.windriver.com/engblom/2011/05/twenty-thirty-and-sixty-years-ago.html">favorite debugging quote</a>. Somebody reminded me (19 years after the deed) about the time I made a Prolog program exhaust all memory on a Sun server, by performing an exhaustive search for a problem with no solution).</p>
<p>I remembered how some students I taught (while still an undergraduate myself, a common practice at DV) spent 24 hours a day for almost a week in a lab room trying to get their operating systems to boot on some MIPS-based lab machines. In particular, the group that was gripped by ambition and tried to turn on the MMU. Each time they reset the machine and tried to boot their OS, they would see a serial terminal spewing out diagnostics text and then stopping cold&#8230; as they failed again to make it work (they passed the course anyway).</p>
<p>This all indicates the fundamental importance of programming to computer science students. That is also what we believed was our core mission when I was a student &#8211; to go out into the world and create great software. Many still do, even if quite a few of us have left day-to-day coding to become project leaders and outright managers. Can&#8217;t say I program all that much myself, apart from some demos and virtual machine scripts, but I still find the topic incredibly interesting and important.</p>
<p>The event also touched on institutional memory and the longevity of data. The very ambitious anniversary committee had produced a brand new version of the classic &#8220;Manualen&#8221;, a song book first produced in 1995. In it, I found a text I wrote in 1995 about what a computer scientist actually does (a bit pompous, as can be expected from a proud student) as well as a photo of myself from a 1996 cover of the DV student magazine &#8220;Blurgel&#8221;. However, in both cases, these had been reproduced from paper copies. There was no digital memory in place of these 15-year-old pieces of data.</p>
<p>I actually still have both paper copies in good shape and the data &#8211; on a multisession CD-R created in 1999 on a Mac. My current computers all being Windows-based cannot extract the data. I know a Mac can still read them, but it is not clear that the data can be used today. It is likely in some old version of <a href="http://en.wikipedia.org/wiki/Pagemaker">Aldus PageMaker </a>and with no file extensions to guide you as to what each file is. The images are greyscale or black-and-white high-resolution TIFF files I believe (from the era before PNG), once again with no file extensions to hint at what is what. It underscores the importance <a href="http://jakob.engbloms.se/archives/180">of actual running programs and systems </a>as a way to access our digital past. The data might all be there and readable, but with no software to interpret it, what can you do? I actually <a href="http://jakob.engbloms.se/archives/19">noted the same</a> four years ago for the somewhat late 25 years anniversary.</p>
<p>Overall, a very memorable evening, and kudos to the organizers for putting it all together.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1520"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1520" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1520" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1520/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Jan Bosch: Software Provocateur</title>
		<link>http://jakob.engbloms.se/archives/1516?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1516#comments</comments>
		<pubDate>Sat, 29 Oct 2011 18:09:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Jan Bosch]]></category>
		<category><![CDATA[Lindholmen Software Development Day]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1516</guid>
		<description><![CDATA[Last week, I had the honor of presenting at and attending the talks of the Lindholmen Software Development Day. The first keynote speaker was Professor Jan Bosch from Chalmers, who did his best to provoke, prod, and shock the audience into action to change how they do software. While I might not agree with everything [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/10/lindholmen-logo.png"><img class="alignleft size-full wp-image-1517" style="margin: 5px;" title="lindholmen logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/10/lindholmen-logo.png" alt="" width="57" height="55" /></a>Last week, I had the honor of presenting at and attending the talks of the <a href="http://www.lindholmen.se/sv/node/17005">Lindholmen Software Development Day</a>. The first keynote speaker was Professor <a href="http://janbosch.com/Jan_Bosch/Jan_Bosch.html">Jan Bosch </a>from <a href="http://www.chalmers.se/sv/Sidor/default.aspx">Chalmers</a>, who did his best to provoke, prod, and shock the audience into action to change how they do software. While I might not agree with everything he said, overall it was very enjoyable and insightful talk.</p>
<p><span id="more-1516"></span></p>
<p>Jan Bosch is a professor in the software engineering center at Chalmers in Gothenburg. He is a dutchman who has worked all over the world. He used to run Nokia&#8217;s research labs in Finland, and then moved to Intuit in Silicon valley. Now, he is back in Scandinavia at Chalmers. What I think he tried to do was to impress the audience with the software development ethos and style of Silicon Valley. I got a feeling that he wanted to make it felt that Europe is maybe behind on modern software development.</p>
<p>However, I am not always sure how the experience from web companies like Google and Amazon can translate into the development of software for systems like telecoms and vehicles (which formed a large part of the audience).</p>
<p>The main points of the talk, and my comments:</p>
<p><strong>Speed is more important than anything else</strong>. If we can get 10% efficiency improvement out of our , it should be used to cut lead times by 10%, not reduce cost and keep the same time. The earlier we get software out, the bigger our revenue. This means that you should focus engineering efforts on shortening lead times rather than reducing man hours worked- and if you get the lead times shorter, you will automatically get efficiency improvements. I must say I agree with this point.</p>
<p>We should <strong>turn R&amp;D into an experimental system</strong> &#8211; quickly implement features and ideas and test them in some way. Maybe not with external customers, internal users might do just as well (that&#8217;s how Apple works, for example, by evaluating many candidate designs by prototypes that are never seen by the outside). This is a good idea, but I think it might be hard to find the resources to do this in many cases &#8211; Apple is kind of extreme in that it rides high, is immensely profitable, and release only a few products each year.</p>
<p>Once we have experiments, <strong>product planning should be based on data</strong>, not on opinion. Sometimes, vision is needed, but that should be turned into small experiments to validate ideas, not big long-term full-blown plans based on opinion. This sounds very nice &#8211; but getting hard data can be very hard in practice.</p>
<p>Initially,  I felt that this would conflict with the concept of the &#8220;BHAG&#8221; &#8211; Big Hairy Audacious Goals &#8211; from the book &#8220;<a href="http://en.wikipedia.org/wiki/Built_to_Last:_Successful_Habits_of_Visionary_Companies">Built to Last</a>&#8220;. There, vision that transcends the current state of affairs is crucial to build products that really change the world. You cannot base that on data &#8211; if you ask people what they need, they will not reply with what they do not know. But I guess that if you have a great idea, it is a good idea to vet it with some quick experiments to ascertain that it makes sense at all, before betting the firm on it. So, I guess there is no real conflict here.</p>
<p><strong>Quickly test new features with customers</strong>. He sees the world switching to a model where the <em>producer decides when an updrade is applied</em> to the market, not the consumers. Already in place for things like Facebook, Google, and Apple iOS devices. I wonder though if this works for professional software and systems. The customers that I know have to plan the rollout of new versions carefully in order to not disrupt projects that rely on them. When someone pays for a piece of software, they tend to want to have something to say about new features too. Consider the uproar that happens every time Facebook change their UI &#8211; if you had paid a hundred thousand dollars for the system, I think Facebook would have listened rather more to their users than they do now. As long as there is no alternative, people will grudgingly accept and learn to live with it &#8211; even though the new version might be strictly worse for them than the old version.</p>
<p>When users do have a choice, they often stay with older version. Just consider the vast numbers of PCs out there still running Windows XP. In some way, I feel that this freedom to choose and control your own computing environment is being threatened by this trend of the web.</p>
<p>If you can recruit willing beta testers in your user community, this is great. But you need quite a few customers in place to be able to have enough beta testers to have anything like a representative sample. For a consumer application with tens or hundreds of millions of users, this is not too hard. For a professional piece of software tooling with less than a hundred different customers, you tend to get three loud voices &#8211; and we are essentially back to opinion rather than data. It would have been interesting to discuss this with Jan, but I never managed to grab him during the day to discuss this. There is also the time it takes to upgrade a user and have them test a new version &#8211; even if the upgrade does not require any changes to their existing code or systems, they still might need training on the new system to fully use it.</p>
<p>The counter to this is obvious: if things are this complicated to use and deploy, maybe we should think about making deployment and use simpler?  And not hide behind &#8220;it is hard and we have always done things this way&#8221;. Make no mistake &#8211; I would love to be able to this, especially with things like debuggers and other daily tools of the programming trade.</p>
<p>Still, I do think that stability is sometimes a hard requirement. When you have a system that you want to maintain for decades, it is very sound to freeze the versions of critical tools used to develop it and stick to these, to minimize the risk of failure due to changes. If the system is certified, you basically have to. Amazon does not build flight control systems. Their system going down is certainly going to cause economic havoc, but nobody will die.</p>
<p>Create an <strong>after market</strong> &#8211; customers will come to expect new features over time, and might well pay for them. This &#8220;app market&#8221; idea for software features keeps coming up in current discussions on architecture and software design. My guess is that it will actually work, once professional users get influenced enough by the overall consumerization of IT to take certain patterns for granted. Selling new software for car engine control or new features into a telecom switch sounds pretty sane &#8211; and is sometimes already practiced today.</p>
<p><strong>Get to know the customer</strong>- at Intuit, each engineer famously follows a customer for one day each year to understand their life and get empathy with their users. Knowing your customer is key, the better you know what your customer wants, the more they will take to your product. This idea is something that I have actually seen being tried in the past, with mixed results. Once again, in a business-to-business world this requires some selling to happen, and customers might be sensitive to information leakage and secrecy. However, often the value of having a product expert available to them for a few days to help them work better and run their systems more efficiently is  great idea. If your business has a consulting or services arm, this can be part of regular business &#8211; as long as you let core development engineers consult a bit and not keep them all hidden inside the company.</p>
<p>When it came down to practical software development practice, Jan Bosch was all about agile, automatic building, testing, and deploying, and standard modern practice. He retold the Amazon &#8220;<a href="http://www.brainwatt.com/two-pizza-rule/">Two Pizza Rule</a>&#8221; &#8211; i.e., keep groups small. Keep groups self-governing, based on quantitative assessments of their output, and guide them with goals, not detailed requirements. That certainly works for a certain class of highly motivated and skilled labor that tends to concentrate in Silicon Valley &#8211; but in the real world, we also have to deal with less able developers that do need leadership to perform and do the right thing. Not everyone can be totally selective in hiring, unfortunately.</p>
<p>Still, a provocative talk that really did get me thinking about how we do things. Which I think was Jan&#8217;s goal to start with, so I guess that means mission accomplished. If you ever get the chance to listen to Jan, take it!</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1516"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1516" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1516" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1516/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GPGPU for Instruction-Set Simulation &#8211; Maybe, Maybe not</title>
		<link>http://jakob.engbloms.se/archives/1506?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1506#comments</comments>
		<pubDate>Sat, 08 Oct 2011 19:17:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[CCGrid]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[GPGPU]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[simulation]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1506</guid>
		<description><![CDATA[I just read a quite interesting article by Christian Pinto et al, &#8220;GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms&#8220;, published at the CCGRID 2011 conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/05/coreshrink1.png"><img class="alignleft size-full wp-image-125" style="margin: 5px 10px;" title="coreshrink1" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/coreshrink1.png" alt="" width="100" height="100" /></a>I just read a quite interesting article by Christian Pinto et al, &#8220;<a href="http://infoscience.epfl.ch/record/164471">GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms</a>&#8220;, published at the <a href="http://www.ics.uci.edu/~ccgrid11/">CCGRID 2011 </a>conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the execution is not without its flaws and it is unclear at least from the paper just how well this generalizes, scales, or compares to parallel simulation on a general-purpose multicore machine.</p>
<p><span id="more-1506"></span>The paper describes a simulation for a network-on-chip based homogeneous system containing a &#8220;ARM-subset&#8221; ISS instances with local instruction and data caches, some local RAM, and also some shared RAM. Each core runs its own local software load, there is no SMP operating system. All communication between cores is over shared memory, using explicit operations across the NoC. All cores run a single cycle before they check communications from their neighbors.</p>
<p>This last point is crucial to understanding why this is feasible at all &#8211; in general, simulating a general shared-memory multiprocessor machine on a shared-memory multiprocessor falls down on the synchronization overhead. If your simulation semantics dictate that you synchronize every cycle anyway, and you do not try to optimize each core simulator, there is clearly decent room for parallel execution. By including the cache, they increase scalability, since there is more work per target cycle that can be run in isolation.</p>
<p>After reading the article, I am impressed by their work &#8211; just getting this to work is pretty good work. But there are quite a few questions which are not really answered in the article and which are crucial to understanding just how well GPGPUs could be used for this kind of ISS work.</p>
<ul>
<li>The targeted level of abstraction is a bit confusing. The authors claim it is &#8220;instruction accurate and not cycle accurate&#8221;, but still simulate caches and cycle-based communications across the NoC. If I read the paper right, communications will take a varying number of cycles depending on the distance for messages to travel. This is more detailed than a typical &#8220;instruction accurate&#8221; simulator.</li>
<li>The target system does not run an OS &#8211; that might (but I do not know) be an advantage for their approach, since it probably implies less variation in the instruction flow in cores, potentially enhancing the amount of time that all ISSes in a thread group in the GPU can execute the same instruction. This would seem crucial, as if each ISS was running a totally different program, the instruction execution part of the code would be running serialized.</li>
<li>They should really try to run the same kind of simulation on a high-end x86 CPU like an Intel Sandy Bridge with 8 or more hardware threads. I wonder if their scaling might not work just as well there &#8211; and with a much faster serial execution engine. This should give  a much more relevant point of comparison for GPU vs CPU execution of the simulator than&#8230;</li>
<li>the comparison object they use right now, a JIT-accelerated multicore simulation using OVP seems pretty irrelevant since it is not doing the same thing at all. That simulator does not simulate the caches or NoC, just a large number of isolated processors. They also do not run a parallel program on OVP, but rather a large number of single-core fibonacci and dhrystone programs. Thus, the fact that OVP uses a large temporal decoupling time slice does not matter for semantics. It just does not seem like a very relevant comparison point. OVP and their simulator try to solve different problems &#8211; fast execution of general code vs. performance profiling of massively parallel machines.</li>
<li>As I understand it, the given &#8220;S-MIPS&#8221; numbers in the evaluation tell us the total number of MIPS that we get out across all target cores. That seems to peak around 2000 &#8211; which isn&#8217;t necessarily that fantastic if we compare to high-performance ISS work in general where a few GIPS is definitely achievable. It is pretty good considering the level of detail here, though, where i would expect a normal ISS + cache simulator to produce at most a few MIPS. Once again, the authors need to be a bit more precise as to what they compare to what.</li>
<li>Not having an MMU and not implementing any interrupts or exceptions in the target machines avoids a large part of the complexity of a real ISS. That complexity might well be too much for the quite rigid execution environment of a GPGPU.</li>
<li>They missed that Simics, unique among instruction-accurate mainstream simulators, is <a href="http://jakob.engbloms.se/archives/128">parallel </a>since version 4.0.</li>
</ul>
<p>So, overall, this paper does not really tell us much whether a GPGPU can be used for instruction-set simulation in general. It does tell us that it might be doable, but there are many crucial complications which are not addressed.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1506"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1506" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1506" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1506/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Adversarial Approach to Compilation</title>
		<link>http://jakob.engbloms.se/archives/1504?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1504#comments</comments>
		<pubDate>Sun, 02 Oct 2011 19:07:55 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[ACM Queue]]></category>
		<category><![CDATA[Communications of the ACM]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1504</guid>
		<description><![CDATA[Paul Henning-Kamp has written a series of columns for the ACM Queue and Communications of the ACM. He is  pointed, always controversial, and often quite funny. One recent column was called &#8220;The Most Expensive One-Byte Mistake&#8220;, which discusses the bad design decision of using null-terminated strings (with the associated buffer overrun risks that would have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/08/acm-queue-logo.gif"><img class="alignleft size-full wp-image-249" style="margin: 5px 10px;" title="acm-queue-logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/08/acm-queue-logo.gif" alt="" width="150" height="91" /></a>Paul Henning-Kamp has written a series of columns for the <a href="http://queue.acm.org">ACM Queue </a>and <a href="http://cacm.acm.org/">Communications of the ACM</a>. He is  pointed, always controversial, and often quite funny. One recent column was called &#8220;<a href="http://queue.acm.org/detail.cfm?id=2010365">The Most Expensive One-Byte Mistake</a>&#8220;, which discusses the bad design decision of using null-terminated strings (with the associated buffer overrun risks that would have been easily avoided with a length+data-style string format). Well worth a read. A key part of the article is the dual observation that compilers are starting to try to solve the efficiency problems of null-terminated strings &#8211; and that such heavily optimizing compilers quite often very hard to use.</p>
<p><span id="more-1504"></span>He has a wonderful line from an anonymous Convec C3800 programmer who accounted his experience with the optimizing compiler as &#8220;<em>having to program as if the compiler was my ex-wife&#8217;s lawyer</em>&#8220;. I can recognize and understand the sentiment. The C language, in particular, is full of corner cases where behavior is &#8220;undefined&#8221; or &#8220;implementation dependent&#8221;. The exact behavior of some statements is also pretty complex &#8211; and as a compiler writer, you end up having to work with the rules of the language like a lawyer works the law &#8211; the thinking is not all that different. And if you take this approach to the extreme, the description of the compiler as an adversarial lawyer is spot-on.</p>
<p>Unfortunately, it is quite easy in C (and trivial in C++) to wander into such territory without noticing and without doing anything that is obviously wrong for an average programmer. For example, <a href="http://jakob.engbloms.se/archives/1489">the program I described in my recent post about a bug demo</a> is strictly speaking using undefined behavior, and the compiler is theoretically free to generate a program that reformats the harddrive or does nothing at all. In practice, I don&#8217;t think compiler designers strive to generate truly creative unexpected results. Rather, if you detect unclear statement in the program, you would emit a warning.</p>
<p>Still, I can identify with this kind of adversarial relationship with a compiler. It really feels like being in a game with a <a href="http://en.wikipedia.org/wiki/Rules_lawyer">rules lawyer</a>, where you read the rules (programming language book) and try to figure out how to play the better than your opponent.</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1504"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1504" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1504" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1504/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>EETimes Articles on Simics</title>
		<link>http://jakob.engbloms.se/archives/1500?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1500#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:26:38 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[EETimes]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1500</guid>
		<description><![CDATA[I just had two articles published the Embedded Design part of the EETimes. First, &#8220;Rethink your project planning with a virtual platform&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png"><img class="alignright size-full wp-image-155" title="eetimes logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png" alt="" width="127" height="56" /></a><a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png"><img class="alignleft size-full wp-image-1501" style="margin: 5px 10px;" title="simics logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png" alt="" width="44" height="44" /></a>I just had two articles published the Embedded Design part of the <a href="http://www.eetimes.com">EETimes</a>.</p>
<p>First, &#8220;<a href="http://www.eetimes.com/design/embedded/4226939/Rethink-your-project-planning-with-a-virtual-platform?Ecosystem=embedded">Rethink your project planning with a virtual platform</a>&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work much better.</p>
<p>Then we have &#8220;<a href="http://www.eetimes.com/design/embedded/4227781/Transporting-bugs-with-virtual-checkpoints?Ecosystem=embedded">Transporting bugs with virtual checkpoints</a>&#8220;, which is a shorter, popular science, version of the paper I published last year at <a href="http://jakob.engbloms.se/archives/1231">S4D</a>. This describes how you can use checkpointing in a virtual platform to communicate bugs across time, space, and teams.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1500"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1500" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1500" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1500/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Stop, Think, and Tie Your Shoes Right</title>
		<link>http://jakob.engbloms.se/archives/1492?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1492#comments</comments>
		<pubDate>Wed, 21 Sep 2011 18:09:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1492</guid>
		<description><![CDATA[There is a new post at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better. It is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /> <a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg"><img class="alignright size-full wp-image-1494" style="margin: 5px 10px;" title="shoes 2" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg" alt="" width="333" height="300" /></a> There is a <a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">new post </a>at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better.</p>
<p>It is way too common to be so busy running around being inefficient that there is no time to think about how to become more efficient. Change also requires some discipline to actually keep pushing at habits until they change for the better.</p>
<p><a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">All of this can be illustrated by tying shoes. </a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1492"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1492" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1492" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1492/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Bug Doesn&#8217;t Work!</title>
		<link>http://jakob.engbloms.se/archives/1489?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1489#comments</comments>
		<pubDate>Wed, 14 Sep 2011 03:27:07 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[compilers]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[VxWorks]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1489</guid>
		<description><![CDATA[Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, finding a good bug when you need one isn&#8217;t always easy.  The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png"><img class="alignleft size-full wp-image-982" title="butterfly" src="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png" alt="" width="90" height="91" /></a>Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, <a href="http://jakob.engbloms.se/archives/975">finding a good bug when you need one isn&#8217;t always easy</a>.  The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in a VxWorks program.</p>
<p>It is pretty very nice demo in which you first start a period program A, which prints the value of an incrementing counter every target second.  You then run a supposedly unrelated program B, resulting in the values that program A prints to become corrupted.  Perfect to show off reverse execution and data breakpoints in reverse as you go from the point where the corrupted value is printed to the piece of code that overwrote the variable.</p>
<p>But then I ported the demo to a new platform&#8230; and the bug didn&#8217;t work anymore. My bug had caught a bug and was now not working, or at least not they way I expected it to. What had happened?<br />
<span id="more-1489"></span><br />
Very simple. I changed the compiler I used. Since my bug relied on an unspecified behavior in C, the change was totally valid and really expected.  Still, it was interesting to see how things played out&#8230; in the end, we got a different bug from the same code thanks to the change.</p>
<p>The code is essentially the following, with some simplifications that make it easier to read for those not familiar with VxWorks, and ignoring all the code to start tasks initially.</p>
<pre>// Global variables
int     iDataArray[100];
int     myWdISRcount;
WDOG_ID myWatchDogId;

// Periodic task - program A
void myWdISR(void)
{
  /* Increment ISR invocation count */
  myWdISRcount = myWdISRcount+1;
  printf("wd Fired %d times\n",myWdISRcount);

  /* Start off next invocation */
  wdStart (
    myWatchDogId,
    WD_INTERVAL,
    (FUNCPTR) myWdISR,
    (int) NULL
  );   
}

// Overwrite code - program B
int myCompletelySafeRoutine(void)
{
  uint32_t *a,i;
  a = iDataArray;   
  // This loop writes one word beyond the
  // limit of the iDataArray
  for (i=0; i&lt;=100; i++) {
     a[i] = 0x7fffffff;
   }   
   return OK;
}</pre>
<p>In the original setup, compiled for a Power Architecture target, iDataArray ended up right before myWdISRcount in memory.  Thus, the buffer overflow changed the value of the counter from something like 10 to 0x7fffffff.  Very noticeable in the printouts from the periodic task.</p>
<p>When I changed to an x86 target (using a compiler from the same family, but obviously with a different code generator since the target was different), the variable order in memory changed and it seems that we got iDataArray placed last.  Suddenly, the effect of running the safe code was that nothing happened at all.  A bit annoying for a demo.  Some small source-code changes and a recompile later, the effect was instead to crash the target with a triple fault (page fault inside a page fault handler). Seems the program now managed to corrupt some kernel state. While impressive as a bug, it was not quite what I was looking for.</p>
<p>I then changed the compiler type to compiler 2, and the data layout changed once more.  This produced a very useful bug, but it took me a while to actually understand this.  Now, when program B was run, program A stopped.  This looked like a bug in my program, and I actually started trying to fix this &#8211; until I realized that this was the bug I was looking for.  Running program B kills program A is just as good a bug as corrupting a counter value, after all.  In this case, the array overrun hits the myWatchDogId variable, and when that gets corrupted, the wdStart call ignores the request since it does not recognize the ID it gets.</p>
<p>So, in the end, I got a bug that was just as good as the first, and arguably a bit more intruiging. It is still obviously a contrived example &#8211; but I think that a good demo or lab exercise can be artificial as long as it gets the point across.  Judging from how people who have done the lab reacts, the goal seems to have been<br />
achieved.</p>
<p>The moral of the story is really that compilers are free to change things which are explicitly implementation-defined or not specified at all in the C standard. That is a good thing as it gives the compiler freedom to optimize the code. If you want to control how variables are laid out in memory, I guess you have to resort to linker scripts or similar &#8211; but that was too much pain for me in this case. Just changing things around until I got a good bug, and then freezing the binary (and not recompiling it ever again) is a sufficient strategy.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1489"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1489" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1489" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1489/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: How to Get Virtual</title>
		<link>http://jakob.engbloms.se/archives/1474?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1474#comments</comments>
		<pubDate>Tue, 02 Aug 2011 12:10:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[development methodology]]></category>
		<category><![CDATA[Functional models]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1474</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about how you build virtual platforms with Simics. The post is more about the methodology than the nature of models, cycle accuracy, endianness, and all the other details of virtual platform modeling. I have written about modeling methodology on this blog too, and in particular I [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about how you <a href="http://blogs.windriver.com/engblom/2011/08/how-to-get-virtual-.html">build virtual platforms with Simics</a>. The post is more about the methodology than the nature of models, cycle accuracy, endianness, and all the other details of virtual platform modeling. I have written about modeling methodology on this blog too, and in particular I would recommend looking at &#8220;<a href="../archives/1317">Two perspectives on modeling</a>&#8220;.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1474"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1474" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1474" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1474/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Register Design Languages &#8211; DSL or not?</title>
		<link>http://jakob.engbloms.se/archives/1462?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1462#comments</comments>
		<pubDate>Wed, 27 Jul 2011 20:20:25 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[Register Design Languages]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1462</guid>
		<description><![CDATA[Recently, Gary Stringham has been running a series of interviews with providers of register design tools on his website. Register design tools seems to be an active area with several small companies (and some open-source tools) fighting for the market. I have written about Gary Stringham and register designs before, and it is an area [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px 10px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>Recently, Gary Stringham has been running a series of <a href="http://www.garystringham.com/rdt.shtml">interviews with providers of register design tools on his website</a>. Register design tools seems to be an active area with several small companies (and some open-source tools) fighting for the market. I have written about Gary Stringham and register designs <a href="http://jakob.engbloms.se/archives/358">before</a>, and it is an area that keeps fascinating me. There is something about the task of register design that keeps it separate from the main hardware design languages, tools, and flows.The different approaches taken by the tools supporting the register design task also illustrates some points about programming language standards, domain-specific languages, and exchange formats that I want to address.</p>
<p><span id="more-1462"></span>First of all, what makes register design such a special task? My guess is because it is a higher-level aspect of the design: it  describes an interface, which can be implemented in many different ways  in hardware. Regular HDLs do not help you reason about register layouts in a good way. The register specification should also be used in other places than the hardware: in address definitions for device drivers, in manuals, and other documentation. All of this indicates that the information needs to be explicit, high-level, and in a format that facilitates automatic processing by tools. It basically screams for a domain-specific language (DSL) at some level of ambition (read my old post about how <a href="http://jakob.engbloms.se/archives/747">languages are grown from problems </a>for more background on just how simple a DSL can be).</p>
<p>If you look at the register design tool vendors that Gary interviewed, you can essentially see two different approaches to how to support the task of register design. In particular, the input language differs quite radically between the tools.</p>
<p>There is <em>domain-specific programming language</em> approach. A register-design DSL makes it easy to work with register designs, and pretty hard to do anything else (imagine writing a sorting algorithm in a register-design language &#8211; I don&#8217;t think it can be done, and if it can be done, the result has to be absolutely contorted). The DSL approach assumes a willingness  on the part of a user to learn a new language, but it seems that for this domain, the languages are simple enough that learning them saves you time in the end. Especially if you maintain large register maps that keep changing or is updated and extended across generations of chips (indeed, ease of maintenance is an undersold aspect of most DSLs in my experience).</p>
<p>The underlying assumption of the DSL approach is that the entry language is not too important, as long as you can export data into other formats. To me, this makes sense. From a single source with sufficient descriptive power, you can ideally generate both HDL code to implement the decoder in hardware, as well as documentation, header files, and maybe even skeletons for virtual platform models. And standard formats like IP-XACT to feed into any other tool.</p>
<p>It does not matter if there is a single tool using each input language, since the outputs are what matters. This leaves the vendors free to invent on the input side, and provide a really powerful tool.</p>
<p>The second approach is <em>transformation-based</em>. Its idea is to not use a dedicated input language, but rather powerful import functions to read whatever specifications already exist in text files, Microsoft Word documents, Microsoft Excel spreadsheets, Framemaker documentation files, IP-XACT, or whatever you can come up with.  The assumption is that the tool is really about transforming data and not about compiling a language. Register designs kind of already exist, and since there is no standard language to compile, you just import whatever there is. It makes the assumption that the user base is not willing to learn a new language to handle register design. A funky side-effect is that we might end up actually doing programming work in <a href="http://www.garystringham.com/rdt/AgnisysInterview.shtml">Word docs</a>.</p>
<p>Still, the transformation approach ends up generating the same outputs as the DSL approach. In  both cases, some of the outputs can be considered &#8220;industry standards&#8221;, like IP-XACT, while others are decidedly ad-hoc, like Excel sheets. To me, this demonstrates that the true value in standards is to enable tool interoperability, and not so much as a way to provide inputs.</p>
<p>Indeed, there is a clear difference between good input languages and good exchange formats. An input language tends to drive for expressive power and human readability, and it is OK to have a heavy compilation process associated with it. An exchange format like IP-XACT does not need to be human-readable, nor efficient to code in. It needs to be easy to parse and compile, so that as many tools as possible can work on it.</p>
<p>A typical example from the field of register design is dealing with repeated groups of registers (such as banks of registers for a DMA channel) and parametrized register designs. The input languages used by the tools that Gary Stringham covers all allow this &#8211; while the standard for register design exchange, IP-XACT, only uses a flat compiled map. It just lists all registers with sizes and offsets, not how these offsets were arrived at or if there is some logical grouping or iterated structure. Quite OK for a tool to work on, but not very useful as a repository for the actual information.</p>
<p>This distinction is worth to keep in mind.</p>
<p>I definitely side with the DSL idea, as that fits my idea of a good tool, but I can see why many people find the transformation from other formats attractive. In all cases, the goal is to have an original design specification that is as clear, succinct, and flexible as possible.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1462"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1462" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1462" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1462/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Software at the Toddler Stage</title>
		<link>http://jakob.engbloms.se/archives/1445?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1445#comments</comments>
		<pubDate>Thu, 30 Jun 2011 09:48:56 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1445</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about the &#8220;Toddler stage&#8221; of software in general and Simics 4.6 in particular. To me, software in an early stage of development shares many traits with a toddler: The software will sometimes behave as expected, but more often than not it will not. It will [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about the &#8220;<a href="http://blogs.windriver.com/engblom/2011/06/software-at-the-toddler-stage.html">Toddler stage</a>&#8221; of software in general and Simics 4.6 in particular. To me, software in an early stage of development shares many traits with a toddler:</p>
<blockquote><p>The software will sometimes behave as expected, but more often than not  it will not. It will do something else, or crash, or generally do the  wrong thing.  It is very much like a toddler &#8211; you can rely on it to  some extent, but you never know when it will decide to do something  unexpected, funny, or just throw a fit over something that would have  seemed inconsequential.</p>
<p>&nbsp;</p></blockquote>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1445"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1445" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1445" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1445/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software is Concrete &#8211; is it?</title>
		<link>http://jakob.engbloms.se/archives/1411?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1411#comments</comments>
		<pubDate>Sat, 07 May 2011 20:13:30 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Robert Howe]]></category>
		<category><![CDATA[verum]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1411</guid>
		<description><![CDATA[I this quote: Software is Concrete. Once poured it becomes extremely difficult and very expensive to change. It comes from a blog post by Robert Howe, CEO of Verum, a company selling formal-methods-based and model-based programming tools. It does capture something of the phenomenon we all know: that software can be pretty darn hard to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/construct.png"><img class="alignleft size-full wp-image-1318" title="construct" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/construct.png" alt="" width="96" height="96" /></a>I this quote:</p>
<blockquote><p><em>Software is Concrete. Once poured it becomes extremely difficult and very expensive to change.</em></p></blockquote>
<p>It comes from a <a href="http://community.verum.com/blog/veritas.aspx?post=%E2%80%9Csoft%E2%80%9D_in_software_is_an_illusion">blog post by Robert Howe, CEO of Verum</a>, a company selling formal-methods-based and model-based programming tools. It does capture something of the phenomenon we all know: that software can be pretty darn hard to change, once it has shipped and is in use. It fits well with the fact that the later bugs are found, the more expensive they are to fix.</p>
<p>But it also provoked quite a bit of opposition when I put the quote up on Facebook, and I have to agree that maybe not all is as simple as that blog makes it out to be.</p>
<p><span id="more-1411"></span>The main problem as I see it is that it compares the development of software to the construction phase of a bridge (or other civil engineering structure). When software development is really much more like the architecture and exploration work that goes on long before the bridge is starting to be built. The blog post recognizes this in a way, I admit that. But you cannot escape the feeling that the fundamental thinking behind it all is waterfall-based. It never says so, but it feels like that.</p>
<p>Which is a shame &#8211; I cannot see that there is any fundamental dichotomy between being iterative, agile, and changing things as requirements clarify, and using model-based development and formal methods.  The idea of model-based development is to program at a higher level of abstraction to get more efficiency. And programming at a higher level of abstraction fundamentally <em>supports</em> agile and iterative development, since the amount of &#8220;code&#8221; you must change is smaller. Model-based development ought to support drastic and quick changes to software just like modern dynamic programming languages and powerful frameworks support it. Add in some model checking to find important classes of bugs (like deadlocks and dead-end states) automatically, and you have a very agile programming environment.</p>
<p>In software development, I don&#8217;t think there needs to be a great step from design and architecture to implementation. It seems to be more gradual, you build code as you go along and ideas get more and more detailed, and slowly you get to something fairly concrete and indeed fixed. The trick is to make this process gradual and find issues early, so that they are cheap to fix.</p>
<p>Overall, it seems that software is more like clay than concrete. It slowly goes from soft to hard&#8230; but you have a chance to make changes during the process.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1411"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1411" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1411" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1411/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>msys git &#8211; error could not allocate cygwin heap</title>
		<link>http://jakob.engbloms.se/archives/1403?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1403#comments</comments>
		<pubDate>Wed, 04 May 2011 11:12:04 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[off-topic]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[msysgit]]></category>
		<category><![CDATA[tortoisegit]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1403</guid>
		<description><![CDATA[I am using TortoiseGit on Windows for a while now, and it works OK. However, today, it just stopped working. The error I got persistently was: 0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487 AllocationBase 0x0, BaseAddress 0x68540000, RegionSize 0x480000, State 0x10000 c:\msysgit\bin\sh.exe: *** Couldn't reserve space for cygwin's heap, Win32 [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/05/tortoise-git-logo.png"><img class="alignleft size-full wp-image-1404" title="tortoise git logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/05/tortoise-git-logo.png" alt="" width="95" height="55" /></a>I am using <a href="http://code.google.com/p/tortoisegit/">TortoiseGit </a>on Windows for a while now, and it works OK. However, today, it just stopped working. The error I got persistently was:</p>
<pre>0 [main] us 0 init_cheap: VirtualAlloc pointer is null,
Win32 error 487 AllocationBase 0x0, BaseAddress 0x68540000,
RegionSize 0x480000, State 0x10000
c:\msysgit\bin\sh.exe:
*** Couldn't reserve space for cygwin's heap, Win32 error 0
</pre>
<p><span id="more-1403"></span>More than mildly annoying.</p>
<p>I tried searching the web, and found a number of discussions on similar issues. It was not easy to find one that worked, but in the end it turns out that playing with the base address of the <tt>msys-1.0.dll</tt> file worked. The error is not really in TortoiseGit per se, but rather in <a href="http://code.google.com/p/msysgit/">msysgit </a> (which tortoisegit relies on to actually do its work).</p>
<p>The magic incantation that I wound up using:</p>
<pre>c:\msysgit\bin&gt;rebase.exe -b 0x50000000 msys-1.0.dll</pre>
<p>Posting it here for the benefit of any other poor soul who is hit by the same. Apparently, you might have to change to use different other addresses.</p>
<p>The details of my setup for reference:</p>
<ul>
<li>msysgit 1.7.4</li>
<li>tortoisegit 1.6.5.0</li>
<li>Windows 7, 64-bit</li>
</ul>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1403"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1403" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1403" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1403/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Being Helpful or Being Correct?</title>
		<link>http://jakob.engbloms.se/archives/1366?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1366#comments</comments>
		<pubDate>Fri, 11 Feb 2011 08:12:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1366</guid>
		<description><![CDATA[   ]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about <a href="http://blogs.windriver.com/engblom/2011/02/being-helpful-or-simply-correct.html">warnings in virtual platforms</a>. It is an art to add good warnings to virtual platform models, and just being correct visavi the hardware behavior is not necessarily that helpful for a software developer. A virtual platform should warn about suspicious operations, even if they are technically &#8220;correct&#8221;.</p>
<p>I also have to apologize for the slow blogging in January of 2011. There was too much going on at work and quite a few days taking care of sick kids. Hopefully, the pace can improve going forward.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1366"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1366" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1366" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1366/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modeling Endianness</title>
		<link>http://jakob.engbloms.se/archives/1336?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1336#comments</comments>
		<pubDate>Sun, 26 Dec 2010 15:58:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[big-endian]]></category>
		<category><![CDATA[endianess]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[little-endian]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1336</guid>
		<description><![CDATA[Endianness is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way. This blog post describes how I am used to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png"><img class="alignleft size-full wp-image-1337" style="margin: 5px 10px;" title="egg" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png" alt="" width="74" height="66" /></a><a href="http://en.wikipedia.org/wiki/Endianness">Endianness </a>is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way.</p>
<p>This blog post describes how I am used to working with endianness in virtual platforms, and why this approach makes sense to me. There are other ways of dealing with endianness, with different trade-offs and overriding goals.</p>
<h2><span id="more-1336"></span>Fundamentals</h2>
<p>What is endianness? In my way of looking at it, it is the arbitrary solution to the problem you get when a large unit of information (say, a 32-bit word) needs to be stored as a set of smaller units (say, 8-bit bytes). When this happens, you need to split the large unit into smaller units, and decide on how to order the smaller units. There is no objectively better or worse way to do this &#8211; as long as the result is unambiguous and based on positional numerics (i.e., no roman numerals, please), it is hard to claim that one order is better than another.</p>
<p>We use &#8220;endianness&#8221; all time without really thinking about it, when we write regular decimal numbers. In our <a href="http://en.wikipedia.org/wiki/Hindu_numerals">standard </a>base-10 decimal writing system, any value &gt;9 has to be written down using multiple digits. The order we use is a big endian representation: the most significant numbers come first in our reading order (hundreds before tens before single digits, etc.).</p>
<p>In computer architecture, we have three main schools of endianness:</p>
<ul>
<li>No endian, where we never break things down to bytes but always operate on equal-size words (not very common in practice, but certain machines like the Microchip PIC have instruction ROMs as wide as the instructions, and no way to address components of the intructions)</li>
<li>Big endian, BE, where the most significant bytes are put first in order of ascending addresses. I.e., the &#8220;big end&#8221; comes first.</li>
<li>Little endian, LE, where the least significant bytes are put first</li>
<li>&#8220;Middle endian&#8221;, where the ordering differs for different sizes of data (<a href="http://en.wikipedia.org/wiki/Endianness">Wikipedia </a>mentions this, but I have never seen an example). I have heard stories about chips that also used different endianness to store data by different instructions (by misdesign, I am not referring to the Power Architecture load/store byte-reversed instructions).</li>
</ul>
<p>BE is the traditional choice of IBM and the major early RISC chips, with Power Architecture, MIPS, SPARC, and the zSeries as the most important representatives. LE is the choice of x86, and more recently ARM. MIPS also seems to be gravitating towards LE, probably as a way to make x86 software slightly easier to port. Note that even though some processor cores are described as endianness-neutral, that really means that they can run as either LE or BE. In practice, particular chip designs incorporating such cores tend to lean heavily towards one endianness, since devices are designed for a particular endianness.</p>
<h2>The Software View</h2>
<p>For me, the most important view of endianness is how the software sees it. When a program is running on any current architecture, it logically sees memory as an array of bytes. Inside the memory chips, we have a very different physical layout, usually with words much wider than a byte, as well as an addressing scheme that is not one-dimensional. The interconnect (&#8220;bus&#8221;) moving data from a processor to memory and back is a complex system containing caches, buses of different widths (usually 64 bits or more), memory controllers, cache controllers, bus bridges, and other devices. All of this is usually completely invisible to software, as illustrated below:</p>
<p style="text-align: center;"><img class="aligncenter" style="margin-top: 5px; margin-bottom: 5px;" title="endianness 1" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-1.png" alt="" width="504" height="389" /></p>
<p style="text-align: left;">Basically, the bus system is invisible. The important endianness property as far as software is concerned is the order in which bytes are put into memory, and memory is considered as an array of bytes (since a byte is the smallest unit of addressing). If you look at the memory of a computer system using a debugger, this is the view you will get &#8211; both for on-target and off-target debuggers like ICE units and JTAG debuggers. Each memory access (store or load) will logically pass a small array of bytes into some position in the very large array that is memory.</p>
<h2 style="text-align: left;">The Modeling View</h2>
<p style="text-align: left;">Modeling endianness is not optional when building a virtual platform. The software will at some point assume a certain relationship between word layouts and byte addresses in memory (such as overlaying a byte array on an integer in a C union), or when interpreting network packets (which are defined to use BE byte order, and therefore network code has to convert values to native endian to process them).</p>
<p style="text-align: left;">If you start from the software view of endianness and memory, the obvious simulation model for memory operations is to maintain the array of bytes view of memory matching the physical target.</p>
<ul>
<li>Each memory access from a simulated processor gets turned into a transaction in the simulator.</li>
<li>The transaction has variable size, matching the size of the memory access operation issued by the processor.</li>
<li>The transaction contains a sequence of bytes, in the same order as they would end up in target memory on a physical machine. I.e., the order reflects the endianness of the processor.</li>
<li>The transaction has a starting address (byte-based) matching the memory access the processor issues.</li>
<li>The contents of the memory model in the simulation is an array of bytes, and its content matches what you would find on the physical target &#8211; the logical software view of the target.</li>
<li>The bus system connecting the processor to the memory is basically considered as a black box that just moves the transaction to memory.</li>
</ul>
<p>The above is very easy to implement, and actually a very convenient implementation for someone used to the software view of hardware. The only thing that remains to be considered is how a processor simulator is implemented in practice.</p>
<p>In a typical processor simulator, you represent the target system registers using words of the same size as the target processor uses. I.e., for a 32-bit processor, you use 32-bit words on the host to represent the contents of a register. As the processor model is running, the contents of the register might have to stored in data structures internal to the processor (such as an array of words representing the register file). Naturally, such data structures are kept in host endianness since they are just plain compiled C code. As the processor model runs, arithmetic is carried out using host endianness.</p>
<p>Actually, usually no endianness is involved as the values are considered as words. Remember that a word does not have endianness until it is broken down into bytes and someone actually looks at the bytes. In particular, an operation like</p>
<pre>uint8  a;
uint32 b;
a = (b &amp; 0xff)</pre>
<p>will pick up the 8 lowest bits of a word on any processor. The code is logically working inside of registers and is perfectly portable. However, the result of</p>
<pre>uint32 *c;
*c = b;
a = *((uint8 *)c);</pre>
<p>will pick up the first (at the lowest address) byte stored in memory when b was written &#8211; which is the same as the above on an LE processor, but different on a BE processor. The crucial observation here is that the latter variant contains an explicit store of a word, and an explicit load of a byte. Thus, endianness enters as we store the word (the byte load has no endianness, as it is loading the smallest unit of addressability).</p>
<p>What this means is that a processor simulator will have to do an explicit ordering of bytes as it is writing out values to memory. The simulator will need to take a word it has represented in &#8220;host order&#8221; (as it is within the simulator itself) and convert it to the byte order of the target processor. If the two match, such as simulating a little-endian ARM target on a (always little-endian) x86 host, nothing needs to be done. If they do not match, such as simulating a big-endian PPC target on an x86 host, the bytes have to be swapped before being sent to simulated memory.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png"><img class="aligncenter size-full wp-image-1340" title="endianness 2" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png" alt="" width="422" height="368" /></a>When the processor does a load, it similarly has to swap the bytes being read from memory (if using different target and host endianness).</p>
<p>As soon as we leave the processor simulator, the order of bytes in transactions and simulated memory has to defined and managed in a host-independent way. This is crucial to enable  snapshots of memory to be <a href="http://blogs.windriver.com/engblom/2010/08/transporting-bugs-with-checkpoints.html#more">shared across hosts, time, and space</a>, and simply to allow the simulation to work correctly. The semantics of the simulation must be defined by the simulator, not by the nature of the host.</p>
<p>Note that as an optimization, quite often we do not create an explicit transaction, but rather use the optimization of letting the processor simulator write directly to the representation of the target memory in the memory simulator. In this design, the target memory representation is just an array of bytes mirroring the contents that the processor would see on a physical target.</p>
<p>Let&#8217;s go through this with a simple example. We assume we are on an x86 host. Our processor simulator contains a 32-bit register with the value 0&#215;01020304. This value is endianless until we have to send it to simulated memory, it is just a value of 32 bits. We write it to target memory at address 0&#215;100</p>
<p>On a simulated LE target, the memory write will result in a transaction containing the byte sequence (0&#215;04, 0&#215;03, 0&#215;02, 0&#215;01) &#8211; lowest byte comes first. The memory model will store this with 0&#215;04 at address 0&#215;100, 0&#215;03 at 0&#215;101, etc. The processor model can achieve this effect by simply doing a host-native word store to the memory array.</p>
<p>On a simulate BE target, the memory write will result in a transaction containing (0&#215;01, 0&#215;02, 0&#215;03, 0&#215;04). In memory, 0&#215;01 will be stored at address 0&#215;100, 0&#215;02 at 0&#215;101, etc. To store this word correctly, the processor model will have to do a byte swap operation on the word before writing it out to memory. Such a byte swap operation might seem expensive, but the evidence does not indicate that it matters. All the fastest instruction-set simulators use this method internally as far as I know (Wind River Simics, Imperas OVP, Qemu, IBM Mambo), which to me indicates that the design works well on a simulation system level.</p>
<h2>Device Models</h2>
<p>Device models are the main part of a functional simulator for a computer system. They also have endianness, as they expose memory-mapped interfaces to software. To deal with devices in a consistent manner, they will interpret inbound memory transactions using their local register endianness. This makes it simple and reliable to simulate systems where the processor and the devices have different endianness.</p>
<p>Systems with mixed device endianness is very common, mostly thanks to PCI. PCI is defined to use little-endian byte ordering in all memory accesses, as it originated in the x86 world. PCI is still being used in almost all computer systems, and thus LE PCI devices are being connected to BE processors.</p>
<p>Internally, a device model will also use words to represent data. When data is written to a device, it will interpret the bytes in the write transaction using its local order. When data is read from a device, it will fill in the data in the read transaction using its local order.This makes device drivers that byte-swap incoming data from an LE PCI device on a BE processor work just like they do on physical hardware.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png"><img class="aligncenter size-full wp-image-1341" title="endianness 3" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png" alt="" width="473" height="414" /></a>This makes endianness a local property of the device. The same device model can be used without change in both an LE and a BE target system. This mirrors reality: PCI devices are used in all kinds of systems, and the devices do not change, and neither do the models have to.</p>
<p>In some systems, the designers try to hide the RISC-processor-to-PCI endianness mismatch by making the hardware swap bytes around as they move from the memory bus into the PCI subsystem. If this is the case in a target system, the simplest simulation method is to insert an byte-swapping intermediary on the path from the processor to the devices. This will do an extra byte swap on all transactions passing by, and things will work correctly (note that this byte swap has to be defined to work on a certain word length, and if transactions are bigger than this length, you will also have to order the words).</p>
<p>Note that as long as all units involved on the path from a device to a processor use the same word length, you can replace all the byte swapping operations with a simple flag. This flag will indicate if a transaction has been swapped or not. For example, when we have a BE processor talking to a BE device, on an LE host. The BE processor will flag the transaction as &#8220;wrong-endian&#8221; as it sends it out but actually store the bytes in LE order in the transaction. The BE device will check the flag and realize that it is wrong-endian too. And since two wrongs make a right, it does not have  to swap the bytes either but can copy the transaction contents directly into its internal registers.</p>
<h2>Dealing with Data</h2>
<p>There are other things you want to do with a memory image in a virtual platform apart from reading and writing it from a processor. One particular task is to move data into and out of memory model in order to load code and data, as well as to save the state of the system. The representation of a memory as an array of bytes works very well for this approach, since it corresponds naturally to how software files are created on the host. Since most software files are intended to be loaded by the target into target memory, they are prepared in target byte order. Another advantage of using a byte-based memory representation is that file formats like ELF can be loaded straight into virtual memory without having to convert addresses.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png"><img class="aligncenter size-full wp-image-1344" title="endianness 5" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png" alt="" width="495" height="395" /></a>The representation is also host-independent, which facilitates moving memory images from one host to another, a key part of <a href="http://jakob.engbloms.se/archives/1235">using virtual platforms as a communications mechanism</a>. Another benefit of viewing memory as an array of bytes as accessed from a processor is that debuggers can look at memory in the same way as they would when running on the same host.</p>
<h2>Summary</h2>
<p>This long post (WordPress tells me it is more than 2500 words) really only starts to scratch the surface of this fascinating topic. It has described one approach to endianness modeling, and some of the subtleties involved. There are many more subtleties that we could go into.</p>
<h2>Footnote: SystemC TLM-2.0</h2>
<p>There are other ways to model endianness. In particular, the approach described here is not used in the SystemC TLM-2.0 standard. In TLM-2.0, all data is stored in a transaction in <em>host</em> order, not target order. To model the target endianness, you instead change a descriptor array that tells the simulator about how to interpret the bytes when viewed from the target.</p>
<p>As I see it, this means that TLM-2.0 is better suited for modeling the ins and outs of a bus system, including discovering how data ends up at a target from the actions of the various components of the bus system. It models byte lanes and the width of buses, and uses host byte order for all transfers of data. In contrast, the approach described in this blog post works by modeling the documented (or intended) effect of the hardware at the software level.</p>
<p>Overall, I would say that TLM-2.0 is slightly more geared towards the &#8220;<a href="http://jakob.engbloms.se/archives/1083">design&#8221; use of modeling, rather than &#8220;describe</a>&#8220;. By modeling bus widths, actual byte lanes, and other concepts, the simulator will discover the shape and endianness of data as it arrives at a target memory or device.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1336"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1336" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1336" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1336/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Iterative Hardware-Software Interface Design</title>
		<link>http://jakob.engbloms.se/archives/1356?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1356#comments</comments>
		<pubDate>Wed, 22 Dec 2010 19:00:10 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1356</guid>
		<description><![CDATA[   ]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about <a href="http://blogs.windriver.com/engblom/2010/12/iterating-the-hardware-software-design.html">iterative hardware-software interface design</a>. It is a discussion with some examples of why hardware designers would do well to use virtual platforms to include software designers in the loop when designing new devices and their programming interfaces.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1356"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1356" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1356" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1356/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

