<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; virtual machines</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/virtual/virtual-machines/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Wind River Blog: Surfing the Web with Netscape 4</title>
		<link>http://jakob.engbloms.se/archives/1485?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1485#comments</comments>
		<pubDate>Wed, 14 Sep 2011 03:06:25 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[history of computing]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1485</guid>
		<description><![CDATA[Just for fun, I tried to surf the web of today using a Netscape 4 browser from 2001. The result: not exactly useful. Netscape 4 was bad back then, and it does not work at all with the current style of web coding. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />Just for fun, I <a href="http://blogs.windriver.com/engblom/2011/09/surfing-the-web-with-netscape-4-1.html">tried to surf the web of today using a Netscape 4 browser </a>from 2001.</p>
<p>The result: not exactly useful. Netscape 4 was bad back then, and it does not work at all with the current style of web coding.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1485"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1485" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1485" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1485/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Working Smarter, not Harder</title>
		<link>http://jakob.engbloms.se/archives/1380?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1380#comments</comments>
		<pubDate>Thu, 24 Feb 2011 13:16:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1380</guid>
		<description><![CDATA[   ]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, <a href="http://blogs.windriver.com/engblom/2011/02/working-faster-and-with-less-sweat.html">about how you can use a virtual platform to complete work faster</a>. Not by making the virtual platform execution of target code faster, but by optimizing the way you work and taking advantage of the features of a virtual platform.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1380"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1380" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1380" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1380/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Additional Notes on Transporting Bugs with Checkpoints</title>
		<link>http://jakob.engbloms.se/archives/1231?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1231#comments</comments>
		<pubDate>Wed, 15 Sep 2010 05:38:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[S4D]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1231</guid>
		<description><![CDATA[This post features some additional notes on the topic of transporting bugs with checkpoints, which is the subject of a paper at the S4D 2010 conference. The idea of transporting bugs with checkpoints is some ways obvious. If you have a checkpoint of a state, of course you move it. Right? However, changing how you [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" style="margin: 5px 10px;" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>This post features some additional notes on the topic of transporting bugs with checkpoints, which is the subject of a paper at the <a href="http://www.ecsi.me/s4d">S4D </a>2010 conference.</p>
<p>The idea of transporting bugs with checkpoints is some ways obvious. If you have a checkpoint of a state, of course you move it. Right? However, changing how you think about reporting bugs takes time. There are also some practical issues to be resolved. The S4D paper goes into some of the aspects of making checkpointing practical.</p>
<p><span id="more-1231"></span>In particular, we need the checkpoints to be:</p>
<ul>
<li>Portable &#8211; so that checkpoints can be copied around between computers</li>
<li>Deterministic- so that everyone opening a checkpoint sees the same behavior</li>
<li>Compact &#8211; so that they can actually be moved around without incurring undue pain</li>
<li>Differential &#8211; so that a checkpoint can build on previous state and just contain a set of changes, not the entire state of the target system</li>
</ul>
<p>Most of my paper is spent on how to make checkpoints small enough to be easily transported, and how it fits with development workflows. The requirements above would seem to be common sense, but there are checkpointing systems out there that do not fulfill them. In particular, the portability aspect is hard to get right.</p>
<p>There are other ways to achieve transportation of bugs, and this blog post will fill in on some related work that I could not fit into the paper or which I discovered only after the final version of the paper was submitted.</p>
<h3>Record-Replay Systems</h3>
<p>There seem to be boundless creativity in creating methods to record live systems and replay their inputs/outputs/internal behavior/other interesting behavior on another system to support debug or analysis or other tasks. It shows just how important the replication of bugs is to the development of systems, and just how hard it is to accurately capture a bug in practice.</p>
<p>The company called <strong>Zealcore </strong>was doing some interesting work in software-based recording of &#8220;only the relevant events&#8221;, and then replaying this on a lab machine. Their angle on the problem was to have software record a minimal trace of important events on a live system, and then control the runtime system in a lab to replicate the event trace. Making this efficient and precise was the subject of a sequence of research papers in the early 2000s. Zealcore was acquired by Enea in 2008, and I have not seen much from them since. From what I can tell, the Zealcore fundamental technology for recording on a live system (or at least the ideas) have been continued into a new company called <strong><a href="http://www.percepio.se/">Percepio</a></strong>.</p>
<p>Aa fundamental difference between these recording systems and checkpointing systems is that they do not capture the complete target system state in the way a checkpoint does. The recording is much more compact, but it does not really solve the same problem. It is not based on running the target inside a simulator (other than at the replay end). What the relative success of such recording system indicates, however, is that in many systems, there are &#8220;important&#8221; and &#8220;irrelevant&#8221; aspects of inputs and events and behaviors, and that recording and replaying only &#8220;important&#8221; aspects is often sufficient to trigger bugs.</p>
<p>You can also throw hardware at the problem.</p>
<p>Completely unexpectedly, I also found a reference to a hardware-based record/replay system in a <a href="http://cacm.acm.org/magazines/2010/8/96632-an-interview-with-edsger-w-dijkstra/fulltext">Communications of the ACM interview with Edsger Dijkstra</a> (a rerun of an <a href="http://www.cbi.umn.edu/oh/pdf.phtml?id=296">interview from 2002</a>). Apparently, during the early programming of the <strong>IBM 360</strong>, IBM realized that debugging interrupts was hard. The solution was to create a piece of special hardware which would record interrupts, and later replay them with precise timing. In this way, you achieved repeatable executions of the most difficult code there was. I must quote what Dijkstra says on this &#8220;throw money at the problem&#8221; approach:</p>
<blockquote><p>When IBM had to develop the software for the 360, they built one or two machines especially equipped with a monitor. That is an extra piece of machinery that would exactly record when interrupts took place and from where to where. And if something went wrong, it could replay it again and use the recorded history to control when interrupts would occur. So they made it reproducible, yes, but at the expense of much more hardware than we could afford. Needless to say, they never got the OS/360 right.</p></blockquote>
<p>The final comment is typical for Dijkstra&#8217;s thinking that debugging is just an indication that you did not get the program and design right from the start. That&#8217;s certainly true, and he would likely have considered my little S4D paper as an unnecessarily complicated solution to a problem that should not have existed in the first place.</p>
<p>I, however, find the idea of the monitor interesting. I think that building something like that today would be much more difficult, as chips are very highly integrated and the support for replaying interrupts would have to go right into the heart of an SoC. But it would be interesting if it could be done.</p>
<p>There is also a <a href="http://jakob.engbloms.se/archives/130">paper from 1969 that I wrote about a few years ago </a>that does include the idea of recording and replaying asynchronous external inputs to a simulator.</p>
<h3>Other Checkpointing Systems</h3>
<p>There might be some related use of checkpoints (or snapshots as they are more commonly known) in the development of game emulators.  There is clearly the ability to save game state in a portable way in emulators like MAME.  Such states can be useful to help debug the emulator, but in a different way from the approach that I presented.  In the emulator case, the state is really the state of the emulated target.  It is not the state of the emulator program itself. If game emulator snapshots were used to debug the game code, it would be the same situation as what I describe in the S4D paper.</p>
<p>As I understand it, this is more like a attaching an example document that makes a program crash to a bug report, rather than transporting the state of the emulator itself.</p>
<p>Going down in the level of abstraction, I have also been told that RTL simulators offers a similar ability and that they have used in a similar way. Since I am not at all familiar with that field, I would not comment on this in the paper.</p>
<p>Transporting RTL bugs using checkpoints makes perfect sense. In an RTL simulator, the target state is very clearly described in an unambiguous way with no  relationship to host state. Checkpointing should be easy to implement and checkpoints should be portable, anything else would be a poor implementation.  The simulation is also deterministic, assuming a reasonable implementation of the simulator. The simulated world is also encapsulated with a set of test cases, RTL simulations are too slow to be interfaced to the real world. If an RTL simulator is interfaced to something else, recording the incoming signals should be straightforward since they are at a very low level (bits, clocks, pin states).</p>
<p>The use of checkpointing with RTL also fits with a conversation I had in 2005 when Virtutech introduced reverse execution in Simics. At one of the tradeshows where we showed the technology, an older gentleman approach me and told me that he had done similar things with hardware simulators back in the 1980s. He immediately understood the implementation idea (checkpoints with deterministic replay), and sounded like he felt it was nothing much new.</p>
<p>Finally, at some other event last year, I saw an demonstration of an RTL-level tool where the trace of the execution was generated on one machine, but inspected on a different machine. That amounts to a portable trace, even if the data volumes were rather large (many GB) and essentially required the RTL simulator (or hardware accelerated emulator) to be sharing disks with the investigation machine. Still, nothing prevents such a solution from being remotely used. The main difference from what I describe is that here only the result of the execution (trace of signals) is transported, not an actual state snapshot that can be brought up to continue the execution in a different place.</p>
<p>If you have any other notes on this, please comment!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1231"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1231" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1231" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1231/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>S4D Paper on Transporting Bugs with Checkpoints</title>
		<link>http://jakob.engbloms.se/archives/1235?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1235#comments</comments>
		<pubDate>Tue, 31 Aug 2010 18:40:55 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[S4D]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1235</guid>
		<description><![CDATA[I have a paper about &#8220;Transporting Bugs with Checkpoints&#8221; to be presented at the S4D (System, Software, SoC and Silicon Debug) conference in Southampton, UK, on September 15 and 16, 2010. The core concept presented is to leverage Simics checkpointing to capture and move a bug from the bug reporter to the responsible developer. It [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" style="margin: 5px 10px;" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>I have a paper about &#8220;Transporting Bugs with Checkpoints&#8221; to be presented at the <a href="http://www.ecsi.me/s4d">S4D (System, Software, SoC and Silicon Debug) conference </a>in Southampton, UK, on September 15 and 16, 2010. The core concept presented is to leverage <a href="http://www.windriver.com/products/simics/">Simics </a>checkpointing  to capture and move a bug from the bug reporter to the responsible  developer. It is a fairly simple idea, but getting it to work  efficiently does require that some things are done right. See the longer <a href="http://blogs.windriver.com/engblom/2010/08/transporting-bugs-with-checkpoints.html#more">Wind River blog posting </a>about this topic for a few more details.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1235"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1235" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1235" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1235/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Interview with a Virtualization Researcher</title>
		<link>http://jakob.engbloms.se/archives/1223?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1223#comments</comments>
		<pubDate>Sun, 29 Aug 2010 07:44:15 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1223</guid>
		<description><![CDATA[Past Friday, I posted a new blog post in my Wind River blog. It is an interview the PhD student Girish Venkatasubramanian from the University of Florida. He is doing research on virtual machines/hypervisors and how they can be implemented more efficiently by making fairly small changes to the architecture of memory management units. The [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>Past Friday, I posted a new blog post in my Wind River blog. It is an <a href="http://blogs.windriver.com/engblom/2010/08/interview-with-girish-venkatasubramanian.html">interview the PhD student Girish Venkatasubramanian </a>from the University of Florida. He is doing research on virtual machines/hypervisors and how they can be implemented more efficiently by making fairly small changes to the architecture of memory management units.</p>
<p><span id="more-1223"></span></p>
<p>The area of virtualization is one that I would definitely have looked  at as an opportunity had I started out as a PhD student today. The work  of his group is a good example of how Simics is being used for <a href="http://blogs.windriver.com/engblom/2010/07/academic-simics.html">research and teaching in universities</a> around the world.</p>
<p>Going one level up in abstraction, I note that this is probably the first time I have published an actual interview. I have been active in writing things since high school, but it has pretty much always been direct writing, not interviewing.  However, I really hope that this is not the last. Having a series of user interviews on the Wind River blog could be really neat, as a way to dive deeply into some particular areas of technology. Will be interesting to see if any other university user is interested in being featured.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1223"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1223" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1223" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1223/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VirtualBox SMP</title>
		<link>http://jakob.engbloms.se/archives/1212?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1212#comments</comments>
		<pubDate>Fri, 20 Aug 2010 18:04:40 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[CECsim]]></category>
		<category><![CDATA[SMP]]></category>
		<category><![CDATA[VirtualBox]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1212</guid>
		<description><![CDATA[I listened to an interesting FLOSS Weekly interview with Adam Hall and Achim Hasenmuller of VirtualBox. For someone interested in virtual machines and hardware simulation, the interview was full of interested tidbits. I think the best part was the discussion on multiprocessing in Virtualbox. VirtualBox is able to give a guest OS up to 32 [...]]]></description>
			<content:encoded><![CDATA[<p>I listened to an interesting <a href="http://twit.tv/floss130">FLOSS Weekly interview </a>with Adam Hall and Achim Hasenmuller of <a href="http://www.virtualbox.org/">VirtualBox</a>. For someone interested in virtual machines and hardware simulation, the interview was full of interested tidbits. I think the best part was the discussion on multiprocessing in Virtualbox.</p>
<p><span id="more-1212"></span>VirtualBox is able to give a guest OS up to 32 virtual processors for its use. The system virtualizes cores so that you can allocate more cores than you have on your host. You don&#8217;t want more active cores than you have physically, or performance might suffer badly (there is a <a href="http://blogs.sun.com/jsavit/entry/virtual_smp_in_virtualbox_3">good blog post hosted by blogs.sun.com about VirtualBox 3 and its SMP support</a>, read it before Oracle decides to kill off all the old content&#8230;).</p>
<p>Second, the development of that SMP support only took some six months. But getting it to work right took 18 months. Sounds like a familiar story for parallelizing software of this kind. The interviewees made it sound like the code for this was utterly complex, and I can believe that too.</p>
<p>VirtualBox makes extensive use of hardware virtualization support on x86  hosts to enhance performance (Intel VT-X and AMD AMD-V). As they see  it, all other alternatives are inferior, including the VmWare tradition  of binary translation and patching. They claimed that VmWare actually  interpreted all of the code in a guest, but I find that a bit hard to  believe.</p>
<p>It is interesting to compare their approach to something like Simics, which was made parallel in the Spring of 2008 with Simics 4.0 Acclerator. The advantage an IT run-time virtual machine like VirtualBox has over a virtual platform like Simics in creating a parallel system is that they can make use of the cache coherence on the host to propagate information between the cores. A virtual platform cannot assume that the host and target are of the same type, which is necessary for this to make sense. It also makes VirtualBox entirely nondeterministic, but that is just plain normal for a physical computer system. One interesting intermediate form here is the IBM CECsim system, where a z-series mainframe is used to simulate a slightly different z-series mainframe, including running multiple simulated processors in parallel. CECsim also makes use of the hardware cache coherence and is nondeterministic.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1212"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1212" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1212" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1212/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FFast: Good Idea, Too Bad About the Implementation</title>
		<link>http://jakob.engbloms.se/archives/1114?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1114#comments</comments>
		<pubDate>Sun, 11 Apr 2010 19:23:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[Antoine Trouvé]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[FFast]]></category>
		<category><![CDATA[Kazuaki Murakami]]></category>
		<category><![CDATA[RAPIDO]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1114</guid>
		<description><![CDATA[I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the RAPIDO 2010 workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be [...]]]></description>
			<content:encoded><![CDATA[<p>I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the <a href="http://www2.lifl.fr/rapido/Rapido/Program.html">RAPIDO 2010</a> workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be used to build a fast instruction-set simulator, and in the extension, a full system simulator.</p>
<p>To me, this idea is worth exploring, since using a mature VM like the .net CLR (used in this paper) or a JVM would offer a shortcut to get high-quality code generation for a JIT compiler. It could also offer other benefits, as these environments support many advanced configuration and management features. I have touched on this topic before, in the posts &#8220;<a href="http://jakob.engbloms.se/archives/1008">Dream ESL Language</a>&#8221; (VM as the basis for a simulator) and &#8220;<a href="http://jakob.engbloms.se/archives/264">The JVM as Universal Parallel Glue</a>&#8221; (that a common VM can  offer huge benefits for an ecosystem).</p>
<p><span id="more-1114"></span></p>
<p>In the paper, the authors show how they have built an ISS for MIPS which runs at 1 MIPS in basic interpretive mode, but at up to 225 MIPS in the most optimized mode. Decent performance on a 2.6 GHz Core 2, but still an order of magnitude compared to the fastest commercial offerings available. However, I think these numbers are not particularly interesting or relevant.</p>
<p>First of all, they only check the performance on basic user-level programs with no I/O, since there is nothing but the CPU present and they thus cannot run an operating system. This makes the numbers essentially &#8220;peak&#8221; numbers, for small programs, which is not particularly realistic. Second, their implementation does not go straight to bytecodes, but rather to C# code. This is not how a high-performance solution would work, as it is obvious that they struggle to get performance even on these small benchmarks using that approach. Too much effort seems to be spent on gaming the C# runtime system and compiler, in my opinion.</p>
<p>Thus, the paper does not really reveal anything useful in terms of &#8220;is building a JIT ISS using a VM a viable idea&#8221;? It probably tells us that it is not necessarily a broken idea, but there is a lot of work to bring the solution up to the level of native C-based solutions.</p>
<p>It is clear that the implementation effort in this case is lower, and that the porting cost to new hosts is also very low, compared to the native C-based approaches used in current industrial solutions. C# should also be more productive than using something like C++ for building a software system.</p>
<p>The most interesting aspect of the idea, and one which the authors do not explore at all, is using the power of the .net CLR to build a dynamic full-system simulator. Using the CLR, it should be trivial to build a solution where hardware models can be separately compiled and loaded dynamically at runtime. Using .net &#8220;properties&#8221;, it might be possible to support user inspection of a running system. Maybe the .net programming tools offer some really interesting possibilities for the debugging of full-system simulators. However, none of this is currently explored, which is a real shame. I guess I could hope that the authors read this short critique  and get some more ideas for future work, as I really think that virtual platforms could be built on top of virtual machines.  That idea is worth exploring in research.</p>
<p>On the nitpicking side, as always when reviewing academic papers in this blog, the authors seem to be unaware of some very relevant previous work. In particular, they should have mentioned Qemu and Simics. They could have used Qemu for MIPS as a point of comparison to compare speeds between their approach and a native C approach. As it is right now, the reference list looks like a fairly random walk around the DAC and DATE communities, but with little insight into the actual virtual platform or full-system simulation tools available today.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1114"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1114" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1114" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1114/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SiCS Multicore Day 2009</title>
		<link>http://jakob.engbloms.se/archives/905?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/905#comments</comments>
		<pubDate>Mon, 07 Sep 2009 19:26:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Anders Landin]]></category>
		<category><![CDATA[CPP]]></category>
		<category><![CDATA[Ericsson]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Hazim Shafi]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[MCC]]></category>
		<category><![CDATA[Richard Kaufmann]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Visual Studio 2010]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=905</guid>
		<description><![CDATA[Last Friday, I attended this year&#8217;s edition of the SiCS Multicore Day. It was smaller in scale than last year, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from Hazim Shafi of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday, I attended this year&#8217;s edition of the <a href="http://www.sics.se/node/4360">SiCS Multicore Day</a>. It was smaller in scale than <a href="http://jakob.engbloms.se/archives/283">last year</a>, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from <a href="http://blogs.msdn.com/hshafi/">Hazim Shafi </a>of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a mid-day three-track session with research and industry talks from the Swedish multicore community.<span id="more-905"></span></p>
<p>I think that for next year, the organizers need to find keynote speakers that are not from the general computing multicore world. The Microsoft talk this year was a step in that direction, as it rather came from multicore programming than multicore hardware. Richard and Anders gave very interesting and good talks, no doubt about it. But it would have been nice with someone from ARM or Freescale or Tensilica or TI or ST or Ericsson or Cisco talking about the kinds of multicore embedded hardware that is being developed and used today. For example, the &#8220;next new thing&#8221; touted by the keynotes this year was GPGPU. Interesting for HPC and desktops, certainly. But pretty irrelevant for most of the people that I know. GPUs are huge, expensive, and power hungry.</p>
<p>GPGPU was one part of the theme this year. It is definitely catching on as <em>the </em>way to do number crunching in the desktop, server, and HPC world. It is not the universal panacea for any kind of parallelism, however, as Hazim and I noted in the panel discussion that ended the day. There are applications (such as <a href="http://www.virtutech.com/whitepapers/accelerator.html">parallel Simics</a>&#8230;) that scale well on general-purpose cores, but that will never ever work on GPUs. In general, the class of problems that work on GPUs is pretty limited to massive data-parallel problems like image and video manipulation.</p>
<p>In the eternal homogeneous vs heterogeneous debate (follow <a href="http://jakob.engbloms.se/archives/tag/homogeneous">the tags </a>in my blog for more posts on this topic), GPGPU was grudingly accepted as a good candidate for something that will not be homogeneized with the main processors. Additionally, Richard Kaufmann gave some hints that Intel or AMD are coming out with new chips with more accelerators on board&#8230; I guess it will be security, as is already done by Sun and <a href="http://jakob.engbloms.se/archives/80">IBM</a>. When I brought up the topic of more accelerators like pattern matching, compression, and the other things we see in chips from Freescale, Cavium, and others, the response was very &#8220;can only be economical for very high volume applications&#8221;.</p>
<p>It is striking how the GPGPU idea is bringing the classic telecommunications DSP-data plane/CPU-control plane division into the desktop and server space. Without any recognition being paid or any experience being reused from the 40 years that that has been done in telecoms and consumer electronics&#8230; as Jack Ganssle often says, us embedded folks get no respect.</p>
<p>In terms of programming, this year was all about general programming languages. Hazim from Microsoft talked about (and demoed) the quite pervasive addition of parallelism to both native C/C++ and managed .net code in Visual Studio 2010. Microsoft is dead serious about parallel programming, and are bringing out a whole set of different libraries and support structures to allow <a href="http://blogs.msdn.com/pfxteam/archive/2009/08/12/9867246.aspx">easier expression of parallel code</a>. In the &#8220;LINQ&#8221; data query language subset of C#, you could add some easy modifiers to &#8220;foreach&#8221; statements to make them parallel, for example. Having a language that is your own and which you can extend at will certainly pays off in terms of innovation here. C++ moves far slower than C#, that is becoming clearer and clearer. C# and its cousins in the .net system seem to be sneaking in lots of powerful language design ideas from places like Python, and also results from Microsoft&#8217;s powerful group of language researchers.</p>
<p>When I tried to bring up the idea of using domain-specific languages to program parallel applications, Hazim had the wonderful comment that &#8220;that might be applicable in certain domains&#8230;&#8221; &#8212; yes, that is the idea. By being narrow in terms of target domains, you gain expressive power and semantic insight that helps move programming from &#8220;how&#8221; towards &#8220;what&#8221;. But it sounds like domain-specific is a foul word inside of Microsoft &#8212; when the audience asked whether LINQ was not a exactly a domain-specific language for data access, Hazim was a pains to point out that it is Turing-complete and that someone had managed to write a Raytracer using it&#8230; interesting. This feels more political than market-based. I guess Micro</p>
<p>Richard Kaufmann had some interesting notes on throughput vs TTC (time-to-completion) jobs in servers. In the &#8220;cloud computing&#8221; era, throughput is much easier to scale: just add more servers. Classic HPC is more oriented towards TTC, as you do want your results within a reasonable time. Quite often, you can most work into a throughput-oriented style by simply running lots of jobs in parallel rather than pushing through a series of jobs sequentially. Note however that we have the entire field of real-time control, real-time communications, etc., that do not work like this. But that is not the market that HP is building servers for, or that Intel and AMD are servicing.</p>
<p>Outside the keynotes, Per Holmberg of Ericsson gave an interesting presentation on the adoption of multicore in the control plane of the <a href="http://www.ericsson.com/ericsson/corpinfo/publications/review/2002_02/161.shtml">Ericsson CPP </a>platform. The core of his talk was the observation that in these kinds of systems, multicore is not such a big revolution.</p>
<p>They have been distributed since the beginning. Thus, scaling by adding more processors (with local memories) is easy and multicore is only a packaging change from that. Also, most performance-intense operations are already offloaded onto DSP groups, network processors, ASICs, or FPGAs. There is not much parallelism left for the control plane to exploit. Essentially, only functions that unexpectedly become performance bottlenecks due to changes in traffic patterns are likely candidates for parallellization. Interesting point, and might be <a href="http://jakob.engbloms.se/archives/703">why the EETimes noted that multicore is slow to catch on in communications </a>(the article is a bit flawed).</p>
<p>Patrik Nyblom from Ericsson held a talk about how the <a href="http://www.erlang.org">Erlang </a>runtime engine was parallelized. From a practical perspective, the most interesting aspect was that this made applications parallel without changing a single line of code in the applications. Of course, applications had to be threaded to start with, but that is the most natural way in Erlang. He mentioned systems containing up to a quarter of a million threads &#8212; hard to do that in anything except Erlang.</p>
<p>He described how they had evolved from a simple implementation that worked well on synthetic benchmarks to a truly industrial-strength implementation. The difference was quite radical, as real codes feature more complex communications patterns, and make heavy use of device drivers and network stacks. This process forced the use of more and finer locks, and rethinking the balance between shared and separate heaps for threads.</p>
<p>They also had the opportunity to test their solution on a Tilera 64-core machines. This mercilessly exposed any scalability limitations in their system, and proved the conventional wisdom that going beyond 10+ cores is quite different from scaling from 1 to 8&#8230; The two key lessons they learned was that <em>no shared lock goes unpunished, </em>and <em>data has to be distributed as well as code.</em> Very interesting to hear this story from real software developers solving real problems.</p>
<p>The next multicore event taking place around here is the Second <a href="http://www.it.uu.se/research/upmarc/MCC09">Swedish WOrkshop on Multicore Computing </a>(MCC 2009), in Uppsala, November 26-27.</p>
<p>Update: note that the presentations from the event are available via <a href="http://www.multicore.se/">http://www.multicore.se/</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/905"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/905" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/905" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/905/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A Toast to Abstraction Layers</title>
		<link>http://jakob.engbloms.se/archives/888?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/888#comments</comments>
		<pubDate>Thu, 13 Aug 2009 19:41:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[gadgets]]></category>
		<category><![CDATA[general research]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[DAC 2009]]></category>
		<category><![CDATA[information hiding]]></category>
		<category><![CDATA[TheToasterProject]]></category>
		<category><![CDATA[Thomas Thwaites]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=888</guid>
		<description><![CDATA[I just found &#8220;The Toaster Project&#8220;, a Royal College of Art project where Thomas Twaites built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-890" style="margin: 10px 5px;" title="toaster" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/toaster.png" alt="toaster" width="81" height="87" />I just found &#8220;<a href="http://www.thomasthwaites.com/thomas/toaster/page2.htm">The Toaster Project</a>&#8220;, a Royal College of Art project where <a href="http://www.thomasthwaites.com/">Thomas Twaites </a>built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about the immense industrial complexity behind the very simple everyday items of our lives. I also think it has something to tell us computer scientists about abstraction.</p>
<p><span id="more-888"></span>What Thomas is showing is just how efficiently today&#8217;s economy manages to hide complexity from consumers (users). That toaster is just there on the shelf at a very low cost. If you take it apart, you will note that it is made from plastic which has been moulded to shape, and various bits and pieces of steel and copper wires. At that level, you feel that you could almost build it yourself. However, that is just the tip of the iceberg. What the toaster project reveals is the next level of abstraction and information hiding going on: that copper wire contains an enormously complex process in its making. From ore extraction, energy production to fuel the process, copper foundries, factories converting raw copper into wires, and a huge logistical machine to move things around.</p>
<p>In essence, we have a very nice example of information hiding and abstraction. As a user of the toaster, I do not need to understand how it works, and I do definitely not have any idea of the huge chain of suppliers leading up to its presence on my breakfast table.</p>
<p>That&#8217;s where are going with computers, but it is going to take time. Today, most users are fairly well shielded from how computers really work. Until they break down, at least. As programmers, we are less lucky. In practice, most good programmers end up understanding at least the basics of assembly language and the memory hierarchy of the machine.</p>
<p>What is hidden today is mostly the innards of the silicon. I have no real idea of how a processor works at the level of transistors and electrons. I don&#8217;t have to care about that, while any computer user fifty years ago probably had a decent understanding of the electronics. If nothing else, that was how you investigated hardware faults and actually built computers in a factory. Before integrated circuits, the electronic bits were much more exposed.</p>
<p>I think the current trend towards virtualization in the IT space and virtual platforms in the system design space is showing that the abstraction stack we are using in computing is getting deeper and more opaque. It takes some getting used to, but in the end, we have to realize that most computer programmers will be like the toaster user. All they want is a virtual toaster that toasts virtual bread in a way that lets them do their job.That is: write software that really does not care that much about the particulars of the hardware it is running on.</p>
<p>For the designer of a toaster (or even worse, the manufacturer of copper wire or the oil producer for the raw materials for the plastics), this takes some getting used to. We have to accept that in many cases, a simple abstraction is sufficient to help programmers get moving. There is no need for perfect timing accuracy or all the details of bus transactions. As long as what comes out is sufficiently similar to toast (a virtual toaster spitting out candy would be a bad abstraction), most users are happy.</p>
<p>Brian Bailey touched on this in a blog post following DAC, called &#8220;<a href="http://www.chipdesignmag.com/bailey/2009/07/30/accuracy-does-not-imply-accuracy/">Accuracy does not imply accuracy</a>&#8220;. Same idea as the toaster: you have to accept less detail, more abstraction, to get somewhere useful. Not everyone needs to go back to basics&#8230; and doing so tends to be counter productive in the end.</p>
<p>It is late now, but I think I will have toast and jam for breakfast tomorrow. Writing this got me hungry.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/888"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/888" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/888" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/888/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Driving an Old Canon Scanner using a VM</title>
		<link>http://jakob.engbloms.se/archives/842?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/842#comments</comments>
		<pubDate>Wed, 15 Jul 2009 18:43:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Canon]]></category>
		<category><![CDATA[LIDE30]]></category>
		<category><![CDATA[scanner]]></category>
		<category><![CDATA[USB]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Vista]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[XP]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=842</guid>
		<description><![CDATA[I have an old Canon LIDE 30 scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-843" style="margin-left: 5px; margin-right: 5px;" title="lide30" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/lide30.gif" alt="lide30" width="100" height="67" />I have an old <a href="http://www.canon-europe.com/For_Home/Product_Finder/Scanners/Flatbed/LIDE30/index.asp">Canon LIDE 30 </a>scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny way around this though, using a virtual machine.</p>
<p><span id="more-842"></span>What I ended up doing to keep using my scanner (whose hardware is still very much intact and solid) is fairly obvious: I installed my old Windows XP license on a VMWare virtual machine (I had the good luck to have a full license with physical media), and then install the Canon LIDE30 driver on that virtualized XP.</p>
<p>VMWare Player is sufficient to let me attach the physical scanner to the virtual machine&#8217;s USB interface, and drive it without the host Vista 64 machine being any the wiser. To get the scanned pictures out, I have to resort to drag-and-drop, as I have failed to get shared folders to work with Player for some unknown reason.</p>
<p>The end result can be pretty complex&#8230; To send some emails from my work computer including scans with this scanner, I had to:</p>
<ul>
<li> Scan on the virtual XP machine</li>
<li>Drag-and-drop to the Pictures folder on my Vista 64 machine</li>
<li>Use file-sharing in Windows to move to my work laptop</li>
<li>Attach in Outlook</li>
</ul>
<p>Workable. It is also a pretty good demo of the power afforded by modern consumer operating systems. Imagine trying to do that in 1995&#8230; would not have been quite as fun.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/842"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/842" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/842" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/842/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pulling the Virtual Ethernet Plug</title>
		<link>http://jakob.engbloms.se/archives/248?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/248#comments</comments>
		<pubDate>Wed, 27 Aug 2008 09:25:00 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ACM Queue]]></category>
		<category><![CDATA[bryan cantrill]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ethernet]]></category>
		<category><![CDATA[fault injection]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=248</guid>
		<description><![CDATA[I just read the panel interview at the start of the latest issue (Number 4, 2008) of ACM Queue. Here, you have Bryan Cantrill of Sun (the man behind dTrace) bemoan the difficulty of testing faults. In particular: Part of the reason I&#8217;m interested in virtualization is as a development methodology. It has not delivered [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-249" style="margin: 5px 10px;" title="acm-queue-logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/08/acm-queue-logo.gif" alt="" width="87" height="51" />I just read the <a href="http://mags.acm.org/queue/20080708/?pg=10&amp;pm=2">panel interview </a>at the start of <a href="http://mags.acm.org/queue/20080708/">the latest issue (Number 4, 2008) of ACM Queue</a>. Here, you have <a href="http://mags.acm.org/queue/20080708/?pg=14&amp;pm=2">Bryan Cantrill of Sun </a>(the man behind dTrace) bemoan the difficulty of testing faults. In particular:</p>
<blockquote><p>Part of the reason I&#8217;m interested in virtualization is as a development methodology. It has not delivered on this, but one of the things that I ask is can I use virtualization to automate someone pulling the Ethernet cable out of the jack? I can get a lot closer to simulating it if you let me create a toy virtual machine than I can running on the live machine.</p></blockquote>
<p>Well, <a href="http://www.virtutech.com/solutions/virtual_platform">this already exists</a>. It is a common feature to any virtual platform that is not a datacenter-oriented runtime engine like VmWare, Xen, LPAR, and its ilk. Doing fault injection is a primary use case for virtual platforms, especially for larger servers and systems featuring redundancy and fault tolerance.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/248"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/248" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/248" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/248/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>David Ditzel Interview at The Register/Semicoherent Computing</title>
		<link>http://jakob.engbloms.se/archives/116?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/116#comments</comments>
		<pubDate>Fri, 09 May 2008 16:10:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[heterogeneo]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Niagara]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[Rock]]></category>
		<category><![CDATA[Sun]]></category>
		<category><![CDATA[Transmeta]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=116</guid>
		<description><![CDATA[The Register has a few podcasts in addition to their website, and the one called &#8220;Semicoherent Computing&#8221; has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to their interview from September 2007 with David Ditzel of Transmeta fame. He had a lot to say about [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-115" style="float: left; margin: 10px;" title="Radio Register Logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/rtfm_logo.png" alt="TheRegister Radio Logo" width="48" height="48" />The Register has a few podcasts in addition to their website, and the one called &#8220;<a href="http://www.theregister.co.uk/hardware/semi_coherent/">Semicoherent Computing</a>&#8221; has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to <a href="http://www.theregister.co.uk/2007/09/21/scc_episode7_david_ditzel/">their interview from September 2007 with David Ditzel of Transmeta fame</a>. He had a lot to say about the history of computing, as well as interesting things on where computing is going. Well worth a listen! Particular interesting highlights&#8230;</p>
<p><span id="more-116"></span></p>
<ul>
<li>How Transmeta pointed Intel and the industry into the right direction of lower power computing.</li>
<li>How multicore processing in the mainstream world (not embedded) can be seen as a way to keep the size of processor dies up in order to keep up prices of processors and keep all the fabs busy. For someone like an Intel, if they cannot add new complexity to their processor cores (which is practically the case today in David&#8217;s opinion), they would be stuck making processors half the size with each process generation. And pretty soon they would only need one fab to push out all the processors the world would ever need. So by using multicore, they can fill all the space they have in their fabs, space that has to be filled to keep the economic engine of Intel running. They need to keep making things more complex and to keep the die size about the same in each generation.</li>
<li>How Sun did business in the old 68k days, and how they are now doing business in x86 land.</li>
<li>The Elbrus company that Sun acquired, who had a pretty innovative VLIW design, and whose leading lights are now at Intel after a tour through Sun and Transmeta.</li>
<li>The coming-back of accelerators even in server/desktop computing. David has a take on the Sun Niagara design as a potential accelerator for web work, running alongside a Rock processor in the same chassis. And talks about rumored IBM work to the same effect, using a massively threaded Power Architecture design as an accelerator for certain types of jobs together with the main Power6 processor in a pSeries server. This is another argument in my ongoing <a href="http://jakob.engbloms.se/archives/90">homogeneous vs heterogeneous a</a>rgument.</li>
</ul>
<p>So it is both history, computer architecture, and business history and assessment all in one. As I said, highly recommended!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/116"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/116" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/116" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/116/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VMM Detection Myths and Realities from a Simics and Embedded Perspective</title>
		<link>http://jakob.engbloms.se/archives/97?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/97#comments</comments>
		<pubDate>Sun, 20 Apr 2008 00:02:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Andrew Warfield]]></category>
		<category><![CDATA[HOTOS]]></category>
		<category><![CDATA[Jason Franklin]]></category>
		<category><![CDATA[Keith Adams]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Tal Garfinkel]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Timing attack]]></category>
		<category><![CDATA[Virtual machine detection]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=97</guid>
		<description><![CDATA[It must have been Google Alerts that send me a link to the HOTOS 2007 (Hot Topics in Operating Systems) paper by Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin called Compatibility is not Transparency: VMM Detection Myths and Realities. This paper is slightly less than a year old today, so it is old [...]]]></description>
			<content:encoded><![CDATA[<p>It must have been Google Alerts that send me a link to the <a href="http://www.usenix.org/events/hotos07/">HOTOS 2007</a> (Hot Topics in Operating Systems) paper by Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin called <a href="http://www.usenix.org/events/hotos07/tech/full_papers/garfinkel/garfinkel_html/">Compatibility is not Transparency: VMM Detection Myths and Realities</a>. This paper is slightly less than a year old today, so it is old by blog standards and quite recent by research paper standards. It deals with the interesting problem of whether a virtual machine can be made undetectable by software running on it &#8212; and software that is trying to detect it. Their conclusion is that it is not feasible, and I agree with that. The reason WHY that is the case can use some more discussion, though&#8230; and here is my take on that issue from a Simics/embedded systems virtualization perspective.</p>
<p><span id="more-97"></span></p>
<p>Their main important assumption is that the VMM cannot be tailored to avoid detection by any particular piece of software, but has to be sufficiently like the real thing to fool something the first time it appears. They discuss from the perspective of virtualization solutions like VmWare that aim at high performance before all else. The virtual PCs generated by VmWare, Parallels, KQemu, and others are all compatible with physical PCs &#8212; run the same software &#8212; but are not at all identical in detail. So they are not transparent in the words of the paper. This means that they are quite easy to spot.</p>
<p>There are some holes in functional differences that VMMs can quite easily plug. The paper shows how you can get a different-sized TLB (compared to the physical hardware), for example, from interference from the VMM. This can obviously be fixed in the VMM, at a cost in performance. The reason such differences are there is that VMMs are optimized for performance at almost any cost. As long as the requisite operating systems run as they should, the VMM is fine even if it is does actually correspond to any particular existing physical machine. This is a testament to the tolerance of modern operating systems towards their hardware. Basically, any OS that probes hardware and discovers what is there will work fine as long as the (virtual) hardware exposes devices that it can recognize. This is quite different from the 1970s or 1980s where an OS would definitely expect a very particular hardware setup with very peculiar timing to run at all. Thus, making a VMM totally identical to some physical machine is a waste of effort and performance.</p>
<p>Paravirtual approaches like Xen and what Sun has with Niagara and IBM on their Power servers, where the OS is rewritten by having drivers for a purely virtual hardware/software interface is an obvious generalization from the VmWare compatibility approach. Compatible versus transparent/invisible  virtualization is really only an issue in the x86 PC world, since all other datacenter architectures are virtual by definition and all operating systems work towards a standard virtual layer. In such an environment, I have hard time seeing that the question posed in the paper does even make sense. You are always virtualized, period.</p>
<p><strong>Embedded Virtual Platforms</strong></p>
<p>Anyhow, back to the main thread. There is still a large set of targets where transparency and compatibility are of interest. x86 PCs is one such target, it is an interesting question for older architectures (Alpha, Vax, Sun and IBM in older generations). In particular,  it is an important topic for embedded systems where you want to use virtual or simulated approaches to develop and test software. As part of that software development process on a virtual machine, you could potentially be examining malware of various kinds. A good not-too-hypothetical example are mobile phone viruses.</p>
<p>If we look at embedded system virtual platforms, the functionality of the simulator is usually more complete and more like a particular physical machine than what a VmWare-style datacenter VMM. This is partially due to embedded software stacks tending to be a bit pickier about what they run on, and partially due to the simple fact that the goal really IS to expose the hardware/software interface of a particular piece of hardware as closely as possible. Also, since this is usually cross-targets (Power Arch on x86, for example), there is no performance gain from using features of the host directly. So items like TLB counts, memory layout, memory content, flash memory programming, etc. are all going to be functionally identical to the physical machine.</p>
<p><strong>Timing is Key</strong></p>
<p>Thus, just like for a patched VmWare-style VMM as discussed in the article, the main attack vector remains <em>timing</em>.</p>
<p>The best way, according to the authors, to spot a VMM is to look for timing differences compared to the behavior on normal hardware. Despite the inherent variability of typical hardware, there are cases where VMMs by necessity vary detectable amounts. I would say this means a factor five or more over many tests of a case.</p>
<p>The authors discuss whether tools like Virtutech Simics could be used to overcome this problem in the context of x86 PCs.  I think the main argument for something like Simics for this purpose is that by simulating the entire hardware platform and providing all timing measurements from a strong virtual time base, you do not see the types of time differences that can be used to detect a &#8220;normal&#8221; VMM. However, since the paper considers Simics and SimNow (from AMD) to be about ten times slower than native hardware, you can always detect them using a non-local time source. That is likely true. But it less obviously true for an embedded target where the simulator running on a fast PC might well be just as fast as the target.</p>
<p><strong>The Multicore Timing Attack</strong></p>
<p>A more intriguing aspect of embedded virtual platforms that could be used to detect virtual platforms is how simulation of multicore machines is handled. For performance reasons, simulators use <em>temporal decoupling</em>,  where each virtual processors is run for a &#8220;long&#8221; time slice before switching to the next. We discussed the effect of this in a recent presentation at the multicore expo (<a href="http://jakob.engbloms.se/archives/89">link to previous blog post</a>), and some of that data is worth repeating.</p>
<p>Here is a slide explaining how temporal decoupling works:</p>
<p><img class="aligncenter size-full wp-image-105" style="vertical-align: middle;" title="temporaldecoupling-what-it-is" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/temporaldecoupling-what-it-is.png" alt="Illustration of temporal decoupling" width="500" height="375" /></p>
<p>So what does this mean in practice for detecting that you are running in a virtual machine?</p>
<p>It means that the communication latency between parallel threads is proportional to the size of the time slicing. If you have two threads progressing in parallel doing spinlocks, on a real machine they will be stealing the lock from each other all the time. On a temporally decoupled simulator, you will rather see a behavior where you can take the lock and then recapture it a few times before missing it. This effect was captured by a simple test program that we wrote, and the data is shown in the slide below:</p>
<p><img class="aligncenter size-full wp-image-106" title="temporaldecoupling-visible-disturbance" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/temporaldecoupling-visible-disturbance.png" alt="Visible disturbance from temporal decoupling" width="500" height="375" /></p>
<p>The program here is running two threads in parallel, updating a shared variable, with three types of locking for the accesses:</p>
<ul>
<li>No locking at all</li>
<li>A local lock to each thread being used (&#8220;fake locking&#8221;)</li>
<li>A proper lock</li>
</ul>
<p>The interesting behavior is the execution time of the program for each of these locking styles. Obviously, running with no lock is the fastest, and with proper locking the slowest. The relative speed of these is the factor to consider. On real hardware, this program observes a very steep increase in execution time when using proper locking. On the simulator, as seen above, the difference in execution time between fake locking and proper locking is significantly smaller when using a long time slice compared to when using a short time slice. The behavior on physical machines is much more like that observed at time slice lengths of ten than that at time slices of 10000.</p>
<p>Normally, a multiprocessor simulator with any ambition to be fast has to use a time slice of 1000 or more. Thus, detecting that you are running inside a simulator is quite simple. If the outside world time seems right, check if you can see strange timing behavior when using locks. Since high speed requires a long time slice, you cannot have both correct real-world timing and a large performance difference. And on the other hand, if the behavior with locking seems reasonable, you should check the real-world time &#8212; as a simulator with a short time slice will be way slower than the real world.</p>
<p>The paper authors note a similar aspect in desktop/server x86 VMM detection. They discuss &#8220;performance cliffs&#8221; that appear when doing &#8220;unusual&#8221; things. For example, VmWare is engineered assuming a minimum use of self-modifying code. Performance is much worse if you use it extensively, and this can be used to detect VmWare quite effectively. This effect is quite similar to the time slice effect in embedded virtual platforms.</p>
<p>Hope you enjoyed this fairly long rant. And we have not even begun exhausting the contents of this topic&#8230; luckily, these discrepancies only very rarely impact the usefulness of virtual platforms. Since most software even on an embedded system does not care about detailed timing like this. In the example above, we still see the lock contention. So we know that we are getting an increase in execution time from the lock. Only not a complete picture of what it means in absolute terms. We will still find missing locks and overused locks.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/97"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/97" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/97" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/97/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SICS Multicore Day 2007 &#8211; More on Programming</title>
		<link>http://jakob.engbloms.se/archives/20?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/20#comments</comments>
		<pubDate>Wed, 05 Sep 2007 19:16:08 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Autocoding]]></category>
		<category><![CDATA[Dataflow]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Fortran]]></category>
		<category><![CDATA[Prolog]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[X10]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/20</guid>
		<description><![CDATA[Some more thoughts on how to program multicore machines that did not make it into my original posting from last week. Some of this was discussed at the multicore day, and others I have been thinking about for some time now. One of the best ways to handle any hard problem is to make it [...]]]></description>
			<content:encoded><![CDATA[<p>Some more thoughts on how to program multicore machines that did not make it into my original posting from last week. Some of this was discussed at the multicore day, and others I have been thinking about for some time now.</p>
<p>One of the best ways to handle any hard problem is to make it &#8220;<a href="http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem">somebody else&#8217;s problem</a>&#8220;. In computer science this is also known as abstraction, and it is a very useful principle for designing more productive programming languages and environments. Basically, the idea I am after is to let a programmer focus on the problem at hand, leaving somebody else to fill in the details and map the problem solution onto the execution substrate.</p>
<p><span id="more-20"></span>This has been the eternal striving of programming language designers. We created FORTRAN to get out of assembly language, we added object orientation to reduce the amount of explicit data juggling, we designed PROLOG so that backtracking search was automatic. We have SDL and UML to generate code from graphical descriptions, and the model-driven architecture trend in automotive software. Always trying to remove implementation details from the solution description, to make it possible to tackle bigger problems with less work.</p>
<p>In the particular case of multicore and parallel computers, we have languages like <a href="http://www.erlang.org/">Erlang </a>that let you express a threaded parallel program without dealing with locks, shared memory, or other quicksand. Tim Mattson called the idea behind Erlang &#8220;shared nothing&#8221;, i.e., there is no shared mutable state and only explicit message passing between threads. This style of programming is eminently suitable to certain classes of problems. It is probably hard to code a high-performance game engine or a database in Erlang, but a control system or a telecom switch is natural. The key property is that an Erlang program is normally massively threaded by its very design, since threads are to Erlang what objects are to C++, Java, and its ilk.</p>
<p>What happens here is that the problem of how to schedule and coordinate the threads is handed on to the run-time system for Erlang. This is a huge win, since that run-time system can be created once by a small group of people, thoroughly debugged thanks to its comparatively limited size, and then used by all Erlang systems all over the world. Without changing the existing code, which is fairly amazing if you think of it. But Erlang is not really alone in applying the &#8220;smart runtime&#8221; principle to parallel problems.</p>
<p>The X10 language is really the same idea, if yet with more features in the language. Put most of the hairy stuff in a run-time system the programmer does not have to care about.</p>
<p>Game engines allow scripting the logic and tactics of individual bots or other non-player entities in a sequential fashion. This easily translates to a parallel program, since there are usually many such non-player entities in play at any particular time. The various entities can look at the world to make decisions, but that is easily done as read-only access, achieving the principle of no shared mutable state. The actions to be carried out can be queued and then managed in a coherent manner by the run-time engine. The engine is sure a hairy piece of work, but once it is done, the majority of the game design, coding, and tweaking can be done without much caring for the parallel execution of the code.</p>
<p>Another example is dataflow programs,Â  which are natural in domains such as control and streaming data processing. The compilation and run-time system has enough knowledge and sufficiently few constraints to be able to schedule individual small computation pieces on one or more processors, and stage the communications between them. Just like Erlang or games, the key issue here is the absence of shared mutable data.</p>
<p>I really think that this is the right way to tackle the era of multicore. Most problems can probably be expressed in a manner that is naturally parallel without involving explicit manipulation of shared data, and these programs can be parallelized using smart compilers and run-times rather than programmers&#8217; blood, sweat, and tears.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/20"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/20" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/20" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/20/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

