<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; virtual platforms</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/virtual/virtual-platforms/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Wind River Blog: Interview with a Networked Simics User</title>
		<link>http://jakob.engbloms.se/archives/1524?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1524#comments</comments>
		<pubDate>Wed, 16 Nov 2011 15:58:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[testing]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Dan Poirot]]></category>
		<category><![CDATA[RTI]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1524</guid>
		<description><![CDATA[There is a new post at my Wind River blog, an interview with Dan Poirot at RTI who is using Simics to model and test heterogeneous, distributed, networked systems. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, an <a href="http://blogs.windriver.com/engblom/2011/11/simics-for-distributed-systems-an-interview-with-dan-poirot.html">interview with Dan Poirot at RTI </a>who is using Simics to model and test heterogeneous, distributed, networked systems.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1524"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1524" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1524" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1524/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EETimes Articles on Simics</title>
		<link>http://jakob.engbloms.se/archives/1500?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1500#comments</comments>
		<pubDate>Fri, 23 Sep 2011 19:26:38 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[EETimes]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1500</guid>
		<description><![CDATA[I just had two articles published the Embedded Design part of the EETimes. First, &#8220;Rethink your project planning with a virtual platform&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png"><img class="alignright size-full wp-image-155" title="eetimes logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png" alt="" width="127" height="56" /></a><a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png"><img class="alignleft size-full wp-image-1501" style="margin: 5px 10px;" title="simics logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/simics-logo.png" alt="" width="44" height="44" /></a>I just had two articles published the Embedded Design part of the <a href="http://www.eetimes.com">EETimes</a>.</p>
<p>First, &#8220;<a href="http://www.eetimes.com/design/embedded/4226939/Rethink-your-project-planning-with-a-virtual-platform?Ecosystem=embedded">Rethink your project planning with a virtual platform</a>&#8220;, which talks about how virtual platforms can change your entire project planning. Essentially, by reducing project friction and risks related to hardware availability, software integration, and show-stopper bugs, you can make projects work much better.</p>
<p>Then we have &#8220;<a href="http://www.eetimes.com/design/embedded/4227781/Transporting-bugs-with-virtual-checkpoints?Ecosystem=embedded">Transporting bugs with virtual checkpoints</a>&#8220;, which is a shorter, popular science, version of the paper I published last year at <a href="http://jakob.engbloms.se/archives/1231">S4D</a>. This describes how you can use checkpointing in a virtual platform to communicate bugs across time, space, and teams.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1500"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1500" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1500" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1500/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Stop, Think, and Tie Your Shoes Right</title>
		<link>http://jakob.engbloms.se/archives/1492?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1492#comments</comments>
		<pubDate>Wed, 21 Sep 2011 18:09:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1492</guid>
		<description><![CDATA[There is a new post at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better. It is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /> <a href="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg"><img class="alignright size-full wp-image-1494" style="margin: 5px 10px;" title="shoes 2" src="http://jakob.engbloms.se/wp-content/uploads/2011/09/shoes-2.jpg" alt="" width="333" height="300" /></a> There is a <a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">new post </a>at my Wind River blog, which could seem to be about shoes but which is really about process improvement. In particular, the need for companies to let their employees take a step or two back and look at what they are doing and what they could do better.</p>
<p>It is way too common to be so busy running around being inefficient that there is no time to think about how to become more efficient. Change also requires some discipline to actually keep pushing at habits until they change for the better.</p>
<p><a href="http://blogs.windriver.com/tools/2011/09/stop-think-and-tie-your-shoes-right.html">All of this can be illustrated by tying shoes. </a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1492"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1492" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1492" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1492/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: A Virtual Year</title>
		<link>http://jakob.engbloms.se/archives/1483?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1483#comments</comments>
		<pubDate>Sat, 20 Aug 2011 07:25:59 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[hypersimulation]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1483</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about hypersimulation in virtual platforms and how it lets virtual time fly much faster than real time. It was the result of simple mistake of leaving Simics running in the background as I did other work on  my machine. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about <a href="http://blogs.windriver.com/engblom/2011/08/a-virtual-year.html">hypersimulation in virtual platforms and how it lets virtual time fly much faster than real time</a>. It was the result of simple mistake of leaving Simics running in the background as I did other work on  my machine.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1483"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1483" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1483" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1483/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: How to Get Virtual</title>
		<link>http://jakob.engbloms.se/archives/1474?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1474#comments</comments>
		<pubDate>Tue, 02 Aug 2011 12:10:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[development methodology]]></category>
		<category><![CDATA[Functional models]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1474</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about how you build virtual platforms with Simics. The post is more about the methodology than the nature of models, cycle accuracy, endianness, and all the other details of virtual platform modeling. I have written about modeling methodology on this blog too, and in particular I [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about how you <a href="http://blogs.windriver.com/engblom/2011/08/how-to-get-virtual-.html">build virtual platforms with Simics</a>. The post is more about the methodology than the nature of models, cycle accuracy, endianness, and all the other details of virtual platform modeling. I have written about modeling methodology on this blog too, and in particular I would recommend looking at &#8220;<a href="../archives/1317">Two perspectives on modeling</a>&#8220;.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1474"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1474" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1474" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1474/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Simics 4.6 Initial Impressions</title>
		<link>http://jakob.engbloms.se/archives/1428?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1428#comments</comments>
		<pubDate>Tue, 31 May 2011 12:35:05 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Simics 4.6]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1428</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about the new Simics 4.6 release. 4.6 has some serious new goodies in it, including an Eclipse source-code debugger and a way to build blinking lights front panels for boards. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about the <a href="http://blogs.windriver.com/engblom/2011/05/simics-46-initial-impressions.html">new Simics 4.6 release.</a> 4.6 has some serious new goodies in it, including an Eclipse source-code debugger and a way to build blinking lights front panels for boards.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1428"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1428" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1428" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1428/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Disappointing SystemC Debugger Integration Paper</title>
		<link>http://jakob.engbloms.se/archives/1419?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1419#comments</comments>
		<pubDate>Wed, 25 May 2011 19:35:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1419</guid>
		<description><![CDATA[Since I have a certain interest in debugging, I was happy find the article &#8220;Guidelines for SystemC &#8211; Debugger Integration&#8221; at the usually interesting Design and Reuse website. However, I must say that it was pretty disappointing. The key idea of the article is to put the debug service in a thread and the debugged [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/05/debug-small.png"><img class="alignleft size-full wp-image-1421" title="debug small" src="http://jakob.engbloms.se/wp-content/uploads/2011/05/debug-small.png" alt="" width="81" height="73" /></a>Since I have a certain interest in debugging, I was happy find the article <a href="http://www.design-reuse.com/articles/26457/guidelines-for-systemc-debugger-integration.html">&#8220;Guidelines for SystemC &#8211; Debugger Integration&#8221;</a> at the usually interesting Design and Reuse website. However, I must say that it was pretty disappointing.</p>
<p><span id="more-1419"></span>The key idea of the article is to put the debug service in a thread and the debugged SystemC system in another thread, and stop SystemC using a mutex. Yes, you have to do that.</p>
<p>But the really interesting part is how to connect the debugger into the virtual platform, and what that requires from the models and processors and the infrastructure. Unfortunately, the article is pretty silent on that. There is some talk of breakpoint handling required in the ISS, and how to update target memory that mostly corresponds to the debug interface of SystemC TLM-2.0 in scope.</p>
<p>Also, nothing about multicore debug and how to deal with temporal decoupling and debugging, or the need for repeatability across runs. Or breakpoints on things like hardware accesses and internal actions in the simulator.</p>
<p>Too bad.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1419"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1419" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1419" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1419/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: VxWorks 64-bit using Simics</title>
		<link>http://jakob.engbloms.se/archives/1396?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1396#comments</comments>
		<pubDate>Fri, 25 Mar 2011 19:52:36 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[64-bit]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VxWorks]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1396</guid>
		<description><![CDATA[There is a new post at my Wind River blog, about how Simics was used to kick-start the development of the 64-bit version of VxWorks. It is an interesting example of how to use a virtual platform as a model of something much simpler and gentler than actual hardware systems. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about how <a href="http://blogs.windriver.com/engblom/2011/03/kick-starting-an-os-port.html">Simics was used to kick-start the development of the 64-bit version of VxWorks</a>. It is an interesting example of how to use a virtual platform as a model of something much simpler and gentler than actual hardware systems.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1396"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1396" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1396" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1396/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EETimes: James Aldis on Performance Modeling</title>
		<link>http://jakob.engbloms.se/archives/1387?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1387#comments</comments>
		<pubDate>Thu, 03 Mar 2011 20:13:03 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[hardware design]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[James Aldis]]></category>
		<category><![CDATA[OMAP]]></category>
		<category><![CDATA[performance optimization]]></category>
		<category><![CDATA[TI]]></category>
		<category><![CDATA[Virtio]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1387</guid>
		<description><![CDATA[James Aldis of TI has published an article in the EEtimes about how Texas Instruments uses SystemC in the modeling of their OMAP2 platform. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png"><img class="alignleft size-full wp-image-1388" style="margin: 5px 10px;" title="TI logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png" alt="" width="80" height="76" /></a>James Aldis of TI has published an article in the <a href="http://www.eetimes.com">EEtimes</a> about how <a href="http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4212778">Texas Instruments uses SystemC in the modeling of their OMAP2 platform</a>. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual platform use of hardware designers, which is significantly different from the virtual platform use of software designers.<br />
<span id="more-1387"></span>For a software person like myself, this article offers a well-written  insight into the world of hardware design and bus optimization for SoCs.</p>
<p>TI deploys two totally different platforms for hardware and software development, which makes perfect sense.  The goals are so different between a high-speed software development platform and performance-accurate hardware design platform that trying to force them together would likely just create a bad compromise that is bad for everybody.</p>
<p>Additionally, FPGAs are used to create timing-dependent low-level code, where you need both timing accuracy and decent speed.  It is worth noting that the performance model is mostly &#8220;dataless&#8221; &#8211; it models the timing of actions and their dependencies, but not their values and computations.</p>
<blockquote><p>The different models serve different purposes, require different levels of effort to use, and become available at different times during the project. The SystemC performance model is always available first and is always the simplest to create and use. The virtual platform is the next to become available. It is used for software development and has very little timing accuracy.  TI uses Virtio technology to create this model rather than SystemC.</p></blockquote>
<p>Given the number of ultimately failed attempts I have seen at making timing and function available in the same model but as orthogonal concerns, this observation in the article is very insightful:</p>
<blockquote><p>It would appear the choice of two different technologies for the virtual platform and the performance model is inefficient, wasting potential code reuse. However, the two have completely different (almost fully orthogonal) requirements, and at module level almost no code reuse is possible.</p></blockquote>
<p>Maybe this is an impossible dream in the general case.</p>
<p>One somewhat surprising statement in the article is that there is no real software available to use in the SoC design phase. Often, virtual platforms are sold as being able to use &#8220;the real software&#8221; when designing hardware. But in the case of TI, the software is mostly written by their customers, with little available for TI to use. Thus, they are forced to design their own test cases to drive the hardware design process.</p>
<blockquote><p>The requirements on the simulation technology are first and foremost ease in creating test cases and models and credibility of results. The emphasis on test-case creation is a consequence of the complexity of the devices and of the way in which an SoC platform such as OMAP-2 is used: because the whole motivation is to be able to move from marketing requirements to RTL freeze and tape-out in a very short time; and because in many cases large parts of the software will be written by the end customer and not by the SoC provider (Texas Instruments, in this article), the performance-area-power tradeoff of a proposed new SoC must be achieved without the aid of &#8220;the software.&#8221;</p></blockquote>
<p>The platform they built is all based on clock-cycle-level interfaces (CC), which is very natural when the primary use case is hardware design.</p>
<p>The primary component optimized in the TI design process is the on-chip interconnect structure, called the &#8220;NoC&#8221; in the article. Each SoC variant is built from a set of (usually already existing) devices and processor cores. The main work of the integration is designing an appropriate NoC for the SoC. The NoC design is crucial to the actual performance level the final SoC product will have.</p>
<p>By playing with the topology, the level of concurrency, and the level of pipelining in the NOC, it&#8217;s possible to create SoCs from the same basic modules with quite different capabilities.</p>
<p>The only real instruction-set simulators used are CC-level models of DSPs, used for software optimization taking but contention into account. No models of the ARM control cores are used. Mostly, processors are represented by stochastic or trace-driven traffic generators that put transactions on buses but do not actually run any real code.</p>
<p>The stochastic processor models are very powerful and provide traffic that is very similar to a real processor.  A very elegant property of such models is that it is very easy to change the parameters of the model to model quite different software/processor scenarios. Compared to writing real test programs for a full ISS, this is much faster and allows for the exploration of more alternatives.</p>
<p>The stochastic models are used along side function-graph breakdowns of software, essentially models that say that an application does A, then B, then C, and that maybe D can happen in parallel. This model of an application is connected to the hardware simulation and can control when things happen and what goes on in parallel. It amounts to a simple model of what an RTOS would do, to some extent.</p>
<p>Configurability is a key theme throughout the OMAP architecture exploration platform. SystemC being what it is, it is limited to configuration at start-up time, but that is perfectly sensible for an architecture exploration use case where you want to setup and platform and test its performance. Dynamic reconfiguration during a run is not that important.  TI has spent a great deal of effort in making the system easy to configure using parameter files.</p>
<p>The article goes into many more fascinating details on the models used.  I can only say one thing: read it, if you have any interest in these kinds of issues.</p>
<p>Good work, James!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1387"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1387" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1387" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1387/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interrupts and Temporal Decoupling</title>
		<link>http://jakob.engbloms.se/archives/1384?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1384#comments</comments>
		<pubDate>Sun, 27 Feb 2011 21:09:17 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Tensilica]]></category>
		<category><![CDATA[virtual]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1384</guid>
		<description><![CDATA[I am just finishing off reading the chapters of the Processor and System-on-Chip Simulation book (where I was part of contributing a chapter), and just read through the chapter about the Tensilica instruction-set simulator (ISS) solutions written by Grant Martin, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>I am just finishing off reading the chapters of the <a href="http://www.springer.com/engineering/circuits+%26+systems/book/978-1-4419-6174-7" target="_self"><em>Processor and System-on-Chip Simulation </em></a>book (where <a href="http://blogs.windriver.com/engblom/2011/01/processor-and-soc-simulation-book.html">I was part of contributing a chapter</a>), and just read through the chapter about the <a href="http://www.tensilica.com">Tensilica </a>instruction-set simulator (ISS) solutions written by <a href="http://www.chipdesignmag.com/martins/">Grant Martin</a>, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS solutions, since that they have an inherently variable target in the configurable and extensible Tensilica cores. However, the more interesting part of the chapter was the discussion on system modeling beyond the core. In particular, how they deal with interrupts to the core in the context of a <a href="http://jakob.engbloms.se/?s=temporal+decoupling">temporally decoupled </a>simulation.</p>
<p><span id="more-1384"></span>This is a small detail, but one where I have always had a feeling that some fundamental assumption was missing in my discussions with various people from the hardware design community. It always seemed that hardware designers assumed a different basic design &#8211; and Grant Martin explained it very well just what that was. They only check for interrupts at the beginning of a time slice. Which makes interrupts less precise  versus the code, but also makes the core interpreter fairly simple since all it has to do is to churn through instructions.</p>
<p>There is another solution, which is employed in Simics, where the processor can take an interrupt at any point in a time quantum. To do this, the processor needs to be aware of what is going to happen. The essentials of the solution is to have devices call the processor and tell it that they intend to interrupt it at some point T in time. The processor simulator then makes sure to stop and give the device model a chance to act at that exact point in time. It is obvious that this solution is easily generalized to cover all time callbacks needed to drive device work. A significant part of the responsibility for running the event-driven simulation is moved into the processor core.</p>
<p>Making the event queue visible to the processor also gives the processor a chance to hypersimulate, or skip idle  time. Since it knows the next point in time that something will happen  (either the end of a time quantum or an event posted by a device), it  can very easily, safely, and <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">repeatably </a>jump forward in time without any  impact on simulation semantics.</p>
<p>When dealing with multiple processors, this means that each processor will have precise interrupts from the devices that are close to it. Timers and IO interrupts tend to work closely with a certain processor for a prolonged period of time. Interrupts between processors suffer a time-quantum delay sometimes, but that is no worse than the solution of checking all interrupts at time-quantum boundaries.</p>
<p>Qemu uses a solution which is a mix of the two. <a href="http://www.usenix.org/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf">According to the 2005 Usenix paper</a>, devices do call into the processor to announce an interrupt, but this is handled by &#8220;soon&#8221; returning to the processor main loop. Processors are not responsible for keeping track of interrupts, making it very imprecise and not very repeatable when interrupts will happen.</p>
<p>Thus, we can see that there are a few different ways to implement interrupts in virtual platforms. Each approach comes from a different tradition and features different trade-offs.</p>
<p>I was a bit surprised by the comment in the Tensilica chapter that only  correctly synchronized programs will work on a temporally decoupled  simulation. In my experience, temporal decoupling is transparent to software functionality &#8211; all software runs. The perceived timing of operations can be different, and some tightly-coupled code might behave in suboptimal ways, but it certainly runs and works. And lets you <a href="http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html">observe  parallel code errors</a>.</p>
<p>Temporal decoupling is necessary in any fast platform, and its effect on semantics are really minor. With the simple tweak of having a processor know when interrupts might happen, it will also not affect the device-processor interface very much, maintaining very tight synchronization between processors and their controlled hardware.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1384"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1384" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1384" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1384/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Evaluating HAVEGE Randomness</title>
		<link>http://jakob.engbloms.se/archives/1374?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1374#comments</comments>
		<pubDate>Thu, 17 Feb 2011 21:33:14 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[evaluation methodology]]></category>
		<category><![CDATA[HAVEGE]]></category>
		<category><![CDATA[random number generation]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1374</guid>
		<description><![CDATA[I previously blogged about the HAVEGE algorithm that is billed as extracting randomness from microarchitectural variations in modern processors. Since it was supposed to rely on hardware timing variations, I wondered what would happen if I ran it on Simics that does not model the processor pipeline, caches, and branch predictor. Wouldn&#8217;t that make the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/02/dice.png"><img class="alignleft size-full wp-image-1371" title="dice" src="http://jakob.engbloms.se/wp-content/uploads/2011/02/dice.png" alt="" width="86" height="88" /></a>I previously blogged about the <a href="http://jakob.engbloms.se/archives/1370">HAVEGE algorithm </a>that is billed as extracting randomness from microarchitectural variations in modern processors. Since it was supposed to rely on hardware timing variations, I wondered what would happen if I ran it on Simics that does not model the processor pipeline, caches, and branch predictor. Wouldn&#8217;t that make the randomness of HAVEGE go away?</p>
<p><span id="more-1374"></span>I got HAVEGE up on a Simics x86 target model with Linux pretty quickly, and ran the two provided tests. <em>Ent</em>, which is a quick entropy test, and <em>nist</em> which supposedly much more thorough.</p>
<p>To my surprise, they both said the randomness we got was totally acceptable. This would seem to invalidate the fundamental assumption of HAVEGE &#8211; that it needs to collect randomness from hardware in order to produce good-quality randomness. To try to understand a bit more of what was going on, I took at look at the execution using <a href="http://blogs.windriver.com/engblom/2010/05/analyzed.html">Simics Analyzer</a> (the dredd.motherboard.processor lines are the processors, and the orange part is the HAVEGE program, yellow is the kernel):</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/02/OS-scheduler-messing-with-haveged.png"><img class="aligncenter size-medium wp-image-1377" title="OS scheduler messing with haveged" src="http://jakob.engbloms.se/wp-content/uploads/2011/02/OS-scheduler-messing-with-haveged-300x128.png" alt="" width="300" height="128" /></a></p>
<p>Zooming in a bit:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/02/OS-scheduler-messing-with-haveged-closer-look.png"><img class="aligncenter size-medium wp-image-1378" title="OS scheduler messing with haveged closer look" src="http://jakob.engbloms.se/wp-content/uploads/2011/02/OS-scheduler-messing-with-haveged-closer-look-300x128.png" alt="" width="300" height="128" /></a>We can see that the program is regularly interrupted by the OS, which could be  a reason for random timing variations. The instructions run by the OS should vary in count, which would disturb the time stamp counter values read by the HAVEGE program. That could be sufficient to cause random variations, essentially showing that HAVEGE really works well just from OS noise &#8211; even in an otherwise idle machine.</p>
<p>However, at this point I started to have my doubts. Something did not feel right.</p>
<p>So I tried to remove all variations from the HAVEGE program. I replaced the &#8220;HARDTICKS&#8221; macro in HAVEGE with the constant 0 (zero) rather than reading the time stamp counter of the processor. This immediately failed the randomness test.</p>
<p>However, when I used the constant 1 (one) instead, the <em>ent </em>test passed. And even <em>nist </em>almost passed with only a single missed test out of the 426 tests executed.</p>
<p>Thus, the conclusion is that we do not know how well HAVEGE &#8216;s collection of hardware randomness works, since the evaluation software is too weak. In essence, we do not know if the collection of hardware randomness matters or not, as the proposed measurement hides the randomness behind a pretty good PRNG algorithm.</p>
<p>Ideally, we would need a measurement that would evaluate the predictability of the randomness generated. Or at least one that can correctly estimate the impact of the variation of low-level hardware timing on the quality of the final random numbers. Unfortunately, that is not the case here, throwing the entire idea into doubt.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1374"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1374" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1374" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1374/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Being Helpful or Being Correct?</title>
		<link>http://jakob.engbloms.se/archives/1366?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1366#comments</comments>
		<pubDate>Fri, 11 Feb 2011 08:12:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1366</guid>
		<description><![CDATA[   ]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" />There is a new post at my Wind River blog, about <a href="http://blogs.windriver.com/engblom/2011/02/being-helpful-or-simply-correct.html">warnings in virtual platforms</a>. It is an art to add good warnings to virtual platform models, and just being correct visavi the hardware behavior is not necessarily that helpful for a software developer. A virtual platform should warn about suspicious operations, even if they are technically &#8220;correct&#8221;.</p>
<p>I also have to apologize for the slow blogging in January of 2011. There was too much going on at work and quite a few days taking care of sick kids. Hopefully, the pace can improve going forward.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1366"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1366" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1366" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1366/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple Machine, Hard to Simulate</title>
		<link>http://jakob.engbloms.se/archives/1360?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1360#comments</comments>
		<pubDate>Sat, 01 Jan 2011 20:29:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Communications of the ACM]]></category>
		<category><![CDATA[George Phillips]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1360</guid>
		<description><![CDATA[In the June 2010 issue of Communications of the ACM, as well as the April 2010 edition of the ACM Queue magazine, George Phillips discusses the development of a simulator for the graphics system of the 1977 Tandy-RadioShack TRS-80 home computer.  It is a very interesting read for all interested in simulation, as well as [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/01/trs80-i-name.jpg"><img class="alignleft size-full wp-image-1361" style="margin: 5px 10px;" title="trs80-i-name" src="http://jakob.engbloms.se/wp-content/uploads/2011/01/trs80-i-name.jpg" alt="" width="76" height="100" /></a>In the <a href="http://mags.acm.org/communications/201006/?folio=52&amp;CFID=4598775&amp;CFTOKEN=51596944#pg54">June 2010 issue of Communications of the ACM</a>, as well as the <a href="http://queue.acm.org/detail.cfm?id=1755886">April 2010 edition of the ACM Queue magazine</a>, George Phillips discusses the development of a simulator for the graphics system of the 1977 Tandy-RadioShack TRS-80 home computer.  It is a very interesting read for all interested in simulation, as well as a good example of just why this kind of old hardware is much harder to simulate than more recent machines.</p>
<p><span id="more-1360"></span>You really should read the article to get the full story. The short summary is that while the basic principle of the graphics display is very simple to simulate, the effect of rewriting the display contents as it is being drawn on the CRT is quite difficult to get right.</p>
<p>I found this picture of the system online, for reference of what the graphics might look like:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/01/trs80-i.jpg"><img class="aligncenter size-full wp-image-1363" title="trs80-i" src="http://jakob.engbloms.se/wp-content/uploads/2011/01/trs80-i.jpg" alt="" width="570" height="375" /></a></p>
<p>The hardware of the TRS-80 works pretty much like my old ZX Spectrum did: a bit of memory is used to hold image data (single buffer), and then the video hardware simply reads this memory as the CRT scan goes by to display the right picture. There is no locking of the display memory during this time, so the processor can race the video hardware and modify the contents of a location in memory between scan lines. This is indeed done, and that&#8217;s what makes simulating the machine much much harder.</p>
<p>On the TRS-80, racing the scan makes it possible to (for example) increase the apparent vertical resolution of the display (since each graphics &#8220;pixel&#8221; actually consists of four vertical pixels that could not be individually addressed). On the ZX Spectrum, you could increase the <a href="http://en.wikipedia.org/wiki/ZX_Spectrum_graphic_modes">apparent vertical color resolution</a>. On the Commodore 64, I think you change the graphics palette during redraw to allow more colors to be displayed simultaneously.</p>
<p>For a simulator, this is pure pain. The result of graphics code written for those machines essentially depends on making a very precise simulation of the timing of all processor instructions, as well the behavior of the video hardware. As noted in the article, you need to model the memory access contention resulting from the processor writing as the display memory simultaneously with the video hardware reading it. You need to know the cycle count of each pixel, and the setup time between each row of pixels. To account for effects like dithering on a modern perfectly stable LCD display, you have to apply filters to the basic bitmap, simulating the fuzziness of the old cheap TVs that these computers used to drive. If you want to simulate the horrible tricks used to make music on the ZX Spectrum&#8217;s 1-bit sound output (as I recall it, you made noise by flipping a bit in an OUT instruction on the Z-80 CPU), you probably need to do a bit of analog waveform simulation.</p>
<p>Indeed, it seems to me that one enabler for today&#8217;s virtual platforms is that we have hardware that is not entangled in low-level timing like this. Instead, thanks to the variability of execution time in modern processors, hardware is asynchronous and tends to use interrupts or status bits in registers to report when a requested operation is complete. You do not see code of the type &#8220;do X, wait precisely Y cycles, and then do Z&#8221; anymore, and that really helps in changing the target system timing to enable fast simulation &#8211; which is what any fast virtual platform has to do, replacing a real processor with variable timing with an ISS with pretty simple instruction timing. It also means that hardware models can be simpler too, since they do not model how the hardware achieves its work, only what that work is. To loop back to the article prompting this blog post, in a modern virtual platform, all you would simulate is the specified effect of setting graphics bytes in memory. Not the incidental effect of how it is drawn onto a TV, scan-line by scan-line.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1360"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1360" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1360" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1360/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Modeling Endianness</title>
		<link>http://jakob.engbloms.se/archives/1336?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1336#comments</comments>
		<pubDate>Sun, 26 Dec 2010 15:58:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[big-endian]]></category>
		<category><![CDATA[endianess]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[little-endian]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1336</guid>
		<description><![CDATA[Endianness is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way. This blog post describes how I am used to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png"><img class="alignleft size-full wp-image-1337" style="margin: 5px 10px;" title="egg" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png" alt="" width="74" height="66" /></a><a href="http://en.wikipedia.org/wiki/Endianness">Endianness </a>is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way.</p>
<p>This blog post describes how I am used to working with endianness in virtual platforms, and why this approach makes sense to me. There are other ways of dealing with endianness, with different trade-offs and overriding goals.</p>
<h2><span id="more-1336"></span>Fundamentals</h2>
<p>What is endianness? In my way of looking at it, it is the arbitrary solution to the problem you get when a large unit of information (say, a 32-bit word) needs to be stored as a set of smaller units (say, 8-bit bytes). When this happens, you need to split the large unit into smaller units, and decide on how to order the smaller units. There is no objectively better or worse way to do this &#8211; as long as the result is unambiguous and based on positional numerics (i.e., no roman numerals, please), it is hard to claim that one order is better than another.</p>
<p>We use &#8220;endianness&#8221; all time without really thinking about it, when we write regular decimal numbers. In our <a href="http://en.wikipedia.org/wiki/Hindu_numerals">standard </a>base-10 decimal writing system, any value &gt;9 has to be written down using multiple digits. The order we use is a big endian representation: the most significant numbers come first in our reading order (hundreds before tens before single digits, etc.).</p>
<p>In computer architecture, we have three main schools of endianness:</p>
<ul>
<li>No endian, where we never break things down to bytes but always operate on equal-size words (not very common in practice, but certain machines like the Microchip PIC have instruction ROMs as wide as the instructions, and no way to address components of the intructions)</li>
<li>Big endian, BE, where the most significant bytes are put first in order of ascending addresses. I.e., the &#8220;big end&#8221; comes first.</li>
<li>Little endian, LE, where the least significant bytes are put first</li>
<li>&#8220;Middle endian&#8221;, where the ordering differs for different sizes of data (<a href="http://en.wikipedia.org/wiki/Endianness">Wikipedia </a>mentions this, but I have never seen an example). I have heard stories about chips that also used different endianness to store data by different instructions (by misdesign, I am not referring to the Power Architecture load/store byte-reversed instructions).</li>
</ul>
<p>BE is the traditional choice of IBM and the major early RISC chips, with Power Architecture, MIPS, SPARC, and the zSeries as the most important representatives. LE is the choice of x86, and more recently ARM. MIPS also seems to be gravitating towards LE, probably as a way to make x86 software slightly easier to port. Note that even though some processor cores are described as endianness-neutral, that really means that they can run as either LE or BE. In practice, particular chip designs incorporating such cores tend to lean heavily towards one endianness, since devices are designed for a particular endianness.</p>
<h2>The Software View</h2>
<p>For me, the most important view of endianness is how the software sees it. When a program is running on any current architecture, it logically sees memory as an array of bytes. Inside the memory chips, we have a very different physical layout, usually with words much wider than a byte, as well as an addressing scheme that is not one-dimensional. The interconnect (&#8220;bus&#8221;) moving data from a processor to memory and back is a complex system containing caches, buses of different widths (usually 64 bits or more), memory controllers, cache controllers, bus bridges, and other devices. All of this is usually completely invisible to software, as illustrated below:</p>
<p style="text-align: center;"><img class="aligncenter" style="margin-top: 5px; margin-bottom: 5px;" title="endianness 1" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-1.png" alt="" width="504" height="389" /></p>
<p style="text-align: left;">Basically, the bus system is invisible. The important endianness property as far as software is concerned is the order in which bytes are put into memory, and memory is considered as an array of bytes (since a byte is the smallest unit of addressing). If you look at the memory of a computer system using a debugger, this is the view you will get &#8211; both for on-target and off-target debuggers like ICE units and JTAG debuggers. Each memory access (store or load) will logically pass a small array of bytes into some position in the very large array that is memory.</p>
<h2 style="text-align: left;">The Modeling View</h2>
<p style="text-align: left;">Modeling endianness is not optional when building a virtual platform. The software will at some point assume a certain relationship between word layouts and byte addresses in memory (such as overlaying a byte array on an integer in a C union), or when interpreting network packets (which are defined to use BE byte order, and therefore network code has to convert values to native endian to process them).</p>
<p style="text-align: left;">If you start from the software view of endianness and memory, the obvious simulation model for memory operations is to maintain the array of bytes view of memory matching the physical target.</p>
<ul>
<li>Each memory access from a simulated processor gets turned into a transaction in the simulator.</li>
<li>The transaction has variable size, matching the size of the memory access operation issued by the processor.</li>
<li>The transaction contains a sequence of bytes, in the same order as they would end up in target memory on a physical machine. I.e., the order reflects the endianness of the processor.</li>
<li>The transaction has a starting address (byte-based) matching the memory access the processor issues.</li>
<li>The contents of the memory model in the simulation is an array of bytes, and its content matches what you would find on the physical target &#8211; the logical software view of the target.</li>
<li>The bus system connecting the processor to the memory is basically considered as a black box that just moves the transaction to memory.</li>
</ul>
<p>The above is very easy to implement, and actually a very convenient implementation for someone used to the software view of hardware. The only thing that remains to be considered is how a processor simulator is implemented in practice.</p>
<p>In a typical processor simulator, you represent the target system registers using words of the same size as the target processor uses. I.e., for a 32-bit processor, you use 32-bit words on the host to represent the contents of a register. As the processor model is running, the contents of the register might have to stored in data structures internal to the processor (such as an array of words representing the register file). Naturally, such data structures are kept in host endianness since they are just plain compiled C code. As the processor model runs, arithmetic is carried out using host endianness.</p>
<p>Actually, usually no endianness is involved as the values are considered as words. Remember that a word does not have endianness until it is broken down into bytes and someone actually looks at the bytes. In particular, an operation like</p>
<pre>uint8  a;
uint32 b;
a = (b &amp; 0xff)</pre>
<p>will pick up the 8 lowest bits of a word on any processor. The code is logically working inside of registers and is perfectly portable. However, the result of</p>
<pre>uint32 *c;
*c = b;
a = *((uint8 *)c);</pre>
<p>will pick up the first (at the lowest address) byte stored in memory when b was written &#8211; which is the same as the above on an LE processor, but different on a BE processor. The crucial observation here is that the latter variant contains an explicit store of a word, and an explicit load of a byte. Thus, endianness enters as we store the word (the byte load has no endianness, as it is loading the smallest unit of addressability).</p>
<p>What this means is that a processor simulator will have to do an explicit ordering of bytes as it is writing out values to memory. The simulator will need to take a word it has represented in &#8220;host order&#8221; (as it is within the simulator itself) and convert it to the byte order of the target processor. If the two match, such as simulating a little-endian ARM target on a (always little-endian) x86 host, nothing needs to be done. If they do not match, such as simulating a big-endian PPC target on an x86 host, the bytes have to be swapped before being sent to simulated memory.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png"><img class="aligncenter size-full wp-image-1340" title="endianness 2" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png" alt="" width="422" height="368" /></a>When the processor does a load, it similarly has to swap the bytes being read from memory (if using different target and host endianness).</p>
<p>As soon as we leave the processor simulator, the order of bytes in transactions and simulated memory has to defined and managed in a host-independent way. This is crucial to enable  snapshots of memory to be <a href="http://blogs.windriver.com/engblom/2010/08/transporting-bugs-with-checkpoints.html#more">shared across hosts, time, and space</a>, and simply to allow the simulation to work correctly. The semantics of the simulation must be defined by the simulator, not by the nature of the host.</p>
<p>Note that as an optimization, quite often we do not create an explicit transaction, but rather use the optimization of letting the processor simulator write directly to the representation of the target memory in the memory simulator. In this design, the target memory representation is just an array of bytes mirroring the contents that the processor would see on a physical target.</p>
<p>Let&#8217;s go through this with a simple example. We assume we are on an x86 host. Our processor simulator contains a 32-bit register with the value 0&#215;01020304. This value is endianless until we have to send it to simulated memory, it is just a value of 32 bits. We write it to target memory at address 0&#215;100</p>
<p>On a simulated LE target, the memory write will result in a transaction containing the byte sequence (0&#215;04, 0&#215;03, 0&#215;02, 0&#215;01) &#8211; lowest byte comes first. The memory model will store this with 0&#215;04 at address 0&#215;100, 0&#215;03 at 0&#215;101, etc. The processor model can achieve this effect by simply doing a host-native word store to the memory array.</p>
<p>On a simulate BE target, the memory write will result in a transaction containing (0&#215;01, 0&#215;02, 0&#215;03, 0&#215;04). In memory, 0&#215;01 will be stored at address 0&#215;100, 0&#215;02 at 0&#215;101, etc. To store this word correctly, the processor model will have to do a byte swap operation on the word before writing it out to memory. Such a byte swap operation might seem expensive, but the evidence does not indicate that it matters. All the fastest instruction-set simulators use this method internally as far as I know (Wind River Simics, Imperas OVP, Qemu, IBM Mambo), which to me indicates that the design works well on a simulation system level.</p>
<h2>Device Models</h2>
<p>Device models are the main part of a functional simulator for a computer system. They also have endianness, as they expose memory-mapped interfaces to software. To deal with devices in a consistent manner, they will interpret inbound memory transactions using their local register endianness. This makes it simple and reliable to simulate systems where the processor and the devices have different endianness.</p>
<p>Systems with mixed device endianness is very common, mostly thanks to PCI. PCI is defined to use little-endian byte ordering in all memory accesses, as it originated in the x86 world. PCI is still being used in almost all computer systems, and thus LE PCI devices are being connected to BE processors.</p>
<p>Internally, a device model will also use words to represent data. When data is written to a device, it will interpret the bytes in the write transaction using its local order. When data is read from a device, it will fill in the data in the read transaction using its local order.This makes device drivers that byte-swap incoming data from an LE PCI device on a BE processor work just like they do on physical hardware.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png"><img class="aligncenter size-full wp-image-1341" title="endianness 3" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png" alt="" width="473" height="414" /></a>This makes endianness a local property of the device. The same device model can be used without change in both an LE and a BE target system. This mirrors reality: PCI devices are used in all kinds of systems, and the devices do not change, and neither do the models have to.</p>
<p>In some systems, the designers try to hide the RISC-processor-to-PCI endianness mismatch by making the hardware swap bytes around as they move from the memory bus into the PCI subsystem. If this is the case in a target system, the simplest simulation method is to insert an byte-swapping intermediary on the path from the processor to the devices. This will do an extra byte swap on all transactions passing by, and things will work correctly (note that this byte swap has to be defined to work on a certain word length, and if transactions are bigger than this length, you will also have to order the words).</p>
<p>Note that as long as all units involved on the path from a device to a processor use the same word length, you can replace all the byte swapping operations with a simple flag. This flag will indicate if a transaction has been swapped or not. For example, when we have a BE processor talking to a BE device, on an LE host. The BE processor will flag the transaction as &#8220;wrong-endian&#8221; as it sends it out but actually store the bytes in LE order in the transaction. The BE device will check the flag and realize that it is wrong-endian too. And since two wrongs make a right, it does not have  to swap the bytes either but can copy the transaction contents directly into its internal registers.</p>
<h2>Dealing with Data</h2>
<p>There are other things you want to do with a memory image in a virtual platform apart from reading and writing it from a processor. One particular task is to move data into and out of memory model in order to load code and data, as well as to save the state of the system. The representation of a memory as an array of bytes works very well for this approach, since it corresponds naturally to how software files are created on the host. Since most software files are intended to be loaded by the target into target memory, they are prepared in target byte order. Another advantage of using a byte-based memory representation is that file formats like ELF can be loaded straight into virtual memory without having to convert addresses.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png"><img class="aligncenter size-full wp-image-1344" title="endianness 5" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png" alt="" width="495" height="395" /></a>The representation is also host-independent, which facilitates moving memory images from one host to another, a key part of <a href="http://jakob.engbloms.se/archives/1235">using virtual platforms as a communications mechanism</a>. Another benefit of viewing memory as an array of bytes as accessed from a processor is that debuggers can look at memory in the same way as they would when running on the same host.</p>
<h2>Summary</h2>
<p>This long post (WordPress tells me it is more than 2500 words) really only starts to scratch the surface of this fascinating topic. It has described one approach to endianness modeling, and some of the subtleties involved. There are many more subtleties that we could go into.</p>
<h2>Footnote: SystemC TLM-2.0</h2>
<p>There are other ways to model endianness. In particular, the approach described here is not used in the SystemC TLM-2.0 standard. In TLM-2.0, all data is stored in a transaction in <em>host</em> order, not target order. To model the target endianness, you instead change a descriptor array that tells the simulator about how to interpret the bytes when viewed from the target.</p>
<p>As I see it, this means that TLM-2.0 is better suited for modeling the ins and outs of a bus system, including discovering how data ends up at a target from the actions of the various components of the bus system. It models byte lanes and the width of buses, and uses host byte order for all transfers of data. In contrast, the approach described in this blog post works by modeling the documented (or intended) effect of the hardware at the software level.</p>
<p>Overall, I would say that TLM-2.0 is slightly more geared towards the &#8220;<a href="http://jakob.engbloms.se/archives/1083">design&#8221; use of modeling, rather than &#8220;describe</a>&#8220;. By modeling bus widths, actual byte lanes, and other concepts, the simulator will discover the shape and endianness of data as it arrives at a target memory or device.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1336"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1336" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1336" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1336/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel SystemC Simulation</title>
		<link>http://jakob.engbloms.se/archives/1327?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1327#comments</comments>
		<pubDate>Fri, 26 Nov 2010 19:08:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Christopher Schumacher]]></category>
		<category><![CDATA[CODES]]></category>
		<category><![CDATA[ISSS]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[parallelized software]]></category>
		<category><![CDATA[Rainer Leupers]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1327</guid>
		<description><![CDATA[I just found a recent paper on the topic of parallel simulation of computer  systems. Christopher Schumacher et al., published an articles at CODES+ISSS in October of 2010 talking about &#8220;parSC: Synchronous Parallel SystemC Simulation on Multicore Architectures&#8220;. Essentially, parallel SystemC. This is very much a hot topic: for the past few years, everyone has [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png"><img class="alignleft size-full wp-image-735" style="margin: 5px 10px;" title="gears" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png" alt="" width="56" height="57" /></a>I just found a recent paper on the topic of parallel simulation of computer  systems. Christopher Schumacher et al., published an articles at <a href="http://www.public.asu.edu/~ashriva6/esweek2010/codesisss2010/">CODES+ISSS in October of 2010 </a>talking about &#8220;<a href="http://doi.acm.org/10.1145/1878961.1879005">parSC: Synchronous Parallel SystemC Simulation on Multicore Architectures</a>&#8220;. Essentially, parallel SystemC.</p>
<p><span id="more-1327"></span></p>
<p>This is very much a hot topic: for the past few years, everyone has been looking for ways to run various forms of simulators in parallel. We had some good discussions on this only last Wednesday at a seminar at KTH where I was presenting about Simics.</p>
<p>The approach taken in this paper is different from what you find being done in tools like Simics (as I briefly discussed at <a href="http://jakob.engbloms.se/archives/1023">MCC 2009 </a>and <a href="http://jakob.engbloms.se/archives/246">SiCS Multicore Days 2008</a>). They do not exploit <a href="http://jakob.engbloms.se/archives/97">temporal decoupling</a> or islands with different local time. Instead, they have a single global clock in the entire simulation, and just parallelize the work that is done during each cycle.</p>
<p>The key for this to be beneficial and practical is that the work done per cycle is far greater than the cost to drive the simulation forward &#8220;between&#8221; cycles. In a high-level TLM model where the work per cycle might be as small as a single host instruction (JIT translation of a simple integer instruction from target to host), it is obvious that this approach would not work at all. However, this work explicitly targets clock-cycle-level simulations, where the work per cycle per hardware unit can be very large. The paper discusses actions that take 1000 to 2000 host cycles per step, and at that level of effort, there is definitely some potential for parallel gain.</p>
<p>What is nice with the approach is that they do peg semantics to a sequential reference, which does aid debugging. Due to the very tight synchronization, it would seem to be deterministic, at least on the same host (the SystemC kernel can theoretically behave differently on different hosts).</p>
<p>They do have one example that is simulating a shared-memory multiprocessor using temporally decoupled CPU models (100 target cycles per invocation, probably 1000 to 10000 host cycles). This achieves fairly neat speedups on a very symmetric case. However, this comes at the cost of making the simulation nondeterministic &#8211; even for the single-threaded case which is pretty scary.</p>
<p>Overall, an interesting paper showing that there is more to be discovered in parallel simulation.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1327"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1327" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1327" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1327/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two Perspectives on Modeling</title>
		<link>http://jakob.engbloms.se/archives/1317?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1317#comments</comments>
		<pubDate>Fri, 19 Nov 2010 22:04:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[virtual platforms]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1317</guid>
		<description><![CDATA[When I started learning about virtual platforms after joining Virtutech back in 2002, the guiding principle of our team was very much one of &#8220;model just enough to make the software happy &#8211; and no more&#8221;.This view was fairly uncontested at the time, and shared (implicitly or explicitly) by everybody developing virtual platforms from a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/construct.png"><img class="alignleft size-full wp-image-1318" title="construct" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/construct.png" alt="" width="96" height="96" /></a>When I started learning about virtual platforms after joining Virtutech back in 2002, the guiding principle of our team was very much one of &#8220;model just enough to make the software happy &#8211; and no more&#8221;.This view was fairly uncontested at the time, and shared (implicitly or explicitly) by everybody developing virtual platforms from a software perspective. There is a second perspective, though, from the hardware design world. From their viewpoint, a model needs to be complete. Both views have their merits.</p>
<h2><span id="more-1317"></span>The Software Perspective</h2>
<p>The modeling philosophy of tools like Simics (in which I include tools like Qemu, IBM Mambo, IBM Cecsim and innumerable efforts to simulate various old computers to run their software) takes the perspective of a software developer: as long as the software has something to run on that works, the completeness of the model when compared to a real machine is fairly uninteresting. I described it like this in my Embedded Systems Conference 2008 talk on virtual platforms:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/bluffing-in-modeling.png"><img class="aligncenter size-full wp-image-1319" title="bluffing in modeling" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/bluffing-in-modeling.png" alt="" width="394" height="370" /></a>It is a bit tongue-in-cheek but captures the essential spirit.</p>
<p>When the software you are interested in running runs, the work is done. Any effort spent on too much depth or breadth is wasted, as it adds no real value to the end user. Obviously, this is not a matter of black-or-white, all or nothing. The definition of &#8220;enough&#8221; is very context-dependent.</p>
<p>In practice, the reason that this philosophy was adopted was the customers for the simulators (virtual platforms) were concerned about software development for standard chips that they were buying from outside parties. These chips are rarely perfect fits for a particular system, but tend to contain a superset of the functionality. In this way, the same chip can be used in many different systems, providing economies of scale for all parties involved.</p>
<p>This meant that the original specification for a virtual system would include some units that would not be modeled. In other units, only certain operation modes would be modeled (it is surprising just how many different ways conceptually simple things like Ethernet controllers or serial ports can be used). The final result would be a model like this:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-1.png"><img class="aligncenter size-full wp-image-1320" title="OEM 1" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-1.png" alt="" width="388" height="215" /></a>The A, B, etc. boxes are various subunits or operation modes of the hardware units. The system is an example, and any resemblance to any real chip, living or dead, is not intentional. We will make use of all categories in the legend later.</p>
<p>The key point is that some parts of the chip are left for later. The model is sufficient to run the software and solve the customer&#8217;s problem.</p>
<p>When a second target system comes along using the same basic chip,  it is usually necessary to fill in some of the gaps in the original model. Extensions could be caused by using a different operating system that drives the hardware units in a different way, an application that actually uses some previously unused features of the hardware, or upgrades to the software that make more aggressive use of advanced hardware operating modes.</p>
<p>The net result would be similar to this:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-2.png"><img class="aligncenter size-full wp-image-1321" title="OEM 2" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-2.png" alt="" width="303" height="218" /></a>In this example, the new OS used a watchdog timer (W) that was not used by the previous OS. We are making use of more features in the Ethernet driver. The new application only uses a single CPU core to run, leaving the second core idle. After modeling the pink pieces, we are still far from a &#8220;complete&#8221; model &#8211; but we have made two customers happy and have (hopefully) hundreds of software developers hacking away delivering software.</p>
<p>Working in this way, a model is quickly extended to meet each successive customer&#8217;s need. There are likely parts that never get modeled as they are never needed. That might indicate that they never got used in practice &#8211; or, more likely, that there are other users of the chip that never requested a virtual platform for it. Which brings us to the second view of modeling, the hardware perspective.</p>
<h2>The Hardware Perspective</h2>
<p>If you are a hardware vendor, you tend to have a different view of what a virtual platform is all about. You want to equip your customers and partners with a virtual chip, and you have to assume that at least someone will be using every feature and unit of your fantastic new chip. From a hardware vendor, users expect a virtual copy of the hardware &#8211; not just a useful subset.</p>
<p>For the first user above, the resulting system state would be this:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/Semi-1.png"><img class="aligncenter size-full wp-image-1322" title="Semi 1" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/Semi-1.png" alt="" width="304" height="215" /></a>We have large parts which are modeled but not used. This represents a waste from the perspective of this user. But it is a waste that does not hurt the user (we assume that unused units do not slow the simulation down).  If we look across more uses, the waste is much less. In use the second example use above, there no need for any additional modeling:</p>
<h2><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/semi-2.png"><img class="aligncenter size-full wp-image-1323" title="semi 2" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/semi-2.png" alt="" width="299" height="213" /></a>Synthesis</h2>
<p>So far, I have presented two different perspectives. One is the one that I have been living for a long time and one its opposite. In a panel debate at a conference, it makes for an excellent topic. Both sides can claim to be right, and claim that the other side is incorrect, uninformed, dangerous, or just stupid. Great fun and a great show&#8230; but also cause for severe misunderstanding and friction between proponents for the two modeling traditions, and maybe not the best way to move forward as n industry.</p>
<p>There is always room for compromise and synthesis. Let&#8217;s look at a stylized illustration of the modeling effort spent over time in these two approaches:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-vs-Semi-vs-time.png"><img class="aligncenter size-full wp-image-1324" title="OEM vs Semi vs time" src="http://jakob.engbloms.se/wp-content/uploads/2010/11/OEM-vs-Semi-vs-time.png" alt="" width="486" height="335" /></a>The red blobs represent effort, and the arrows show when different users get the platform they need. The bottom case corresponds to the hardware perspective on modeling, and the top to the software perspective.</p>
<p>In this example, users 1 and 2 would get models faster by the step-by-step modeling preferred by the software perspective. User 3 would be the same, and user 4 shows that by having a complete model, there is no delay for additional modeling as new users sign up. What I want to show is that there is no absolute best-in-all-cases modeling strategy: it all depends on the circumstances.</p>
<p>The diagram is potentially a bit misleading&#8230; it is maybe not entirely reasonable to have the modeling effort for a new hardware start at the same time as use-case driven modeling. Since the hardware is new, there are probably no users for it yet. More likely, software-perspective use-case-driven modeling is applied when the hardware is already complete, sold, and designed into an OEM system. The hardware-perspective completeness-driven modeling is much more applicable in a presilicon setting, where the virtual platform is used to design-in and enable early software development.</p>
<p>Still, the two approaches are not completely incompatible. Even in a presilicon hardware-driven setting, it is often possible to start to deliver partial platforms early. Key customers tend to know what they need and do not need from a future hardware platform, and are often quite willing to get something started early even if not all pieces are there yet.</p>
<p>To port the core of an operating system, not all peripheral devices need to be in place. To create a virtual platform that can talk to a network, Ethernet is needed &#8211; but the support for Serial Rapid IO or PCIe for a rack backplane can be delivered later. In this way, an eventually complete hardware-perspective virtual platform can be delivered in increments that minimize software developer waiting.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1317"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1317" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1317" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1317/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: &#8220;Virtual Basil Fawlty&#8221;</title>
		<link>http://jakob.engbloms.se/archives/1292?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1292#comments</comments>
		<pubDate>Wed, 20 Oct 2010 07:59:00 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Basil Fawlty]]></category>
		<category><![CDATA[fault injection]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Wind River]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1292</guid>
		<description><![CDATA[Last week, I posted a discussion about fault injection in virtual systems, using Basil Fawlty as the perfect example of a fault injection agent. Tweet]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/10/184829-1_m1.jpg"><img class="alignleft size-full wp-image-1294" style="margin: 10px 5px;" title="John Cleese as Basil Fawlty" src="http://jakob.engbloms.se/wp-content/uploads/2010/10/184829-1_m1.jpg" alt="" width="100" height="121" /></a>Last week, I posted a <a href="http://blogs.windriver.com/tools/2010/10/the-virtual-basil-fawlty.html">discussion about fault injection in virtual systems</a>, using Basil Fawlty as the perfect example of a fault injection agent.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1292"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1292" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1292" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1292/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Virtual vs Physical Systems</title>
		<link>http://jakob.engbloms.se/archives/1285?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1285#comments</comments>
		<pubDate>Wed, 06 Oct 2010 11:19:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Wind River]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1285</guid>
		<description><![CDATA[I have a post at my Wind River blog, about the difference between virtual and physical systems. The key idea is this: Comparing virtual and physical systems is like comparing apples and apples, not apples and oranges: while apples are mostly interchangeable, they is certainly variation between them. Some apples are best for eating, some [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="Wind River Logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>I have a post at my Wind River blog, about <a href="http://blogs.windriver.com/engblom/2010/10/physical-or-virtual.html">the difference between virtual and physical systems</a>. The key idea is this:</p>
<blockquote><p>Comparing virtual and physical systems is like comparing apples and   apples, not apples and oranges: while apples are mostly interchangeable,  they is certainly  variation between them. Some apples are best for  eating, some are better for making  sauce, some are pie material, and  some are best for fermenting cider. The type you  select depends on what  you want to cook. The difference between physical and virtual hardware  is similar: they can be used as replacements for each other to some  extent, but the connoisseur can make much better use of both by looking  at the differences.</p></blockquote>
<p>Go there now and read i!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1285"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1285" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1285" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1285/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>S4D 2010</title>
		<link>http://jakob.engbloms.se/archives/1251?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1251#comments</comments>
		<pubDate>Wed, 15 Sep 2010 08:02:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Debug]]></category>
		<category><![CDATA[ESCUG]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[Infineon]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[John Aynsley]]></category>
		<category><![CDATA[Pat Brouillette]]></category>
		<category><![CDATA[S4D]]></category>
		<category><![CDATA[Simon Davidmann]]></category>
		<category><![CDATA[Southampton]]></category>
		<category><![CDATA[ST]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Thorsten Grötker]]></category>
		<category><![CDATA[TrustZone]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1251</guid>
		<description><![CDATA[Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and the breaks, and I think we all got new insights and ideas.</p>
<p><span id="more-1251"></span></p>
<h2><a href="../wp-content/uploads/2010/09/P1140077.jpg"><img class="aligncenter size-full wp-image-1276" title="P1140077" src="../wp-content/uploads/2010/09/P1140077.jpg" alt="" width="400" height="258" /></a></h2>
<h2>S4D Talks, Themes, and Topics</h2>
<p>More is available in &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D part 2</a>&#8220;.</p>
<h3>Tracing and Instrumentation</h3>
<p>The papers presented covered a wide variety of topics from a variety of angles. Still, everybody felt that two topics kept coming back in various forms in a majority of the papers and discussions: <em>tracing</em> and <em>instrumentation</em>.</p>
<p>Code instrumentation is not a dirty word anymore. The traditional judgment that inserting probes into your software is plain bad does not apply anymore, at least not in the minds of the people at S4D. Instrumentation was applied to drivers, OS kernels, and regular user-level software. I think the key insight is that there is clear value in having the developers that write a piece of software also mark points of interest in the code. When analyzing a trace of an execution, that means that the information in the trace becomes meaningful to the software developers, as it is on the right level of abstraction. Instrumentation naturally produces traces, which can be fed out using  shared memory, networks, special-purpose hardware, and more.</p>
<p>One of the instrumentation trace solutions presented (the SVEN system from Intel Digital Home presented by Pat Brouillette), actually leaves the instrumentation in place in the shipping customer systems. In this way, you cannot really claim that instrumentation is intrusive &#8211; it is just part of the software, always. Customers can even activate the tracing in deployed systems, and ship the traces back to the developers for analysis of bugs found in the field. It is another approach to <a href="http://jakob.engbloms.se/archives/1231">record and replay</a> that touches on my paper on transporting bugs with checkpoints.</p>
<p>The increased interest in instrumentation probably has something to do with the nature of the systems that are being addressed. For systems using shared memory multicore hardware and general-purpose operating systems, the cost of instrumentation is easier to take than for very small constrained embedded systems. Essentially, as systems get more complex, instrumentation becomes more tractable.</p>
<p>Instrumentation can interact with hardware trace and debug functions is a neat way to build a system which is more powerful than a hardware or software system would be on its own. Especially for software stacks involving hypervisors and multiple complex operating systems, that is likely necessary.</p>
<p>Once we have a trace, just <a href="../archives/942">like last year</a>, we need to have tools for analyzing the tons of data you get from tracing a modern system. ST talked about a tracing system that generated 100s of gigabytes of data.</p>
<p>One trace aspect that kept coming up was the need for <em>time stamps </em>on trace data. To reconcile multiple traces and understand how different concurrent units talk to each other, a global time stamping mechanism is crucial. There seems to be work on hardware to support this.</p>
<h3>Security, Secrecy, and Debug</h3>
<p>I moderated a panel on hardware support for debug, and posed the question on how to balance security and the need to debug. This generated a number of interesting answers from the panel and the audience.</p>
<p>The conflict between debuggability and secrecy is there. Even from the same customer you first get &#8220;you have to make the internal state of the controller inaccessible and hidden to avoid customers modifying their engines&#8221;&#8230; and then when a problem appears in the field, they ask for a way to analyze and trace that very same system. Hard to support both requirements in a reasonable way.</p>
<p>A sophisticated solution to debug security from companies like ARM, Infineon, and ST is debug that can be enabled using key exchange. The chips are built with a &#8220;locked door&#8221; in place, but the keys to the door are kept well-guarded. In this way the same chip can be used in development and in the field.</p>
<p>To support debug of systems involving secure modes like ARM TrustZone, ARM has defined several levels of access in their CoreSight hardware modules. This makes it possible for a debugger to be restricted to just debugging user-level code, just OS and user-level code, or all of the software stack. To me, this sounds like it could allow mobile phone manufacturers to &#8220;securely&#8221; let their application developers use hardware-based debug, without compromising operating systems or secure boot modes.</p>
<p>The classic technique of using fuses to turn off functions is also relevant, at least for systems with moderate levels of security. This can certainly be overcome using special tools to peel off the top of chips and reconnect the fuses, but the panel seemed to think that that level of attack was in general not worth protecting against. However, the audience pointed out that  this was actually being done to automotive engine controllers and there are people making a good living from such antics.</p>
<h3>ESCUG Meeting</h3>
<p>The ESCUG meeting was a mix of fairly slick commercial presentations from OVP/IMperas chief Simon Davidmann and SystemC guru John Aynsley, and research presentations of varying quality.</p>
<p>One thing that struck me was that the academics spent a significant time in all presentations about how their approaches were compatible with the existing SystemC structure, where they host their open-source efforts, etc. I guess that is good in that they show a certain concern for reality &#8211; but it is also a bit sad that they did not get time to actually talk that much about the core ideas they were bringing forward. I am personally much more interested in new ideas than infrastructure and project management. It does not bode well for European research if this is what people are forced to produce, in lieu of real innovation.</p>
<h3>Thorsten Grötker&#8217;s Keynote</h3>
<p>On Wednesday morning, Thorsten from Synopsys did a look back over the history of SystemC, free from product pitching. He only mentioned Synopsys in his introduction, where the high-level message was that the embedded software is really the key problem for industry today. I cannot disagree with that.</p>
<p>During the SystemC parts of his talk he did say a few things that I did not quite agree with&#8230; in particular that TLM was unknown prior to 1999. It was not called that, but it certainly existed in the field of full-system simulation. The main problem is that Thorsten only sees the EDA history of modeling, not the computer architecture and software-driven work that did simulations as far back as 1950 (the famous Gill paper), and fast simulation since at <a href="http://jakob.engbloms.se/archives/130">least 1967</a>.</p>
<p>He also claims that with SystemC you have a single language for both detailed and TLM models. That is true&#8230; but you still need multiple models, one at each level of abstraction. So yes, one language, multiple models. However, that gluability really comes with a performance and complexity cost. It makes it too easy to slip into bad modeling even in TLM.</p>
<p>An interesting theme that Thorsten picked up from John&#8217;s talk at ESCUG is the use of SystemC to model software and RTOS, using the upcoming process control extensions. If you stretch that into the area of software synthesis, it means that SystemC is going to collide with the field of model-driven software development. Will you use SystemC, coming from the hardware world, or UML/MATLAB/Domain-specific languages coming from the software world?  Thorsten makes the interesting point that in order to integrate with that world, SystemC will require some concepts from that world (like pins and clocks enable interaction with RTL). I am not sure that is true, necessarily, I think you can just as well create point adaptors to the same effect.</p>
<h2>Getting to Southampton</h2>
<p>The <a href="http://www.soton.ac.uk/">University of Southampton </a>hosted the event, and it took place in the university lecture halls.  That means that we got free very fast WiFi (unlike any commercial conference venue I have ever seen).  The university campus was full of services (unlike the desolate place that last year&#8217;s FDL/S4D choose).  Housing in the <a href="http://www.soton.ac.uk/accommodation/halls/gleneyre/index.html">Glen Eyre residential halls </a>was a bit spartan but functional. Felt like being back in my days as a student living in student housing.</p>
<p>The instructions from the conference about how to get to the conference was a bit confusing and incomplete. In practice, it is very easy to get to Southampton from both Gatwick (direct train) and Heathrow (NationalExpess bus 203).  At Heathrow, I had a bit of luck with the bus to Southampton. The instructions from the NationalExpress website had me believe that I had to get from Terminal 5 where we landed to the central bus station and then catch the bus at 15.00. As we landed 40 minutes late (14.40), this looked very hopeless&#8230; until I found the NationalExpress counter in the arrivals hall at Terminal 5 and they told me the bus would leave at 15.30. Nice, no stress. The bus to Southampton even had free Wifi on board!</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062.jpg"></a><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg"><img class="aligncenter size-full wp-image-1275" title="P1140062-1" src="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg" alt="" width="400" height="246" /></a></p>
<p>Once in Southampton, you then had to take the bus U1A out to the university campus, and finding a bus stop for that was the most difficult part of the journey, actually. Some of the buses from Heathrow stop at Southampton university.</p>
<p>See also &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D Part 2</a>&#8221; for a few more tidbits from S4D.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1251"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1251" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1251" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1251/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Additional Notes on Transporting Bugs with Checkpoints</title>
		<link>http://jakob.engbloms.se/archives/1231?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1231#comments</comments>
		<pubDate>Wed, 15 Sep 2010 05:38:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[S4D]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1231</guid>
		<description><![CDATA[This post features some additional notes on the topic of transporting bugs with checkpoints, which is the subject of a paper at the S4D 2010 conference. The idea of transporting bugs with checkpoints is some ways obvious. If you have a checkpoint of a state, of course you move it. Right? However, changing how you [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" style="margin: 5px 10px;" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>This post features some additional notes on the topic of transporting bugs with checkpoints, which is the subject of a paper at the <a href="http://www.ecsi.me/s4d">S4D </a>2010 conference.</p>
<p>The idea of transporting bugs with checkpoints is some ways obvious. If you have a checkpoint of a state, of course you move it. Right? However, changing how you think about reporting bugs takes time. There are also some practical issues to be resolved. The S4D paper goes into some of the aspects of making checkpointing practical.</p>
<p><span id="more-1231"></span>In particular, we need the checkpoints to be:</p>
<ul>
<li>Portable &#8211; so that checkpoints can be copied around between computers</li>
<li>Deterministic- so that everyone opening a checkpoint sees the same behavior</li>
<li>Compact &#8211; so that they can actually be moved around without incurring undue pain</li>
<li>Differential &#8211; so that a checkpoint can build on previous state and just contain a set of changes, not the entire state of the target system</li>
</ul>
<p>Most of my paper is spent on how to make checkpoints small enough to be easily transported, and how it fits with development workflows. The requirements above would seem to be common sense, but there are checkpointing systems out there that do not fulfill them. In particular, the portability aspect is hard to get right.</p>
<p>There are other ways to achieve transportation of bugs, and this blog post will fill in on some related work that I could not fit into the paper or which I discovered only after the final version of the paper was submitted.</p>
<h3>Record-Replay Systems</h3>
<p>There seem to be boundless creativity in creating methods to record live systems and replay their inputs/outputs/internal behavior/other interesting behavior on another system to support debug or analysis or other tasks. It shows just how important the replication of bugs is to the development of systems, and just how hard it is to accurately capture a bug in practice.</p>
<p>The company called <strong>Zealcore </strong>was doing some interesting work in software-based recording of &#8220;only the relevant events&#8221;, and then replaying this on a lab machine. Their angle on the problem was to have software record a minimal trace of important events on a live system, and then control the runtime system in a lab to replicate the event trace. Making this efficient and precise was the subject of a sequence of research papers in the early 2000s. Zealcore was acquired by Enea in 2008, and I have not seen much from them since. From what I can tell, the Zealcore fundamental technology for recording on a live system (or at least the ideas) have been continued into a new company called <strong><a href="http://www.percepio.se/">Percepio</a></strong>.</p>
<p>Aa fundamental difference between these recording systems and checkpointing systems is that they do not capture the complete target system state in the way a checkpoint does. The recording is much more compact, but it does not really solve the same problem. It is not based on running the target inside a simulator (other than at the replay end). What the relative success of such recording system indicates, however, is that in many systems, there are &#8220;important&#8221; and &#8220;irrelevant&#8221; aspects of inputs and events and behaviors, and that recording and replaying only &#8220;important&#8221; aspects is often sufficient to trigger bugs.</p>
<p>You can also throw hardware at the problem.</p>
<p>Completely unexpectedly, I also found a reference to a hardware-based record/replay system in a <a href="http://cacm.acm.org/magazines/2010/8/96632-an-interview-with-edsger-w-dijkstra/fulltext">Communications of the ACM interview with Edsger Dijkstra</a> (a rerun of an <a href="http://www.cbi.umn.edu/oh/pdf.phtml?id=296">interview from 2002</a>). Apparently, during the early programming of the <strong>IBM 360</strong>, IBM realized that debugging interrupts was hard. The solution was to create a piece of special hardware which would record interrupts, and later replay them with precise timing. In this way, you achieved repeatable executions of the most difficult code there was. I must quote what Dijkstra says on this &#8220;throw money at the problem&#8221; approach:</p>
<blockquote><p>When IBM had to develop the software for the 360, they built one or two machines especially equipped with a monitor. That is an extra piece of machinery that would exactly record when interrupts took place and from where to where. And if something went wrong, it could replay it again and use the recorded history to control when interrupts would occur. So they made it reproducible, yes, but at the expense of much more hardware than we could afford. Needless to say, they never got the OS/360 right.</p></blockquote>
<p>The final comment is typical for Dijkstra&#8217;s thinking that debugging is just an indication that you did not get the program and design right from the start. That&#8217;s certainly true, and he would likely have considered my little S4D paper as an unnecessarily complicated solution to a problem that should not have existed in the first place.</p>
<p>I, however, find the idea of the monitor interesting. I think that building something like that today would be much more difficult, as chips are very highly integrated and the support for replaying interrupts would have to go right into the heart of an SoC. But it would be interesting if it could be done.</p>
<p>There is also a <a href="http://jakob.engbloms.se/archives/130">paper from 1969 that I wrote about a few years ago </a>that does include the idea of recording and replaying asynchronous external inputs to a simulator.</p>
<h3>Other Checkpointing Systems</h3>
<p>There might be some related use of checkpoints (or snapshots as they are more commonly known) in the development of game emulators.  There is clearly the ability to save game state in a portable way in emulators like MAME.  Such states can be useful to help debug the emulator, but in a different way from the approach that I presented.  In the emulator case, the state is really the state of the emulated target.  It is not the state of the emulator program itself. If game emulator snapshots were used to debug the game code, it would be the same situation as what I describe in the S4D paper.</p>
<p>As I understand it, this is more like a attaching an example document that makes a program crash to a bug report, rather than transporting the state of the emulator itself.</p>
<p>Going down in the level of abstraction, I have also been told that RTL simulators offers a similar ability and that they have used in a similar way. Since I am not at all familiar with that field, I would not comment on this in the paper.</p>
<p>Transporting RTL bugs using checkpoints makes perfect sense. In an RTL simulator, the target state is very clearly described in an unambiguous way with no  relationship to host state. Checkpointing should be easy to implement and checkpoints should be portable, anything else would be a poor implementation.  The simulation is also deterministic, assuming a reasonable implementation of the simulator. The simulated world is also encapsulated with a set of test cases, RTL simulations are too slow to be interfaced to the real world. If an RTL simulator is interfaced to something else, recording the incoming signals should be straightforward since they are at a very low level (bits, clocks, pin states).</p>
<p>The use of checkpointing with RTL also fits with a conversation I had in 2005 when Virtutech introduced reverse execution in Simics. At one of the tradeshows where we showed the technology, an older gentleman approach me and told me that he had done similar things with hardware simulators back in the 1980s. He immediately understood the implementation idea (checkpoints with deterministic replay), and sounded like he felt it was nothing much new.</p>
<p>Finally, at some other event last year, I saw an demonstration of an RTL-level tool where the trace of the execution was generated on one machine, but inspected on a different machine. That amounts to a portable trace, even if the data volumes were rather large (many GB) and essentially required the RTL simulator (or hardware accelerated emulator) to be sharing disks with the investigation machine. Still, nothing prevents such a solution from being remotely used. The main difference from what I describe is that here only the result of the execution (trace of signals) is transported, not an actual state snapshot that can be brought up to continue the execution in a different place.</p>
<p>If you have any other notes on this, please comment!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1231"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1231" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1231" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1231/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

