<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; computer simulation technology</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/virtual/simulation-tech/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Tue, 27 Jul 2010 19:57:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Pipeline Performance Simulator Anno 1960</title>
		<link>http://jakob.engbloms.se/archives/1126?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1126#comments</comments>
		<pubDate>Mon, 03 May 2010 19:56:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[Frederick Brooks]]></category>
		<category><![CDATA[Harwood Kolsky]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[IBM 7030]]></category>
		<category><![CDATA[ISCA]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[Tensilica]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1126</guid>
		<description><![CDATA[I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article &#8220;Stretch-ing is Great Exercise &#8212; It Gets You in Shape to Win&#8221; by Frederick Brooks (the man behind the Mythical Man-Month) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/05/4506VV3073.jpg"><img class="alignleft size-full wp-image-1128" style="margin: 5px 10px;" title="IBM Stretch panel" src="http://jakob.engbloms.se/wp-content/uploads/2010/05/4506VV3073.jpg" alt="" width="83" height="79" /></a>I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article &#8220;<a href="http://dx.doi.org/10.1109/MAHC.2010.26">Stretch-ing is Great Exercise &#8212; It Gets You in Shape to Win</a>&#8221; by Frederick Brooks (the man behind <a href="http://en.wikipedia.org/wiki/Mythical_man_month">the Mythical Man-Month</a>) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM created a simulator of the pipeline for the <a href="http://en.wikipedia.org/wiki/IBM_Stretch">IBM 7030 &#8220;Stretch&#8221; computer </a>developed from 1956 to 1961 (<a href="http://www-03.ibm.com/ibm/history/exhibits/vintage/vintage_4506VV3073.html">photo from IBM.com</a>).</p>
<p><span id="more-1126"></span></p>
<p>For those unfamiliar with the Stretch machine, it was a supercomputer developed by IBM which introduced many of the performance techniques and basic computer technologies that we all use today (most of them handed down to us via the IBM System/360). For example, it was the first to use 8-bit bytes and 64-bit floating point. It also introduced memory protection, memory interleaving, and instruction prefetching.</p>
<p>More relevant for my blog is the fact that the Stretch used the world&#8217;s first pipelined main processor, complete with interlocks to maintain program-order semantics. When developing this pipeline, Frederick Brooks claims that IBM developed a program to simulate the pipeline. This simulator was used to test the performance of the pipeline design on various test programs (this was before they were called benchmarks), and tune the design accordingly. The simulator was created by <a href="http://archive.computerhistory.org/resources/text/FindingAids/102658131.Kolsky.pdf">Harwood Kolsky</a>. There is no firm date for the pipeline simulator, but based on the development time of the Stretch, it can be dated somewhere around 1960.</p>
<p>Thus, the simulation-driven approach to computer architecture is about 50 years old by now. Should have gone to ISCA and used this as an excuse for a party I guess&#8230;</p>
<p>It is also interesting to note that the Stretch computer acquired a co-processor in 1962, to do cryptology work. This machine was the one-off <a href="http://en.wikipedia.org/wiki/IBM_7950">IBM 7950 &#8220;Harvest&#8221; </a>and was tailored for the needs of the NSA in the US. It was a seriously special-purpose hardware unit adding a few instructions to the Stretch machine, and beating any other machine at the time by about 50 to 200 on the particular NSA workloads.  Sounds like the kind of performance claims that Tensilica and other application-customized processors claim. 50 years ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1126/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Blog at Wind River!</title>
		<link>http://jakob.engbloms.se/archives/1121?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1121#comments</comments>
		<pubDate>Thu, 29 Apr 2010 19:14:03 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Wind River]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1121</guid>
		<description><![CDATA[One of the many nice effects of the Wind River acquisition of Simics is that I will be blogging as part of the Wind River Blog network. My first post there is up now, and it is a short (at least compared to a textbook, I admit it looks terribly long for a blog post) [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="button-quicklink-blogs" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>One of the many nice effects of the Wind River acquisition of Simics is that I will be blogging as part of the Wind River Blog network. <a href="http://blogs.windriver.com/engblom/2010/04/what_is_simics_really.html">My first post there is up now</a>, and it is a short (at least compared to a textbook, I admit it looks terribly long for a blog post) overview of how Simics works inside.</p>
<p>I think it is important for users of technologically advanced tools to know a bit of how they work. A classic example of this is compilers, where I taught an ESC class almost a decade ago which is my most <a href="http://jakob.engbloms.se/archives/750">popular piece of writing to date</a>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1121/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FFast: Good Idea, Too Bad About the Implementation</title>
		<link>http://jakob.engbloms.se/archives/1114?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1114#comments</comments>
		<pubDate>Sun, 11 Apr 2010 19:23:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[Antoine Trouvé]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[FFast]]></category>
		<category><![CDATA[Kazuaki Murakami]]></category>
		<category><![CDATA[RAPIDO]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1114</guid>
		<description><![CDATA[I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the RAPIDO 2010 workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be [...]]]></description>
			<content:encoded><![CDATA[<p>I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the <a href="http://www2.lifl.fr/rapido/Rapido/Program.html">RAPIDO 2010</a> workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be used to build a fast instruction-set simulator, and in the extension, a full system simulator.</p>
<p>To me, this idea is worth exploring, since using a mature VM like the .net CLR (used in this paper) or a JVM would offer a shortcut to get high-quality code generation for a JIT compiler. It could also offer other benefits, as these environments support many advanced configuration and management features. I have touched on this topic before, in the posts &#8220;<a href="http://jakob.engbloms.se/archives/1008">Dream ESL Language</a>&#8221; (VM as the basis for a simulator) and &#8220;<a href="http://jakob.engbloms.se/archives/264">The JVM as Universal Parallel Glue</a>&#8221; (that a common VM can  offer huge benefits for an ecosystem).</p>
<p><span id="more-1114"></span></p>
<p>In the paper, the authors show how they have built an ISS for MIPS which runs at 1 MIPS in basic interpretive mode, but at up to 225 MIPS in the most optimized mode. Decent performance on a 2.6 GHz Core 2, but still an order of magnitude compared to the fastest commercial offerings available. However, I think these numbers are not particularly interesting or relevant.</p>
<p>First of all, they only check the performance on basic user-level programs with no I/O, since there is nothing but the CPU present and they thus cannot run an operating system. This makes the numbers essentially &#8220;peak&#8221; numbers, for small programs, which is not particularly realistic. Second, their implementation does not go straight to bytecodes, but rather to C# code. This is not how a high-performance solution would work, as it is obvious that they struggle to get performance even on these small benchmarks using that approach. Too much effort seems to be spent on gaming the C# runtime system and compiler, in my opinion.</p>
<p>Thus, the paper does not really reveal anything useful in terms of &#8220;is building a JIT ISS using a VM a viable idea&#8221;? It probably tells us that it is not necessarily a broken idea, but there is a lot of work to bring the solution up to the level of native C-based solutions.</p>
<p>It is clear that the implementation effort in this case is lower, and that the porting cost to new hosts is also very low, compared to the native C-based approaches used in current industrial solutions. C# should also be more productive than using something like C++ for building a software system.</p>
<p>The most interesting aspect of the idea, and one which the authors do not explore at all, is using the power of the .net CLR to build a dynamic full-system simulator. Using the CLR, it should be trivial to build a solution where hardware models can be separately compiled and loaded dynamically at runtime. Using .net &#8220;properties&#8221;, it might be possible to support user inspection of a running system. Maybe the .net programming tools offer some really interesting possibilities for the debugging of full-system simulators. However, none of this is currently explored, which is a real shame. I guess I could hope that the authors read this short critique  and get some more ideas for future work, as I really think that virtual platforms could be built on top of virtual machines.  That idea is worth exploring in research.</p>
<p>On the nitpicking side, as always when reviewing academic papers in this blog, the authors seem to be unaware of some very relevant previous work. In particular, they should have mentioned Qemu and Simics. They could have used Qemu for MIPS as a point of comparison to compare speeds between their approach and a native C approach. As it is right now, the reference list looks like a fairly random walk around the DAC and DATE communities, but with little insight into the actual virtual platform or full-system simulation tools available today.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1114/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Describe is not the same as Design</title>
		<link>http://jakob.engbloms.se/archives/1083?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1083#comments</comments>
		<pubDate>Mon, 15 Feb 2010 20:56:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1083</guid>
		<description><![CDATA[The discussion on my previous blog post about &#8220;the ideal ESL language&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there. On one hand, we have the [...]]]></description>
			<content:encoded><![CDATA[<p>The discussion on my previous blog post about &#8220;<a href="http://jakob.engbloms.se/archives/1008">the ideal ESL language</a>&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there.</p>
<p>On one hand, we have the task of supporting the design of new hardware bits, for the purpose of creating it. On the other hand, we have the task of describing a particular design for the purpose of simulating it. These two are not necessarily the same.</p>
<p><span id="more-1083"></span>To use an <a href="http://jakob.engbloms.se/archives/1035">analogy with building a house</a>, a design language helps the architect create the house (piece of hardware). Since the architect relies on craftsmen and experts (compilers) to do detailed design (how to put in windows, where to put light switches, etc.), the high-level description does not contain all the details of the house. However, if you are trying to simulate the house (piece of hardware) so that its inhabitants (software) don&#8217;t see the difference to the real thing, the details are sometimes what matters most. For example, the precise way to operate the stove in the house is very important for familiarity, but is a detail most likely left out of the architect&#8217;s initial drawings.</p>
<p>A design language can leave many things unspecified to be filled in by a compiler, but these things can be absolutely core to a description language. In particular, programming register maps tend to be created as a not-too-important side activity in hardware design. They do not really need to be visible in higher-level ESL languages, as they can obviously be filled in later by a tool or a human. But for a description language, they are absolutely core.</p>
<p>A description language can also leave out many parts of the hardware. If the software being used or written does not use certain modes or functions of a piece of hardware, those pieces can be ignored and implemented as dummies. That means that support for dummies is very important in description languages. But dummies make little sense in a design language, as you are unlikely to design a chip with lots of area spent on dummy functions that do nothing.</p>
<p>A description language can also ignore crucial aspects like power constraints and synthesis constraints. These are guidelines for a compilation step that has no bearing on the description of the hardware &#8212; the description language should describe what ended up happening, not the if, please, what, and buts that guided how we got there.</p>
<p>For virtual platform creation, you seem to need a bit of both. I maintain that most of a VP is based on old hardware that exists, which calls for languages with strong description abilities. That&#8217;s the space that <a href="http://jakob.engbloms.se/archives/99">Simics DML </a>was designed for. For the small part of the hardware that is novel would be nice to have some way to convert from a design language to a virtual platform. Here, I don&#8217;t really see any usable current tools or languages &#8212; SystemC is really more a design language, but if you want a virtual platform model, you have to use it as a description language. There is no automagic getting to a fast abstract model from a design-oriented description. That&#8217;s why we need new, higher level systems, that can push out decent descriptions from a design.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1083/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Dream ESL Language</title>
		<link>http://jakob.engbloms.se/archives/1008?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1008#comments</comments>
		<pubDate>Fri, 27 Nov 2009 19:51:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[FDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1008</guid>
		<description><![CDATA[This post is a belated comment on the FDL 2009 conference that I attended some months ago. I have had some things in mind for a while, but some recent podcast listening has brought the issues to front again. What has been striking is the extent to which FDL was about languages only to a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg"><img class="alignleft size-full wp-image-881" style="margin: 5px 10px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" /></a> This post is a belated comment on the FDL 2009 conference that I attended some months ago. I have had some things in mind for a while, but some recent podcast listening has brought the issues to front again. What has been striking is the extent to which FDL was about <em>languages </em>only to a very small degree. Compared to programming-language conferences like PLDI, there was precious little innovation going on in input languages, and very little concern for the <em>programming </em>aspects of virtual platform design and hardware modeling.</p>
<p><span id="more-1008"></span>Walking to and from the conference from my hotel, I listened through a <a href="http://www.twit.tv/floss79">FLOSS Weekly interview </a>with David Heinemeier Hanson, the creator of Ruby on Rails. His approach to programming and languages was quite unlike that exposed at FDL. In his world, anything that is repeated in code should be put into the language or library. In Ruby, that is easier than in many other cases, as the language can be extended arbitrarily without recompiling the VM. His focus on programmer productivity and convenience is in stark contrast to the FDL discussions which mostly dealt with how to simulate things in a single language, SystemC. Quite boring from a programming language perspective.</p>
<p>Another podcast that triggered thoughts on programming and how to improve it using languages was <a href="http://itc.conversationsnetwork.org/shows/detail4291.html">Stackoverflow Episode 73. </a>In the listener questions section, the topic of language evolution came up. Joel and Jeff pointed out that C# is a glowing example of a language that quickly evolves and adds useful features, including things from the field of dynamic languages. Quite interesting. They made the crucial point that backwards compatibility in a language is not really needed, as long as you can link code compiled from the old and the new languages together. So, if C# 3.0 won&#8217;t compile all C# 2.0 code, it is no big deal, as you can still have the old C# 2.0 compiler around, and then link with the new C# 3.0 code.</p>
<p>The key is linkability between modules, not the standard of the input language. Here, Microsoft&#8217;s .net system is starting to make a very impressive showing, I think. C#, VB, F#, Python, Ruby &#8212; a ton of languages all share the same common language runtime and the basic libraries of .net. After hearing a talk by Tim Harris of Microsoft UK at <a href="http://www.it.uu.se/research/upmarc/MCC09/prog">MCC 2009</a>, I am even more impressed by what .net can do.</p>
<p>.net was also the topic of the <a href="http://www.twit.tv/floss82">FLOSS Weekly interview with the team behind IronPython</a>. IronPython is Python on top of the .net framework, and the interview went into a lot of interesting details on how that has played out. The short answer is: very impressive, very smart, and very much the way things should be.</p>
<p>Note that even if the perspective is that &#8220;ESL languages describe a single hardware chip configuration, which is fixed&#8221;, having a language which is more dynamic still helps.Remember that modeling is programming, and anything that makes programming more efficient is a good. All you need to do is to have a &#8220;freeze&#8221; operation that says that &#8220;this particular set of things is my design&#8221;. But you might get there by interactively adding and removing things at a command-line interface.</p>
<p>Working in OSCI CCI WG, I have come to realize just how useful reflection in languages like Python is (or as we implement it in Simics). When all you have is a static C++ compiled binary, you cannot easily do things like ask objects for their type and other metadata like documentation. Since it just is not there. While in Python, you can do such inspection, and also extend things at run-time, which is very useful. If you want to add configuration hooks to a class, Python makes it dead easy, while C++ makes it major painful.</p>
<h2>The Dream ESL Language</h2>
<p>Overall, what that I get from all of this that a sound design for an &#8220;ESL&#8221; language, had we started today, would be:</p>
<ul>
<li>Basic semantics given by a virtual machine, not an input language.</li>
<li>Opportunities for several different input languages of potentially very different styles to be used, all compiling and linking into the same VM. That would open up for real innovation.</li>
<li>Extensive reflection and introspection features.</li>
<li>Dynamic reconfiguration during run-time, optionally frozen if the goal is to actually describe some hardware design for synthesis. But such synthesis would be from VM code, not some input language.</li>
</ul>
<p>Essentially, taking the approach of providing a stable interoperability layer between languages in the form of a VM, and allowing languages to be anything anyone could care to invent.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1008/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Another Layer of Virtual Indirection</title>
		<link>http://jakob.engbloms.se/archives/893?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/893#comments</comments>
		<pubDate>Sun, 23 Aug 2009 19:41:06 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ethernet]]></category>
		<category><![CDATA[indirection]]></category>
		<category><![CDATA[networking]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=893</guid>
		<description><![CDATA[After a long break, this is another blog post in the series of &#8220;how to do modeling for virtual platforms&#8221;. The previous installments dealt with checkpointing and determinism. This post is about the use of indirection in a model to increase its flexibility and ease of use, at the cost of a bit more work [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-486" title="gears-modeling" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/gears-modeling.png" alt="gears-modeling" width="62" height="65" />After a long break, this is another blog post in the series of &#8220;how to do modeling for virtual platforms&#8221;. The previous installments dealt with <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://jakob.engbloms.se/archives/734">determinism</a>.</p>
<p>This post is about the use of <strong>indirection </strong>in a model to increase its flexibility and ease of use, at the cost of a bit more work for the first model to be created.In particular, indirection in the sense of having explicit objects in a simulation to represent things like networks and cables connecting virtual machines.</p>
<p><span id="more-893"></span>There is a well-known saying (by <a href="http://en.wikipedia.org/wiki/David_Wheeler_%28computer_scientist%29">David Wheeler</a>) that &#8220;any problem in computer science can be solved with another                                 layer of indirection&#8221;. Among computer architects, this is often used with addition &#8220;&#8230;or a cache&#8221;. I think this is true, most of the time. The number of times that adding some indirection to an architecture for a program has simplified it &#8212; or made it feasible at all &#8212; are too many to count. It is at the very core of object-oriented programming, and the number of times you end up passing around function pointers is innumerable.</p>
<p>In the world of virtual platforms, there is one particular area where I see a pretty useful layer of indirection missing. Networks. Many virtual platform solutions offer various ways of connecting a virtual platform to a physical world, for interfaces like USB, Ethernet, or serial. Most virtual platforms achieve this by making the virtual hardware directly connect to the outside world.</p>
<p>Here is an illustration for Ethernet, where I have included a PHY in the picture. Quite often, you don&#8217;t even get that, just an Ethernet device that includes its PHY and connects out to a physical network. That&#8217;s what Qemu tends to do, for example.</p>
<p><img class="aligncenter size-full wp-image-895" title="No indirection" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/No-indirection.png" alt="No indirection" width="248" height="311" />This approach is the simplest when your thinking is that you will model a single device, and simulate one virtual machine at a time, connecting to the physical network to receive stimulus. For USB, this means the useful feature of connecting a camera or USB disk on your PC to the virtual machine. And as a bonus, you can connect multiple machines together using some form of cross-connection on the PC (such as TAP network interface).</p>
<p>However, there is a much better structure that is employed in some simulators. It is based on making each network an explicit object in the simulation, and have all virtual devices talk to the virtual network. Connections to the physical world are then handled by the virtual network, or, even better, by another device attached to the same virtual network.What you also get is the ability to connect multiple virtual devices to each other over the virtual network, and to easily write simulation modules that inspect or do fault-injection on the network traffic.</p>
<p>The picture below illustrates the idea for Ethernet:</p>
<p><img class="aligncenter size-full wp-image-896" title="indirection" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/indirection.png" alt="indirection" width="369" height="356" />The cost of this architecture is that you have to create the virtual network object, and invent the interface between devices and the network. This increases the cost for the first network device you create, and if all  you are tasked with is that single device, I can see why some simulation designers took the direct route. However, if you think about the task of creating tens of devices connecting to the same type of network, the &#8220;cost&#8221; of creating a virtual network is actually negative. Using an indirect approach like this makes creating each device simpler, and each device immediately gets the benefit of all the services that have been added to the virtual network. As long as a device can connect to the virtual network, it can connect to the physical network without any extra coding or cost.</p>
<p>Encapsulating entire networks with multiple virtual machines within a single simulation session <a href="http://www.virtutech.com/whitepapers/networking.html">is also very beneficial for control, inspection, and determinism. </a>Relying on a physical connection between virtual machines makes all packets pass the unreliable and random real world on their way between machines, destroying any determinism or control you might have hoped to incur.</p>
<p>In the world of SystemC simulation, an indirect approach like this is also a way to overcome some silly language limitations. Unbelievable as it might sound to the uninitiated, in SystemC you set up a simulation once into a single static setup (in something called the elaboration phase), and then that is what you simulate. There is no option to setup connection between modules or even add new modules to the simulation after the initial setup. Here, you can use a layer of indirection as a work-around. At the  start of simulation, connect all devices that might at some point in time be connected to a particular network to that network. During simulation, configure and reconfigure the network module to only allow traffic from and to certain modules, essentially creating a useful illusion that they are connected and disconnected from the network.</p>
<p>I hope I have convinced you: if you ever build a virtual platform, make sure to make all connections indirect.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/893/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Toast to Abstraction Layers</title>
		<link>http://jakob.engbloms.se/archives/888?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/888#comments</comments>
		<pubDate>Thu, 13 Aug 2009 19:41:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[gadgets]]></category>
		<category><![CDATA[general research]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[DAC 2009]]></category>
		<category><![CDATA[information hiding]]></category>
		<category><![CDATA[TheToasterProject]]></category>
		<category><![CDATA[Thomas Thwaites]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=888</guid>
		<description><![CDATA[I just found &#8220;The Toaster Project&#8220;, a Royal College of Art project where Thomas Twaites built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-890" style="margin: 10px 5px;" title="toaster" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/toaster.png" alt="toaster" width="81" height="87" />I just found &#8220;<a href="http://www.thomasthwaites.com/thomas/toaster/page2.htm">The Toaster Project</a>&#8220;, a Royal College of Art project where <a href="http://www.thomasthwaites.com/">Thomas Twaites </a>built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about the immense industrial complexity behind the very simple everyday items of our lives. I also think it has something to tell us computer scientists about abstraction.</p>
<p><span id="more-888"></span>What Thomas is showing is just how efficiently today&#8217;s economy manages to hide complexity from consumers (users). That toaster is just there on the shelf at a very low cost. If you take it apart, you will note that it is made from plastic which has been moulded to shape, and various bits and pieces of steel and copper wires. At that level, you feel that you could almost build it yourself. However, that is just the tip of the iceberg. What the toaster project reveals is the next level of abstraction and information hiding going on: that copper wire contains an enormously complex process in its making. From ore extraction, energy production to fuel the process, copper foundries, factories converting raw copper into wires, and a huge logistical machine to move things around.</p>
<p>In essence, we have a very nice example of information hiding and abstraction. As a user of the toaster, I do not need to understand how it works, and I do definitely not have any idea of the huge chain of suppliers leading up to its presence on my breakfast table.</p>
<p>That&#8217;s where are going with computers, but it is going to take time. Today, most users are fairly well shielded from how computers really work. Until they break down, at least. As programmers, we are less lucky. In practice, most good programmers end up understanding at least the basics of assembly language and the memory hierarchy of the machine.</p>
<p>What is hidden today is mostly the innards of the silicon. I have no real idea of how a processor works at the level of transistors and electrons. I don&#8217;t have to care about that, while any computer user fifty years ago probably had a decent understanding of the electronics. If nothing else, that was how you investigated hardware faults and actually built computers in a factory. Before integrated circuits, the electronic bits were much more exposed.</p>
<p>I think the current trend towards virtualization in the IT space and virtual platforms in the system design space is showing that the abstraction stack we are using in computing is getting deeper and more opaque. It takes some getting used to, but in the end, we have to realize that most computer programmers will be like the toaster user. All they want is a virtual toaster that toasts virtual bread in a way that lets them do their job.That is: write software that really does not care that much about the particulars of the hardware it is running on.</p>
<p>For the designer of a toaster (or even worse, the manufacturer of copper wire or the oil producer for the raw materials for the plastics), this takes some getting used to. We have to accept that in many cases, a simple abstraction is sufficient to help programmers get moving. There is no need for perfect timing accuracy or all the details of bus transactions. As long as what comes out is sufficiently similar to toast (a virtual toaster spitting out candy would be a bad abstraction), most users are happy.</p>
<p>Brian Bailey touched on this in a blog post following DAC, called &#8220;<a href="http://www.chipdesignmag.com/bailey/2009/07/30/accuracy-does-not-imply-accuracy/">Accuracy does not imply accuracy</a>&#8220;. Same idea as the toaster: you have to accept less detail, more abstraction, to get somewhere useful. Not everyone needs to go back to basics&#8230; and doing so tends to be counter productive in the end.</p>
<p>It is late now, but I think I will have toast and jam for breakfast tomorrow. Writing this got me hungry.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/888/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can we Rely on C?</title>
		<link>http://jakob.engbloms.se/archives/885?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/885#comments</comments>
		<pubDate>Mon, 10 Aug 2009 07:49:04 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[Michael Barr]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=885</guid>
		<description><![CDATA[I have written several times on this blog about the odd propensity of the &#8220;EDA&#8221; business to consider the C and C++ languages &#8220;high level&#8221; languages. They are what I use almost daily for most of the demo-order programming I do, but I still don&#8217;t consider them very high-level. High-level for me is scripting (Python, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-166" style="margin: 5px 10px;" title="whyc" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/whyc.png" alt="whyc" width="100" height="106" />I have written several times on this blog about the odd propensity of the &#8220;EDA&#8221; business to consider the C and C++ languages &#8220;high level&#8221; languages. They are what I use almost daily for most of the demo-order programming I do, but I still don&#8217;t consider them very high-level. High-level for me is scripting (Python, Lua, &#8230;) or domain-specific languages (DML, Lex, Yacc, MatLab, &#8230;) or model-driven development (UML, LabView, Simulink, &#8230;) or languages which at least provide sensible and reasonably safe semantics (Erlang, Java, &#8230;).</p>
<p>However, in fact, most the embedded industry and the &#8220;virtual platform&#8221; industry rely on C and C++ to get our daily jobs done. Question is, how much longer can we expect to do that? An interesting post at Embedded.com by Michael Barr brought back my argument that modeling needs to move up in levels of abstraction just like mainstream programming.</p>
<p><span id="more-885"></span></p>
<p>Michael Barr wrote the column &#8220;<a href="http://www.eetimes.eu/semi/218900394">Real Programmers Program in C</a>&#8220;, where he points out that knowledge of C is declining among computer science graduates. It is simply not efficient enough for simple mainstream work like creating web services and custom IT applications.</p>
<p style="padding-left: 30px;">Clever though he is, the young man admitted he wasn&#8217;t making that quote up on the spot. That &#8220;real men program in C&#8221; is part of a lingo he and his fellow computer science students developed while categorizing the usefulness of the various programming languages available to them. Exploring a bit, I learned the quiche-like phrase assigns both a high difficulty factor to the C language and a certain age group to C programmers. Put simply, C was too hard for programmers of their generation to bother mastering.</p>
<p>Obviously, if you take this argument to the extreme, you end up with the Monthy Python sketch where a bunch of old men are trying to trumph each other with the tough childhoods they had. In the end, they claim to have eaten just a handful of cold gravel for breakfast, walked 50 km to school, and having to clean the road each day&#8230; and kids these days, they just don&#8217;t understand&#8230;</p>
<p>But apart from the fact that kids in the western world today are very lazy and can&#8217;t stomach running 15km to school each day and therefore lack the toughness to match Kenyans in marathons there is a real issue here.</p>
<p style="padding-left: 30px;">The bottom line is that embedded programmers aren&#8217;t going to stop using C anytime soon. There are several reasons for this. First, C compilers are available for the vast majority of 8-, 16-, and 32-bit CPUs. Second, C offers just the right mix of low-level and high-level language features for programming at the processor and driver level. Until the use of C starts to turn down in future such surveys, C programming skills will remain important.</p>
<p>The issue is that universities are moving up in the efficiency scale of languages, teaching students good things rather than hard things. Not all universities do (and I am trying my best to lobby for keeping assembly language and device driver programming in the core computer science curriculum whenever I can), but it is clear that the market for &#8220;general IT stuff&#8221; is so much bigger that it will attract more students to &#8220;easy&#8221; languages like Ruby and VisualBasic.</p>
<p>So we need to move both embedded programming and virtual platform technology much more in this direction to maintain  a steady influx of smart people into the field. High-level synthesis of hardware and virtual platform models from a VisualBasic form? Sounds like a stretch&#8230;</p>
<p>We also need to jump into the education system and create the courses and motivate professors to teach lower-level languages. Not all are that familiar with actual practices in industry, unfortunately.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/885/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing in SystemC @ FDL</title>
		<link>http://jakob.engbloms.se/archives/880?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/880#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:48:26 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Marius Monton]]></category>
		<category><![CDATA[Mark Burton]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=880</guid>
		<description><![CDATA[Along with Marius Monton and Mark Burton of GreenSocs, I will be presenting a paper on checkpointing and SystemC at the FDL, Forum on Specification and Design Languages, in late September 2009. The paper will explain how we did Simics-style checkpointing in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-881" style="margin: 5px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" />Along with Marius Monton and Mark Burton of <a href="http://www.greensocs.com">GreenSocs</a>, I will be presenting a paper on <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://www.systemc.org">SystemC </a>at the FDL, <a href="http://www.ecsi-association.org/ecsi/fdl/fdl09/mainpage.asp?fn=advance">Forum on Specification and Design Languages</a>, in late September 2009.</p>
<p>The paper will explain how we did <a href="http://www.virtutech.com/whitepapers/simics_checkpointing.html">Simics-style checkpointing </a>in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics attribute system.</p>
<p><span id="more-880"></span>It is an approach that does not have the limitations of the &#8220;save the entire simulation process&#8221; method employed by Cadence (and I think also CoWare) in their <a href="http://jakob.engbloms.se/archives/817">SystemC checkpointing solution</a>. It does require you to mark all relevant state in your models, but the benefit from doing so is that regardless of how you change the code of a model, you can still use the same old checkpoints. It is also portable across hosts. We did have to do some patching to the OSCI SystemC kernel to draw out and reset all relevant state from the kernel. The OSCI kernel does not provide sufficient interfaces to checkpoint its state in its vanilla form.</p>
<p>The conference takes place on September 22 to 24, in Sophia Antipolis in France. Now all I have to do is figure out how to get there in the most convenient way. I expect this to be as much fun as the other EDA conferences I have been to recently (I seem to only go to such events nowadays, nothing left on the old embedded circuit for me it seems).</p>
<p>By the way, the FDL logo is really pretty. I think all long-running events should spend the time to create a recognizable logo. My old real-time conferences used to just have plain text and the <a href="http://www.ieee.org">IEEE </a>and <a href="http://www.acm.org">ACM </a>logos.</p>
<p><img class="aligncenter size-full wp-image-882" title="fdl_logo_new" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdl_logo_new.jpg" alt="fdl_logo_new" width="435" height="159" /></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/880/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cadence SystemC Checkpointing</title>
		<link>http://jakob.engbloms.se/archives/817?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/817#comments</comments>
		<pubDate>Sat, 13 Jun 2009 20:29:35 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[George Frazier]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=817</guid>
		<description><![CDATA[I while ago I wrote a blog post on checkpointing in virtual platforms, and what it is good for. Checkpointing has been a fairly rare feature in virtual platform tools for some reason, but it seems to be picking up some implementations. In particular, I recently noticed that Cadence added it to their simulator solutions [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-737" style="margin: 5px 10px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="gears1" width="56" height="57" />I while ago I wrote a blog post on <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>in virtual platforms, and what it is good for. Checkpointing has been a fairly rare feature in virtual platform tools for some reason, but it seems to be picking up some implementations. In particular, I recently noticed that Cadence added it to their simulator solutions a while ago (2007 according to their blog posts). There are a two blog posts  by <a href="http://www.cadence.com/community/posts/georgef.aspx">George Frazier </a>of Cadence (&#8220;<a href="http://www.cadence.com/Community/blogs/sd/archive/2009/02/18/how-to-save-os-boot-time-in-your-systemc-virtual-platform-with-save-and-restore.aspx">saving boot time</a>&#8221; and &#8220;<a href="http://www.cadence.com/Community/blogs/sd/archive/2009/03/09/systemc-save-and-restore-part-2-advanced-usage.aspx">advanced usage</a>&#8220;) that offer some insight into what is going on.</p>
<p><span id="more-817"></span>Note that checkpointing is nothing new to RTL-level HDL simulators, since that is a much more controlled environment than a general virtual platform. I think the Cadence blog put it quite well:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Save and restore (or restart) has existed in HDL simulators for years, but things are trickier if SystemC is involved. For one thing, SystemC simulators use external tools for compilation and linking: i.e. gcc. They have more or less a “black box” understanding of global variables, local variables, file descriptors and heap values that make up the simulation state at any point in time. When you throw in multiple threads implemented with application-level threading packages and the fact that C++ heap objects are impractical to save programmatically, it’s easy to see why save and restore tools for HDL simulators can’t be easily extended for SystemC. </span></p></blockquote>
<p><span class="Cadence_CS_BlogDetail_BlogText">I could not say it better myself. What is interesting is that the Cadence solution does solve this problem, in a limitied way, for a limited use case. I have not looked at their solution in detail (such as using it myself), but this paragraph indicates that the solution is essentially a complete memory contents dump:</span></p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">During restart, all internal variables inherit the same values from the process as it existed at the time of save (for example, C variables declared static). While this behavior helps assure that SystemC state information is properly saved and restored, it can also leave variables that reference the process environment (like file descriptor and sockets to other processes) in limbo. </span></p></blockquote>
<p><span class="Cadence_CS_BlogDetail_BlogText">Doing it this way is heroic in effort but also quite limited in scope. If I look at the four operations for restoring from a checkpoint that I outlined in my previous blog post on checkpointing:</span></p>
<ul>
<li><span class="Cadence_CS_BlogDetail_BlogText">Restore to same machine, same model</span></li>
<li><span class="Cadence_CS_BlogDetail_BlogText">Restore to different machine, same model</span></li>
<li><span class="Cadence_CS_BlogDetail_BlogText">Restore to same or different machine, updated model</span></li>
<li><span class="Cadence_CS_BlogDetail_BlogText">Restore to same or different machine, completely different model</span></li>
</ul>
<p>It is clear that you can only do the first, as the solution will restore the state of an implementation of a model, not just its relevant state as is done in Simics checkpointing. The sole advantage of this approach is that it does work with arbitrary code. But it does not support any of the more powerful uses of checkpoints beyond simply not repeating work for a single user on a particular machine.</p>
<p>I don&#8217;t think a memory dump can travel even to a second machine of similar make and setup, since it will depend on the precise memory layout of a process that starts. And that is affected by DLL and shared objects load order, which is hard to control. The versions of all libraries have to be exactly the same too. It is not even clear that a checkpoint survives the upgrading of the OS on the machine being used, as that will surely change things in terms of precise memory allocation.Would be happy for the Cadence users to be proven wrong, but in principle I think checkpointing done right requires models to be written explicitly to support it. <a href="http://stackoverflow.com/questions/184027/serialization-of-objects-no-thread-state-can-be-involved-right">Just like any serialization solution in any programming language</a>.</p>
<p>I must admit that George Frazier does mention using checkpoints &#8220;weeks or even months&#8221; after initial save, but there is no mention of changing the code of the model in that time frame. For me checkpoints tend to live for years, I have some nice Simics demo checkpoints that have been with us for some five years at this point in time, surviving from Simics 2.0 to 2.2 to 3.0 to 3.2 to 4.0 to 4.2&#8230; thanks to the power of the Simics &#8220;save only explicitly defined state&#8221; principle, and checkpoint updater functions that essentially rewrite old checkpoints to make them compatible with new and updated machine models.</p>
<p>The other bit that I find interesting is what is considered the biggest headache: not the actual saving of the memory state, but how to handle open files and similar host operating-system connections:</p>
<p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText"></p>
<blockquote><p>If a save operation is performed when the file is open then problems can arise if the program attempts to write to the same file after restore (because the file descriptor associated with the open file will be in a different state after restore).</p></blockquote>
<p>It is nice to have a way to solve this, but it is also pretty shocking that you have to solve it! It kicks of a mini-rant&#8230; A virtual platform model should <em>not </em>read or write or access other host resources directly in any way, in my rulebook for sound programming practice. All host dependencies should be handled via the simulation core and framework, in a manner that is checkpoint-safe, portable, and does not rely on any information from the host directly in the models. It is crucial to localize all such host interactions in specially written host connection modules that make sure all regular simulation modules run in a completely encapsulated and virtual world.</p>
<p>So, overall, hats off to Cadence for actually doing something, but keep in mind that it will be very limited until some discipline is exercised in modeling and state considered as something separate from the implementation.</p>
<p></span></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/817/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Guest Blog at Cadence: &#8220;Way Worse than the Real Thing&#8221;</title>
		<link>http://jakob.engbloms.se/archives/781?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/781#comments</comments>
		<pubDate>Wed, 20 May 2009 10:45:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[ISX]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software testing]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=781</guid>
		<description><![CDATA[Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform. As part of explaining why this [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-782" style="margin-left: 5px; margin-right: 5px;" title="avataraspx" src="http://jakob.engbloms.se/wp-content/uploads/2009/05/avataraspx.jpg" alt="avataraspx" width="72" height="72" />Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform.</p>
<p><span id="more-781"></span></p>
<p>As part of explaining why this is cool and what it means, I have a <a href="http://www.cadence.com/Community/blogs/sd/archive/2009/05/18/way-worse-than-the-real-thing.aspx">guest blog posting over at Cadence&#8217;s blog site</a>, called &#8220;Way Worse than the Real Thing&#8221;. The blog is posted under the general &#8220;TeamESL&#8221; &#8220;personality&#8221; on the blog site, which is used for people external to Cadence in the ESL space.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/781/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulation Determinism: Necessary or Evil?</title>
		<link>http://jakob.engbloms.se/archives/734?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/734#comments</comments>
		<pubDate>Sun, 19 Apr 2009 20:36:02 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[determinism]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[repeatability]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=734</guid>
		<description><![CDATA[In my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-735" style="margin-left: 10px; margin-right: 10px;" title="gears" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png" alt="gears" width="56" height="57" />In my series (well, I have one previous post about <a href="http://jakob.engbloms.se/archives/714"><em>checkpointing</em></a>) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: <em>determinism.</em> Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.</p>
<p><span id="more-734"></span></p>
<h2>What?</h2>
<p>Determinism really means this:</p>
<ul>
<li>Given a certain initial state</li>
<li>And a certain sequence of external inputs</li>
<li>The end result and state of the simulation will always be the same</li>
</ul>
<p>The key to note is that you need to require both the starting state and the sequence of external inputs to be the same in order to get the same result. If either of these change, you can well get a different result. Implementing a deterministic simulator requires all internal events and activities in the simulator to be performed in the same order and at the same time in each simulation run. It means that the host computer environment state cannot be allowed to affect the simulator execution, and that in turn means that all sorting of internal events have to be done in defined orders in all instances.</p>
<p>I have a story about how hard that can be in practice. I once talked to some compiler developers who had the issue that when recompiling the same program with the same set of compiler options, the results might come out different, even on the same machine. The problem was that each run of the compiler was done in a different overall system state, and this might affect how the OS memory allocation functions allocated items in memory. It turned out that in some cases, the precise value of the <em>pointers </em>to the items in a complex data structure were used by standard libraries to handle iteration over nodes in the data structures. Thus, a different memory allocation pattern gave a different iteration order and a different traversal order of nodes, and in the end an almost arbitrarily different result. The correct solution they had to implement was to use a defined lexical ordering to traverse and iterate, not anything dependent on the state of the host machine. It is nothing different in a simulator: define the order of <em>everything</em>, in order to be deterministic.</p>
<h2>Why?</h2>
<p>The crucial benefit that determinism brings to a simulation in general and a virtual platform in particular is <em>repeatable debugging</em>. With determinism and an appropriate recording mechanism (and most practically <a href="http://jakob.engbloms.se/archives/714">checkpointing</a>) you can rely on being able to repeat a run resulting in a bug any number of times with the precise same sequence of events in the simulation. In particular, the same sequence and timing and timing relative to instructions executed for events visible to and relevant for the software running on the virtual platform. Especially for multicore and parallel computing systems this is incredibly powerful, and something that just cannot be achieved on physical hardware (due to its inherent randomness and chaotic behavior, see my 2006 and 2007 ESC Silicon Valley talks for more on this, at my <a href="http://www.engbloms.se/jakob_publications.html">publications </a>and <a href="http://www.engbloms.se/jakob_presentations.html">presentations </a>pages).</p>
<p>If you assume stability of the simulation infrastructure and the simulation platform, determinism also makes debugging the simulation itself easier. Often, a bug in a simulation model is repeatable, and with determinism, it is easy to repeat the same external stimulus sequence to the module and debug it repeatably.</p>
<p>Determinism also makes it easy to detect change in the behavior of a simulation: if the same simulation setup results in a different result or final simulation state, you know something in the setup (model, model parameters, or software) changed. There is no randomness that cause changes without some fundamental parameter being changed. Such boring reliable behavior is generally exactly what you want when testing and debugging large, complex systems.</p>
<p>Obviously, once determinism becomes a requirement, missing determinism in a model is a bug in itself &#8212; and finding such bugs can certainly be interesting exercises.</p>
<h2>Why Not?</h2>
<p>Just like for checkpointing, one reason not do to determinism is that it is hard, as discussed above.</p>
<p>The most common reason that people claim to want to avoid determinism is that they want to explore alternatives within their simulation. Basically, there is a need for <em>variability </em>that would seem to be at odds with determinism. The typical argument is that &#8220;if my simulation model contains a non-deterministic choice, I want the simulation to expose that and not just make the same decision every time&#8221;. This is where determinism tends to be considered <em>evil</em>. However, this argument is not correct.</p>
<p>If we take the case that at some point P in a simulation run there are two different events <em>E</em> and <em>F</em> that can fire (since they are both posted to the same point in virtual time), a deterministic simulator will always select one and the same. This is necessary to reap the system-level benefits discussed above. However, nothing prevents us from programming a change from this behavior into our system explicitly, <em>introducing controlled and repeatable variation. </em>In such a setup, we will have a random decision being made in each simulation run, but one where the outcome in any particular run can be repeated by setting the same random seed parameter.</p>
<p>This brings the best of both worlds: variation to expose issues where there is potential non-determinism or lack of synchronization in the model, and perfect repeatability of the issues this poses in terms of target software and simulation system behavior. The reason for the simultaneous readiness can be considered to be lacking synchronization in the model, in general, and such a randomizer of behavior will expose that at several different levels. But uncontrolled randomness is not the answer.</p>
<p>Another common misconception is that at a higher level, determinism in a virtual platform means that target software will always run in the same way. That is not true, and misses the importance of state in the deterministic behavior equation. If the initial state when a program starts is different, a different execution will result. If software is run on top of any non-trivial operating system, there is plenty of such variation. In one of our simplest Simics demos, we show this by running an intentionally buggy race-condition-ridden program. Each time it is run, it hits a different number of race conditions. But thanks to determinism (best demoed using reverse execution), we can repeat each run perfectly.</p>
<p>Thus, determinism is not equal to constant behavior or lack of variation.</p>
<h2>The reverse argument</h2>
<p>Finally, determinism is the simplest way to implement reverse execution: if you have recording, determinism, and checkpointing, you can easily virtually reverse the execution by going back to a checkpoint and replay the execution from that point. If you stop one instruction before the current instruction, you have in essence stepped backwards one step in time. This is how both VMWare and Simics implement reverse execution and debugging. And it could not happen without determinism.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/734/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing: Meaningless, Difficult, or just Overlooked?</title>
		<link>http://jakob.engbloms.se/archives/714?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/714#comments</comments>
		<pubDate>Thu, 09 Apr 2009 19:56:16 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[Mambo]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[ZX Spectrum]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=714</guid>
		<description><![CDATA[One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-737" style="margin-left: 10px; margin-right: 10px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="gears1" width="56" height="57" />One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without it. Not having checkpointing is like having a word processor where you only get to save once, when your document is finished, with no option of saving intermediate states.</p>
<p>But not everyone seems to consider this an important feature, judging from its relative rarity in the world of EDA and virtual platforms. Why is this? Let&#8217;s look at some possible explanations.</p>
<p><span id="more-714"></span></p>
<p>But first, let&#8217;s examine the subject of this post a bit more. What is checkpointing, precisely?</p>
<h2>What?</h2>
<p>In short, it is the ability of a virtual platform or virtualization environment to save the state of an executing simulation to disk (or memory or something) and later bring the saved state back and continue the simulation as if nothing had happened.</p>
<p>In detail, there are four operations that need to be supported for this to be truly useful:</p>
<p><img class="aligncenter size-full wp-image-715" title="checkpoints" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/checkpoints.png" alt="checkpoints" width="632" height="494" /></p>
<ul>
<li>Saving and restoring to the same simulation system on the same host machine (i.e., into the exact same program binary for the simulation).</li>
<li>Restoring on a different machine (where different can mean a machine with a different word-length, endianness, and operating system).</li>
<li>Restoring into a bug-fixed version of the same simulation model.</li>
<li>Restoring into a completely different simulation model that happens to have the same state.</li>
</ul>
<h2>Why?</h2>
<p>Let&#8217;s look at some use cases for checkpointing:</p>
<p>The last operation is very interesting, since it carries with it the ability to change abstraction level. It is used in IBM Mambo (see a <a href="http://www.research.ibm.com/journal/rd/502/peterson.html">2006 IBM paper that you now have to buy due to an annoying change in IBM policy</a>) to exactly this effect, and in Simics for the Freescale QorIQ P4080 as well. It is also well exploited by academic research frameworks for Simics, such as <a href="http://www.cs.wisc.edu/gems/">GEMS </a>and <a href="http://www.ece.cmu.edu/~simflex/">SimFlex</a>. Essentially, the idea is to position using fast mode, and then move over to detailed mode. The advantage to doing this over a checkpoint is that you can farm out the experiments across many different hosts, save the precise starting point for future regression tests, and try different detailed settings from a known common starting position.</p>
<p>The most obvious use for checkpoints is to avoid repeating simulation work that does not add value, in particular booting of operating systems. A modern OS boot  easily takes billions of instructions (say 10 seconds on a dual-core gigahertz machine&#8230; do the math). Being able to save a simulation effort like this for instant reuse is such a standard part of how I work with virtual platforms that I could not imagine the pain of not having it.</p>
<p>Checkpointing is also a useful communications tool: it makes it possible for any user of a virtual platform to precisely communicate the system state and configuration to anybody else with access to the same virtual platform system (note that a Checkpoint, at least in Simics land, contains the list of objects in the simulation and how they are connected, so you do not need any other description of the simulation setup). This helps in debugging models &#8211; a user testing it can easily package problems and report them to the modeling team. And it helps in debugging software running on the virtual platform, as a tester can package up the precise system state right before a bug hits and send it back to development. Incredibly powerful! Here, portability of checkpoings across hosts is obviously very important, as well as across model versions. Once you have a fix for a model bug, you test it using the checkpoint, and check that things now proceed as they should.</p>
<p>Checkpointing also comes in handy as a backup-save ability when configuring an interactive target system. In many cases, the loading and configuration of software on a target is a very valuable and hard-to-repeat-exactly activity. Adding in software, configuring it, starting servers, assigning network addresses, configuring communications paths for backplanes can take a lot of time. On physical machines or virtual platforms, if you mess up, you have to go back and start over. With checkpointing, you can incrementally save work as you go along. This is a common use case for the snapshotting ability in VmWare, for example. But it works equally well for embedded targets modeled as virtual platforms.</p>
<p>There are more uses, the paragraphs above just scratch the surface of the utility of checkpoints.</p>
<h2>Why Not?</h2>
<p>But despite the obvious benefits, this feature is very rarely found in virtual platforms. I can see three main lines of argument:</p>
<ul>
<li><em>Meaningless</em>: for tests comprising only short software runs like a few million or tens of millions of instructions, rerunning it is fast enough. Or changes major enough. That checkpointing seems pointless. I can buy that &#8212; but only until the simple target is part of a greater context. If a DSP, for example, is part of a big system setup, you want to save its state even if it is only running a few small million-instruction loops.</li>
<li><em>Difficult</em>: I think this might be the most important explanantion. Doing checkpointing right puts requirements on the simulation kernel and on all processors and device models. All models have to be coded with discipline so that all state is available and can be set at any point in time. In particular, this means that explicit threading like employed in SystemC SC_THREAD is out. It must also be admitted that certain types of models like detailed processor models can be very difficult to serialize and deserialize from disk, simply due to the enormous intricacies of their implementations. But had they been designed with checkpointing in mind from the start, it would have been less difficult.</li>
<li><em>Overlooked</em>: The virtual platform was designed without thinking of checkpointing. Alternatively, no customers asked for it, so it was not built.</li>
</ul>
<p>I find the last argument very interesting, since I can see what happens once you have tried checkpointing. In my experience, once a user of a virtual platform has tried checkpointing, they want it. It goes from a interesting idea to a must-have feature very quickly. No arguments about why it is hard or why they can do without it work, as they have seen how things should be done.</p>
<p>For me, I think it is akin to my first encounter with a Macintosh computer, and the concept of &#8220;undo&#8221; in programs. Before that, I was happily editing code on a ZX Spectrum, in an environment where &#8220;undo&#8221; meant &#8220;manually remember how it looked at change it&#8221;. I had no problems with that, but once I saw how things could be done, there was no going back.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/714/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>I Want One&#8230; Trillion Instructions&#8230;</title>
		<link>http://jakob.engbloms.se/archives/709?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/709#comments</comments>
		<pubDate>Sat, 28 Mar 2009 21:10:31 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Dr. Evil]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mpc8641d]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=709</guid>
		<description><![CDATA[There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite [...]]]></description>
			<content:encoded><![CDATA[<p>There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I<a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html"> recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification</a>.</p>
<p><span id="more-709"></span></p>
<p>It all comes down to a simple classic tradeoff that I usually illustrate like this (using more neutral ground than computer systems; and with credit to Peter Magnusson who had this slide already in place when I joined Virtutech back in 2002):</p>
<p><img class="aligncenter size-full wp-image-711" title="simulation-rule" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/simulation-rule.png" alt="simulation-rule" width="457" height="341" /></p>
<p>What this is telling you is simple:</p>
<ul>
<li>You simulate something very large using large units, i.e., low level of detail; or</li>
<li>You simulate something quite small using small units, i.e., high level of detail.</li>
</ul>
<p>I wanted to test the idea that by using less detail, you can run larger test cases and therefore obtain better coverage of overall landscape than diving in and counting cycles in some small part of it. In the end, this made me cross the trillion instruction line &#8212; since each experiment took a few hundred billion target instructions to complete, repeating and tweaking during the development work definitely add up to more than a trillion instructions.</p>
<p>And this is where I have put my little finger close to my mouth and say:</p>
<p style="text-align: center;"><img class="size-full wp-image-710 aligncenter" style="margin-top: 10px; margin-bottom: 10px;" title="drevil_million_dollars" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/drevil_million_dollars.jpg" alt="drevil_million_dollars" width="300" height="318" /></p>
<p>&#8216;I want one trillion instructions&#8217;</p>
<p>So what did I get from these trillion instructions?</p>
<p>An interesting study in how operating system overhead can have a big impact on the profitability of hardware accelerators. By running hundreds of test cases with different assigned computation latencies of a hardware accelerators, as well as different driver models for my hardware (all running under Linux on my favorite MPC8641D), a key diagram emerged:</p>
<p style="text-align: left;"><img class="aligncenter size-full wp-image-712" style="margin-top: 10px; margin-bottom: 10px;" title="hwsw" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/hwsw.png" alt="hwsw" width="872" height="507" /><a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html">Read the paper </a>for all the details, but the key thing to note is that with a poor driver architecture, making the hardware 100 times faster resulted in zero gain in system performance. Had this experiment been performed on a bare-bones platform without a full operating system in place, I am fairly certain that the faster hardware would have been considered much more worthwhile.</p>
<p style="text-align: left;">In the end, I resorted to a driver variant where I had user-level code directly access the device programming interface via an mmap()-mapped memory region. Not pretty, essentially this was bare-metal programming wrapped inside a big cosy Linux package, but it sure was efficient compared to doing a kernel/user mode switch for each hardware operation. But even here, it turned out that making the hardware very very fast as opposed to just very fast had no benefit. It proves to me that the software has to be taken into account in full in order to properly evaluate an idea for a hardware design.</p>
<p style="text-align: left;">You could say that the poor results for acceleration here were due to my inept Linux driver programming skills, but that just underscores the key result: you have to take the software into account. If the conclusion is that a better Linux device driver programmer is needed, you have still decided that the key system bottleneck is not just the speed of the hardware, but how it is used. And that is exactly what system design needs to be about.</p>
<p style="text-align: left;">As an aside, playing around with a complete system like this, and automatically run large volumes of test with varying parameters was a really interesting experience. I must admit that getting to these trillions of instructions required  a few hours of simulation time, but nothing that could not be solved by leaving a computer running over lunch or a long meeting. The machine was modeled using standard Simics &#8220;software timing&#8221;, i.e., without any particular cache or pipeline or bus details, and it seems that that is usually all you need. Had I increased the level of detail and slowed things down by a factor of ten or a hundred, I would never have covered such a large set of test cases and been able to evaluate as many different variants of drivers and hardware speeds.</p>
<h2 style="text-align: left;">IBM did it before me</h2>
<p style="text-align: left;">Finally, I found it interesting that an analogous experience about the effect of creating a complete software stack and testing what looks like a very good hardware idea was reported in an IBM paper from a few years ago, in &#8220;<a href="http://researchweb.watson.ibm.com/journal/rd/502/peterson.html">Application of full-system simulation in exploratory system design and development</a>&#8220;, by Peterson et al, in the IBM Journal of Research and Development. Look at the section about the &#8220;MIP Morphing&#8221; feature, which is essentially cache locking. They do use a fairly detailed simulator for the end evaluation of their performance &#8211; but the key message is that by running a full software stack, they realized that just managing the feature was too hard in a realistic software environment to make it worthwhile:</p>
<blockquote>
<p style="text-align: left;">Initially, the MIP morphing feature was well received by internal development and HPCS customers alike. The team was aware of the need to both manage this hardware feature at the OS level and provide portable abstractions to the programmer to exploit this feature in a productive way. &#8230;</p>
</blockquote>
<p style="text-align: left;">And then:</p>
<blockquote>
<p style="text-align: left;">The implementation effort was facilitated by Mambo, allowing the OS team to prototype the MIP morph idea in a controlled development environment. Taking the prototyping effort to this level of realism uncovered many complexities in supporting the MIP morph in a virtualized manner. ..</p>
</blockquote>
<p style="text-align: left;">And finally:</p>
<blockquote>
<p style="text-align: left;">By prototyping the software support that was <em>needed at the OS level and exposing the usage issues at the application programmer&#8217;s level</em>, the magnitude of the problem was exposed at its fullest. Further, the improvement in performance did not show a sufficient payback for the immense effort that would be required at the software level to support the idea, and as a result it was dropped from further consideration.</p>
</blockquote>
<p style="text-align: left;">It seems that whatever you do, IBM did it first&#8230; and it validates the idea of full-system simulation and that software is king today.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/709/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>IBM z10 Heavy-Duty Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/639?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/639#comments</comments>
		<pubDate>Sun, 15 Feb 2009 17:17:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[CECsim]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[z10]]></category>
		<category><![CDATA[zSeries]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=639</guid>
		<description><![CDATA[Unknown to most, IBM has one of the world&#8217;s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-640" style="margin: 5px;" title="ibm_z10" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/ibm_z10.png" alt="ibm_z10" width="118" height="118" />Unknown to most, IBM has one of the world&#8217;s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found in the <a href="http://www.research.ibm.com/journal/rd53-1.html">IBM Journal of Research and Development</a>, number 1, 2009, . It is called <a href="http://www.research.ibm.com/journal/rd/531/koerner.pdf">&#8220;IBM System z10 Firmware Simulation&#8221;, by Körner et al</a>.</p>
<p><span id="more-639"></span>The z10 is the latest generation of the classic IBM mainframe family that started with S/360 back in the 1960s. The simulation for just running the firmware of these beasts is making most other virtual platforms look positively puny &#8211; focusing on single SoCs for consumer or digital devices. It also shows that virtual platforms as a technology can scale all the way from single-core bare-metal simple machines that are useful for developing initial software for simple embedded systems up to servers and racks containing hundreds of processing units and very diverse hardware.</p>
<p>The teminology used is unusual, compared to the EDA/ESL and computer architecture research worlds. But it is good. The key concept is a &#8220;VPO&#8221;, Virtual Power On. For a computer of this class, doing Power On is a major event, and calling it a &#8220;boot&#8221; does not really cover its full complexity, involving many different layers of software running on the same and different computers. The VPO was targeted at four months prior to hardware tape-out &#8212; and this means that at that point in time the virtual system would be complete and the firmware complete enough to do a power on.</p>
<p>The simulation system used for the z10 mixes IBM&#8217;s in-house <a href="http://researchweb.watson.ibm.com/journal/rd/464/vonbuttlar.html">CECsim </a>with <a href="http://www.virtutech.com/solutions/virtual_platform/power">Virtutech Simics</a>. CECsim executes the code for the <a href="http://jakob.engbloms.se/archives/80">central zSeries processors</a>, while Simics simulates the FSP-1 &#8220;flexible support processor&#8221; based on the Power Architecture. In previous generations of simulation, the FSP code had been host-compiled and run on an x86 workstation instead of running the actual Power Architecture binaries. Running the real binaries brought additional verification value to the software, finding 3 times more bugs than in the previous host-based simulation:</p>
<blockquote><p>Because the Simics environment now enables us to execute all FSP code in simulation, a far greater amount of code is simulated. Correspondingly, the number of defects found in simulation also increased, by more than 33(Table 2).</p></blockquote>
<p>The article also describes how hardware-accelerated simulation of the actual VHDL of complex new IO chips were used to validate the bits-and-cycles-level interfacing between code and the logic, as well as to validate the logic design itself.</p>
<p>Overall, the article is one of best presentations of comprehensive use of various types of simulation tools and techniques to remove firmware defects as early as possible in the system development project.</p>
<p>For more on the history of this, I refer to a previous blog post here, &#8220;<a href="http://jakob.engbloms.se/archives/130">The 1970 rules strikes again</a>&#8220;, where I described some late 1960&#8242;s mainframe simulation technology and its uses. Also, browse the back issues of the IBM JRD archives, there are lots of nuggets to be found there!</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/639/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Threading or Not as a Hardware Modeling Paradigm</title>
		<link>http://jakob.engbloms.se/archives/485?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/485#comments</comments>
		<pubDate>Thu, 01 Jan 2009 08:31:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[Reactive programming]]></category>
		<category><![CDATA[sampalib]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Threading]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=485</guid>
		<description><![CDATA[Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-486" style="margin: 5px 10px;" title="gears-modeling" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/gears-modeling.png" alt="gears-modeling" width="62" height="65" />Traditional hardware design languages like <a href="http://en.wikipedia.org/wiki/Verilog">Verilog </a>were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of <a href="http://www.systemc.org">SystemC</a>, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look &#8220;natural&#8221; as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.</p>
<p>In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of &#8220;natural&#8221; modeling.</p>
<p><span id="more-485"></span></p>
<p>As I see it, the main alternative modeling paradigm is to use a classic event-driven system, where all activity is triggered by events and run the associated code to completion. This makes execution occur in a series of simulation steps in various part of the system, rather than as a set of (pseudo) concurrent tasks.</p>
<h2>Threaded Problems</h2>
<p>The most common complaint with threading is <strong>performance</strong>. This has become very clear in the case of using SystemC for transaction-level modeling. All advice in how to do good and fast TLM coding tells us to use SC_METHODs, which are essentially callbacks that are not active objects in their own right. Note that SystemC models found in the wild are often built on SC_THREADs despite this advice, as that is the &#8220;easiest&#8221; way to do things. Some convenience systems part of the OSCI TLM-2.0 library also rely on threads to convert between AT-style asynchronous and LT-style synchronous function calls (which is pretty unavoidable, but not applicable in the realm of high-performance simulation for virtual platforms).</p>
<p>Furthermore, using threading as a paradigm (even cooperative single-active-thread cooperative threads like in SystemC or classic MacOS) bring with it the <strong>problems of concurrent programming</strong>, in that you suddenly need to care about protecting data structures against conflicting accesses, worry about deadlocks, and similar concurrent programming issues. Without threads, all such issues go away.</p>
<p>Note that using threading as a modeling paradigm with truly concurrent execution of models will make the execution have all the problems of parallel programs, especially non-deterministic execution and hard-to-find bugs. At least a cooperative multitasking system tends to be deterministic in the way it goes wrong.</p>
<p>Threading as a hardware model programming style therefore makes concurrent multithreaded simulation harder rather than easier to achieve. Especially if the semantics of the simulation system specifies an interleaved model of execution as the semantics, which is the case for SystemC. In this cases, there is no way to really make SystemC parallel without adding parallelism as some extra library.</p>
<p>However, one of the biggest practical problems with threading is the problem of <strong>inspecting, changing, and checkpointing simulation state</strong>. With threads, you end up having state stored in local variables on the stacks in the system, as well as in processor registers, the program counter, and other places that are hard to get to from the outside.  This is not just me saying this, I found this well said in the <a href="http://www.sampalib.org/doc/papers/A%20Sampalib%20and%20SystemC%20comparison.pdf">sampalib white paper </a>:</p>
<blockquote><p>Using threads means that part of the simulation state is in stacks, which may limit the ability to persist the state of the simulation in checkpoints.</p>
<p>Using wait() implies context switch which are costly in terms of simulation speed, and thus often discouraged in guidelines for modeling SystemC™ models</p></blockquote>
<p>To furthermore drive this point, all librariesfor general program state serialization that I have seen (for C++ and Java, for example) also rely on explicit state stored in objects, and explicitly do not support the &#8220;transient&#8221; state held in local variables and the program counter. Essentially, only heap-allocated objects are handled in serialization solutions.</p>
<h2>Event-Driven Solutions</h2>
<p>An event-driven transaction-level hardware simulation is coded in a different way from a naive threaded implementation (but not that differently from a more sophisticated threaded program).</p>
<p>Each device model has to make its state explicit as a set of variables, and preferably also declare these for access for an external tool using something like <a href="http://www.greensocs.com/en/projects/GreenControl">GreenSocs GreenControl </a>or <a href="http://www.virtutech.com/whitepapers/modeling.html">Simics Attributes</a>. It also has to expose a set of functions to be called when events happen or other devices in the simulation system send a transaction into the device model.</p>
<p>Additionally, you should encapsulate all state in a model inside the model object and not expose it for direct access from the outside. A pure object-oriented style with accessor functions for everything is required for best modularity.</p>
<p>The advantages of this model are clear:</p>
<ul>
<li>Concurrency problems are reduced, since each function call will run to completion before any other object or function is activated. There is no need to worry about shared data variables, as they should not exist.</li>
<li>Checkpointing and inspection is facilitated, since all state is now explicit and declared.</li>
<li>Performance is typically increased, since there is no need to do context switches between threads. Locality is also increased by having functions run to completion before returning.</li>
<li>True concurrency is easier to achieve, since each model can quite easily be considered a local-state, shared-nothing, explicit message-passing component similar to Erlang threads. This makes it possible for the simulation scheduler to run multiple models concurrently on multiple host threads. For more on this topic, see my <a href="http://jakob.engbloms.se/archives/246">SiCS Multicore Days 2008 </a><a href="http://www.engbloms.se/presentations/engblom-multicore-sics-2008.pdf">presentation on how Simics was threaded</a>.</li>
</ul>
<p>The downside is that some people consider the programming more complicated. Which is really a matter of appearance over substance: event-driven programming tends to be more robust and easier to follow in the long run, since threaded programming makes things a bit too implicit.</p>
<p>Here is the basic example of a thread that does some periodic work.</p>
<p>Threaded style:</p>
<blockquote>
<pre>Thread_for_D():
  loop forever:
    do work...
    wait(some time)</pre>
</blockquote>
<p>Event-driven style, where we just repost an event each time we are called:</p>
<blockquote>
<pre>Time_callback():
  do work...
  post event(some time, Time_callback)</pre>
</blockquote>
<p>Another advantage of event-driven models is that such a paradigm makes it clear that you need to be able to accept any call into the model at any time. This makes for more robust code, since it is quite easy to (intentionally or by mistake) encode an expectation on the sequence of activity in a threaded that might not be what actually happens at run-time. In particular, the state of any protocol being acted on will need to be explicitly rather than implicitly represented.</p>
<p>There is much more to be said on how to code in this style, but there are long papers out there to read on this.</p>
<h2>High-Performance Event-Driven Simulation</h2>
<p>Note that in high-performance virtual platform-style simulation, processors will usually be a special case in both threaded and event-driven styles. That is since the flow of instructions that they execute constitute very many very small actions that cannot affort a context switch between each. Here, the advantage of the event-driven model is even clearer, given some special-casing of processors. This is another long story that I will not reiterate here, but basically, most events as discussed above will be memory accesses from a processor to read and write device registers, and each such memory access can be handled in a single simulation step. No need to switch context or do anything but handle a simple function call. By not having a wait() call to deal with, this mechanism can be kept simple and cheap &#8212; which is essentially using an SC_METHOD in SystemC. But in the complete absence of SC_THREADs and their ilk, many other things can be optimized even better.</p>
<h2>The End</h2>
<p>What I wanted to provide in this almost-article-length post was an idea for the problems that I see threads cause as a modeling paradigm for hardware models, and the advantages offered by a reactive event-driven style. For some reason, this is misunderstood in the modeling community at large, probably because most operating systems and simulation systems in common use today present various forms of threads as the way to model concurrent behavior. However, threads as a prominent user-level programming model are known to be bad in many ways&#8230; and modeling is no exception to this rule.</p>
<p>Note that I realize that threads are needed at some level in order to take advantage of multicore hardware, but I think they are best hidden inside a simpler framework that presents a simpler understandable semantics to the user.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/485/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes from the IP 08 Panel</title>
		<link>http://jakob.engbloms.se/archives/440?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/440#comments</comments>
		<pubDate>Sat, 06 Dec 2008 20:31:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[IP08]]></category>
		<category><![CDATA[panel discussion]]></category>
		<category><![CDATA[Register Design Languages]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[SystemRDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=440</guid>
		<description><![CDATA[Now I am home again, and some days have passed since the IP 08 panel discussion about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-366" style="margin: 5px 10px;" title="ip08" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/ip08.gif" alt="" width="147" height="63" />Now I am home again, and some days have passed since the <a href="http://www.design-reuse.com/ip08/program/panel_virtualplatform.html">IP 08 panel discussion </a>about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a good job of moderating and keeping things interesting.</p>
<p><span id="more-440"></span></p>
<p>The panel had a clear consensus, which nobody really challenged, that virtual platforms for software development are different in kind from virtual platforms for hardware development. Indeed, a the taxonomy of &#8220;hardware virtual platforms&#8221; versus &#8220;software virtual platforms&#8221; was used frequently and proved quite appropriate.</p>
<p>A software virtual platform has to be fast and its timing can be fairly approximate. It main value, in this context, is that can be created quickly and is useful for early software development and debug. Opinions differed, however, on how to produce them and where to go with them.</p>
<ul>
<li>Markus Willems from Synopsys had the position that they are produced in some appropriate way as a separate task from hardware development. SystemC was his language of choice.</li>
<li>Peter Flake proposed a methodology where you start by developing the software virtual platform and then refine it down towards more detailed models and finally hardware. He brought up Virtutech <a href="http://www.virtutech.com/whitepapers/virtutech_dml.html">DML </a>and <a href="http://jakob.engbloms.se/archives/358">SystemRDL</a>, as examples of languages pointing in this direction.</li>
<li><strong> </strong>Loic Le Toumelin considered the software virtual platform as a something that is generated from a common design entry point, using some form of synthesis that can also generate the hardware and the hardware virtual platform.</li>
<li>I think my realistic position right now is that a software virtual platform is created as a separate item, but that we want to make this work as short and easy as possible and that in the future, the vision is similar to Peter Flake&#8217;s: start with a software virtual platform to define the hardware-software interface.</li>
</ul>
<p>It was also interesting in how different the opinion was when we got to the detailed hardware-oriented virtual platforms. The ones that tend to be clock-cycle level and attempt to be cycle-accurate (CA) in many cases.</p>
<ul>
<li>Markus said that the only good way to build a CA model was to take the RTL and convert it, or run it in an FPGA prototype. He echoed the sentiments <a href="http://jakob.engbloms.se/archives/153">I wrote about in July, that ARM is getting out of cycle-accurate models and the general difficulty of creating such a model by hand</a>.</li>
<li>Peter pointed out that you can have CA models before RTL, as a design tool. I strongly agree with this model of working, it is common in industry and definitely one way to go. However, for existing hardware, I agree that RTL-to-CA seems reasonable, even if the resulting models are painfully slow.</li>
<li>Loic wanted the CA to come from the same source as the software VP, and was very keen on their being in complete agreement on semantics of the hardware.</li>
</ul>
<p>The third major discussion was about the required accuracy and fidelity-to-hardware of a virtual platform. With a consensus that a software virtual platform has to be fast and with timing approximated, it is still clear that many people are uncomfortable about this idea of not being &#8220;exactly like the hardware&#8221;.</p>
<p>For some purposes, you do need complete fidelity to the hardware timing in a CA model. Loic definitely could not accept anything less when giving a customer a virtual platform, and some people in the audience echoed the same sentiment. Most, however, agreed that most software work can be done with simple timing, and that it does not matter all that much if there are some functionality bugs or omissions in the virtual platform. It is still far better than no platform at all!</p>
<p>What is clearly needed, at least for virtual platforms close to a hardware design process, is a way to check the software virtual platform and hardware virtual platform against the functionality and maybe timing of the final RTL. In the cases that you have the RTL, which is far from always in my world.</p>
<p>There were some other questions about software development tools support (of course you use the same debugger and compiler as with a physical platform) and other issues where the panel was mostly in agreement. I guess some of this also indicates that virtual platforms are not yet universally understood and that most people have not really had any experience with them.</p>
<p>Overall, this was a fun panel, and I hope the audience enjoyed it too and learnt something in the process.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/440/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Details of Speed</title>
		<link>http://jakob.engbloms.se/archives/355?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/355#comments</comments>
		<pubDate>Sun, 23 Nov 2008 20:40:04 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[general history]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Functional models]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[Speed]]></category>
		<category><![CDATA[Spitfire]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=355</guid>
		<description><![CDATA[I just read a fairly interesting book about the British Spitfire fighter plane of World War 2. The war bits were fairly boring, actually, but the development story was all the more interesting. I find it fascinating to read about how aviation engineers in the 1930s experiment and guess their way from the slow unwiedly [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-356" title="spitfire-1" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/spitfire-1.jpg" alt="" width="200" height="108" />I just read a fairly interesting book about the British <a href="http://en.wikipedia.org/wiki/Supermarine_Spitfire">Spitfire </a>fighter plane of World War 2. The war bits were fairly boring, actually, but the development story was all the more interesting. I find it fascinating to read about how aviation engineers in the 1930s experiment and guess their way from the slow unwiedly biplanes of World War 1 and the 1920s to the sleek very fast aircraft of 1940 and beyond. It is a story that also has something tell us about contemporary software development and optimization.</p>
<p><span id="more-355"></span></p>
<p>The Spitfire development starts with an excellent basic architecture for the aircraft, including the wing shape, the long nose, and the low pilot position. This offered a good basic design that lasted until the early jet era, and that was still competitive in 1945 (with lots of upgrades to engines and armament and equipment along the way).  What is truly fascinating is the detail work that went into turning that good design into a practical fighter aircraft.</p>
<p>Especially the performance and measured in terms of simple top speed. Here, the engineers fought a constant battle between the requirements of armament and electronics and engines and the need for as clean and streamlined an outline as possible. It was a constant attention to little details, adding things, testing the their effect, redesigning or scrapping feature. It is very similar to how we develop software today, where adding features might help some users &#8212; but often at the price of more complexity, longer critical paths, and lower absolute performance.</p>
<p>For example, early prototypes had a simple skid instead of a full tail wheel, as that skid was more streamlined and cheaper to build. But flying from a hard-surface airfield required a wheel, and in the end the Spitfire had to use a retractable rear wheel, which was not in the original design (RAF had decided to upgrade their airfields, and this requirement was introduced fairly late in the process). It cost some weight and complexity, but increased top speed by several miles per hour. Once again, we see the same kind of pattern in software development: you can gain performance at the cost of complexity somewhere else. Advanced optimizations tend to rely on quite complicated techniques to make the common case fast, where simpler implementations feature lower performance but shorter development time.</p>
<p>Simulation was also used in a very clever way. In one series of experiments, the question was asked whether simpler cheaper rivets with domed heads could be used instead of complicated flush rivets. To check this, they glued peahalves to a prototype aircraft to simulate various configurations, and in the end concluded that it was fine to have domed rivets on most of the body, but that the wings absolutely required flush rivets. Very ingenious experimentation I think. And a story that should be familiar to anyone who has done some optimization work on real-world software: some things that seem &#8220;necessary and right&#8221; actually do not have the expected benefit for the cost (flush rivets on the body), while others are crucial (flush rivets on the wings). You typically do not know until you have tried. Just guessing is usually a bad guide (I just read <a href="http://www.embedded.com/design/opensource/212100638;?pgno=2">an article at Embedded.com </a>about the misguided attempt to establish an &#8220;Embedded C++ subset&#8221; in the mid-1990s that is a perfect example of this).</p>
<p>Other examples of the battle with speed was that adding radio aerials reduced speed by a few mph, as did the addition of extra cooler air intakes for stronger engines. On the other hand, a stronger engine also increased speed, so it was a good tradeoff in the end. There was a short-lived little air intake to provide driving air flow to cockpit electronics that cost a few mph and was promptly removed. It is actually quite fascinating to see in all aircraft of this era how little bumps and protrusions can have a significant impact on speed and performance. It was a case of &#8220;death by a thousand cuts&#8221;, as each little feature by itself can seem insignificant, but the total effect is dramatic. Here we also see a modern analogy in software optimization: while undergraduate courses in software teach you to identify &#8220;the big bottleneck&#8221; and &#8220;use a better algorithm&#8221;, most real-world software has no primary bottleneck. And here, just improving little things all over the place will have a pretty major aggregate impact. <a href="http://www.twit.tv/167">The Twit Podcast #167 has a discussion on how this is the case for Windows 7, </a>where Microsoft has made big strides in performance by a lot of small improvements.</p>
<p>Thus, for software, you can also get &#8220;life by a thousand cuts&#8221;, by cutting out a thousand little pieces of overhead you can make your software way more lively.</p>
<p>In my world of computer simulators and virtualization solutions, this is a very familiar scenario. There are sound basic architectures (and less so), and for each type of architecture, the quality of implementation can make a marked difference in performance (and stability). I recently published a <a href="http://www.virtutech.com/whitepapers/simics_speed.html">white paper on some of these aspects for Simics</a>, which I think is a good example of a Spitfire-style design with a good basic architecture and lots of detail work to really make performance shine.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/355/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Few Parallel EDA Tools</title>
		<link>http://jakob.engbloms.se/archives/324?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/324#comments</comments>
		<pubDate>Wed, 29 Oct 2008 12:48:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallelized software]]></category>
		<category><![CDATA[SPICE]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=324</guid>
		<description><![CDATA[I keep looking out for interesting examples of parallel  software, and there is constant trickle of these. This past week I spotted a couple of new ones in the EDA field: SPICE simulation and chip timing analysis. Mentor Graphics Olympus-SoC Richard Goering at SCDSource has a good write-up of a recent announcement from Mentor Graphics [...]]]></description>
			<content:encoded><![CDATA[<p>I keep looking out for interesting examples of parallel  software, and there is constant trickle of these. This past week I spotted a couple of new ones in the EDA field: SPICE simulation and chip timing analysis.</p>
<p><span id="more-324"></span></p>
<h2>Mentor Graphics Olympus-SoC</h2>
<p>Richard Goering at SCDSource has <a href="http://www.scdsource.com/article.php?id=315">a good write-up of a recent announcement from Mentor Graphics</a> on a parallelized version of the Olympus-SoC tool suite for timing analysis. The best bit is the description of how they found parallelism in what used to be a serial program: they went down to very small components of the overall computation, and did a data-flow analysis to find independent atomic units to compute on in parallel. Here, fine-grained is the key to finding lots of parallelism, while using larger units does not work as well.</p>
<p>Qouting the article:</p>
<blockquote>
<div>“If you don’t work at the atomic level, it is very difficult to come up with tasks that are not dependent on each other,” Srinivas said. “We collect a lot of tasks, and we just keep all the cores busy all the time.” The goal, he said, is “minimal starvation” so that individual CPUs are not starved for tasks.</div>
<div>A key technology that makes this possible is what Mentor calls “pin levelization.” With this approach, each node is assigned a level number. If another node has a higher number, there is a possible dependency. Pins at the same level, however, are independent, and their tasks can be collected together into one heterogeneous chunk.</div>
</blockquote>
<p>Go read the rest of it for nice illustrations and more background.</p>
<h2>Gemini SPICE Simulator</h2>
<p><a href="http://www.chipdesignmag.com/payne/">Daniel Payne at Chip Design writes about another fast SPICE simulator.</a> Not as much detail here, but very nice graphs from the Gemini marketing folks. Not that they could not have been done in 2D with better information density, though. SPICE simulation would seem to be fairly parallellizable, which is not too surprising, considering the inherent parallelism of the domain. But as always, implementing a program to take advantage of such domain parallelism can be harder than expected if you did not do it from scratch. Which is what the Gemini people did, apparently.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/324/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cadence on Virtual Prototypes instead of Host Execution</title>
		<link>http://jakob.engbloms.se/archives/308?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/308#comments</comments>
		<pubDate>Sun, 19 Oct 2008 21:40:37 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=308</guid>
		<description><![CDATA[Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;Is Host-Code Execution History&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and [...]]]></description>
			<content:encoded><![CDATA[<p>Cadence technical blogger <a href="http://www.cadence.com/community/posts/jasona.aspx">Jason Andrews </a>wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;<a href="http://www.cadence.com/Community/blogs/sd/archive/2008/10/17/is-host-code-execution-history.aspx">Is Host-Code Execution History</a>&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.</p>
<p>However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.</p>
<p><span id="more-308"></span></p>
<p>The problems are these:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Most companies that are using host-code execution today use &#8220;explicit access&#8221;.  This means they require all places in the code that access the hardware to call read() and write() functions so every hardware access goes through a common set of functions and then they use #ifdef to change the hardware accesses to call the simulator if they are doing verification with host-code execution. If they are running on the target system, then pointer dereferences are used. </span></p>
<p>&#8230;</p>
<p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">This is where implicit access came in. It provided a way to automatically trap pointer dereferences that were reading and writing to hardware locations and convert the load or store instruction into a simulated read or write. For reads it would put the result into the proper host CPU register and the user had no idea that a line of C code would magically turn into a bus transaction on a Verilog BFM</span></p></blockquote>
<p>Yes, that is a right pain, and I have seen lots of solutions for it, none of which have the elegant simplicity of a processor simulation. The &#8220;implicit access&#8221; system is basically trying to trap memory accesses without overtly changing the source code of a program. I guess the best way to do this is binary instrumentation, but it is still very hard to get to work right and robustly. A simulator is simply much simpler in principle here.</p>
<p>Jason continues later on:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Given the hassle of host-code execution I would prefer to cross compile the software and run the target instruction set. Beyond the implicit or explicit access issue, this also eliminates issues with differences in data type sizes, data structure layout, byte order (endianess) and other differences between the host and target processor. </span></p></blockquote>
<p>That is absolutely true! Jason does not mention the additional fun of what happens when the target is running an OS that is happily fielding interrupts, scheduling software tasks, etc. Also, that having to maintain a separate build target and maybe code variant is very expensive, process-wise. The expense that a good virtual platform incurs can be paid for pretty quickly once such reduced friction costs are factored in.</p>
<p>So I guess I pretty  much agree with all that Jason is saying, and thanks him for mentioning <a href="http://www.virtutech.com/products">Simics</a>. Thanks for the insights into what was done in the 1990s, it always interesting to get pointers to old fundamental and interesting work.</p>
<p>About how the virtual platforms actually work inside: it is not that complicated in principle (but pretty hairy to get it quite right and fast in practice). You have to simplify the timing of the target processor, you have to convert from target processor binaries to host binary format using some kind of just-in-time compilation technique (also called dynamic binary translation or code morphing), and you have to provide some kind of direct access to target memory for the target processor simulation (like the DMI feature in <a href="http://systemc.org">SystemC TLM-2.0</a>, but usually the difficult bits are on the CPU side of that, not the memory side).  The most interesting bit is how to build the surroundign system model to not slow the CPU model down, and for this I can recommend a couple of pieces of writing:</p>
<ul>
<li>My ESC 2008 general intro to the subject of virtual prototypes (<a href="http://www.engbloms.se/presentations/engblom-ESC2008-class410-simulation-slides.pdf">slides</a>, <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">paper</a>)</li>
<li>Virtutech white paper on <a href="http://www.virtutech.com/whitepapers/modeling.html">system modeling </a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/308/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
