<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; ESL</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/eda/esl/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Register Design Languages &#8211; DSL or not?</title>
		<link>http://jakob.engbloms.se/archives/1462?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1462#comments</comments>
		<pubDate>Wed, 27 Jul 2011 20:20:25 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[Register Design Languages]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1462</guid>
		<description><![CDATA[Recently, Gary Stringham has been running a series of interviews with providers of register design tools on his website. Register design tools seems to be an active area with several small companies (and some open-source tools) fighting for the market. I have written about Gary Stringham and register designs before, and it is an area [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px 10px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>Recently, Gary Stringham has been running a series of <a href="http://www.garystringham.com/rdt.shtml">interviews with providers of register design tools on his website</a>. Register design tools seems to be an active area with several small companies (and some open-source tools) fighting for the market. I have written about Gary Stringham and register designs <a href="http://jakob.engbloms.se/archives/358">before</a>, and it is an area that keeps fascinating me. There is something about the task of register design that keeps it separate from the main hardware design languages, tools, and flows.The different approaches taken by the tools supporting the register design task also illustrates some points about programming language standards, domain-specific languages, and exchange formats that I want to address.</p>
<p><span id="more-1462"></span>First of all, what makes register design such a special task? My guess is because it is a higher-level aspect of the design: it  describes an interface, which can be implemented in many different ways  in hardware. Regular HDLs do not help you reason about register layouts in a good way. The register specification should also be used in other places than the hardware: in address definitions for device drivers, in manuals, and other documentation. All of this indicates that the information needs to be explicit, high-level, and in a format that facilitates automatic processing by tools. It basically screams for a domain-specific language (DSL) at some level of ambition (read my old post about how <a href="http://jakob.engbloms.se/archives/747">languages are grown from problems </a>for more background on just how simple a DSL can be).</p>
<p>If you look at the register design tool vendors that Gary interviewed, you can essentially see two different approaches to how to support the task of register design. In particular, the input language differs quite radically between the tools.</p>
<p>There is <em>domain-specific programming language</em> approach. A register-design DSL makes it easy to work with register designs, and pretty hard to do anything else (imagine writing a sorting algorithm in a register-design language &#8211; I don&#8217;t think it can be done, and if it can be done, the result has to be absolutely contorted). The DSL approach assumes a willingness  on the part of a user to learn a new language, but it seems that for this domain, the languages are simple enough that learning them saves you time in the end. Especially if you maintain large register maps that keep changing or is updated and extended across generations of chips (indeed, ease of maintenance is an undersold aspect of most DSLs in my experience).</p>
<p>The underlying assumption of the DSL approach is that the entry language is not too important, as long as you can export data into other formats. To me, this makes sense. From a single source with sufficient descriptive power, you can ideally generate both HDL code to implement the decoder in hardware, as well as documentation, header files, and maybe even skeletons for virtual platform models. And standard formats like IP-XACT to feed into any other tool.</p>
<p>It does not matter if there is a single tool using each input language, since the outputs are what matters. This leaves the vendors free to invent on the input side, and provide a really powerful tool.</p>
<p>The second approach is <em>transformation-based</em>. Its idea is to not use a dedicated input language, but rather powerful import functions to read whatever specifications already exist in text files, Microsoft Word documents, Microsoft Excel spreadsheets, Framemaker documentation files, IP-XACT, or whatever you can come up with.  The assumption is that the tool is really about transforming data and not about compiling a language. Register designs kind of already exist, and since there is no standard language to compile, you just import whatever there is. It makes the assumption that the user base is not willing to learn a new language to handle register design. A funky side-effect is that we might end up actually doing programming work in <a href="http://www.garystringham.com/rdt/AgnisysInterview.shtml">Word docs</a>.</p>
<p>Still, the transformation approach ends up generating the same outputs as the DSL approach. In  both cases, some of the outputs can be considered &#8220;industry standards&#8221;, like IP-XACT, while others are decidedly ad-hoc, like Excel sheets. To me, this demonstrates that the true value in standards is to enable tool interoperability, and not so much as a way to provide inputs.</p>
<p>Indeed, there is a clear difference between good input languages and good exchange formats. An input language tends to drive for expressive power and human readability, and it is OK to have a heavy compilation process associated with it. An exchange format like IP-XACT does not need to be human-readable, nor efficient to code in. It needs to be easy to parse and compile, so that as many tools as possible can work on it.</p>
<p>A typical example from the field of register design is dealing with repeated groups of registers (such as banks of registers for a DMA channel) and parametrized register designs. The input languages used by the tools that Gary Stringham covers all allow this &#8211; while the standard for register design exchange, IP-XACT, only uses a flat compiled map. It just lists all registers with sizes and offsets, not how these offsets were arrived at or if there is some logical grouping or iterated structure. Quite OK for a tool to work on, but not very useful as a repository for the actual information.</p>
<p>This distinction is worth to keep in mind.</p>
<p>I definitely side with the DSL idea, as that fits my idea of a good tool, but I can see why many people find the transformation from other formats attractive. In all cases, the goal is to have an original design specification that is as clear, succinct, and flexible as possible.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1462"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1462" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1462" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1462/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EETimes: James Aldis on Performance Modeling</title>
		<link>http://jakob.engbloms.se/archives/1387?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1387#comments</comments>
		<pubDate>Thu, 03 Mar 2011 20:13:03 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[hardware design]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[James Aldis]]></category>
		<category><![CDATA[OMAP]]></category>
		<category><![CDATA[performance optimization]]></category>
		<category><![CDATA[TI]]></category>
		<category><![CDATA[Virtio]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1387</guid>
		<description><![CDATA[James Aldis of TI has published an article in the EEtimes about how Texas Instruments uses SystemC in the modeling of their OMAP2 platform. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png"><img class="alignleft size-full wp-image-1388" style="margin: 5px 10px;" title="TI logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png" alt="" width="80" height="76" /></a>James Aldis of TI has published an article in the <a href="http://www.eetimes.com">EEtimes</a> about how <a href="http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4212778">Texas Instruments uses SystemC in the modeling of their OMAP2 platform</a>. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual platform use of hardware designers, which is significantly different from the virtual platform use of software designers.<br />
<span id="more-1387"></span>For a software person like myself, this article offers a well-written  insight into the world of hardware design and bus optimization for SoCs.</p>
<p>TI deploys two totally different platforms for hardware and software development, which makes perfect sense.  The goals are so different between a high-speed software development platform and performance-accurate hardware design platform that trying to force them together would likely just create a bad compromise that is bad for everybody.</p>
<p>Additionally, FPGAs are used to create timing-dependent low-level code, where you need both timing accuracy and decent speed.  It is worth noting that the performance model is mostly &#8220;dataless&#8221; &#8211; it models the timing of actions and their dependencies, but not their values and computations.</p>
<blockquote><p>The different models serve different purposes, require different levels of effort to use, and become available at different times during the project. The SystemC performance model is always available first and is always the simplest to create and use. The virtual platform is the next to become available. It is used for software development and has very little timing accuracy.  TI uses Virtio technology to create this model rather than SystemC.</p></blockquote>
<p>Given the number of ultimately failed attempts I have seen at making timing and function available in the same model but as orthogonal concerns, this observation in the article is very insightful:</p>
<blockquote><p>It would appear the choice of two different technologies for the virtual platform and the performance model is inefficient, wasting potential code reuse. However, the two have completely different (almost fully orthogonal) requirements, and at module level almost no code reuse is possible.</p></blockquote>
<p>Maybe this is an impossible dream in the general case.</p>
<p>One somewhat surprising statement in the article is that there is no real software available to use in the SoC design phase. Often, virtual platforms are sold as being able to use &#8220;the real software&#8221; when designing hardware. But in the case of TI, the software is mostly written by their customers, with little available for TI to use. Thus, they are forced to design their own test cases to drive the hardware design process.</p>
<blockquote><p>The requirements on the simulation technology are first and foremost ease in creating test cases and models and credibility of results. The emphasis on test-case creation is a consequence of the complexity of the devices and of the way in which an SoC platform such as OMAP-2 is used: because the whole motivation is to be able to move from marketing requirements to RTL freeze and tape-out in a very short time; and because in many cases large parts of the software will be written by the end customer and not by the SoC provider (Texas Instruments, in this article), the performance-area-power tradeoff of a proposed new SoC must be achieved without the aid of &#8220;the software.&#8221;</p></blockquote>
<p>The platform they built is all based on clock-cycle-level interfaces (CC), which is very natural when the primary use case is hardware design.</p>
<p>The primary component optimized in the TI design process is the on-chip interconnect structure, called the &#8220;NoC&#8221; in the article. Each SoC variant is built from a set of (usually already existing) devices and processor cores. The main work of the integration is designing an appropriate NoC for the SoC. The NoC design is crucial to the actual performance level the final SoC product will have.</p>
<p>By playing with the topology, the level of concurrency, and the level of pipelining in the NOC, it&#8217;s possible to create SoCs from the same basic modules with quite different capabilities.</p>
<p>The only real instruction-set simulators used are CC-level models of DSPs, used for software optimization taking but contention into account. No models of the ARM control cores are used. Mostly, processors are represented by stochastic or trace-driven traffic generators that put transactions on buses but do not actually run any real code.</p>
<p>The stochastic processor models are very powerful and provide traffic that is very similar to a real processor.  A very elegant property of such models is that it is very easy to change the parameters of the model to model quite different software/processor scenarios. Compared to writing real test programs for a full ISS, this is much faster and allows for the exploration of more alternatives.</p>
<p>The stochastic models are used along side function-graph breakdowns of software, essentially models that say that an application does A, then B, then C, and that maybe D can happen in parallel. This model of an application is connected to the hardware simulation and can control when things happen and what goes on in parallel. It amounts to a simple model of what an RTOS would do, to some extent.</p>
<p>Configurability is a key theme throughout the OMAP architecture exploration platform. SystemC being what it is, it is limited to configuration at start-up time, but that is perfectly sensible for an architecture exploration use case where you want to setup and platform and test its performance. Dynamic reconfiguration during a run is not that important.  TI has spent a great deal of effort in making the system easy to configure using parameter files.</p>
<p>The article goes into many more fascinating details on the models used.  I can only say one thing: read it, if you have any interest in these kinds of issues.</p>
<p>Good work, James!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1387"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1387" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1387" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1387/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interrupts and Temporal Decoupling</title>
		<link>http://jakob.engbloms.se/archives/1384?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1384#comments</comments>
		<pubDate>Sun, 27 Feb 2011 21:09:17 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Tensilica]]></category>
		<category><![CDATA[virtual]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1384</guid>
		<description><![CDATA[I am just finishing off reading the chapters of the Processor and System-on-Chip Simulation book (where I was part of contributing a chapter), and just read through the chapter about the Tensilica instruction-set simulator (ISS) solutions written by Grant Martin, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>I am just finishing off reading the chapters of the <a href="http://www.springer.com/engineering/circuits+%26+systems/book/978-1-4419-6174-7" target="_self"><em>Processor and System-on-Chip Simulation </em></a>book (where <a href="http://blogs.windriver.com/engblom/2011/01/processor-and-soc-simulation-book.html">I was part of contributing a chapter</a>), and just read through the chapter about the <a href="http://www.tensilica.com">Tensilica </a>instruction-set simulator (ISS) solutions written by <a href="http://www.chipdesignmag.com/martins/">Grant Martin</a>, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS solutions, since that they have an inherently variable target in the configurable and extensible Tensilica cores. However, the more interesting part of the chapter was the discussion on system modeling beyond the core. In particular, how they deal with interrupts to the core in the context of a <a href="http://jakob.engbloms.se/?s=temporal+decoupling">temporally decoupled </a>simulation.</p>
<p><span id="more-1384"></span>This is a small detail, but one where I have always had a feeling that some fundamental assumption was missing in my discussions with various people from the hardware design community. It always seemed that hardware designers assumed a different basic design &#8211; and Grant Martin explained it very well just what that was. They only check for interrupts at the beginning of a time slice. Which makes interrupts less precise  versus the code, but also makes the core interpreter fairly simple since all it has to do is to churn through instructions.</p>
<p>There is another solution, which is employed in Simics, where the processor can take an interrupt at any point in a time quantum. To do this, the processor needs to be aware of what is going to happen. The essentials of the solution is to have devices call the processor and tell it that they intend to interrupt it at some point T in time. The processor simulator then makes sure to stop and give the device model a chance to act at that exact point in time. It is obvious that this solution is easily generalized to cover all time callbacks needed to drive device work. A significant part of the responsibility for running the event-driven simulation is moved into the processor core.</p>
<p>Making the event queue visible to the processor also gives the processor a chance to hypersimulate, or skip idle  time. Since it knows the next point in time that something will happen  (either the end of a time quantum or an event posted by a device), it  can very easily, safely, and <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">repeatably </a>jump forward in time without any  impact on simulation semantics.</p>
<p>When dealing with multiple processors, this means that each processor will have precise interrupts from the devices that are close to it. Timers and IO interrupts tend to work closely with a certain processor for a prolonged period of time. Interrupts between processors suffer a time-quantum delay sometimes, but that is no worse than the solution of checking all interrupts at time-quantum boundaries.</p>
<p>Qemu uses a solution which is a mix of the two. <a href="http://www.usenix.org/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf">According to the 2005 Usenix paper</a>, devices do call into the processor to announce an interrupt, but this is handled by &#8220;soon&#8221; returning to the processor main loop. Processors are not responsible for keeping track of interrupts, making it very imprecise and not very repeatable when interrupts will happen.</p>
<p>Thus, we can see that there are a few different ways to implement interrupts in virtual platforms. Each approach comes from a different tradition and features different trade-offs.</p>
<p>I was a bit surprised by the comment in the Tensilica chapter that only  correctly synchronized programs will work on a temporally decoupled  simulation. In my experience, temporal decoupling is transparent to software functionality &#8211; all software runs. The perceived timing of operations can be different, and some tightly-coupled code might behave in suboptimal ways, but it certainly runs and works. And lets you <a href="http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html">observe  parallel code errors</a>.</p>
<p>Temporal decoupling is necessary in any fast platform, and its effect on semantics are really minor. With the simple tweak of having a processor know when interrupts might happen, it will also not affect the device-processor interface very much, maintaining very tight synchronization between processors and their controlled hardware.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1384"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1384" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1384" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1384/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Describe is not the same as Design</title>
		<link>http://jakob.engbloms.se/archives/1083?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1083#comments</comments>
		<pubDate>Mon, 15 Feb 2010 20:56:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1083</guid>
		<description><![CDATA[The discussion on my previous blog post about &#8220;the ideal ESL language&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there. On one hand, we have the [...]]]></description>
			<content:encoded><![CDATA[<p>The discussion on my previous blog post about &#8220;<a href="http://jakob.engbloms.se/archives/1008">the ideal ESL language</a>&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there.</p>
<p>On one hand, we have the task of supporting the design of new hardware bits, for the purpose of creating it. On the other hand, we have the task of describing a particular design for the purpose of simulating it. These two are not necessarily the same.</p>
<p><span id="more-1083"></span>To use an <a href="http://jakob.engbloms.se/archives/1035">analogy with building a house</a>, a design language helps the architect create the house (piece of hardware). Since the architect relies on craftsmen and experts (compilers) to do detailed design (how to put in windows, where to put light switches, etc.), the high-level description does not contain all the details of the house. However, if you are trying to simulate the house (piece of hardware) so that its inhabitants (software) don&#8217;t see the difference to the real thing, the details are sometimes what matters most. For example, the precise way to operate the stove in the house is very important for familiarity, but is a detail most likely left out of the architect&#8217;s initial drawings.</p>
<p>A design language can leave many things unspecified to be filled in by a compiler, but these things can be absolutely core to a description language. In particular, programming register maps tend to be created as a not-too-important side activity in hardware design. They do not really need to be visible in higher-level ESL languages, as they can obviously be filled in later by a tool or a human. But for a description language, they are absolutely core.</p>
<p>A description language can also leave out many parts of the hardware. If the software being used or written does not use certain modes or functions of a piece of hardware, those pieces can be ignored and implemented as dummies. That means that support for dummies is very important in description languages. But dummies make little sense in a design language, as you are unlikely to design a chip with lots of area spent on dummy functions that do nothing.</p>
<p>A description language can also ignore crucial aspects like power constraints and synthesis constraints. These are guidelines for a compilation step that has no bearing on the description of the hardware &#8212; the description language should describe what ended up happening, not the if, please, what, and buts that guided how we got there.</p>
<p>For virtual platform creation, you seem to need a bit of both. I maintain that most of a VP is based on old hardware that exists, which calls for languages with strong description abilities. That&#8217;s the space that <a href="http://jakob.engbloms.se/archives/99">Simics DML </a>was designed for. For the small part of the hardware that is novel would be nice to have some way to convert from a design language to a virtual platform. Here, I don&#8217;t really see any usable current tools or languages &#8212; SystemC is really more a design language, but if you want a virtual platform model, you have to use it as a description language. There is no automagic getting to a fast abstract model from a design-oriented description. That&#8217;s why we need new, higher level systems, that can push out decent descriptions from a design.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1083"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1083" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1083" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1083/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Neat Register Design to Avoid Races</title>
		<link>http://jakob.engbloms.se/archives/1070?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1070#comments</comments>
		<pubDate>Thu, 28 Jan 2010 18:59:53 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[64-bit computing]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[programming register]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1070</guid>
		<description><![CDATA[In his most recent Embedded Bridge Newsletter, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat! I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-589" style="margin: 5px 10px;" title="racecondition" src="http://jakob.engbloms.se/wp-content/uploads/2008/01/racecondition.png" alt="racecondition" width="99" height="78" />In his most recent <a href="http://garystringham.com/newsletter.shtml?nid=039">Embedded Bridge Newsletter</a>, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat!</p>
<p>I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other two variants. The idea of having a &#8220;write mask&#8221; in one half of a 32-bit word is really clever.</p>
<p>However, this got me thinking about what the fundamental issue here really is.</p>
<p><span id="more-1070"></span></p>
<p>As I see it, it is the fact that the processor cannot address small enough units atomically. The <a href="http://garystringham.com/newsletter.shtml?nid=037">read-modify-write that was used to start the discussion in the Embedded Bridge #37</a> was needed in order to get the current state of a configuration register, change some setting that only occupied a few bits in it, and write back the result to the register. The way most configuration registers that I have seen in practice works.</p>
<p>But if each setting could be given its own register, the problem would go away. Each operation would target a unique address, achieving the same effect as the bit-wise masks or write-1 solutions proposed. The core problem is that hardware tends to share settings into registers, as it has been considered too expensive to put information that might cover a range as small as [0,1] into a 32-bit register. Probably, since there is a lack of addresses for registers, you cannot have 1000 settings cause each simple device to use up 1000 words of physical addresses.</p>
<p>But is that really an issue, if we look forward?</p>
<p>It seems to me that, as 64-bit instruction sets and addressing systems penetrate down into more and more embedded systems, a simple solution would be to throw address space at the problem. I don&#8217;t think it is uneconomical to allocate huge chunks of memory space to each device, giving each setting its own register, when you have 64 bit virtual addresses to work with. There is no way you can fill up a physical memory system (guess that will some day come back to haunt me)&#8230; even the highest-end machines today only use something like 40 bits for actually addressing physical memories.</p>
<p>The software would be simpler and more robust, with virtually no cost.</p>
<p>Another solution that I have also seen starting to appear is to dispense with register settings altogether, and rather define a command API that the processor &#8220;calls&#8221; by putting in command packets into some memory area. This does require quite a bit of silicon for a decoder, but it provides for a much higher level of interaction with devices. As hardware devices get defined in successively higher-level languages (C, C++, UML, MatLab, &#8230;), and <a href="http://jakob.engbloms.se/archives/871">their programming interfaces and associated drivers get autogenerated</a>, this solution makes eminent sense.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1070"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1070" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1070" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1070/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CoWare SystemC Checkpointing</title>
		<link>http://jakob.engbloms.se/archives/1039?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1039#comments</comments>
		<pubDate>Tue, 29 Dec 2009 12:02:24 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[CoWare]]></category>
		<category><![CDATA[Mambo]]></category>
		<category><![CDATA[Reiner Leupers]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SimOS]]></category>
		<category><![CDATA[Stefan Kraemer]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1039</guid>
		<description><![CDATA[Continuing on my series of posts about checkpointing in virtual platforms (see previous posts Simics, Cadence, our FDL paper), I have finally found a decent description of how CoWare does things for SystemC. It is pretty much the same approach as that taken by Cadence, in that it uses full stores a complete process state [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-735" style="margin: 5px 10px;" title="gears" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png" alt="gears" width="56" height="57" />Continuing on my series of posts about checkpointing in virtual platforms (see previous posts <a href="http://jakob.engbloms.se/archives/714">Simics</a>, <a href="http://jakob.engbloms.se/archives/817">Cadence</a>, our <a href="http://jakob.engbloms.se/archives/880">FDL paper</a>), I have finally found a decent description of how <a href="http://www.coware.com">CoWare </a>does things for <a href="http://www.systemc.org">SystemC</a>. It is pretty much the same approach as that taken by Cadence, in that it uses full stores a complete process state to disk, and uses special callbacks to handle the connection to open files and similar local resources on a system. The approach is described in a paper called  &#8220;A Checkpoint/Restore Framework for SystemC-Based Virtual Platforms&#8221;, by Stefan Kraemer and Reiner Leupers of RWTH Aachen, and Dietmar Petras, and Thomas Philipp of CoWare, published at the <a href="http://soc.cs.tut.fi/2009/">International Symposium on System-on-Chip, in Tampere, Finland, in October of 2009.</a></p>
<p><span id="more-1039"></span></p>
<p>The approach taken for their checkpointing system is to save the entire state of  the running simulation program (i.e., an entire host operating system process), and later recreate the process in the same state. This gets around the need program all simulation models to explicitly support checkpointing, but it also limits the applicability of checkpointing severely. Of the <a href="http://jakob.engbloms.se/archives/714">checkpointing operations described in my previous post</a>, it only supports the bringing up of a checkpoint into the same model, on the same machine (or a machine that runs a completely identical software stack down to the exact versions of all libraries, drivers, etc., which is not very likely to happen). This is honestly admitted in the paper, I appreciate that.</p>
<p>Process-based checkpointing in this way is also used by Cadence, and just like Cadence&#8217;s solution, the problem that appears when implementing it in practice is how to handle all the OS resources opened by a process. The process itself is not enough. The resources are typically files open for input and output, connections to debuggers, and other things that reach out of the virtual world of the simulation into the real world of the host machine. The solution is just like Cadence&#8217;s solution: provide callbacks for the SystemC side of such connections that can save the state of the connection in some way, and restart the connection when a checkpoint is opened. Doing this right does require some care in what to do in which order, and the paper nicely explains this.</p>
<p>For example, you need to close down all connections before taking a checkpoint, and then restore them after taking it. This is a fairly destructive operation, compared to the Simics-style checkpointing where all you do is interrogate the state of all objects and save that. I had not thought of that before, but it makes perfect sense.</p>
<p>To me, this also points to an important architectural issue in SystemC simulations in general: what are these simulation modules doing opening files on their own anyway? In my opinion, a simulation model should never do such low-level thing, it should operate using only the defined simulation system API and simulation-level connections to other simulation modules. If you need to load test data, do that by putting it into some storage system managed by the simulator itself. In all my professional life, I have considered file I/O to be a fundamental service provided by the application framework I use, nothing that my actual payload code should have to deal with. For example, in Simics, we tend to solve the loading and checkpointing of test data vectors using a memory &#8220;image&#8221;, which supports checkpointing in itself. Then a script just loads a file into the memory image, and a device reads the memory and moves transactions into the simulated system. This means that file I/O is totally absent from the model, and also that the test data can be managed and inspected by existing infrastructure.</p>
<p>The cost of taking a checkpoint is quantified in the paper, and it is not as bad as one could fear. The size of  a checkpoint easily reaches 100s of MB for even small target systems, but saving and loading that today does not take all that long. Still, going to a system with only 16 target processors (8 ARM, 8 DSP, and 8 times 64 MB of target memory = 512 MB) generates a checkpoint taking 1.7 GB to store and 30 seconds to save.</p>
<p>Once again, it is interesting to compare this to the Simics-style approach, where only differences are saved for target memories, and only target memory that is in use needs to be saved at all. This makes for usually far more compact checkpoints, which save and open in a few seconds in most cases. A checkpoint taking 30 seconds to open in Simics is rare indeed, and basically requires a target containing many gigabytes of target memory that is all in use (not host memory).</p>
<p>The paper does raise a novel point for why checkpointing is good: it saves you the time to setup a debugger. It is certainly true that setting up a debug environment for a session does take time, but it is not made clear just how it is recreated. My impression is that really what is going on here is that the debugger remains alive and loaded, and the simulation goes back and forth in time using checkpoints. Which is perfectly valid and nice.</p>
<p>In Simics, we solve this problem in two ways: when it comes to setting up the internal debugger when opening a checkpoint, the solution is to use a script to configure it. On purpose and by design, Simics checkpoints do not store the simulation session state, only the target system state, as that is the only way to be portable over time and across widely different machines. it would be very strange to open a checkpoint a few years after it was created, by some random user, start a simulation, and have it stop just because there was some breakpoint still in place for some long-forgotten reason. In any case, the simulator-side of the debugger has to be adopted in all systems to accept checkpointing.</p>
<p>Another strange point in the paper that I would have liked to ask the authors about is the idea that you can always change the target software in the middle of a session. But what is the value of checkpointing then? Since the idea is saving the cost of booting a machine and setting up a debug session, it is hard to see what is gained by booting a machine, keeping the hardware state, but replacing the software state. The expensive operation is presumably setting up the software state? At least it is in my experience.</p>
<p>I commend the paper for actually running something more than the archetypal &#8220;ARM+DSP&#8221;, by running eight copies of an ARM+DSP subsystem. That is at least starting to look like something interesting to simulate.</p>
<h2>Reviewer Notes for the Paper</h2>
<p>Finally, I have some small critiques on the academic paper itself, of the kind that I tend to provide to paper authors when active as a reviewer for conferences.</p>
<p>The SoC conference paper itself is well-written, but I have point out that the authors for some reason ignore the history of checkpointing in full-system simulation. The seminal work here was the checkpointing system used for changing the level of abstraction from fast functional simulation to cycle-accurate simulation in the SimOS system developed at Stanford <a href="ftp://db.stanford.edu/pub/cstr/reports/csl/tr/94/631/CSL-TR-94-631.pdf">already in the early 1990s</a>. This has later been continued in both IBM Mambo and Virtutech Simics, and remains the most powerful way of doing checkpointing until this day. I don&#8217;t think the authors were completely ignorant of this work, even though finding good references can be a bit difficult. Even so, here are some to add to future versions of that paper:</p>
<ul>
<li><a href="http://portal.acm.org/citation.cfm?id=891451">SimOS: A Fast Operating System Simulation Environment</a>, Stanford CS Technical Report CSL-TR-94-631, 1994. The earliest mention of using checkpointing in a full-system simulator (for switching from fast to detailed simulation).</li>
<li>&#8220;<a href="http://portal.acm.org/citation.cfm?id=1014602">Design and validation of a performance and power simulator for PowerPC systems</a>&#8220;, from the IBM Journal of Research and Development in 2003. Mentions that IBM Mambo does checkpointing like SimOS.</li>
<li><a href="http://parsa.epfl.ch/simflex/publ/per2004.pdf">SIMFLEX: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture</a>, ACM SIGMETRICS Performance Evaluation Review, 2004. Also see the <a href="http://parsa.epfl.ch/simflex/">Simflex </a>homepage. Shows an ambitious use of checkpointing in computer architecture simulations.</li>
</ul>
<p>It is also a bit disingenuous to dismiss the issue of just how the process checkpointing is performed by a reference to an early requirements paper from the <a href="https://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml">BLCR project</a>.  All that says is that there are lots of possible variants of implementation, nothing on what was done in this particular instance. It would have been nice to have had some more details: does the approach use a kernel module or not?</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1039"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1039" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1039" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1039/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The System, Not the Parts</title>
		<link>http://jakob.engbloms.se/archives/1035?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1035#comments</comments>
		<pubDate>Sat, 19 Dec 2009 19:38:22 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Peter Day]]></category>
		<category><![CDATA[podcast commentary]]></category>
		<category><![CDATA[Russel Ackoff]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1035</guid>
		<description><![CDATA[I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;Peter Day&#8217;s World of Business&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was originally published in 2007. The main theme of the interview is the need to shift business thinking from small details [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg"><img class="alignleft size-full wp-image-1036" style="margin: 10px;" title="Peter days world of business" src="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg" alt="Peter days world of business" width="100" height="100" /></a>I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;<a href="http://www.bbc.co.uk/podcasts/series/worldbiz/">Peter Day&#8217;s World of Business</a>&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was <a href="http://news.bbc.co.uk/2/hi/business/6338527.stm ">originally published in 2007</a>.</p>
<p><span id="more-1035"></span></p>
<p>The main theme of the interview is the need to shift business thinking from small details to entire systems. From operations research where you spend lots of time understanding some process or department in great detail, to a system-level thinking where you focus on what an entire enterprise is doing.</p>
<p>For me, this struck a chord in my system-level heart&#8230; in my world of computer systems and virtual platforms, system-level is what it is so hard to get engineers to. Far too much time is spent (in my opinion) understanding, modeling, and tweaking subsystems. Far too little effort is spent on understanding the whole, how things fit together in practice, taking software, hardware, and software system evolution over time into account. The analogy is not perfect, but there are more things that are alike than are not.</p>
<p>The most interesting analysis that Russell Ackoff fires off from his perspective is that of comparing companies and architecture. An architect knows the whole of a building, but does not entirely go into details on just how it is to be built. He/she trusts the carpenters, bricklayers, and other workers to know how best to solve their local problems. Basically, applying hierarchical abstraction to the task of constructing an actual building.</p>
<p>This got me thinking some of why this is the case. I think it could be because building things (castles, cathedrals, houses, walls, pyramids, canals, &#8230;) must have been among the most complex tasks undertaken for a very long time in human history. Thanks to this long history, we have perfected the abstraction and division of labor in construction. Buildings are built in a certain way, by a certain set of crafts, since that method has been proven to work well for a very long time. So just like in the case of the design patterns craze in the late 1990&#8242;s, architecture might have something to teach us about how to build hardware/software systems too.</p>
<p>Note that for some reason, I cannot find a link to the podcast on the BBC homepage. But if you subscribe in iTunes or similar, I think you will find it. Something is not as user-friendly as it could be.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1035"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1035" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1035" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1035/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing in SystemC @ FDL</title>
		<link>http://jakob.engbloms.se/archives/880?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/880#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:48:26 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Marius Monton]]></category>
		<category><![CDATA[Mark Burton]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=880</guid>
		<description><![CDATA[Along with Marius Monton and Mark Burton of GreenSocs, I will be presenting a paper on checkpointing and SystemC at the FDL, Forum on Specification and Design Languages, in late September 2009. The paper will explain how we did Simics-style checkpointing in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-881" style="margin: 5px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" />Along with Marius Monton and Mark Burton of <a href="http://www.greensocs.com">GreenSocs</a>, I will be presenting a paper on <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://www.systemc.org">SystemC </a>at the FDL, <a href="http://www.ecsi-association.org/ecsi/fdl/fdl09/mainpage.asp?fn=advance">Forum on Specification and Design Languages</a>, in late September 2009.</p>
<p>The paper will explain how we did <a href="http://www.virtutech.com/whitepapers/simics_checkpointing.html">Simics-style checkpointing </a>in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics attribute system.</p>
<p><span id="more-880"></span>It is an approach that does not have the limitations of the &#8220;save the entire simulation process&#8221; method employed by Cadence (and I think also CoWare) in their <a href="http://jakob.engbloms.se/archives/817">SystemC checkpointing solution</a>. It does require you to mark all relevant state in your models, but the benefit from doing so is that regardless of how you change the code of a model, you can still use the same old checkpoints. It is also portable across hosts. We did have to do some patching to the OSCI SystemC kernel to draw out and reset all relevant state from the kernel. The OSCI kernel does not provide sufficient interfaces to checkpoint its state in its vanilla form.</p>
<p>The conference takes place on September 22 to 24, in Sophia Antipolis in France. Now all I have to do is figure out how to get there in the most convenient way. I expect this to be as much fun as the other EDA conferences I have been to recently (I seem to only go to such events nowadays, nothing left on the old embedded circuit for me it seems).</p>
<p>By the way, the FDL logo is really pretty. I think all long-running events should spend the time to create a recognizable logo. My old real-time conferences used to just have plain text and the <a href="http://www.ieee.org">IEEE </a>and <a href="http://www.acm.org">ACM </a>logos.</p>
<p><img class="aligncenter size-full wp-image-882" title="fdl_logo_new" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdl_logo_new.jpg" alt="fdl_logo_new" width="435" height="159" /></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/880"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/880" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/880" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/880/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Conquering Software with Software High-Level Synthesis</title>
		<link>http://jakob.engbloms.se/archives/871?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/871#comments</comments>
		<pubDate>Fri, 31 Jul 2009 22:12:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[Kees Vissers]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=871</guid>
		<description><![CDATA[This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to software eventually though, and had another good dialogue with the audience. Thanks to the tough DAC participants who held out to the end of the last panel of the last day!</p>
<p>As is often the case, after the panel has ended, I realized several good and important points that I never got around to making&#8230; and of those one struck me as worthy of a blog post in its own right.It is the issue of how high-level synthesis can help software design.</p>
<p><span id="more-871"></span>At the end of the panel, the last comment from Kees Vissers of Xilinx pointed out that high-level synthesis is a very powerful way to build hardware. I think his point was that hardware and software are not that different&#8230; but the remark also got me thinking. If it is the case that high-level synthesis currently makes hardware creation easier, can&#8217;t it also be applied to software creation? In particular, if you have a HLS description of a piece of hardware, can&#8217;t you also generate its driver software?</p>
<p>I think that makes eminent sense, since one of the hard parts of doing device drivers is just getting the use of the programming registers of a device right. The programming register interface is a really strange thing if you think about it. It is not native to either software or hardware, really.</p>
<p>In hardware, you communicate between devices using fifos or signals or packet-based mechanisms which do not in general look like programming register writes. You move data by sending a stream of data directly, not word-by-word addressing registers. Similarly, on the software side, software units call each other using functions (or higher-level OS abstractions like signals or network packets). They do not put data into addressed registers&#8230;</p>
<p>Today, high-level synthesis as practiced in industry involves describing the function of a device in pretty abstract terms, so that the compiler can make smart decisions on the implementation details. It also makes it easier to try different alternatives in the implementation, trading size, speed, and power consumption against each other. Different types of concurrency and pipelining can be explored.</p>
<p>However, once we get to the hardware-software interface, we get rudely dropped into a world of manual detailing of an interface with no tool support to explore it or validate it. Why should that really be the case? I think that the hardware-software interface requires just as much care as the internals of the device. After all, it is the external face of the device, and if that is too hard to use, users will not get the full benefit of the device. Here are some previous posts on the nature of interfaces and why they matter: <a href="http://www.garystringham.com/newsletter.shtml">1</a>, <a href="http://jakob.engbloms.se/archives/799">2</a>, <a href="http://jakob.engbloms.se/archives/770">3</a>, <a href="http://jakob.engbloms.se/archives/709">4</a>.</p>
<p>So I would propose a different take on this, where you apply synthesis at a higher-level, and generate the hardware internals, the programming register interface, and the software driver from the same source. The interface you design for a device would be a set of function calls expressed in software terms, and thus relatively easy to use from software. Let&#8217;s call this SHLS, Software High-Level Synthesis. Or maybe Software-Level Synthesis, SLS.</p>
<p>I am well aware that most operating systems do not provide an interface to device drivers consisting of function calls&#8230; but rather rely on pretty non-semantic methods like read/write/ioctl. However, that is easy to overcome by generating a user-level interface library in addition to the raw device driver. Obviously, a tool like this would need some adaptation to apply to each operating system targeted. But that does not feel like something that cannot fairly easy be handled by a template system. Compared to the complexity of synthesizing hardware this feels pretty basic.</p>
<p>If you hide the software-hardware interface inside a black box like this, it can also be implemented in quite different and interesting ways. For example, you could imagine using a small memory-mapped buffer where you enter commands and data and then ask the hardware to &#8220;parse&#8221; it, rather than using discrete registers with immediate effects. Or you could optimize the coding of the relevant hardware operations into a bit-compacted representation no human would like, but which are no problem for the machine.</p>
<p>Another option is obviously to just stop at the hardware-software interface, but let the tool help the hardware designer build the programming register interface and explore various options for it. Not having to invent a register layout from scratch should make that job much easier.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/871"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/871" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/871" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/871/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The TLM DAC</title>
		<link>http://jakob.engbloms.se/archives/865?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/865#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:47:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[tlm]]></category>
		<category><![CDATA[TLM-2.0]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=865</guid>
		<description><![CDATA[The past few days here at DAC, a big theme has been transaction level modeling (TLM). TLM is often considered to be SystemC TLM-2.0. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />The past few days here at <a href="http://www.dac.com/46th/index.aspx">DAC</a>, a big theme has been transaction level modeling (TLM).</p>
<p>TLM is often considered to be <a href="http://www.systemc.org/apps/group_public/workgroup.php?wg_abbrev=tlmwg">SystemC TLM-2.0</a>. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, it is clear that it is not that simple&#8230;</p>
<p><span id="more-865"></span>The issue is that even if all agree on using the TLM-2.0 standard and its default standard generic memory-mapped bus protocol and payload for the memory-map part of their device models, there are other interfaces which are not standard at this point in time.</p>
<p>For example, there is no standard way to model interrupts between devices. So any time you have interrupts in a system (which tends to be always), you need to write custom wrappers between modules to convert different ways of modeling interrupts. Even worse, the standard way to do it is to use SystemC signals, which are definitely not TLM abstractions. They take a detour through the SystemC kernel, which is quite costly.</p>
<p>The defining property (from a simulation execution perspective) of TLM is that your simulation modules talk directly to each other through <em>direct function calls</em>, rather than passing over whatever simulation kernel you happen to be using. Essentially, TLM tends to convert simulators into being much more like &#8220;regular programs&#8221;, with fewer references to the simulation kernel and its event and time handling. In my world, unless you are doing direct function calls, you are not doing TLM.</p>
<p>Note that this state of things in the SystemC world is likely to change for the better over time. <a href="http://www.greensocs.com">GreenSocs</a> announced at DAC that they are working with <a href="http://www.virtutech.com">Virtutech </a>and an unnamed other partner to create a set of TLM interfaces for other interconnects, such as signals (interrupts under another name), serial, and Ethernet.</p>
<p>But apart from all the technicalities of SystemC TLM-2.0 and how it works, the big question is just what to use TLM for, and how. Here, everyone seems to try to turn TLM into their own use cases. The most obvious application is doing fast virtual platforms, but you also have TLM use as the basis for hardware synthesis, validation, golden reference models, architectural exploration, and pretty much all other EDA design tasks.</p>
<p>Even so, the most important message for me is that the EDA industry is actually starting to get interesting in TLM. It is no longer a quaint odd thing done by some peripheral start-up companies, but rather a mainstream technology that everyone has to pay attention to.</p>
<p>Finally, I want to point out that TLM is not just SystemC. TLM is a general idea that has been in <a href="http://jakob.engbloms.se/archives/130">active use since the late 1960s</a>. It is the obvious way to model a computer, if all you are concerned about is how it looks to the software. Another current example is the <a href="http://www.virtutech.com/whitepapers/simics-tlm.html">Simics style of TLM</a> (<a href="http://www.virtutech.com/whitepapers/modeling.html">and here</a>), which is similar to but different in details from the SystemC implementation.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/865"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/865" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/865" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/865/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cadence Industry Insight: &#8220;Virtual Platforms Unite HW and SW&#8221;</title>
		<link>http://jakob.engbloms.se/archives/784?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/784#comments</comments>
		<pubDate>Fri, 22 May 2009 06:41:09 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[ISX]]></category>
		<category><![CDATA[Richard Goering]]></category>
		<category><![CDATA[scdsource]]></category>
		<category><![CDATA[software testing]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=784</guid>
		<description><![CDATA[Another Cadence guest blog entry, about the overall impact of virtual platforms on the interaction between hardware and software designers. Essentially, virtual platforms are a great tool to make software and hardware people talk to each other more, since it provides a common basis for understanding. My entry is called &#8220;Virtual Platforms unites Hardware, Software [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin-left: 5px; margin-right: 5px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />Another Cadence guest blog entry, about the overall impact of virtual platforms on the interaction between hardware and software designers. Essentially, virtual platforms are a great tool to make software and hardware people talk to each other more, since it provides a common basis for understanding.</p>
<p><span id="more-784"></span>My entry is called &#8220;<a href="http://www.cadence.com/Community/blogs/ii/archive/2009/05/21/guest-blog-virtual-platforms-unite-hardware-software-designers.aspx">Virtual Platforms unites Hardware, Software Engineers</a>&#8220;, and is presented by <a href="http://www.cadence.com/community/posts/rgoering.aspx">Richard Goering </a>(who used to be with <a href="http://www.scdsource.com">SCDSource</a>), in his &#8220;<a href="http://www.cadence.com/Community/blogs/ii/default.aspx">Industry Insights</a>&#8221; section of the Cadence community of blogs. Richard Goering has a personal post pointing in the same direction, about <a href="http://www.cadence.com/Community/blogs/ii/archive/2009/05/13/meeting-the-embedded-software-challenge.aspx?postID=17593">EDA tackling embedded software</a>. Worth reading.</p>
<p>Note that I do <em>not </em>say that hardware and software engineers should use the same <em>programming languages</em> as a result of using virtual platforms. Programming languages efficient for hardware design are quite different from those efficient for virtual platform creation, which are in turn different from good software engineering languages.  In some cases, some of them coincide, but in general, I believe in using the best tool for each job, and a programming language is just a tool. And the more designed it is for its task, the better. Some older posts of mine on this topic:</p>
<ul>
<li><a href="http://jakob.engbloms.se/archives/747">DSL: Purpose-built languages</a></li>
<li><a href="http://jakob.engbloms.se/archives/681">DSL: The tyranny of syntax</a></li>
<li><a href="http://jakob.engbloms.se/archives/283">Multicore programming and DSLs</a></li>
<li><a href="http://jakob.engbloms.se/archives/165">What is the obsession with C in EDA?</a></li>
<li><a href="http://jakob.engbloms.se/archives/157">Kunle Olukotun on DSLs</a></li>
<li><a href="http://jakob.engbloms.se/archives/709">Modeling hardware at a high level for software development</a></li>
</ul>
<p>And there is the <a href="http://jakob.engbloms.se/archives/306">ChipDesign article </a>from last year about using virtual platforms in the hardware design process all the way out to customers.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/784"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/784" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/784" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/784/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Guest Blog at Cadence: &#8220;Way Worse than the Real Thing&#8221;</title>
		<link>http://jakob.engbloms.se/archives/781?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/781#comments</comments>
		<pubDate>Wed, 20 May 2009 10:45:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[ISX]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software testing]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=781</guid>
		<description><![CDATA[Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform. As part of explaining why this [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-782" style="margin-left: 5px; margin-right: 5px;" title="avataraspx" src="http://jakob.engbloms.se/wp-content/uploads/2009/05/avataraspx.jpg" alt="avataraspx" width="72" height="72" />Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform.</p>
<p><span id="more-781"></span></p>
<p>As part of explaining why this is cool and what it means, I have a <a href="http://www.cadence.com/Community/blogs/sd/archive/2009/05/18/way-worse-than-the-real-thing.aspx">guest blog posting over at Cadence&#8217;s blog site</a>, called &#8220;Way Worse than the Real Thing&#8221;. The blog is posted under the general &#8220;TeamESL&#8221; &#8220;personality&#8221; on the blog site, which is used for people external to Cadence in the ESL space.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/781"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/781" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/781" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/781/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I Want One&#8230; Trillion Instructions&#8230;</title>
		<link>http://jakob.engbloms.se/archives/709?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/709#comments</comments>
		<pubDate>Sat, 28 Mar 2009 21:10:31 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Dr. Evil]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mpc8641d]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=709</guid>
		<description><![CDATA[There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite [...]]]></description>
			<content:encoded><![CDATA[<p>There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I<a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html"> recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification</a>.</p>
<p><span id="more-709"></span></p>
<p>It all comes down to a simple classic tradeoff that I usually illustrate like this (using more neutral ground than computer systems; and with credit to Peter Magnusson who had this slide already in place when I joined Virtutech back in 2002):</p>
<p><img class="aligncenter size-full wp-image-711" title="simulation-rule" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/simulation-rule.png" alt="simulation-rule" width="457" height="341" /></p>
<p>What this is telling you is simple:</p>
<ul>
<li>You simulate something very large using large units, i.e., low level of detail; or</li>
<li>You simulate something quite small using small units, i.e., high level of detail.</li>
</ul>
<p>I wanted to test the idea that by using less detail, you can run larger test cases and therefore obtain better coverage of overall landscape than diving in and counting cycles in some small part of it. In the end, this made me cross the trillion instruction line &#8212; since each experiment took a few hundred billion target instructions to complete, repeating and tweaking during the development work definitely add up to more than a trillion instructions.</p>
<p>And this is where I have put my little finger close to my mouth and say:</p>
<p style="text-align: center;"><img class="size-full wp-image-710 aligncenter" style="margin-top: 10px; margin-bottom: 10px;" title="drevil_million_dollars" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/drevil_million_dollars.jpg" alt="drevil_million_dollars" width="300" height="318" /></p>
<p>&#8216;I want one trillion instructions&#8217;</p>
<p>So what did I get from these trillion instructions?</p>
<p>An interesting study in how operating system overhead can have a big impact on the profitability of hardware accelerators. By running hundreds of test cases with different assigned computation latencies of a hardware accelerators, as well as different driver models for my hardware (all running under Linux on my favorite MPC8641D), a key diagram emerged:</p>
<p style="text-align: left;"><img class="aligncenter size-full wp-image-712" style="margin-top: 10px; margin-bottom: 10px;" title="hwsw" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/hwsw.png" alt="hwsw" width="872" height="507" /><a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html">Read the paper </a>for all the details, but the key thing to note is that with a poor driver architecture, making the hardware 100 times faster resulted in zero gain in system performance. Had this experiment been performed on a bare-bones platform without a full operating system in place, I am fairly certain that the faster hardware would have been considered much more worthwhile.</p>
<p style="text-align: left;">In the end, I resorted to a driver variant where I had user-level code directly access the device programming interface via an mmap()-mapped memory region. Not pretty, essentially this was bare-metal programming wrapped inside a big cosy Linux package, but it sure was efficient compared to doing a kernel/user mode switch for each hardware operation. But even here, it turned out that making the hardware very very fast as opposed to just very fast had no benefit. It proves to me that the software has to be taken into account in full in order to properly evaluate an idea for a hardware design.</p>
<p style="text-align: left;">You could say that the poor results for acceleration here were due to my inept Linux driver programming skills, but that just underscores the key result: you have to take the software into account. If the conclusion is that a better Linux device driver programmer is needed, you have still decided that the key system bottleneck is not just the speed of the hardware, but how it is used. And that is exactly what system design needs to be about.</p>
<p style="text-align: left;">As an aside, playing around with a complete system like this, and automatically run large volumes of test with varying parameters was a really interesting experience. I must admit that getting to these trillions of instructions required  a few hours of simulation time, but nothing that could not be solved by leaving a computer running over lunch or a long meeting. The machine was modeled using standard Simics &#8220;software timing&#8221;, i.e., without any particular cache or pipeline or bus details, and it seems that that is usually all you need. Had I increased the level of detail and slowed things down by a factor of ten or a hundred, I would never have covered such a large set of test cases and been able to evaluate as many different variants of drivers and hardware speeds.</p>
<h2 style="text-align: left;">IBM did it before me</h2>
<p style="text-align: left;">Finally, I found it interesting that an analogous experience about the effect of creating a complete software stack and testing what looks like a very good hardware idea was reported in an IBM paper from a few years ago, in &#8220;<a href="http://researchweb.watson.ibm.com/journal/rd/502/peterson.html">Application of full-system simulation in exploratory system design and development</a>&#8220;, by Peterson et al, in the IBM Journal of Research and Development. Look at the section about the &#8220;MIP Morphing&#8221; feature, which is essentially cache locking. They do use a fairly detailed simulator for the end evaluation of their performance &#8211; but the key message is that by running a full software stack, they realized that just managing the feature was too hard in a realistic software environment to make it worthwhile:</p>
<blockquote>
<p style="text-align: left;">Initially, the MIP morphing feature was well received by internal development and HPCS customers alike. The team was aware of the need to both manage this hardware feature at the OS level and provide portable abstractions to the programmer to exploit this feature in a productive way. &#8230;</p>
</blockquote>
<p style="text-align: left;">And then:</p>
<blockquote>
<p style="text-align: left;">The implementation effort was facilitated by Mambo, allowing the OS team to prototype the MIP morph idea in a controlled development environment. Taking the prototyping effort to this level of realism uncovered many complexities in supporting the MIP morph in a virtualized manner. ..</p>
</blockquote>
<p style="text-align: left;">And finally:</p>
<blockquote>
<p style="text-align: left;">By prototyping the software support that was <em>needed at the OS level and exposing the usage issues at the application programmer&#8217;s level</em>, the magnitude of the problem was exposed at its fullest. Further, the improvement in performance did not show a sufficient payback for the immense effort that would be required at the software level to support the idea, and as a result it was dropped from further consideration.</p>
</blockquote>
<p style="text-align: left;">It seems that whatever you do, IBM did it first&#8230; and it validates the idea of full-system simulation and that software is king today.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/709"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/709" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/709" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/709/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Adding to Schirrmeister&#8217;s Virtual Platform Myth Busting</title>
		<link>http://jakob.engbloms.se/archives/651?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/651#comments</comments>
		<pubDate>Wed, 18 Feb 2009 12:22:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[Eve]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[Lauro Ritazzi]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=651</guid>
		<description><![CDATA[Frank Schirrmeister of Synopsys recently published a blog post called &#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin: 10px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />Frank Schirrmeister of Synopsys recently published a blog post called <a href="http://www.synopsysoc.org/viewfromtop/?p=64#comment-1008">&#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”</a>. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some Synopsys-Virtio Innovator examples of such uses. In his view, most appication-software is being developed using host-compiled techniques.  I want to add to this refutal by adding that application-software is surely a very important &#8212; and large &#8212; use case for virtual platforms.</p>
<p><span id="more-651"></span>The beginning of the argument was found in an <a href="http://www.edadesignline.com/howto/212200519">EDA Design Line article titled &#8220;Unified Verification for Hardware and Embedded Software Developers&#8221; </a>by Lauro Ritazzi of Eve USA. In it, he makes the following claim:</p>
<blockquote><p>While some may have achieved the scope of jump-starting software development, they only address application programs that do not require an accurate representation of the underling hardware design. They fall short when testing the interaction of the embedded software with hardware, such as firmware, device drivers, operating systems and diagnostics. For this testing, embedded software developers need an accurate model of the hardware to validate their code, while hardware designers need fairly complete software to fully validate their application specific integrated circuit (ASIC) or SoC.</p></blockquote>
<p>The interesting part here is really that jump-start is just for applications, and that OS and drivers require more details than a fast virtual platform can supply. I do not quite agree with this. But let&#8217;s first see what Frank Schirrmeister said:</p>
<blockquote><p>the majority of the software development on virtual platforms is spent on firmware, device drivers, operating system porting and diagnostics. And that is not &#8211; as one could assume &#8211; on cycle accurate models, but on functionally accurate models with only essential timing, the type of models called loosely timed (LT) in SystemC.</p></blockquote>
<p>I totally agree with this. As is evident from many different <a href="http://www.virtutech.com/casestudies">public use cases</a>, OS, BSP, and driver development is a big use of virtual platforms. For example, last summer, <a href="http://jakob.engbloms.se/archives/137">Freescale announced the QorIQ P4080 with pretty good software support </a>in terms of Linux and VxWorks operating systems, as well as some middleware stacks. All developed on Simics using an even more timing-abstracted model of the hardware.</p>
<p>However, Frank then makes the following claim that I have a harder time with:</p>
<blockquote><p>In contrast, application software is developed more often than not using completely hardware independent techniques, including cross compilation from the host development machine using development kits like Apple’s iPhone development kit.</p></blockquote>
<p>This is to some extent true, but as time goes on, I think this type of development environment is going to be less useful. Traditionally, OS vendors have had tools like VxSim and OSE SoftKernel in place to help customers &#8220;run code on their desktop&#8221;, while using the API of the operating system of choice. However, such solutions have lots of problems in how close they can get to the target.</p>
<ul>
<li>If you have any kind of third-party binary-only application, or want to use an existing binary component without lots of complex recompilation, you need a virtual platform running the underlying OS. You cannot squeeze that into a host-compiled API simulator.</li>
<li>You are not using the same compiler and code-generation settings and build settings as you are for your actual target, and this can (read: will) introduce nasty compiler version issues.</li>
<li>It forces you to maintain an additional build variant for your code, which can be pretty expensive for a complex build.</li>
<li>You are not using the real OS scheduler, device drivers, and interrupt structure found on the target system. This can have a huge impact, especially for multithreaded multiprocessor systems.</li>
<li>The API simulator needs to be kept in synch with the real software stack, and customized in the same way for any particular target. This is hard to get right (even though it has been done).</li>
<li>The API simulator does not handle heterogeneous systems very well, such as chips or boards or racks mixing two or more different OS kernels in the same system (like a DSP and a main processor OS).</li>
<li>API simulation completely falls apart when the OS is no longer the lowest level of the software stack, but you also have a hypervisor layer underneath the OSes on your target system. An API simulator simply cannot represent this kind of case.</li>
<li>Using a virtual platform and the real target binaires also fits with the very important &#8220;fly what you test, test what you fly&#8221; principle of embedded software development.</li>
</ul>
<p>For various subsets of these reasons, I see many users picking up virtual platforms as a way to streamline application development. For example, <a href="http://www.virtutech.com/news_events/pr/pr2009-02-11-595.html">NASA recently selected a virtual platform based on Simics </a>to develop the software for the new Orion spacecraft. That is going to be a complete software stack, not just OS and drivers, which tend to to be fairly off-the-shelf component for these kinds of systems. Most of the effort is on the application level, and the platform used is a virtual platform.</p>
<p>However, note that there are cases where a fast virtual platform like we are discussing here is not sufficient to validate all aspects of the code. I think the main reason we see different viewpoints on this, is that we are looking at very different types of software-hardware integration.</p>
<p>In a <a href="http://jakob.engbloms.se/archives/153">blog post I wrote last year on the dead-ness of cycle-accurate simulation</a>, Grant Martin of Tensilica pointed out that <a href="http://jakob.engbloms.se/archives/153#comment-1652">some software desperately needs cycle-accuracy </a>as it is intimately dependent on the timing of the hardware. This is certainly true for some aspects of drivers, and more so for the really early boot code.</p>
<p>Here, FPGA-based hardware-accelerated simulation of the actual design in VHDL or Verilog makes eminent sense as a way to get the details perfectly right. But that is only one part of a much greater system development puzzle, and it really only applies to very small subsystems as  it is kind of hard to fit much more than a single chip inside a hardware acceleration unit. Just as Frank Schirrmeister says, hardware accelerated simulation is very important. The nice article on the <a href="http://jakob.engbloms.se/archives/639">IBM z10 development </a>that I blogged about earlier says exactly that: for some parts of the validation, there is no way around using the actual hardware RTL design.</p>
<p>And in the end, you have to test the timing and analogue aspects of a design on physical hardware anyway. There should not be too many suprises at this stage, if you have used all of the cool current tools right. But there surely will be some &#8212; even a VHDL simulation is a simulation, and not reality, after all.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/651"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/651" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/651" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/651/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Eclipse Linux Kernel Indexing Works</title>
		<link>http://jakob.engbloms.se/archives/338?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/338#comments</comments>
		<pubDate>Sun, 01 Feb 2009 17:10:18 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Linux kernel]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[Simon Kågström]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=338</guid>
		<description><![CDATA[Edited on 2009-Feb-01, to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09. Thanks to Simon Kågströms post (and the even better second-generation with screenshots) about using Eclipse for the Linux kernel, I have a much [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-339 alignleft" style="margin: 5px 10px;" title="eclipse_wide_logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/eclipse_wide_logo.jpg" alt="" width="131" height="68" /> <img class="size-medium wp-image-329 alignright" style="margin-left: 10px; margin-right: 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" /> <em>Edited on 2009-Feb-01,  to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09.</em></p>
<p>Thanks to <a href="http://simonkagstrom.livejournal.com/31079.html?view=19559#t19559">Simon Kågströms post </a>(and the even better <a href="http://simonkagstrom.livejournal.com/33093.html">second-generation with screenshots</a>) about using <a href="http://www.eclipse.org">Eclipse </a>for the Linux kernel, I have a much nicer work environment now for my ongoing work in learning Linux device drivers on PowerPC, which has helped me work my way through several hard-to-figure-out system calls.<span id="more-338"></span> Here is a screenshot that I found pretty cool&#8230; the tool has found the definition and comments for the IRQ registration function:</p>
<p style="text-align: center;"><a href="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08.png"><img class="size-medium wp-image-340 aligncenter" title="2008-11-09-21-51-08" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08-300x187.png" alt="" width="300" height="187" /></a></p>
<p style="text-align: left;">2009-Feb-01:</p>
<p style="text-align: left;">I had to rebuild my indexing from scratch in the past weekend, and as a result, I have a word of warning: you have to create a &#8220;C project&#8221; in Eclipse, if you accidentally create a &#8220;Project&#8221;, the Eclipse workspace file will have the wrong name (.project instead of .cproject), and the autoconf-to-eclipse script will not work.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/338"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/338" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/338" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/338/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>EDA Tech Forum Article on Ecosystem Enablement</title>
		<link>http://jakob.engbloms.se/archives/577?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/577#comments</comments>
		<pubDate>Sat, 10 Jan 2009 21:17:54 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[business issues]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ecosystem enablement]]></category>
		<category><![CDATA[EDA Tech Forum]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[qoriq]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=577</guid>
		<description><![CDATA[I have an article about ecosystem enablement for new hardware, co-authored with Richard Schnur of Freescale published in the December 2008 issue of EDA Tech Forum. The core concept is that a virtual platform solution makes it possible to get a new chip to market faster with better software support, and even enables virtual design-in [...]]]></description>
			<content:encoded><![CDATA[<p>I have an <a href="http://www.edatechforum.com/journal/dec2008/streamlining_intro.cfm">article about ecosystem enablement for new hardware, co-authored with Richard Schnur </a>of <a href="http://www.freescale.com">Freescale</a> published in the <a href="http://www.edatechforum.com/journal/dec2008/">December 2008 issue of EDA Tech Forum</a>. The core concept is that a virtual platform solution makes it possible to get a new chip to market faster with better software support, and even enables virtual design-in of a chip at OEM customers before hardware becomes available. The article builds on our joint experience with the QorIQ P4080 launch in the Summer of 2008, where we had several operating systems and middleware packages in place at the moment the chip was announced. EDA Tech Forum requires registration, but it was still free, and there are many other good articles available.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/577"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/577" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/577" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/577/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Threading or Not as a Hardware Modeling Paradigm</title>
		<link>http://jakob.engbloms.se/archives/485?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/485#comments</comments>
		<pubDate>Thu, 01 Jan 2009 08:31:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[Reactive programming]]></category>
		<category><![CDATA[sampalib]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Threading]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=485</guid>
		<description><![CDATA[Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-486" style="margin: 5px 10px;" title="gears-modeling" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/gears-modeling.png" alt="gears-modeling" width="62" height="65" />Traditional hardware design languages like <a href="http://en.wikipedia.org/wiki/Verilog">Verilog </a>were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of <a href="http://www.systemc.org">SystemC</a>, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look &#8220;natural&#8221; as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.</p>
<p>In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of &#8220;natural&#8221; modeling.</p>
<p><span id="more-485"></span></p>
<p>As I see it, the main alternative modeling paradigm is to use a classic event-driven system, where all activity is triggered by events and run the associated code to completion. This makes execution occur in a series of simulation steps in various part of the system, rather than as a set of (pseudo) concurrent tasks.</p>
<h2>Threaded Problems</h2>
<p>The most common complaint with threading is <strong>performance</strong>. This has become very clear in the case of using SystemC for transaction-level modeling. All advice in how to do good and fast TLM coding tells us to use SC_METHODs, which are essentially callbacks that are not active objects in their own right. Note that SystemC models found in the wild are often built on SC_THREADs despite this advice, as that is the &#8220;easiest&#8221; way to do things. Some convenience systems part of the OSCI TLM-2.0 library also rely on threads to convert between AT-style asynchronous and LT-style synchronous function calls (which is pretty unavoidable, but not applicable in the realm of high-performance simulation for virtual platforms).</p>
<p>Furthermore, using threading as a paradigm (even cooperative single-active-thread cooperative threads like in SystemC or classic MacOS) bring with it the <strong>problems of concurrent programming</strong>, in that you suddenly need to care about protecting data structures against conflicting accesses, worry about deadlocks, and similar concurrent programming issues. Without threads, all such issues go away.</p>
<p>Note that using threading as a modeling paradigm with truly concurrent execution of models will make the execution have all the problems of parallel programs, especially non-deterministic execution and hard-to-find bugs. At least a cooperative multitasking system tends to be deterministic in the way it goes wrong.</p>
<p>Threading as a hardware model programming style therefore makes concurrent multithreaded simulation harder rather than easier to achieve. Especially if the semantics of the simulation system specifies an interleaved model of execution as the semantics, which is the case for SystemC. In this cases, there is no way to really make SystemC parallel without adding parallelism as some extra library.</p>
<p>However, one of the biggest practical problems with threading is the problem of <strong>inspecting, changing, and checkpointing simulation state</strong>. With threads, you end up having state stored in local variables on the stacks in the system, as well as in processor registers, the program counter, and other places that are hard to get to from the outside.  This is not just me saying this, I found this well said in the <a href="http://www.sampalib.org/doc/papers/A%20Sampalib%20and%20SystemC%20comparison.pdf">sampalib white paper </a>:</p>
<blockquote><p>Using threads means that part of the simulation state is in stacks, which may limit the ability to persist the state of the simulation in checkpoints.</p>
<p>Using wait() implies context switch which are costly in terms of simulation speed, and thus often discouraged in guidelines for modeling SystemC™ models</p></blockquote>
<p>To furthermore drive this point, all librariesfor general program state serialization that I have seen (for C++ and Java, for example) also rely on explicit state stored in objects, and explicitly do not support the &#8220;transient&#8221; state held in local variables and the program counter. Essentially, only heap-allocated objects are handled in serialization solutions.</p>
<h2>Event-Driven Solutions</h2>
<p>An event-driven transaction-level hardware simulation is coded in a different way from a naive threaded implementation (but not that differently from a more sophisticated threaded program).</p>
<p>Each device model has to make its state explicit as a set of variables, and preferably also declare these for access for an external tool using something like <a href="http://www.greensocs.com/en/projects/GreenControl">GreenSocs GreenControl </a>or <a href="http://www.virtutech.com/whitepapers/modeling.html">Simics Attributes</a>. It also has to expose a set of functions to be called when events happen or other devices in the simulation system send a transaction into the device model.</p>
<p>Additionally, you should encapsulate all state in a model inside the model object and not expose it for direct access from the outside. A pure object-oriented style with accessor functions for everything is required for best modularity.</p>
<p>The advantages of this model are clear:</p>
<ul>
<li>Concurrency problems are reduced, since each function call will run to completion before any other object or function is activated. There is no need to worry about shared data variables, as they should not exist.</li>
<li>Checkpointing and inspection is facilitated, since all state is now explicit and declared.</li>
<li>Performance is typically increased, since there is no need to do context switches between threads. Locality is also increased by having functions run to completion before returning.</li>
<li>True concurrency is easier to achieve, since each model can quite easily be considered a local-state, shared-nothing, explicit message-passing component similar to Erlang threads. This makes it possible for the simulation scheduler to run multiple models concurrently on multiple host threads. For more on this topic, see my <a href="http://jakob.engbloms.se/archives/246">SiCS Multicore Days 2008 </a><a href="http://www.engbloms.se/presentations/engblom-multicore-sics-2008.pdf">presentation on how Simics was threaded</a>.</li>
</ul>
<p>The downside is that some people consider the programming more complicated. Which is really a matter of appearance over substance: event-driven programming tends to be more robust and easier to follow in the long run, since threaded programming makes things a bit too implicit.</p>
<p>Here is the basic example of a thread that does some periodic work.</p>
<p>Threaded style:</p>
<blockquote>
<pre>Thread_for_D():
  loop forever:
    do work...
    wait(some time)</pre>
</blockquote>
<p>Event-driven style, where we just repost an event each time we are called:</p>
<blockquote>
<pre>Time_callback():
  do work...
  post event(some time, Time_callback)</pre>
</blockquote>
<p>Another advantage of event-driven models is that such a paradigm makes it clear that you need to be able to accept any call into the model at any time. This makes for more robust code, since it is quite easy to (intentionally or by mistake) encode an expectation on the sequence of activity in a threaded that might not be what actually happens at run-time. In particular, the state of any protocol being acted on will need to be explicitly rather than implicitly represented.</p>
<p>There is much more to be said on how to code in this style, but there are long papers out there to read on this.</p>
<h2>High-Performance Event-Driven Simulation</h2>
<p>Note that in high-performance virtual platform-style simulation, processors will usually be a special case in both threaded and event-driven styles. That is since the flow of instructions that they execute constitute very many very small actions that cannot affort a context switch between each. Here, the advantage of the event-driven model is even clearer, given some special-casing of processors. This is another long story that I will not reiterate here, but basically, most events as discussed above will be memory accesses from a processor to read and write device registers, and each such memory access can be handled in a single simulation step. No need to switch context or do anything but handle a simple function call. By not having a wait() call to deal with, this mechanism can be kept simple and cheap &#8212; which is essentially using an SC_METHOD in SystemC. But in the complete absence of SC_THREADs and their ilk, many other things can be optimized even better.</p>
<h2>The End</h2>
<p>What I wanted to provide in this almost-article-length post was an idea for the problems that I see threads cause as a modeling paradigm for hardware models, and the advantages offered by a reactive event-driven style. For some reason, this is misunderstood in the modeling community at large, probably because most operating systems and simulation systems in common use today present various forms of threads as the way to model concurrent behavior. However, threads as a prominent user-level programming model are known to be bad in many ways&#8230; and modeling is no exception to this rule.</p>
<p>Note that I realize that threads are needed at some level in order to take advantage of multicore hardware, but I think they are best hidden inside a simpler framework that presents a simpler understandable semantics to the user.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/485"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/485" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/485" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/485/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes from the IP 08 Panel</title>
		<link>http://jakob.engbloms.se/archives/440?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/440#comments</comments>
		<pubDate>Sat, 06 Dec 2008 20:31:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[IP08]]></category>
		<category><![CDATA[panel discussion]]></category>
		<category><![CDATA[Register Design Languages]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[SystemRDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=440</guid>
		<description><![CDATA[Now I am home again, and some days have passed since the IP 08 panel discussion about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-366" style="margin: 5px 10px;" title="ip08" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/ip08.gif" alt="" width="147" height="63" />Now I am home again, and some days have passed since the <a href="http://www.design-reuse.com/ip08/program/panel_virtualplatform.html">IP 08 panel discussion </a>about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a good job of moderating and keeping things interesting.</p>
<p><span id="more-440"></span></p>
<p>The panel had a clear consensus, which nobody really challenged, that virtual platforms for software development are different in kind from virtual platforms for hardware development. Indeed, a the taxonomy of &#8220;hardware virtual platforms&#8221; versus &#8220;software virtual platforms&#8221; was used frequently and proved quite appropriate.</p>
<p>A software virtual platform has to be fast and its timing can be fairly approximate. It main value, in this context, is that can be created quickly and is useful for early software development and debug. Opinions differed, however, on how to produce them and where to go with them.</p>
<ul>
<li>Markus Willems from Synopsys had the position that they are produced in some appropriate way as a separate task from hardware development. SystemC was his language of choice.</li>
<li>Peter Flake proposed a methodology where you start by developing the software virtual platform and then refine it down towards more detailed models and finally hardware. He brought up Virtutech <a href="http://www.virtutech.com/whitepapers/virtutech_dml.html">DML </a>and <a href="http://jakob.engbloms.se/archives/358">SystemRDL</a>, as examples of languages pointing in this direction.</li>
<li><strong> </strong>Loic Le Toumelin considered the software virtual platform as a something that is generated from a common design entry point, using some form of synthesis that can also generate the hardware and the hardware virtual platform.</li>
<li>I think my realistic position right now is that a software virtual platform is created as a separate item, but that we want to make this work as short and easy as possible and that in the future, the vision is similar to Peter Flake&#8217;s: start with a software virtual platform to define the hardware-software interface.</li>
</ul>
<p>It was also interesting in how different the opinion was when we got to the detailed hardware-oriented virtual platforms. The ones that tend to be clock-cycle level and attempt to be cycle-accurate (CA) in many cases.</p>
<ul>
<li>Markus said that the only good way to build a CA model was to take the RTL and convert it, or run it in an FPGA prototype. He echoed the sentiments <a href="http://jakob.engbloms.se/archives/153">I wrote about in July, that ARM is getting out of cycle-accurate models and the general difficulty of creating such a model by hand</a>.</li>
<li>Peter pointed out that you can have CA models before RTL, as a design tool. I strongly agree with this model of working, it is common in industry and definitely one way to go. However, for existing hardware, I agree that RTL-to-CA seems reasonable, even if the resulting models are painfully slow.</li>
<li>Loic wanted the CA to come from the same source as the software VP, and was very keen on their being in complete agreement on semantics of the hardware.</li>
</ul>
<p>The third major discussion was about the required accuracy and fidelity-to-hardware of a virtual platform. With a consensus that a software virtual platform has to be fast and with timing approximated, it is still clear that many people are uncomfortable about this idea of not being &#8220;exactly like the hardware&#8221;.</p>
<p>For some purposes, you do need complete fidelity to the hardware timing in a CA model. Loic definitely could not accept anything less when giving a customer a virtual platform, and some people in the audience echoed the same sentiment. Most, however, agreed that most software work can be done with simple timing, and that it does not matter all that much if there are some functionality bugs or omissions in the virtual platform. It is still far better than no platform at all!</p>
<p>What is clearly needed, at least for virtual platforms close to a hardware design process, is a way to check the software virtual platform and hardware virtual platform against the functionality and maybe timing of the final RTL. In the cases that you have the RTL, which is far from always in my world.</p>
<p>There were some other questions about software development tools support (of course you use the same debugger and compiler as with a physical platform) and other issues where the panel was mostly in agreement. I guess some of this also indicates that virtual platforms are not yet universally understood and that most people have not really had any experience with them.</p>
<p>Overall, this was a fun panel, and I hope the audience enjoyed it too and learnt something in the process.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/440"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/440" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/440" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/440/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Gary Stringham on Hardware Interface Design vs Virtual Platforms</title>
		<link>http://jakob.engbloms.se/archives/358?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/358#comments</comments>
		<pubDate>Sat, 29 Nov 2008 20:51:55 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[Semifore]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Spectareg]]></category>
		<category><![CDATA[SystemRDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=358</guid>
		<description><![CDATA[I just read an interesting paper from the 2004 Embedded System&#8217;s Conference (ESC) written by Gary Stringham. It is called &#8220;ASIC Design Practices from a Firmware Perspective&#8221; and straddles the boundary between hardware design and driver software development. It was good to see someone take the viewpoint of &#8220;how you actually program a hardware device [...]]]></description>
			<content:encoded><![CDATA[<p>I just read an interesting paper from the 2004 Embedded System&#8217;s Conference (ESC) written by <a href="http://www.garystringham.com">Gary Stringham</a>. It is called &#8220;ASIC Design Practices from a Firmware Perspective&#8221; and straddles the boundary between hardware design and driver software development. It was good to see someone take the viewpoint of &#8220;how you actually program a hardware device is as important as what it does&#8221;. Gary seems to understand both the hardware design and implementation view of things, as well as that of the embedded software engineer. To me, that seems to be a fairly rare combination of skills, to the detriment of our entire economy of computer system development.</p>
<p><span id="more-358"></span></p>
<p>Gary Stringham&#8217;s lists a number of tips on how to create hardware-software interfaces. Some of them are echoed on his <a href="http://www.garystringham.com/newsletter.shtml">monthly newsletter</a>, which is worth a read (even if it is a bit short on detail). Unfortunately, there seems to be no publicly available version of the text. Gary has definitely kept lecturing on the topic since, at venues like the ESC and DVCon it seems, but more recent lecture notes that I have found in the ESC proceedings are pretty sparse. I guess being a consultant teaching people to do these things for a fee makes you a bit hesitant to share all your knowledge freely with the world&#8230; I can understand that position.</p>
<p>Anyway. Some of the comments in the text indicate to me the great value that virtual platforms can bring to the actual design of hardware up front, not just as an execution vehicle for the final design, used by a software engineer who has to take whatever is given.</p>
<p>In particular, the issue of getting ASIC and Firmware designers to collaborate on the same thing at the same time. Quote:</p>
<blockquote><p>The key to designing an ASIC is to get the firmware engineers involved early. They are the customers that will be using the ASIC. Unfortunately, getting them involved early is often difficult to do because the ASIC design has to start several months before the firmware engineers will get parts</p></blockquote>
<p>And</p>
<blockquote><p>When the parts do arrive, the roles are reversed. The firmware engineers are trying to work with<br />
it while the ASIC engineers have mainly forgotten it and have moved on to new projects.</p></blockquote>
<p>The proposed solution to this problem is to involve the firmware people in the hardware design review process, which is a good idea.</p>
<p>It would be even better, however, if the firmware people could have the hardware interface to try as a live thing rather than just reading the documents. This is exactly what virtual platforms offer: quickly build a fast simple prototype of the interface, and hand it over to the software engineers to try.</p>
<p>This is something that I am currently <a href="http://jakob.engbloms.se/archives/330">exploring in some detail with Simics</a>, and that I wrote a piece about in <a href="http://chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">Chip Design </a>earlier this year. Fundamentally, I think this is feasible, provided that hardware designers do not fret too much about timing details and the precise performance of the final implementation, and focus more on the programming interface design first &#8212; and then later go on and make sure the timing and performance is right.</p>
<p>It is just like software development is supposed to be done: start by designing a useful interface to a piece of functionality, and then add in the details and optimize performance within the boundary of that interface. Of course, the interface might need some adjustments to support certain optimizations, but it is quick and easy to provide a new virtual platform with a new behavior to the software engineer. Much faster than providing a new piece of silicon or even new documentation.</p>
<p>Register design tools like <a href="http://www.denali.com/en/products/systemrdl_about.jsp">SystemRDL </a>(<a href="http://www.arm.com/iqonline/news/marketnews/17617.html">now being standardized by SPIRIT</a>), <a href="http://spectareg.com/">Spectareg</a>, <a href="http://www.semifore.com/">Semifore</a>, etc. all touch on this, but all seem to be lacking the ability to actually describe what a device does beyond some simple basics like software and hardware read/write properties. You really need a full expressive language to write a truly executable model of the hardware (and I like <a href="http://www.virtutech.com/whitepapers/virtutech_dml.html">DML </a>for this).</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/358"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/358" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/358" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/358/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Cadence-Ran vs Synopsys-Frank over Low-Power and Virtual Things</title>
		<link>http://jakob.engbloms.se/archives/344?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/344#comments</comments>
		<pubDate>Sat, 15 Nov 2008 22:32:11 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[power analysis]]></category>
		<category><![CDATA[Ran Avinun]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=344</guid>
		<description><![CDATA[Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right in his thinking about virtual platforms and how to use them. Read on for some comments on the exchange.</p>
<p><span id="more-344"></span><br />
The following appears to be to sequence of events:</p>
<ul>
<li>Cadence press release, in September, about their &#8220;<a href="http://www.design-reuse.com/news/19019/power-analysis-pre-rtl-exploration.html">Palladium Incisive Palladium Dynamic Power Analysis and Cadence InCyte Chip Estimator</a>&#8220;, quoting Ran:</li>
<blockquote><p><em>Cadence Incisive Palladium Dynamic Power Analysis enables SoC designers, architects and validation engineers to quickly estimate the power consumption of their system during the design phase, analyzing the effects of running various real software stacks and other real-world stimuli. The new offerings also include the Cadence InCyte Chip Estimator, which can now provide what-if power analysis through exploration of different low-power techniques. The InCyte Chip Estimator also generates automatically the Si2 Common Power Format (CPF), which helps drive architectural power specification and intent into implementation and verification.</em></p></blockquote>
<li>Frank Schirrmeister blogged &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=50">On Chameleons, Low Power and the Marketing Power of Copy Editing</a>&#8220;, basically saying that what Cadence was selling was something that was bound to the RTL level and thus arriving with estimates pretty late in the design process, after most important architecture decisions had been made. Instead, he proposed a flow using <strong>virtual prototypes </strong>that contained a sequence of successively better estimates, from the usual initial spreadsheet to estimates actually derived from RTL later in the process (or for IP blocks that already exist). Synopsys is not alone in this, <a href="http://www.neosera.com">Neosera </a>and <a href="http://www.scdsource.com/article.php?id=82">ChipVision</a> are after similar ideas. I think this approach makes excellent sense, following the idea that getting some kind of approximate feedback from a complete system early in the process is better than getting lots of details from a small part of a system late in the process.</li>
<li>Ran Avinun then blogged a reply to Frank, at &#8220;<a href="http://http://www.cadence.com/Community/blogs/sd/archive/2008/10/30/the-power-of-cadence-system-power-flow-vs-viewing-from-the-top.aspx">The Power of Cadence System Power Flow vs. Viewing from the Top</a>&#8220;. His contention there is that virtual prototypes have their uses, but that real designers will be using hardware accelerators, as that provides the key accuracy needed to do real power work. Also, he sees the creation of a virtual platform as a big problem, and cites a number of cases where running the actual semi-final RTL with power simulation was key to project success. Also, Ran sees the time needed to create a virtual platform as a big obstacle.</li>
<li>Frank then replied to the reply, at &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=53">Hammers, Nails and the Spirits That I Called …</a>&#8220;&#8230; where he points out that Ran has some misconceptions about virtual platforms, admits that the Cadence flow works well, but that it does miss the point of early power estimation before the design is too frozen to be much changed. There is a pretty but hard-to-read diagram in the post, from <a href="http://www.design-reuse.com/articles/12728/towards-activity-based-system-level-power-estimation.html">a 2005 article he wrote while at ChipVision in Germany</a>, pointing out the need to evaluate designs with actual test data from the real world.</li>
</ul>
<p>What do I make of all of this?</p>
<h2>Ran&#8217;s Points</h2>
<p>I must admit that I think the Palladium hardware simulation accelerator boxes are very cool pieces of hardware, which at least used to be based on custom logic systems that use several cycles of a fixed sized hardware to simulate multiples of the hardware&#8217;s based emulation capacity (so 10M capacity system can use 10 cycles per target cycle to simulate 100M, for example). However, I do agree with Frank that these are dependent on having actual RTL in place to be of much use.</p>
<p>Another issue with hardware emulators is their overall availability: compared to the number of PCs available in an organization, they are going to be very limited. As discussed in many different forums, a key advantage of a pure virtual platform is that it can turn any programmer&#8217;s PC into a target system running the real target software. Without having to book time on a limited set of physical target machines, and hardware accelerators are such limited-in-supply hardware machines. So a virtual platform is much more available to people within, and especially outside, a design organization. Also, unless you are happy to release RTL for your design to people outside your organization, hardware acceleration is going to do little to help your end users get the most out of your design, pre-silicon.</p>
<p>My final gripe with hardware emulators is their limited scope. They tend to max out at a the borders of a single chip, or less. A virtual platform, on the other hand, has much more room to scale, to include multiple chips, <a href="http://www.virtutech.com/products/simics_accelerator.html">multiple boards</a>, or even <a href="http://www.compactpci-systems.com/articles/id/?3537">complete racks</a> and networks of networks. You cannot really do that in any hardware simulation, as it involves too many billions of gates running too many billions on instructions. The general rule of simulation still applies with hardware acceleration: <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">you need to increase the level of abstraction to handle larger systems</a>.</p>
<p>As to the problem faced by Ran&#8217;s customers, having RTL but no virtual platform: what were they thinking of? Seriously, if you want to do design today a virtual platform should be your starting point, not an afterthought. Time and again, we see examples today where using virtual platforms <a href="http://www.chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">gets chips to customers ahead of time and provides the ability to test ideas before committing to final RTL</a>. It seems that Ran agrees with this need, but his means are different:</p>
<blockquote><p><em>&#8220;As was stated above, big reason our customers use RTL emulation platforms is for accuracy, and while virtual platforms can offer certain performance, eventually the need to accuracy becomes critical and can not be overlooked, even for initial performance and power estimation analysis. Frank seems to forget in his statement above that the average bring-up time of new virtual platforms takes 6-12 months while the average bring-up time of many emulated designs takes days.&#8221;</em></p></blockquote>
<p>The time to create a virtual platform is actually pretty short, if you do it at a sufficently abstract level of detail and don&#8217;t worry too much about cycle accuracy. Also, that bringing up of an emulation depends on having a detailed RTL-level description to start with&#8230; which is not necessarily the case. I must say that the cited six to 12 months for a VP (for a single SoC as discussed here) sounds reasonable to me &#8212; if you are building a cycle-level model that tries to emulate the final timing (<a href="http://jakob.engbloms.se/archives/153">which might not be really feasible at at all</a>). If you work at a higher level of abstraction like loosely-timed TLM, that time shrinks by a factor of ten or so. I agree that in the end, accuracy is critical &#8211; but before you get there, the approximations used by the VP will have gotten you pretty far in terms of software development and architecture testing.</p>
<p>Ran is also afraid of the lack of accuracy:</p>
<blockquote><p><em><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText"> Now, even if you build this platform successfully 9-12 months in advance, how do you know that your virtual platform representing your real design? How do you connect it to your verification and implementation environment and realistic power information? Frank seems to overlook these things. Looking at the analogy of the story described at the blog above, using a system-level platform that is not targeting the actual hardware for performance analysis and power trade-offs guarantees that the Chamelon will become a snake and you will get bitten. </span></em></p></blockquote>
<p>As with all simulations, virtual platforms needs to be used with care and understanding. It might also once again be a matter of system scale: for RTL simulation, you are looking inside a single chip, and the detailed design to save power there. With a VP, you might be looking at whether a particular OS kernel does even care to try to turn off unused hardware at all&#8230; and that might be just as important in the end as being accurate in how functional units turn on and off inside an accelerator.</p>
<p>In today&#8217;s software-driven systems that mostly consist of existing off-the-shelf hardware, not any particular SoC that is being designed right now, the large-scale behavior and smarts of the software in a setting containing lots of chips and functions is far more important than optimizations inside a chip.</p>
<h2>Frank&#8217;s Points</h2>
<p>Since Frank is a virtual platform supporter just like me, I instinctively agree with his points about VPs being pretty fast to develop and available long in advance of actual silicon. I like the way he deals with power in the ARM DevCon presentation cited (do have a look at it), but still there are some lingering doubts and issues&#8230;</p>
<p>What I have a hard time understanding is just how detailed the virtual platforms need to be. The use of SystemC TLM-2.0 LT is sensible for speed, but it seems from the DevCon presentation that the main emphasis is on AT-level (and therefore pretty slow) timing-accurate simulations that look at power cycle by cycle in the target. If that is the case, I think we could almost just as well go get ourselves a hardware accelerator, as cycle-level models (even if transaction-driven)</p>
<p>However, Frank also says this which I cannot but agree with: you should not always run around with a hammer and look at everything like a nail &#8212; any reasonable chip design process needs both virtual platforms and hardware accelerators, one cannot really replace the other:</p>
<blockquote><p><em>When discussing this matter with a friend, he pointed out rightfully so that both Ran’s and my post suffer from “Hammer and Nail-itis”. In fact, he pointed out, the combination of Cadence’s estimators (InCyte), C based synthesis, Palladium, and Synopsys virtual would be pretty powerful! It’s a good thing then that we acquired Synplicity which brought us Synplify high-level synthesis and Confirma FPGA Prototyping to Synopsys, and of course, that we have existing interfaces between our Virtual Platforms and Eve’s solutions. </em></p></blockquote>
<h2>Conclusion</h2>
<p>To me, the lesson from this discussion is clear: A virtual platform should be the starting point of a new design, but once you get down to RTL, hardware acceleration is really pretty useful. You need both, and VP should come first, not second. It is not an either-or issue, rather I expect system and chip designers to use both tools, and the only question is what should come first, which I think is naturally the simulation in the form of a virtual platform. That also allows the chip to be set into a system context, which is otherwise pretty hard before silicon arrives, and something that large system integrators are screaming for.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/344"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/344" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/344" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/344/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

