<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; Grant Martin</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/grant-martin/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Interrupts and Temporal Decoupling</title>
		<link>http://jakob.engbloms.se/archives/1384?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1384#comments</comments>
		<pubDate>Sun, 27 Feb 2011 21:09:17 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Tensilica]]></category>
		<category><![CDATA[virtual]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1384</guid>
		<description><![CDATA[I am just finishing off reading the chapters of the Processor and System-on-Chip Simulation book (where I was part of contributing a chapter), and just read through the chapter about the Tensilica instruction-set simulator (ISS) solutions written by Grant Martin, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>I am just finishing off reading the chapters of the <a href="http://www.springer.com/engineering/circuits+%26+systems/book/978-1-4419-6174-7" target="_self"><em>Processor and System-on-Chip Simulation </em></a>book (where <a href="http://blogs.windriver.com/engblom/2011/01/processor-and-soc-simulation-book.html">I was part of contributing a chapter</a>), and just read through the chapter about the <a href="http://www.tensilica.com">Tensilica </a>instruction-set simulator (ISS) solutions written by <a href="http://www.chipdesignmag.com/martins/">Grant Martin</a>, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS solutions, since that they have an inherently variable target in the configurable and extensible Tensilica cores. However, the more interesting part of the chapter was the discussion on system modeling beyond the core. In particular, how they deal with interrupts to the core in the context of a <a href="http://jakob.engbloms.se/?s=temporal+decoupling">temporally decoupled </a>simulation.</p>
<p><span id="more-1384"></span>This is a small detail, but one where I have always had a feeling that some fundamental assumption was missing in my discussions with various people from the hardware design community. It always seemed that hardware designers assumed a different basic design &#8211; and Grant Martin explained it very well just what that was. They only check for interrupts at the beginning of a time slice. Which makes interrupts less precise  versus the code, but also makes the core interpreter fairly simple since all it has to do is to churn through instructions.</p>
<p>There is another solution, which is employed in Simics, where the processor can take an interrupt at any point in a time quantum. To do this, the processor needs to be aware of what is going to happen. The essentials of the solution is to have devices call the processor and tell it that they intend to interrupt it at some point T in time. The processor simulator then makes sure to stop and give the device model a chance to act at that exact point in time. It is obvious that this solution is easily generalized to cover all time callbacks needed to drive device work. A significant part of the responsibility for running the event-driven simulation is moved into the processor core.</p>
<p>Making the event queue visible to the processor also gives the processor a chance to hypersimulate, or skip idle  time. Since it knows the next point in time that something will happen  (either the end of a time quantum or an event posted by a device), it  can very easily, safely, and <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">repeatably </a>jump forward in time without any  impact on simulation semantics.</p>
<p>When dealing with multiple processors, this means that each processor will have precise interrupts from the devices that are close to it. Timers and IO interrupts tend to work closely with a certain processor for a prolonged period of time. Interrupts between processors suffer a time-quantum delay sometimes, but that is no worse than the solution of checking all interrupts at time-quantum boundaries.</p>
<p>Qemu uses a solution which is a mix of the two. <a href="http://www.usenix.org/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf">According to the 2005 Usenix paper</a>, devices do call into the processor to announce an interrupt, but this is handled by &#8220;soon&#8221; returning to the processor main loop. Processors are not responsible for keeping track of interrupts, making it very imprecise and not very repeatable when interrupts will happen.</p>
<p>Thus, we can see that there are a few different ways to implement interrupts in virtual platforms. Each approach comes from a different tradition and features different trade-offs.</p>
<p>I was a bit surprised by the comment in the Tensilica chapter that only  correctly synchronized programs will work on a temporally decoupled  simulation. In my experience, temporal decoupling is transparent to software functionality &#8211; all software runs. The perceived timing of operations can be different, and some tightly-coupled code might behave in suboptimal ways, but it certainly runs and works. And lets you <a href="http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html">observe  parallel code errors</a>.</p>
<p>Temporal decoupling is necessary in any fast platform, and its effect on semantics are really minor. With the simple tweak of having a processor know when interrupts might happen, it will also not affect the device-processor interface very much, maintaining very tight synchronization between processors and their controlled hardware.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1384"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1384" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1384" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1384/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Adding to Schirrmeister&#8217;s Virtual Platform Myth Busting</title>
		<link>http://jakob.engbloms.se/archives/651?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/651#comments</comments>
		<pubDate>Wed, 18 Feb 2009 12:22:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[Eve]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[Lauro Ritazzi]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=651</guid>
		<description><![CDATA[Frank Schirrmeister of Synopsys recently published a blog post called &#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin: 10px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />Frank Schirrmeister of Synopsys recently published a blog post called <a href="http://www.synopsysoc.org/viewfromtop/?p=64#comment-1008">&#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”</a>. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some Synopsys-Virtio Innovator examples of such uses. In his view, most appication-software is being developed using host-compiled techniques.  I want to add to this refutal by adding that application-software is surely a very important &#8212; and large &#8212; use case for virtual platforms.</p>
<p><span id="more-651"></span>The beginning of the argument was found in an <a href="http://www.edadesignline.com/howto/212200519">EDA Design Line article titled &#8220;Unified Verification for Hardware and Embedded Software Developers&#8221; </a>by Lauro Ritazzi of Eve USA. In it, he makes the following claim:</p>
<blockquote><p>While some may have achieved the scope of jump-starting software development, they only address application programs that do not require an accurate representation of the underling hardware design. They fall short when testing the interaction of the embedded software with hardware, such as firmware, device drivers, operating systems and diagnostics. For this testing, embedded software developers need an accurate model of the hardware to validate their code, while hardware designers need fairly complete software to fully validate their application specific integrated circuit (ASIC) or SoC.</p></blockquote>
<p>The interesting part here is really that jump-start is just for applications, and that OS and drivers require more details than a fast virtual platform can supply. I do not quite agree with this. But let&#8217;s first see what Frank Schirrmeister said:</p>
<blockquote><p>the majority of the software development on virtual platforms is spent on firmware, device drivers, operating system porting and diagnostics. And that is not &#8211; as one could assume &#8211; on cycle accurate models, but on functionally accurate models with only essential timing, the type of models called loosely timed (LT) in SystemC.</p></blockquote>
<p>I totally agree with this. As is evident from many different <a href="http://www.virtutech.com/casestudies">public use cases</a>, OS, BSP, and driver development is a big use of virtual platforms. For example, last summer, <a href="http://jakob.engbloms.se/archives/137">Freescale announced the QorIQ P4080 with pretty good software support </a>in terms of Linux and VxWorks operating systems, as well as some middleware stacks. All developed on Simics using an even more timing-abstracted model of the hardware.</p>
<p>However, Frank then makes the following claim that I have a harder time with:</p>
<blockquote><p>In contrast, application software is developed more often than not using completely hardware independent techniques, including cross compilation from the host development machine using development kits like Apple’s iPhone development kit.</p></blockquote>
<p>This is to some extent true, but as time goes on, I think this type of development environment is going to be less useful. Traditionally, OS vendors have had tools like VxSim and OSE SoftKernel in place to help customers &#8220;run code on their desktop&#8221;, while using the API of the operating system of choice. However, such solutions have lots of problems in how close they can get to the target.</p>
<ul>
<li>If you have any kind of third-party binary-only application, or want to use an existing binary component without lots of complex recompilation, you need a virtual platform running the underlying OS. You cannot squeeze that into a host-compiled API simulator.</li>
<li>You are not using the same compiler and code-generation settings and build settings as you are for your actual target, and this can (read: will) introduce nasty compiler version issues.</li>
<li>It forces you to maintain an additional build variant for your code, which can be pretty expensive for a complex build.</li>
<li>You are not using the real OS scheduler, device drivers, and interrupt structure found on the target system. This can have a huge impact, especially for multithreaded multiprocessor systems.</li>
<li>The API simulator needs to be kept in synch with the real software stack, and customized in the same way for any particular target. This is hard to get right (even though it has been done).</li>
<li>The API simulator does not handle heterogeneous systems very well, such as chips or boards or racks mixing two or more different OS kernels in the same system (like a DSP and a main processor OS).</li>
<li>API simulation completely falls apart when the OS is no longer the lowest level of the software stack, but you also have a hypervisor layer underneath the OSes on your target system. An API simulator simply cannot represent this kind of case.</li>
<li>Using a virtual platform and the real target binaires also fits with the very important &#8220;fly what you test, test what you fly&#8221; principle of embedded software development.</li>
</ul>
<p>For various subsets of these reasons, I see many users picking up virtual platforms as a way to streamline application development. For example, <a href="http://www.virtutech.com/news_events/pr/pr2009-02-11-595.html">NASA recently selected a virtual platform based on Simics </a>to develop the software for the new Orion spacecraft. That is going to be a complete software stack, not just OS and drivers, which tend to to be fairly off-the-shelf component for these kinds of systems. Most of the effort is on the application level, and the platform used is a virtual platform.</p>
<p>However, note that there are cases where a fast virtual platform like we are discussing here is not sufficient to validate all aspects of the code. I think the main reason we see different viewpoints on this, is that we are looking at very different types of software-hardware integration.</p>
<p>In a <a href="http://jakob.engbloms.se/archives/153">blog post I wrote last year on the dead-ness of cycle-accurate simulation</a>, Grant Martin of Tensilica pointed out that <a href="http://jakob.engbloms.se/archives/153#comment-1652">some software desperately needs cycle-accuracy </a>as it is intimately dependent on the timing of the hardware. This is certainly true for some aspects of drivers, and more so for the really early boot code.</p>
<p>Here, FPGA-based hardware-accelerated simulation of the actual design in VHDL or Verilog makes eminent sense as a way to get the details perfectly right. But that is only one part of a much greater system development puzzle, and it really only applies to very small subsystems as  it is kind of hard to fit much more than a single chip inside a hardware acceleration unit. Just as Frank Schirrmeister says, hardware accelerated simulation is very important. The nice article on the <a href="http://jakob.engbloms.se/archives/639">IBM z10 development </a>that I blogged about earlier says exactly that: for some parts of the validation, there is no way around using the actual hardware RTL design.</p>
<p>And in the end, you have to test the timing and analogue aspects of a design on physical hardware anyway. There should not be too many suprises at this stage, if you have used all of the cool current tools right. But there surely will be some &#8212; even a VHDL simulation is a simulation, and not reality, after all.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/651"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/651" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/651" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/651/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Grant Martin on the &#8220;Verification is 70% of the Effort&#8221; Claim</title>
		<link>http://jakob.engbloms.se/archives/361?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/361#comments</comments>
		<pubDate>Sun, 30 Nov 2008 08:16:36 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Grant Martin]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=361</guid>
		<description><![CDATA[Over at Taken for Granted, Grant Martin just did a very good write-up on the &#8220;accepted fact&#8221; that verification is seventy percent of a chip design effort. It is not exactly easy to prove this point, but is it really just an urban myth that has gained credibility by being repeated over and over again? [...]]]></description>
			<content:encoded><![CDATA[<p>Over at Taken for Granted, Grant Martin just did a very good write-up on <a href="http://www.chipdesignmag.com/martins/2008/11/27/the-myths-of-eda-the-70-rule/">the &#8220;accepted fact&#8221; that verification is seventy percent of a chip design effort</a>. It is not exactly easy to prove this point, but is it really just an urban myth that has gained credibility by being repeated over and over again?</p>
<p>Go over there to see what he has to say.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/361"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/361" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/361" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/361/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Book review: Taxonomies for the &#8230; Digital Systems</title>
		<link>http://jakob.engbloms.se/archives/175?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/175#comments</comments>
		<pubDate>Sat, 26 Jul 2008 18:49:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Brian Bailey]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Thomas Andersson]]></category>
		<category><![CDATA[VSIA]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=175</guid>
		<description><![CDATA[The book &#8220;Taxonomies for the Development and Verification of Digital Systems&#8220;, edited by Brian Bailey, Grant Martin, and Thomas Andersson, was published in 2005 by Springer Verlag. It is a legacy of the defunct VSIA, and presents an attempt to bring order to nomenclature and taxonomies in the chip design field (its scope is defined [...]]]></description>
			<content:encoded><![CDATA[<p class="firstHeading"><a href="http://jakob.engbloms.se/wp-content/uploads/2008/07/taxonomies-cover.png"><img class="alignleft size-medium wp-image-176" style="margin: 5px 10px;" title="taxonomies-cover" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/taxonomies-cover-192x300.png" alt="" width="96" height="150" /></a>The book &#8220;<a href="http://www.springer.com/engineering/circuits+%26+systems/book/978-0-387-24019-0">Taxonomies for the Development and Verification of Digital Systems</a>&#8220;, edited by Brian Bailey, Grant Martin, and Thomas Andersson, was published in 2005 by Springer Verlag. It is a legacy of the <a href="http://vsi.org/">defunct VSIA</a>, and presents an attempt to bring order to nomenclature and taxonomies in the chip design field (its scope is defined to be broader than that, but in essence, the book is about SoC design for the most part).</p>
<p class="firstHeading"><span id="more-175"></span></p>
<p class="firstHeading">The book is obviously a collection of previous work, and the style is quite inconsistent from section to section. Not so much that it detracts from the value of the information, but it does feel a bit rushed.</p>
<p class="firstHeading">The book presents four sets of taxonomies:</p>
<ul>
<li>Models, which is by far the richest part. In addition to a definition of terms in the field of building models of digital systems, it also offers a detailed classification scheme for the models. The classification scheme uses five &#8220;resolution&#8221; axes along with an external/internal perspective to define what a model contains and at what level of resolution.</li>
<li>Functional Verification, which is really just a collection of terms.</li>
<li>Platform-Based Design, which is more of a marketing discussion in how to define and design platforms. Also mostly a definition of terms.</li>
<li>Hardware-Dependent Software (HdS), which defines terms. It has some attempts to classify software along some axes, but it does not really work out too well.</li>
</ul>
<p class="firstHeading">The main value is really in the second chapter, where it does provide a decent basis for discussing the level of abstraction at which models are created (note that accuracy is different from abstraction: a very detailed model at a very low level of abstraction can be totally off when considered from an accuracy perspective).</p>
<p class="firstHeading">However, I fear that the impact of this work has been fairly limited. No discussion on modeling that I have been participating in has really gone back to this basis and worked from there. I think a key problem here is that the material is only available as a quite expensive book from Springer, rather than as a freely downloadable document on the web. It is clear to me that spreading ideas today depends on free and easy digital access to the information&#8230; that&#8217;s why I have my own publications page up to make stuff that I have created available for reading.</p>
<p class="firstHeading">The hardware-dependent software (HdS) section gave rise to several &#8220;but they forgot X&#8221; comments from my part. For example, it is missing important aspects like processor virtualization, hypervisors, and IO virtualization (which totally change the game on HdS). Also, SMP operating systems are only giving a passing reference, with the focus on uniprocessors. And why do people keep using the term &#8220;<a href="http://en.wikipedia.org/wiki/Rate-monotonic_scheduling">Rate-Monotonic Analysis</a>? There are so many much more modern fixed-priority scheduling analysis theories, methods, and tools available that RMA is like talking about programming in assembler&#8230;</p>
<p class="firstHeading">What a choice for summer vacation reading&#8230;  and thanks to Bart Vanhournout at CoWare for the tip about the book in a discussion trying to pin down abstraction levels.</p>
<p class="firstHeading">
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/175"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/175" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/175" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/175/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Is Cycle Accuracy a bad Idea?</title>
		<link>http://jakob.engbloms.se/archives/153?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/153#comments</comments>
		<pubDate>Fri, 11 Jul 2008 20:45:02 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Axys]]></category>
		<category><![CDATA[Carbon Technology]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[CoWare]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[DEC]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Infineon]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[rtl]]></category>
		<category><![CDATA[scdsource]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=153</guid>
		<description><![CDATA[In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea [...]]]></description>
			<content:encoded><![CDATA[<p>In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all&#8230; is ARM right or am I right, or are we both right since we are talking about different things?</p>
<p>Let&#8217;s look at this in more detail.</p>
<p><span id="more-153"></span></p>
<h2>Definitions</h2>
<p>A clock-cycle (CC) model in this discussion is something that attempts to provide a cycle-by-cycle depiction of the behavior of a computer system. Usually, such models are driven by a cycle-by-cycle clock, as that is the easiest way to write and structure them.<br />
A cycle-accurate (CA) model is a CC model where the depiction is &#8220;the same&#8221; as what would happen in the real system provided they both started from the same state. </p>
<h2>What is ARM Doing?</h2>
<p>ARM seems to be passing on the tools and technologies they acquired when they bought Axys back in 2004. These tools are CC-oriented, and are aimed at hardware architects (and some really-low-level software work). They make it possible to evolve a target design cycle by cycle in the simulator to get a very accurate picture of the target behavior. I think this fits very well for Carbon, as they generate cycle-driven very accurate models by essentially compiling the actual RTL implementation of a piece of logic, processor, or device into something a bit faster than plain HDL simulation. Carbon models are a natural fit for the Axys tools.</p>
<p>Basically, it sounds as if ARM decided that manually creating CC level CA models for their latest processors for use in the Axys tools (SoC Designer) was too much work and too hard to validate. Thus, they pass the whole thing on to Carbon and seem to expect Carbon to generate CA models for use with SoC designer straight from the actual ARM implementation RTL. Carbon will have the old CC/CA models written by Axys (and later ARM), and then generate new models for new generations of ARM chips like the Cortex A9. I quote:</p>
<blockquote><p>&#8220;The model generation flow will be optimized and validated using the RTL code, ensuring speed and accuracy. The processor models will also leverage the Carbon model application programming interface (API) to offer a direct connection to the ARM RealView(R) Debugger. Carbon-generated models of ARM IP will offer our customers the fastest, most-accurate path for firmware development and architectural exploration.&#8221; (<a href="http://www.carbondesignsystems.com/Press/20080707%20Carbon%20Press%20Release.pdf">press release</a>)</p></blockquote>
<p>And:</p>
<blockquote><p>ARM made this decision, Cornish said, because it&#8217;s become increasingly difficult and time-consuming to develop cycle-accurate models. &#8220;We recognized it would make more sense to work with a specialist like Carbon that has technology for generating models directly from RTL,&#8221; he said. (<a href="http://www.scdsource.com/article.php?id=264">SCDSource News Piece on the deal</a>)</p></blockquote>
<h2>Feasibility of Construction</h2>
<p>The core argument here is really how easy or feasible it is to build CA models of a processor core (or any other really complex piece of logic). There are several interesting views to consider.</p>
<ol>
<li>The ARM statement is basically saying that building CA models of a processor core is very hard. It is hard to get right, hard to validate, and hard to maintain. So why even try? Better to generate it from the RTL and let experts at doing that do the work.</li>
<li>In my PhD thesis from 2002, I concluded that building an accurate model of a processor from public information and reverse engineering is very very difficult, and cited a number of computer architecture and real-time systems attempts to build models that all turned out to have accuracy issues. I did not know much about EDA then &#8212; and ESL did not really exist. But I think that still holds water: constructing a model of a processor is hard.</li>
<li>In the SCDSource article, I make the statement that &#8220;Building cycle-accurate (CA) models is very difficult, as you need to understand and describe the implementation details of complicated hardware units. &#8230; It is quite easy to end up with something that is essentially an alternative implementation to the actual chip RTL. It is especially difficult for third parties, as it requires access to the device and processor core designers to explain the design.&#8221; Which is essentially saying that you need to get inside the processor design group to get the information.</li>
<li>The common knowledge that all great processor design teams, from the DEC Alpha to Intel x86s to AMD Opterons to IBM Power to Freescale Power to Infineon TriCore to Sun Niagara use internal cycle-detailed simulators as their main design tools to prototype and decide how to design pipelines, memory systems, and system platforms. In this case, the simulator comes before the processor, not the other way around.</li>
<li>Tensilica has, as Grant Martin points out in comments at <a href="http://www.scdsource.com/article.php?id=266">SCDSource</a>, tools that generate both the processor and an accurate model at the same time from the same information base.</li>
<li>CoWare&#8217;s LisaTek tools for describing and generating application-specific processors also claim to generate accurate models from the LISA source files in a way similar to Tensilica but based on a user describing a completely custom design in a third-party tool. In the case of Tensilica, the tool and the design come from the same company.</li>
</ol>
<p>So where does this leave us? It makes it clear that in order to build a good cycle-accurate model you need access to internal information and the processor design/processor design team. The CA model can be built either:</p>
<ol>
<li>By synthesizing from the RTL, Carbon-style.</li>
<li>By synthesizing from some more abstract design description, Tensilica or LisaTek-style.</li>
<li>By the design team as part of the design process.</li>
<li>By some poor guy working after the fact from specs and test cases.</li>
</ol>
<p>I think the ARM-Carbon deal (and all practical experience as well) invalidates the fourth variant. Essentially, that is what Axys had to do: build models after the fact, separate from the CPU design flow. This is a property of how ARM design processors and the fact that Axys began life outside of ARM (my guess, nota bene). It is what computer architecture researchers often want to do but fall down on over and over again. In fact, a common question from computer architecture newbies is if Virtutech Simics has correct models of processors like the Intel Pentium4 or Core 2 available to use as starting points in research. It would be nice, but sorry, we do not.</p>
<p>But the other three variants do make sense, and will all result in some kind of decent model. Which one you end up doing depends on the style of your design and quite likely the complexity of the processor and system design. In the end, any truly revolutionary design (think Sun Rock, for example) will need to write a custom simulator as tools will not have the concepts in them to model all ideas. It seems that simple &#8220;standard&#8221; designs that fit in the categories of &#8220;custom RISC&#8221; or &#8220;custom DSP&#8221; and that do not break new ground in computer architecture can probably be designed using tools that allow processor and simulator generation. I think that most heavy-duty general-purpose processor cores will have to do either the design-model or RTL-generation path, while more accelerator-style cores can use the tools approach.</p>
<p>As a final note, there could really be two different problems being addressed here regarding &#8220;cycle accuracy&#8221;, and that this might contribute to different levels of feasibility:</p>
<ul>
<li>Using the simulator to validate and optimize software performance can tolerate some errors in details as long as errors do not accumulate (see for example the &#8220;<a href="http://moss.csc.ncsu.edu/~mueller/wcet06/accepted/5.html">timing anomalies</a>&#8221; or &#8220;<a href="http://www-emsoft02.imag.fr/Programme/Engblom.pdf">unbounded long timing effects</a>&#8221; found in WCET research). It is about understanding the software behavior versus the processor design (or complex accelerator design versus input data), in small focused spots of execution.</li>
<li>Using the simulator to validate a chip design including buses and other devices that can be bus masters. This ought to require a higher level of accuracy, as the penalty for errors would potentially seem greater. And this is also where ARM&#8217;s SoC designer fit in, rather than as a tool to understand the software behavior. The scope here is larger and there is usually no idea of zooming in on detail at particular points in time.</li>
</ul>
<p>So where does this land us?</p>
<p>I guess that CC/CA models can be built if you have a nice inside track to the design team, and that the only sensible way to use them is as a zoom device for the places in your code where you absolutely need the details. Most of the time (say 90-95-99%) software does not need CC models, but rather something that is functionally accurate and that runs really really fast so that all software can at least be executed. That is something a CC model will never be able to do, at least not for systems using non-trivial operating systems requiring a few billion instructions just to boot&#8230;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/153"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/153" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/153" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/153/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

