<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; ESL</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/eda/esl/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Tue, 27 Jul 2010 19:57:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Describe is not the same as Design</title>
		<link>http://jakob.engbloms.se/archives/1083?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1083#comments</comments>
		<pubDate>Mon, 15 Feb 2010 20:56:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1083</guid>
		<description><![CDATA[The discussion on my previous blog post about &#8220;the ideal ESL language&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there. On one hand, we have the [...]]]></description>
			<content:encoded><![CDATA[<p>The discussion on my previous blog post about &#8220;<a href="http://jakob.engbloms.se/archives/1008">the ideal ESL language</a>&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there.</p>
<p>On one hand, we have the task of supporting the design of new hardware bits, for the purpose of creating it. On the other hand, we have the task of describing a particular design for the purpose of simulating it. These two are not necessarily the same.</p>
<p><span id="more-1083"></span>To use an <a href="http://jakob.engbloms.se/archives/1035">analogy with building a house</a>, a design language helps the architect create the house (piece of hardware). Since the architect relies on craftsmen and experts (compilers) to do detailed design (how to put in windows, where to put light switches, etc.), the high-level description does not contain all the details of the house. However, if you are trying to simulate the house (piece of hardware) so that its inhabitants (software) don&#8217;t see the difference to the real thing, the details are sometimes what matters most. For example, the precise way to operate the stove in the house is very important for familiarity, but is a detail most likely left out of the architect&#8217;s initial drawings.</p>
<p>A design language can leave many things unspecified to be filled in by a compiler, but these things can be absolutely core to a description language. In particular, programming register maps tend to be created as a not-too-important side activity in hardware design. They do not really need to be visible in higher-level ESL languages, as they can obviously be filled in later by a tool or a human. But for a description language, they are absolutely core.</p>
<p>A description language can also leave out many parts of the hardware. If the software being used or written does not use certain modes or functions of a piece of hardware, those pieces can be ignored and implemented as dummies. That means that support for dummies is very important in description languages. But dummies make little sense in a design language, as you are unlikely to design a chip with lots of area spent on dummy functions that do nothing.</p>
<p>A description language can also ignore crucial aspects like power constraints and synthesis constraints. These are guidelines for a compilation step that has no bearing on the description of the hardware &#8212; the description language should describe what ended up happening, not the if, please, what, and buts that guided how we got there.</p>
<p>For virtual platform creation, you seem to need a bit of both. I maintain that most of a VP is based on old hardware that exists, which calls for languages with strong description abilities. That&#8217;s the space that <a href="http://jakob.engbloms.se/archives/99">Simics DML </a>was designed for. For the small part of the hardware that is novel would be nice to have some way to convert from a design language to a virtual platform. Here, I don&#8217;t really see any usable current tools or languages &#8212; SystemC is really more a design language, but if you want a virtual platform model, you have to use it as a description language. There is no automagic getting to a fast abstract model from a design-oriented description. That&#8217;s why we need new, higher level systems, that can push out decent descriptions from a design.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1083/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Neat Register Design to Avoid Races</title>
		<link>http://jakob.engbloms.se/archives/1070?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1070#comments</comments>
		<pubDate>Thu, 28 Jan 2010 18:59:53 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[64-bit computing]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[programming register]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1070</guid>
		<description><![CDATA[In his most recent Embedded Bridge Newsletter, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat! I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-589" style="margin: 5px 10px;" title="racecondition" src="http://jakob.engbloms.se/wp-content/uploads/2008/01/racecondition.png" alt="racecondition" width="99" height="78" />In his most recent <a href="http://garystringham.com/newsletter.shtml?nid=039">Embedded Bridge Newsletter</a>, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat!</p>
<p>I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other two variants. The idea of having a &#8220;write mask&#8221; in one half of a 32-bit word is really clever.</p>
<p>However, this got me thinking about what the fundamental issue here really is.</p>
<p><span id="more-1070"></span></p>
<p>As I see it, it is the fact that the processor cannot address small enough units atomically. The <a href="http://garystringham.com/newsletter.shtml?nid=037">read-modify-write that was used to start the discussion in the Embedded Bridge #37</a> was needed in order to get the current state of a configuration register, change some setting that only occupied a few bits in it, and write back the result to the register. The way most configuration registers that I have seen in practice works.</p>
<p>But if each setting could be given its own register, the problem would go away. Each operation would target a unique address, achieving the same effect as the bit-wise masks or write-1 solutions proposed. The core problem is that hardware tends to share settings into registers, as it has been considered too expensive to put information that might cover a range as small as [0,1] into a 32-bit register. Probably, since there is a lack of addresses for registers, you cannot have 1000 settings cause each simple device to use up 1000 words of physical addresses.</p>
<p>But is that really an issue, if we look forward?</p>
<p>It seems to me that, as 64-bit instruction sets and addressing systems penetrate down into more and more embedded systems, a simple solution would be to throw address space at the problem. I don&#8217;t think it is uneconomical to allocate huge chunks of memory space to each device, giving each setting its own register, when you have 64 bit virtual addresses to work with. There is no way you can fill up a physical memory system (guess that will some day come back to haunt me)&#8230; even the highest-end machines today only use something like 40 bits for actually addressing physical memories.</p>
<p>The software would be simpler and more robust, with virtually no cost.</p>
<p>Another solution that I have also seen starting to appear is to dispense with register settings altogether, and rather define a command API that the processor &#8220;calls&#8221; by putting in command packets into some memory area. This does require quite a bit of silicon for a decoder, but it provides for a much higher level of interaction with devices. As hardware devices get defined in successively higher-level languages (C, C++, UML, MatLab, &#8230;), and <a href="http://jakob.engbloms.se/archives/871">their programming interfaces and associated drivers get autogenerated</a>, this solution makes eminent sense.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1070/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The System, Not the Parts</title>
		<link>http://jakob.engbloms.se/archives/1035?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1035#comments</comments>
		<pubDate>Sat, 19 Dec 2009 19:38:22 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[business issues]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Peter Day]]></category>
		<category><![CDATA[podcast commentary]]></category>
		<category><![CDATA[Russel Ackoff]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1035</guid>
		<description><![CDATA[I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;Peter Day&#8217;s World of Business&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was originally published in 2007. The main theme of the interview is the need to shift business thinking from small details [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg"><img class="alignleft size-full wp-image-1036" style="margin: 10px;" title="Peter days world of business" src="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg" alt="Peter days world of business" width="100" height="100" /></a>I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;<a href="http://www.bbc.co.uk/podcasts/series/worldbiz/">Peter Day&#8217;s World of Business</a>&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was <a href="http://news.bbc.co.uk/2/hi/business/6338527.stm ">originally published in 2007</a>.</p>
<p><span id="more-1035"></span></p>
<p>The main theme of the interview is the need to shift business thinking from small details to entire systems. From operations research where you spend lots of time understanding some process or department in great detail, to a system-level thinking where you focus on what an entire enterprise is doing.</p>
<p>For me, this struck a chord in my system-level heart&#8230; in my world of computer systems and virtual platforms, system-level is what it is so hard to get engineers to. Far too much time is spent (in my opinion) understanding, modeling, and tweaking subsystems. Far too little effort is spent on understanding the whole, how things fit together in practice, taking software, hardware, and software system evolution over time into account. The analogy is not perfect, but there are more things that are alike than are not.</p>
<p>The most interesting analysis that Russell Ackoff fires off from his perspective is that of comparing companies and architecture. An architect knows the whole of a building, but does not entirely go into details on just how it is to be built. He/she trusts the carpenters, bricklayers, and other workers to know how best to solve their local problems. Basically, applying hierarchical abstraction to the task of constructing an actual building.</p>
<p>This got me thinking some of why this is the case. I think it could be because building things (castles, cathedrals, houses, walls, pyramids, canals, &#8230;) must have been among the most complex tasks undertaken for a very long time in human history. Thanks to this long history, we have perfected the abstraction and division of labor in construction. Buildings are built in a certain way, by a certain set of crafts, since that method has been proven to work well for a very long time. So just like in the case of the design patterns craze in the late 1990&#8242;s, architecture might have something to teach us about how to build hardware/software systems too.</p>
<p>Note that for some reason, I cannot find a link to the podcast on the BBC homepage. But if you subscribe in iTunes or similar, I think you will find it. Something is not as user-friendly as it could be.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1035/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing in SystemC @ FDL</title>
		<link>http://jakob.engbloms.se/archives/880?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/880#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:48:26 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Marius Monton]]></category>
		<category><![CDATA[Mark Burton]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=880</guid>
		<description><![CDATA[Along with Marius Monton and Mark Burton of GreenSocs, I will be presenting a paper on checkpointing and SystemC at the FDL, Forum on Specification and Design Languages, in late September 2009. The paper will explain how we did Simics-style checkpointing in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-881" style="margin: 5px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" />Along with Marius Monton and Mark Burton of <a href="http://www.greensocs.com">GreenSocs</a>, I will be presenting a paper on <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://www.systemc.org">SystemC </a>at the FDL, <a href="http://www.ecsi-association.org/ecsi/fdl/fdl09/mainpage.asp?fn=advance">Forum on Specification and Design Languages</a>, in late September 2009.</p>
<p>The paper will explain how we did <a href="http://www.virtutech.com/whitepapers/simics_checkpointing.html">Simics-style checkpointing </a>in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics attribute system.</p>
<p><span id="more-880"></span>It is an approach that does not have the limitations of the &#8220;save the entire simulation process&#8221; method employed by Cadence (and I think also CoWare) in their <a href="http://jakob.engbloms.se/archives/817">SystemC checkpointing solution</a>. It does require you to mark all relevant state in your models, but the benefit from doing so is that regardless of how you change the code of a model, you can still use the same old checkpoints. It is also portable across hosts. We did have to do some patching to the OSCI SystemC kernel to draw out and reset all relevant state from the kernel. The OSCI kernel does not provide sufficient interfaces to checkpoint its state in its vanilla form.</p>
<p>The conference takes place on September 22 to 24, in Sophia Antipolis in France. Now all I have to do is figure out how to get there in the most convenient way. I expect this to be as much fun as the other EDA conferences I have been to recently (I seem to only go to such events nowadays, nothing left on the old embedded circuit for me it seems).</p>
<p>By the way, the FDL logo is really pretty. I think all long-running events should spend the time to create a recognizable logo. My old real-time conferences used to just have plain text and the <a href="http://www.ieee.org">IEEE </a>and <a href="http://www.acm.org">ACM </a>logos.</p>
<p><img class="aligncenter size-full wp-image-882" title="fdl_logo_new" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdl_logo_new.jpg" alt="fdl_logo_new" width="435" height="159" /></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/880/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Conquering Software with Software High-Level Synthesis</title>
		<link>http://jakob.engbloms.se/archives/871?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/871#comments</comments>
		<pubDate>Fri, 31 Jul 2009 22:12:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[Kees Vissers]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=871</guid>
		<description><![CDATA[This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to software eventually though, and had another good dialogue with the audience. Thanks to the tough DAC participants who held out to the end of the last panel of the last day!</p>
<p>As is often the case, after the panel has ended, I realized several good and important points that I never got around to making&#8230; and of those one struck me as worthy of a blog post in its own right.It is the issue of how high-level synthesis can help software design.</p>
<p><span id="more-871"></span>At the end of the panel, the last comment from Kees Vissers of Xilinx pointed out that high-level synthesis is a very powerful way to build hardware. I think his point was that hardware and software are not that different&#8230; but the remark also got me thinking. If it is the case that high-level synthesis currently makes hardware creation easier, can&#8217;t it also be applied to software creation? In particular, if you have a HLS description of a piece of hardware, can&#8217;t you also generate its driver software?</p>
<p>I think that makes eminent sense, since one of the hard parts of doing device drivers is just getting the use of the programming registers of a device right. The programming register interface is a really strange thing if you think about it. It is not native to either software or hardware, really.</p>
<p>In hardware, you communicate between devices using fifos or signals or packet-based mechanisms which do not in general look like programming register writes. You move data by sending a stream of data directly, not word-by-word addressing registers. Similarly, on the software side, software units call each other using functions (or higher-level OS abstractions like signals or network packets). They do not put data into addressed registers&#8230;</p>
<p>Today, high-level synthesis as practiced in industry involves describing the function of a device in pretty abstract terms, so that the compiler can make smart decisions on the implementation details. It also makes it easier to try different alternatives in the implementation, trading size, speed, and power consumption against each other. Different types of concurrency and pipelining can be explored.</p>
<p>However, once we get to the hardware-software interface, we get rudely dropped into a world of manual detailing of an interface with no tool support to explore it or validate it. Why should that really be the case? I think that the hardware-software interface requires just as much care as the internals of the device. After all, it is the external face of the device, and if that is too hard to use, users will not get the full benefit of the device. Here are some previous posts on the nature of interfaces and why they matter: <a href="http://www.garystringham.com/newsletter.shtml">1</a>, <a href="http://jakob.engbloms.se/archives/799">2</a>, <a href="http://jakob.engbloms.se/archives/770">3</a>, <a href="http://jakob.engbloms.se/archives/709">4</a>.</p>
<p>So I would propose a different take on this, where you apply synthesis at a higher-level, and generate the hardware internals, the programming register interface, and the software driver from the same source. The interface you design for a device would be a set of function calls expressed in software terms, and thus relatively easy to use from software. Let&#8217;s call this SHLS, Software High-Level Synthesis. Or maybe Software-Level Synthesis, SLS.</p>
<p>I am well aware that most operating systems do not provide an interface to device drivers consisting of function calls&#8230; but rather rely on pretty non-semantic methods like read/write/ioctl. However, that is easy to overcome by generating a user-level interface library in addition to the raw device driver. Obviously, a tool like this would need some adaptation to apply to each operating system targeted. But that does not feel like something that cannot fairly easy be handled by a template system. Compared to the complexity of synthesizing hardware this feels pretty basic.</p>
<p>If you hide the software-hardware interface inside a black box like this, it can also be implemented in quite different and interesting ways. For example, you could imagine using a small memory-mapped buffer where you enter commands and data and then ask the hardware to &#8220;parse&#8221; it, rather than using discrete registers with immediate effects. Or you could optimize the coding of the relevant hardware operations into a bit-compacted representation no human would like, but which are no problem for the machine.</p>
<p>Another option is obviously to just stop at the hardware-software interface, but let the tool help the hardware designer build the programming register interface and explore various options for it. Not having to invent a register layout from scratch should make that job much easier.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/871/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The TLM DAC</title>
		<link>http://jakob.engbloms.se/archives/865?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/865#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:47:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[tlm]]></category>
		<category><![CDATA[TLM-2.0]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=865</guid>
		<description><![CDATA[The past few days here at DAC, a big theme has been transaction level modeling (TLM). TLM is often considered to be SystemC TLM-2.0. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />The past few days here at <a href="http://www.dac.com/46th/index.aspx">DAC</a>, a big theme has been transaction level modeling (TLM).</p>
<p>TLM is often considered to be <a href="http://www.systemc.org/apps/group_public/workgroup.php?wg_abbrev=tlmwg">SystemC TLM-2.0</a>. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, it is clear that it is not that simple&#8230;</p>
<p><span id="more-865"></span>The issue is that even if all agree on using the TLM-2.0 standard and its default standard generic memory-mapped bus protocol and payload for the memory-map part of their device models, there are other interfaces which are not standard at this point in time.</p>
<p>For example, there is no standard way to model interrupts between devices. So any time you have interrupts in a system (which tends to be always), you need to write custom wrappers between modules to convert different ways of modeling interrupts. Even worse, the standard way to do it is to use SystemC signals, which are definitely not TLM abstractions. They take a detour through the SystemC kernel, which is quite costly.</p>
<p>The defining property (from a simulation execution perspective) of TLM is that your simulation modules talk directly to each other through <em>direct function calls</em>, rather than passing over whatever simulation kernel you happen to be using. Essentially, TLM tends to convert simulators into being much more like &#8220;regular programs&#8221;, with fewer references to the simulation kernel and its event and time handling. In my world, unless you are doing direct function calls, you are not doing TLM.</p>
<p>Note that this state of things in the SystemC world is likely to change for the better over time. <a href="http://www.greensocs.com">GreenSocs</a> announced at DAC that they are working with <a href="http://www.virtutech.com">Virtutech </a>and an unnamed other partner to create a set of TLM interfaces for other interconnects, such as signals (interrupts under another name), serial, and Ethernet.</p>
<p>But apart from all the technicalities of SystemC TLM-2.0 and how it works, the big question is just what to use TLM for, and how. Here, everyone seems to try to turn TLM into their own use cases. The most obvious application is doing fast virtual platforms, but you also have TLM use as the basis for hardware synthesis, validation, golden reference models, architectural exploration, and pretty much all other EDA design tasks.</p>
<p>Even so, the most important message for me is that the EDA industry is actually starting to get interesting in TLM. It is no longer a quaint odd thing done by some peripheral start-up companies, but rather a mainstream technology that everyone has to pay attention to.</p>
<p>Finally, I want to point out that TLM is not just SystemC. TLM is a general idea that has been in <a href="http://jakob.engbloms.se/archives/130">active use since the late 1960s</a>. It is the obvious way to model a computer, if all you are concerned about is how it looks to the software. Another current example is the <a href="http://www.virtutech.com/whitepapers/simics-tlm.html">Simics style of TLM</a> (<a href="http://www.virtutech.com/whitepapers/modeling.html">and here</a>), which is similar to but different in details from the SystemC implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/865/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cadence Industry Insight: &#8220;Virtual Platforms Unite HW and SW&#8221;</title>
		<link>http://jakob.engbloms.se/archives/784?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/784#comments</comments>
		<pubDate>Fri, 22 May 2009 06:41:09 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[ISX]]></category>
		<category><![CDATA[Richard Goering]]></category>
		<category><![CDATA[scdsource]]></category>
		<category><![CDATA[software testing]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=784</guid>
		<description><![CDATA[Another Cadence guest blog entry, about the overall impact of virtual platforms on the interaction between hardware and software designers. Essentially, virtual platforms are a great tool to make software and hardware people talk to each other more, since it provides a common basis for understanding. My entry is called &#8220;Virtual Platforms unites Hardware, Software [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin-left: 5px; margin-right: 5px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />Another Cadence guest blog entry, about the overall impact of virtual platforms on the interaction between hardware and software designers. Essentially, virtual platforms are a great tool to make software and hardware people talk to each other more, since it provides a common basis for understanding.</p>
<p><span id="more-784"></span>My entry is called &#8220;<a href="http://www.cadence.com/Community/blogs/ii/archive/2009/05/21/guest-blog-virtual-platforms-unite-hardware-software-designers.aspx">Virtual Platforms unites Hardware, Software Engineers</a>&#8220;, and is presented by <a href="http://www.cadence.com/community/posts/rgoering.aspx">Richard Goering </a>(who used to be with <a href="http://www.scdsource.com">SCDSource</a>), in his &#8220;<a href="http://www.cadence.com/Community/blogs/ii/default.aspx">Industry Insights</a>&#8221; section of the Cadence community of blogs. Richard Goering has a personal post pointing in the same direction, about <a href="http://www.cadence.com/Community/blogs/ii/archive/2009/05/13/meeting-the-embedded-software-challenge.aspx?postID=17593">EDA tackling embedded software</a>. Worth reading.</p>
<p>Note that I do <em>not </em>say that hardware and software engineers should use the same <em>programming languages</em> as a result of using virtual platforms. Programming languages efficient for hardware design are quite different from those efficient for virtual platform creation, which are in turn different from good software engineering languages.  In some cases, some of them coincide, but in general, I believe in using the best tool for each job, and a programming language is just a tool. And the more designed it is for its task, the better. Some older posts of mine on this topic:</p>
<ul>
<li><a href="http://jakob.engbloms.se/archives/747">DSL: Purpose-built languages</a></li>
<li><a href="http://jakob.engbloms.se/archives/681">DSL: The tyranny of syntax</a></li>
<li><a href="http://jakob.engbloms.se/archives/283">Multicore programming and DSLs</a></li>
<li><a href="http://jakob.engbloms.se/archives/165">What is the obsession with C in EDA?</a></li>
<li><a href="http://jakob.engbloms.se/archives/157">Kunle Olukotun on DSLs</a></li>
<li><a href="http://jakob.engbloms.se/archives/709">Modeling hardware at a high level for software development</a></li>
</ul>
<p>And there is the <a href="http://jakob.engbloms.se/archives/306">ChipDesign article </a>from last year about using virtual platforms in the hardware design process all the way out to customers.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/784/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Guest Blog at Cadence: &#8220;Way Worse than the Real Thing&#8221;</title>
		<link>http://jakob.engbloms.se/archives/781?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/781#comments</comments>
		<pubDate>Wed, 20 May 2009 10:45:21 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[ISX]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software testing]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=781</guid>
		<description><![CDATA[Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform. As part of explaining why this [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-782" style="margin-left: 5px; margin-right: 5px;" title="avataraspx" src="http://jakob.engbloms.se/wp-content/uploads/2009/05/avataraspx.jpg" alt="avataraspx" width="72" height="72" />Virtutech and Cadence yesterday announced the integration of Virtutech Simics and Cadence ISX (Incisive Software Extensions), which is essentially a directed random test framework for software. With this tool integration, you can systematically test low-level software and the hardware-software (device driver) interface of a system, leveraging a virtual platform.</p>
<p><span id="more-781"></span></p>
<p>As part of explaining why this is cool and what it means, I have a <a href="http://www.cadence.com/Community/blogs/sd/archive/2009/05/18/way-worse-than-the-real-thing.aspx">guest blog posting over at Cadence&#8217;s blog site</a>, called &#8220;Way Worse than the Real Thing&#8221;. The blog is posted under the general &#8220;TeamESL&#8221; &#8220;personality&#8221; on the blog site, which is used for people external to Cadence in the ESL space.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/781/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I Want One&#8230; Trillion Instructions&#8230;</title>
		<link>http://jakob.engbloms.se/archives/709?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/709#comments</comments>
		<pubDate>Sat, 28 Mar 2009 21:10:31 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Dr. Evil]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mpc8641d]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=709</guid>
		<description><![CDATA[There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite [...]]]></description>
			<content:encoded><![CDATA[<p>There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I<a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html"> recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification</a>.</p>
<p><span id="more-709"></span></p>
<p>It all comes down to a simple classic tradeoff that I usually illustrate like this (using more neutral ground than computer systems; and with credit to Peter Magnusson who had this slide already in place when I joined Virtutech back in 2002):</p>
<p><img class="aligncenter size-full wp-image-711" title="simulation-rule" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/simulation-rule.png" alt="simulation-rule" width="457" height="341" /></p>
<p>What this is telling you is simple:</p>
<ul>
<li>You simulate something very large using large units, i.e., low level of detail; or</li>
<li>You simulate something quite small using small units, i.e., high level of detail.</li>
</ul>
<p>I wanted to test the idea that by using less detail, you can run larger test cases and therefore obtain better coverage of overall landscape than diving in and counting cycles in some small part of it. In the end, this made me cross the trillion instruction line &#8212; since each experiment took a few hundred billion target instructions to complete, repeating and tweaking during the development work definitely add up to more than a trillion instructions.</p>
<p>And this is where I have put my little finger close to my mouth and say:</p>
<p style="text-align: center;"><img class="size-full wp-image-710 aligncenter" style="margin-top: 10px; margin-bottom: 10px;" title="drevil_million_dollars" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/drevil_million_dollars.jpg" alt="drevil_million_dollars" width="300" height="318" /></p>
<p>&#8216;I want one trillion instructions&#8217;</p>
<p>So what did I get from these trillion instructions?</p>
<p>An interesting study in how operating system overhead can have a big impact on the profitability of hardware accelerators. By running hundreds of test cases with different assigned computation latencies of a hardware accelerators, as well as different driver models for my hardware (all running under Linux on my favorite MPC8641D), a key diagram emerged:</p>
<p style="text-align: left;"><img class="aligncenter size-full wp-image-712" style="margin-top: 10px; margin-bottom: 10px;" title="hwsw" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/hwsw.png" alt="hwsw" width="872" height="507" /><a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html">Read the paper </a>for all the details, but the key thing to note is that with a poor driver architecture, making the hardware 100 times faster resulted in zero gain in system performance. Had this experiment been performed on a bare-bones platform without a full operating system in place, I am fairly certain that the faster hardware would have been considered much more worthwhile.</p>
<p style="text-align: left;">In the end, I resorted to a driver variant where I had user-level code directly access the device programming interface via an mmap()-mapped memory region. Not pretty, essentially this was bare-metal programming wrapped inside a big cosy Linux package, but it sure was efficient compared to doing a kernel/user mode switch for each hardware operation. But even here, it turned out that making the hardware very very fast as opposed to just very fast had no benefit. It proves to me that the software has to be taken into account in full in order to properly evaluate an idea for a hardware design.</p>
<p style="text-align: left;">You could say that the poor results for acceleration here were due to my inept Linux driver programming skills, but that just underscores the key result: you have to take the software into account. If the conclusion is that a better Linux device driver programmer is needed, you have still decided that the key system bottleneck is not just the speed of the hardware, but how it is used. And that is exactly what system design needs to be about.</p>
<p style="text-align: left;">As an aside, playing around with a complete system like this, and automatically run large volumes of test with varying parameters was a really interesting experience. I must admit that getting to these trillions of instructions required  a few hours of simulation time, but nothing that could not be solved by leaving a computer running over lunch or a long meeting. The machine was modeled using standard Simics &#8220;software timing&#8221;, i.e., without any particular cache or pipeline or bus details, and it seems that that is usually all you need. Had I increased the level of detail and slowed things down by a factor of ten or a hundred, I would never have covered such a large set of test cases and been able to evaluate as many different variants of drivers and hardware speeds.</p>
<h2 style="text-align: left;">IBM did it before me</h2>
<p style="text-align: left;">Finally, I found it interesting that an analogous experience about the effect of creating a complete software stack and testing what looks like a very good hardware idea was reported in an IBM paper from a few years ago, in &#8220;<a href="http://researchweb.watson.ibm.com/journal/rd/502/peterson.html">Application of full-system simulation in exploratory system design and development</a>&#8220;, by Peterson et al, in the IBM Journal of Research and Development. Look at the section about the &#8220;MIP Morphing&#8221; feature, which is essentially cache locking. They do use a fairly detailed simulator for the end evaluation of their performance &#8211; but the key message is that by running a full software stack, they realized that just managing the feature was too hard in a realistic software environment to make it worthwhile:</p>
<blockquote>
<p style="text-align: left;">Initially, the MIP morphing feature was well received by internal development and HPCS customers alike. The team was aware of the need to both manage this hardware feature at the OS level and provide portable abstractions to the programmer to exploit this feature in a productive way. &#8230;</p>
</blockquote>
<p style="text-align: left;">And then:</p>
<blockquote>
<p style="text-align: left;">The implementation effort was facilitated by Mambo, allowing the OS team to prototype the MIP morph idea in a controlled development environment. Taking the prototyping effort to this level of realism uncovered many complexities in supporting the MIP morph in a virtualized manner. ..</p>
</blockquote>
<p style="text-align: left;">And finally:</p>
<blockquote>
<p style="text-align: left;">By prototyping the software support that was <em>needed at the OS level and exposing the usage issues at the application programmer&#8217;s level</em>, the magnitude of the problem was exposed at its fullest. Further, the improvement in performance did not show a sufficient payback for the immense effort that would be required at the software level to support the idea, and as a result it was dropped from further consideration.</p>
</blockquote>
<p style="text-align: left;">It seems that whatever you do, IBM did it first&#8230; and it validates the idea of full-system simulation and that software is king today.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/709/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Adding to Schirrmeister&#8217;s Virtual Platform Myth Busting</title>
		<link>http://jakob.engbloms.se/archives/651?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/651#comments</comments>
		<pubDate>Wed, 18 Feb 2009 12:22:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[Eve]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[Lauro Ritazzi]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=651</guid>
		<description><![CDATA[Frank Schirrmeister of Synopsys recently published a blog post called &#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin: 10px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />Frank Schirrmeister of Synopsys recently published a blog post called <a href="http://www.synopsysoc.org/viewfromtop/?p=64#comment-1008">&#8220;Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”</a>. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some Synopsys-Virtio Innovator examples of such uses. In his view, most appication-software is being developed using host-compiled techniques.  I want to add to this refutal by adding that application-software is surely a very important &#8212; and large &#8212; use case for virtual platforms.</p>
<p><span id="more-651"></span>The beginning of the argument was found in an <a href="http://www.edadesignline.com/howto/212200519">EDA Design Line article titled &#8220;Unified Verification for Hardware and Embedded Software Developers&#8221; </a>by Lauro Ritazzi of Eve USA. In it, he makes the following claim:</p>
<blockquote><p>While some may have achieved the scope of jump-starting software development, they only address application programs that do not require an accurate representation of the underling hardware design. They fall short when testing the interaction of the embedded software with hardware, such as firmware, device drivers, operating systems and diagnostics. For this testing, embedded software developers need an accurate model of the hardware to validate their code, while hardware designers need fairly complete software to fully validate their application specific integrated circuit (ASIC) or SoC.</p></blockquote>
<p>The interesting part here is really that jump-start is just for applications, and that OS and drivers require more details than a fast virtual platform can supply. I do not quite agree with this. But let&#8217;s first see what Frank Schirrmeister said:</p>
<blockquote><p>the majority of the software development on virtual platforms is spent on firmware, device drivers, operating system porting and diagnostics. And that is not &#8211; as one could assume &#8211; on cycle accurate models, but on functionally accurate models with only essential timing, the type of models called loosely timed (LT) in SystemC.</p></blockquote>
<p>I totally agree with this. As is evident from many different <a href="http://www.virtutech.com/casestudies">public use cases</a>, OS, BSP, and driver development is a big use of virtual platforms. For example, last summer, <a href="http://jakob.engbloms.se/archives/137">Freescale announced the QorIQ P4080 with pretty good software support </a>in terms of Linux and VxWorks operating systems, as well as some middleware stacks. All developed on Simics using an even more timing-abstracted model of the hardware.</p>
<p>However, Frank then makes the following claim that I have a harder time with:</p>
<blockquote><p>In contrast, application software is developed more often than not using completely hardware independent techniques, including cross compilation from the host development machine using development kits like Apple’s iPhone development kit.</p></blockquote>
<p>This is to some extent true, but as time goes on, I think this type of development environment is going to be less useful. Traditionally, OS vendors have had tools like VxSim and OSE SoftKernel in place to help customers &#8220;run code on their desktop&#8221;, while using the API of the operating system of choice. However, such solutions have lots of problems in how close they can get to the target.</p>
<ul>
<li>If you have any kind of third-party binary-only application, or want to use an existing binary component without lots of complex recompilation, you need a virtual platform running the underlying OS. You cannot squeeze that into a host-compiled API simulator.</li>
<li>You are not using the same compiler and code-generation settings and build settings as you are for your actual target, and this can (read: will) introduce nasty compiler version issues.</li>
<li>It forces you to maintain an additional build variant for your code, which can be pretty expensive for a complex build.</li>
<li>You are not using the real OS scheduler, device drivers, and interrupt structure found on the target system. This can have a huge impact, especially for multithreaded multiprocessor systems.</li>
<li>The API simulator needs to be kept in synch with the real software stack, and customized in the same way for any particular target. This is hard to get right (even though it has been done).</li>
<li>The API simulator does not handle heterogeneous systems very well, such as chips or boards or racks mixing two or more different OS kernels in the same system (like a DSP and a main processor OS).</li>
<li>API simulation completely falls apart when the OS is no longer the lowest level of the software stack, but you also have a hypervisor layer underneath the OSes on your target system. An API simulator simply cannot represent this kind of case.</li>
<li>Using a virtual platform and the real target binaires also fits with the very important &#8220;fly what you test, test what you fly&#8221; principle of embedded software development.</li>
</ul>
<p>For various subsets of these reasons, I see many users picking up virtual platforms as a way to streamline application development. For example, <a href="http://www.virtutech.com/news_events/pr/pr2009-02-11-595.html">NASA recently selected a virtual platform based on Simics </a>to develop the software for the new Orion spacecraft. That is going to be a complete software stack, not just OS and drivers, which tend to to be fairly off-the-shelf component for these kinds of systems. Most of the effort is on the application level, and the platform used is a virtual platform.</p>
<p>However, note that there are cases where a fast virtual platform like we are discussing here is not sufficient to validate all aspects of the code. I think the main reason we see different viewpoints on this, is that we are looking at very different types of software-hardware integration.</p>
<p>In a <a href="http://jakob.engbloms.se/archives/153">blog post I wrote last year on the dead-ness of cycle-accurate simulation</a>, Grant Martin of Tensilica pointed out that <a href="http://jakob.engbloms.se/archives/153#comment-1652">some software desperately needs cycle-accuracy </a>as it is intimately dependent on the timing of the hardware. This is certainly true for some aspects of drivers, and more so for the really early boot code.</p>
<p>Here, FPGA-based hardware-accelerated simulation of the actual design in VHDL or Verilog makes eminent sense as a way to get the details perfectly right. But that is only one part of a much greater system development puzzle, and it really only applies to very small subsystems as  it is kind of hard to fit much more than a single chip inside a hardware acceleration unit. Just as Frank Schirrmeister says, hardware accelerated simulation is very important. The nice article on the <a href="http://jakob.engbloms.se/archives/639">IBM z10 development </a>that I blogged about earlier says exactly that: for some parts of the validation, there is no way around using the actual hardware RTL design.</p>
<p>And in the end, you have to test the timing and analogue aspects of a design on physical hardware anyway. There should not be too many suprises at this stage, if you have used all of the cool current tools right. But there surely will be some &#8212; even a VHDL simulation is a simulation, and not reality, after all.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/651/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Eclipse Linux Kernel Indexing Works</title>
		<link>http://jakob.engbloms.se/archives/338?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/338#comments</comments>
		<pubDate>Sun, 01 Feb 2009 17:10:18 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[desktop software]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Linux kernel]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[Simon Kågström]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=338</guid>
		<description><![CDATA[Edited on 2009-Feb-01, to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09. Thanks to Simon Kågströms post (and the even better second-generation with screenshots) about using Eclipse for the Linux kernel, I have a much [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-339 alignleft" style="margin: 5px 10px;" title="eclipse_wide_logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/eclipse_wide_logo.jpg" alt="" width="131" height="68" /> <img class="size-medium wp-image-329 alignright" style="margin-left: 10px; margin-right: 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" /> <em>Edited on 2009-Feb-01,  to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09.</em></p>
<p>Thanks to <a href="http://simonkagstrom.livejournal.com/31079.html?view=19559#t19559">Simon Kågströms post </a>(and the even better <a href="http://simonkagstrom.livejournal.com/33093.html">second-generation with screenshots</a>) about using <a href="http://www.eclipse.org">Eclipse </a>for the Linux kernel, I have a much nicer work environment now for my ongoing work in learning Linux device drivers on PowerPC, which has helped me work my way through several hard-to-figure-out system calls.<span id="more-338"></span> Here is a screenshot that I found pretty cool&#8230; the tool has found the definition and comments for the IRQ registration function:</p>
<p style="text-align: center;"><a href="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08.png"><img class="size-medium wp-image-340 aligncenter" title="2008-11-09-21-51-08" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08-300x187.png" alt="" width="300" height="187" /></a></p>
<p style="text-align: left;">2009-Feb-01:</p>
<p style="text-align: left;">I had to rebuild my indexing from scratch in the past weekend, and as a result, I have a word of warning: you have to create a &#8220;C project&#8221; in Eclipse, if you accidentally create a &#8220;Project&#8221;, the Eclipse workspace file will have the wrong name (.project instead of .cproject), and the autoconf-to-eclipse script will not work.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/338/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>EDA Tech Forum Article on Ecosystem Enablement</title>
		<link>http://jakob.engbloms.se/archives/577?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/577#comments</comments>
		<pubDate>Sat, 10 Jan 2009 21:17:54 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[business issues]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ecosystem enablement]]></category>
		<category><![CDATA[EDA Tech Forum]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[qoriq]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=577</guid>
		<description><![CDATA[I have an article about ecosystem enablement for new hardware, co-authored with Richard Schnur of Freescale published in the December 2008 issue of EDA Tech Forum. The core concept is that a virtual platform solution makes it possible to get a new chip to market faster with better software support, and even enables virtual design-in [...]]]></description>
			<content:encoded><![CDATA[<p>I have an <a href="http://www.edatechforum.com/journal/dec2008/streamlining_intro.cfm">article about ecosystem enablement for new hardware, co-authored with Richard Schnur </a>of <a href="http://www.freescale.com">Freescale</a> published in the <a href="http://www.edatechforum.com/journal/dec2008/">December 2008 issue of EDA Tech Forum</a>. The core concept is that a virtual platform solution makes it possible to get a new chip to market faster with better software support, and even enables virtual design-in of a chip at OEM customers before hardware becomes available. The article builds on our joint experience with the QorIQ P4080 launch in the Summer of 2008, where we had several operating systems and middleware packages in place at the moment the chip was announced. EDA Tech Forum requires registration, but it was still free, and there are many other good articles available.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/577/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Threading or Not as a Hardware Modeling Paradigm</title>
		<link>http://jakob.engbloms.se/archives/485?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/485#comments</comments>
		<pubDate>Thu, 01 Jan 2009 08:31:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[Reactive programming]]></category>
		<category><![CDATA[sampalib]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Threading]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=485</guid>
		<description><![CDATA[Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-486" style="margin: 5px 10px;" title="gears-modeling" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/gears-modeling.png" alt="gears-modeling" width="62" height="65" />Traditional hardware design languages like <a href="http://en.wikipedia.org/wiki/Verilog">Verilog </a>were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of <a href="http://www.systemc.org">SystemC</a>, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look &#8220;natural&#8221; as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.</p>
<p>In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of &#8220;natural&#8221; modeling.</p>
<p><span id="more-485"></span></p>
<p>As I see it, the main alternative modeling paradigm is to use a classic event-driven system, where all activity is triggered by events and run the associated code to completion. This makes execution occur in a series of simulation steps in various part of the system, rather than as a set of (pseudo) concurrent tasks.</p>
<h2>Threaded Problems</h2>
<p>The most common complaint with threading is <strong>performance</strong>. This has become very clear in the case of using SystemC for transaction-level modeling. All advice in how to do good and fast TLM coding tells us to use SC_METHODs, which are essentially callbacks that are not active objects in their own right. Note that SystemC models found in the wild are often built on SC_THREADs despite this advice, as that is the &#8220;easiest&#8221; way to do things. Some convenience systems part of the OSCI TLM-2.0 library also rely on threads to convert between AT-style asynchronous and LT-style synchronous function calls (which is pretty unavoidable, but not applicable in the realm of high-performance simulation for virtual platforms).</p>
<p>Furthermore, using threading as a paradigm (even cooperative single-active-thread cooperative threads like in SystemC or classic MacOS) bring with it the <strong>problems of concurrent programming</strong>, in that you suddenly need to care about protecting data structures against conflicting accesses, worry about deadlocks, and similar concurrent programming issues. Without threads, all such issues go away.</p>
<p>Note that using threading as a modeling paradigm with truly concurrent execution of models will make the execution have all the problems of parallel programs, especially non-deterministic execution and hard-to-find bugs. At least a cooperative multitasking system tends to be deterministic in the way it goes wrong.</p>
<p>Threading as a hardware model programming style therefore makes concurrent multithreaded simulation harder rather than easier to achieve. Especially if the semantics of the simulation system specifies an interleaved model of execution as the semantics, which is the case for SystemC. In this cases, there is no way to really make SystemC parallel without adding parallelism as some extra library.</p>
<p>However, one of the biggest practical problems with threading is the problem of <strong>inspecting, changing, and checkpointing simulation state</strong>. With threads, you end up having state stored in local variables on the stacks in the system, as well as in processor registers, the program counter, and other places that are hard to get to from the outside.  This is not just me saying this, I found this well said in the <a href="http://www.sampalib.org/doc/papers/A%20Sampalib%20and%20SystemC%20comparison.pdf">sampalib white paper </a>:</p>
<blockquote><p>Using threads means that part of the simulation state is in stacks, which may limit the ability to persist the state of the simulation in checkpoints.</p>
<p>Using wait() implies context switch which are costly in terms of simulation speed, and thus often discouraged in guidelines for modeling SystemC™ models</p></blockquote>
<p>To furthermore drive this point, all librariesfor general program state serialization that I have seen (for C++ and Java, for example) also rely on explicit state stored in objects, and explicitly do not support the &#8220;transient&#8221; state held in local variables and the program counter. Essentially, only heap-allocated objects are handled in serialization solutions.</p>
<h2>Event-Driven Solutions</h2>
<p>An event-driven transaction-level hardware simulation is coded in a different way from a naive threaded implementation (but not that differently from a more sophisticated threaded program).</p>
<p>Each device model has to make its state explicit as a set of variables, and preferably also declare these for access for an external tool using something like <a href="http://www.greensocs.com/en/projects/GreenControl">GreenSocs GreenControl </a>or <a href="http://www.virtutech.com/whitepapers/modeling.html">Simics Attributes</a>. It also has to expose a set of functions to be called when events happen or other devices in the simulation system send a transaction into the device model.</p>
<p>Additionally, you should encapsulate all state in a model inside the model object and not expose it for direct access from the outside. A pure object-oriented style with accessor functions for everything is required for best modularity.</p>
<p>The advantages of this model are clear:</p>
<ul>
<li>Concurrency problems are reduced, since each function call will run to completion before any other object or function is activated. There is no need to worry about shared data variables, as they should not exist.</li>
<li>Checkpointing and inspection is facilitated, since all state is now explicit and declared.</li>
<li>Performance is typically increased, since there is no need to do context switches between threads. Locality is also increased by having functions run to completion before returning.</li>
<li>True concurrency is easier to achieve, since each model can quite easily be considered a local-state, shared-nothing, explicit message-passing component similar to Erlang threads. This makes it possible for the simulation scheduler to run multiple models concurrently on multiple host threads. For more on this topic, see my <a href="http://jakob.engbloms.se/archives/246">SiCS Multicore Days 2008 </a><a href="http://www.engbloms.se/presentations/engblom-multicore-sics-2008.pdf">presentation on how Simics was threaded</a>.</li>
</ul>
<p>The downside is that some people consider the programming more complicated. Which is really a matter of appearance over substance: event-driven programming tends to be more robust and easier to follow in the long run, since threaded programming makes things a bit too implicit.</p>
<p>Here is the basic example of a thread that does some periodic work.</p>
<p>Threaded style:</p>
<blockquote>
<pre>Thread_for_D():
  loop forever:
    do work...
    wait(some time)</pre>
</blockquote>
<p>Event-driven style, where we just repost an event each time we are called:</p>
<blockquote>
<pre>Time_callback():
  do work...
  post event(some time, Time_callback)</pre>
</blockquote>
<p>Another advantage of event-driven models is that such a paradigm makes it clear that you need to be able to accept any call into the model at any time. This makes for more robust code, since it is quite easy to (intentionally or by mistake) encode an expectation on the sequence of activity in a threaded that might not be what actually happens at run-time. In particular, the state of any protocol being acted on will need to be explicitly rather than implicitly represented.</p>
<p>There is much more to be said on how to code in this style, but there are long papers out there to read on this.</p>
<h2>High-Performance Event-Driven Simulation</h2>
<p>Note that in high-performance virtual platform-style simulation, processors will usually be a special case in both threaded and event-driven styles. That is since the flow of instructions that they execute constitute very many very small actions that cannot affort a context switch between each. Here, the advantage of the event-driven model is even clearer, given some special-casing of processors. This is another long story that I will not reiterate here, but basically, most events as discussed above will be memory accesses from a processor to read and write device registers, and each such memory access can be handled in a single simulation step. No need to switch context or do anything but handle a simple function call. By not having a wait() call to deal with, this mechanism can be kept simple and cheap &#8212; which is essentially using an SC_METHOD in SystemC. But in the complete absence of SC_THREADs and their ilk, many other things can be optimized even better.</p>
<h2>The End</h2>
<p>What I wanted to provide in this almost-article-length post was an idea for the problems that I see threads cause as a modeling paradigm for hardware models, and the advantages offered by a reactive event-driven style. For some reason, this is misunderstood in the modeling community at large, probably because most operating systems and simulation systems in common use today present various forms of threads as the way to model concurrent behavior. However, threads as a prominent user-level programming model are known to be bad in many ways&#8230; and modeling is no exception to this rule.</p>
<p>Note that I realize that threads are needed at some level in order to take advantage of multicore hardware, but I think they are best hidden inside a simpler framework that presents a simpler understandable semantics to the user.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/485/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes from the IP 08 Panel</title>
		<link>http://jakob.engbloms.se/archives/440?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/440#comments</comments>
		<pubDate>Sat, 06 Dec 2008 20:31:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[IP08]]></category>
		<category><![CDATA[panel discussion]]></category>
		<category><![CDATA[Register Design Languages]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[SystemRDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=440</guid>
		<description><![CDATA[Now I am home again, and some days have passed since the IP 08 panel discussion about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-366" style="margin: 5px 10px;" title="ip08" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/ip08.gif" alt="" width="147" height="63" />Now I am home again, and some days have passed since the <a href="http://www.design-reuse.com/ip08/program/panel_virtualplatform.html">IP 08 panel discussion </a>about software and hardware virtual platforms. This was an EDA hardware-oriented conference, and thus the audience was quite interested in how to tie things to hardware design. Any case, it was a fun panel, and Pierre Bricaud did a good job of moderating and keeping things interesting.</p>
<p><span id="more-440"></span></p>
<p>The panel had a clear consensus, which nobody really challenged, that virtual platforms for software development are different in kind from virtual platforms for hardware development. Indeed, a the taxonomy of &#8220;hardware virtual platforms&#8221; versus &#8220;software virtual platforms&#8221; was used frequently and proved quite appropriate.</p>
<p>A software virtual platform has to be fast and its timing can be fairly approximate. It main value, in this context, is that can be created quickly and is useful for early software development and debug. Opinions differed, however, on how to produce them and where to go with them.</p>
<ul>
<li>Markus Willems from Synopsys had the position that they are produced in some appropriate way as a separate task from hardware development. SystemC was his language of choice.</li>
<li>Peter Flake proposed a methodology where you start by developing the software virtual platform and then refine it down towards more detailed models and finally hardware. He brought up Virtutech <a href="http://www.virtutech.com/whitepapers/virtutech_dml.html">DML </a>and <a href="http://jakob.engbloms.se/archives/358">SystemRDL</a>, as examples of languages pointing in this direction.</li>
<li><strong> </strong>Loic Le Toumelin considered the software virtual platform as a something that is generated from a common design entry point, using some form of synthesis that can also generate the hardware and the hardware virtual platform.</li>
<li>I think my realistic position right now is that a software virtual platform is created as a separate item, but that we want to make this work as short and easy as possible and that in the future, the vision is similar to Peter Flake&#8217;s: start with a software virtual platform to define the hardware-software interface.</li>
</ul>
<p>It was also interesting in how different the opinion was when we got to the detailed hardware-oriented virtual platforms. The ones that tend to be clock-cycle level and attempt to be cycle-accurate (CA) in many cases.</p>
<ul>
<li>Markus said that the only good way to build a CA model was to take the RTL and convert it, or run it in an FPGA prototype. He echoed the sentiments <a href="http://jakob.engbloms.se/archives/153">I wrote about in July, that ARM is getting out of cycle-accurate models and the general difficulty of creating such a model by hand</a>.</li>
<li>Peter pointed out that you can have CA models before RTL, as a design tool. I strongly agree with this model of working, it is common in industry and definitely one way to go. However, for existing hardware, I agree that RTL-to-CA seems reasonable, even if the resulting models are painfully slow.</li>
<li>Loic wanted the CA to come from the same source as the software VP, and was very keen on their being in complete agreement on semantics of the hardware.</li>
</ul>
<p>The third major discussion was about the required accuracy and fidelity-to-hardware of a virtual platform. With a consensus that a software virtual platform has to be fast and with timing approximated, it is still clear that many people are uncomfortable about this idea of not being &#8220;exactly like the hardware&#8221;.</p>
<p>For some purposes, you do need complete fidelity to the hardware timing in a CA model. Loic definitely could not accept anything less when giving a customer a virtual platform, and some people in the audience echoed the same sentiment. Most, however, agreed that most software work can be done with simple timing, and that it does not matter all that much if there are some functionality bugs or omissions in the virtual platform. It is still far better than no platform at all!</p>
<p>What is clearly needed, at least for virtual platforms close to a hardware design process, is a way to check the software virtual platform and hardware virtual platform against the functionality and maybe timing of the final RTL. In the cases that you have the RTL, which is far from always in my world.</p>
<p>There were some other questions about software development tools support (of course you use the same debugger and compiler as with a physical platform) and other issues where the panel was mostly in agreement. I guess some of this also indicates that virtual platforms are not yet universally understood and that most people have not really had any experience with them.</p>
<p>Overall, this was a fun panel, and I hope the audience enjoyed it too and learnt something in the process.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/440/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Gary Stringham on Hardware Interface Design vs Virtual Platforms</title>
		<link>http://jakob.engbloms.se/archives/358?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/358#comments</comments>
		<pubDate>Sat, 29 Nov 2008 20:51:55 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[Semifore]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Spectareg]]></category>
		<category><![CDATA[SystemRDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=358</guid>
		<description><![CDATA[I just read an interesting paper from the 2004 Embedded System&#8217;s Conference (ESC) written by Gary Stringham. It is called &#8220;ASIC Design Practices from a Firmware Perspective&#8221; and straddles the boundary between hardware design and driver software development. It was good to see someone take the viewpoint of &#8220;how you actually program a hardware device [...]]]></description>
			<content:encoded><![CDATA[<p>I just read an interesting paper from the 2004 Embedded System&#8217;s Conference (ESC) written by <a href="http://www.garystringham.com">Gary Stringham</a>. It is called &#8220;ASIC Design Practices from a Firmware Perspective&#8221; and straddles the boundary between hardware design and driver software development. It was good to see someone take the viewpoint of &#8220;how you actually program a hardware device is as important as what it does&#8221;. Gary seems to understand both the hardware design and implementation view of things, as well as that of the embedded software engineer. To me, that seems to be a fairly rare combination of skills, to the detriment of our entire economy of computer system development.</p>
<p><span id="more-358"></span></p>
<p>Gary Stringham&#8217;s lists a number of tips on how to create hardware-software interfaces. Some of them are echoed on his <a href="http://www.garystringham.com/newsletter.shtml">monthly newsletter</a>, which is worth a read (even if it is a bit short on detail). Unfortunately, there seems to be no publicly available version of the text. Gary has definitely kept lecturing on the topic since, at venues like the ESC and DVCon it seems, but more recent lecture notes that I have found in the ESC proceedings are pretty sparse. I guess being a consultant teaching people to do these things for a fee makes you a bit hesitant to share all your knowledge freely with the world&#8230; I can understand that position.</p>
<p>Anyway. Some of the comments in the text indicate to me the great value that virtual platforms can bring to the actual design of hardware up front, not just as an execution vehicle for the final design, used by a software engineer who has to take whatever is given.</p>
<p>In particular, the issue of getting ASIC and Firmware designers to collaborate on the same thing at the same time. Quote:</p>
<blockquote><p>The key to designing an ASIC is to get the firmware engineers involved early. They are the customers that will be using the ASIC. Unfortunately, getting them involved early is often difficult to do because the ASIC design has to start several months before the firmware engineers will get parts</p></blockquote>
<p>And</p>
<blockquote><p>When the parts do arrive, the roles are reversed. The firmware engineers are trying to work with<br />
it while the ASIC engineers have mainly forgotten it and have moved on to new projects.</p></blockquote>
<p>The proposed solution to this problem is to involve the firmware people in the hardware design review process, which is a good idea.</p>
<p>It would be even better, however, if the firmware people could have the hardware interface to try as a live thing rather than just reading the documents. This is exactly what virtual platforms offer: quickly build a fast simple prototype of the interface, and hand it over to the software engineers to try.</p>
<p>This is something that I am currently <a href="http://jakob.engbloms.se/archives/330">exploring in some detail with Simics</a>, and that I wrote a piece about in <a href="http://chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">Chip Design </a>earlier this year. Fundamentally, I think this is feasible, provided that hardware designers do not fret too much about timing details and the precise performance of the final implementation, and focus more on the programming interface design first &#8212; and then later go on and make sure the timing and performance is right.</p>
<p>It is just like software development is supposed to be done: start by designing a useful interface to a piece of functionality, and then add in the details and optimize performance within the boundary of that interface. Of course, the interface might need some adjustments to support certain optimizations, but it is quick and easy to provide a new virtual platform with a new behavior to the software engineer. Much faster than providing a new piece of silicon or even new documentation.</p>
<p>Register design tools like <a href="http://www.denali.com/en/products/systemrdl_about.jsp">SystemRDL </a>(<a href="http://www.arm.com/iqonline/news/marketnews/17617.html">now being standardized by SPIRIT</a>), <a href="http://spectareg.com/">Spectareg</a>, <a href="http://www.semifore.com/">Semifore</a>, etc. all touch on this, but all seem to be lacking the ability to actually describe what a device does beyond some simple basics like software and hardware read/write properties. You really need a full expressive language to write a truly executable model of the hardware (and I like <a href="http://www.virtutech.com/whitepapers/virtutech_dml.html">DML </a>for this).</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/358/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Cadence-Ran vs Synopsys-Frank over Low-Power and Virtual Things</title>
		<link>http://jakob.engbloms.se/archives/344?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/344#comments</comments>
		<pubDate>Sat, 15 Nov 2008 22:32:11 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[power analysis]]></category>
		<category><![CDATA[Ran Avinun]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=344</guid>
		<description><![CDATA[Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right in his thinking about virtual platforms and how to use them. Read on for some comments on the exchange.</p>
<p><span id="more-344"></span><br />
The following appears to be to sequence of events:</p>
<ul>
<li>Cadence press release, in September, about their &#8220;<a href="http://www.design-reuse.com/news/19019/power-analysis-pre-rtl-exploration.html">Palladium Incisive Palladium Dynamic Power Analysis and Cadence InCyte Chip Estimator</a>&#8220;, quoting Ran:</li>
<blockquote><p><em>Cadence Incisive Palladium Dynamic Power Analysis enables SoC designers, architects and validation engineers to quickly estimate the power consumption of their system during the design phase, analyzing the effects of running various real software stacks and other real-world stimuli. The new offerings also include the Cadence InCyte Chip Estimator, which can now provide what-if power analysis through exploration of different low-power techniques. The InCyte Chip Estimator also generates automatically the Si2 Common Power Format (CPF), which helps drive architectural power specification and intent into implementation and verification.</em></p></blockquote>
<li>Frank Schirrmeister blogged &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=50">On Chameleons, Low Power and the Marketing Power of Copy Editing</a>&#8220;, basically saying that what Cadence was selling was something that was bound to the RTL level and thus arriving with estimates pretty late in the design process, after most important architecture decisions had been made. Instead, he proposed a flow using <strong>virtual prototypes </strong>that contained a sequence of successively better estimates, from the usual initial spreadsheet to estimates actually derived from RTL later in the process (or for IP blocks that already exist). Synopsys is not alone in this, <a href="http://www.neosera.com">Neosera </a>and <a href="http://www.scdsource.com/article.php?id=82">ChipVision</a> are after similar ideas. I think this approach makes excellent sense, following the idea that getting some kind of approximate feedback from a complete system early in the process is better than getting lots of details from a small part of a system late in the process.</li>
<li>Ran Avinun then blogged a reply to Frank, at &#8220;<a href="http://http://www.cadence.com/Community/blogs/sd/archive/2008/10/30/the-power-of-cadence-system-power-flow-vs-viewing-from-the-top.aspx">The Power of Cadence System Power Flow vs. Viewing from the Top</a>&#8220;. His contention there is that virtual prototypes have their uses, but that real designers will be using hardware accelerators, as that provides the key accuracy needed to do real power work. Also, he sees the creation of a virtual platform as a big problem, and cites a number of cases where running the actual semi-final RTL with power simulation was key to project success. Also, Ran sees the time needed to create a virtual platform as a big obstacle.</li>
<li>Frank then replied to the reply, at &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=53">Hammers, Nails and the Spirits That I Called …</a>&#8220;&#8230; where he points out that Ran has some misconceptions about virtual platforms, admits that the Cadence flow works well, but that it does miss the point of early power estimation before the design is too frozen to be much changed. There is a pretty but hard-to-read diagram in the post, from <a href="http://www.design-reuse.com/articles/12728/towards-activity-based-system-level-power-estimation.html">a 2005 article he wrote while at ChipVision in Germany</a>, pointing out the need to evaluate designs with actual test data from the real world.</li>
</ul>
<p>What do I make of all of this?</p>
<h2>Ran&#8217;s Points</h2>
<p>I must admit that I think the Palladium hardware simulation accelerator boxes are very cool pieces of hardware, which at least used to be based on custom logic systems that use several cycles of a fixed sized hardware to simulate multiples of the hardware&#8217;s based emulation capacity (so 10M capacity system can use 10 cycles per target cycle to simulate 100M, for example). However, I do agree with Frank that these are dependent on having actual RTL in place to be of much use.</p>
<p>Another issue with hardware emulators is their overall availability: compared to the number of PCs available in an organization, they are going to be very limited. As discussed in many different forums, a key advantage of a pure virtual platform is that it can turn any programmer&#8217;s PC into a target system running the real target software. Without having to book time on a limited set of physical target machines, and hardware accelerators are such limited-in-supply hardware machines. So a virtual platform is much more available to people within, and especially outside, a design organization. Also, unless you are happy to release RTL for your design to people outside your organization, hardware acceleration is going to do little to help your end users get the most out of your design, pre-silicon.</p>
<p>My final gripe with hardware emulators is their limited scope. They tend to max out at a the borders of a single chip, or less. A virtual platform, on the other hand, has much more room to scale, to include multiple chips, <a href="http://www.virtutech.com/products/simics_accelerator.html">multiple boards</a>, or even <a href="http://www.compactpci-systems.com/articles/id/?3537">complete racks</a> and networks of networks. You cannot really do that in any hardware simulation, as it involves too many billions of gates running too many billions on instructions. The general rule of simulation still applies with hardware acceleration: <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">you need to increase the level of abstraction to handle larger systems</a>.</p>
<p>As to the problem faced by Ran&#8217;s customers, having RTL but no virtual platform: what were they thinking of? Seriously, if you want to do design today a virtual platform should be your starting point, not an afterthought. Time and again, we see examples today where using virtual platforms <a href="http://www.chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">gets chips to customers ahead of time and provides the ability to test ideas before committing to final RTL</a>. It seems that Ran agrees with this need, but his means are different:</p>
<blockquote><p><em>&#8220;As was stated above, big reason our customers use RTL emulation platforms is for accuracy, and while virtual platforms can offer certain performance, eventually the need to accuracy becomes critical and can not be overlooked, even for initial performance and power estimation analysis. Frank seems to forget in his statement above that the average bring-up time of new virtual platforms takes 6-12 months while the average bring-up time of many emulated designs takes days.&#8221;</em></p></blockquote>
<p>The time to create a virtual platform is actually pretty short, if you do it at a sufficently abstract level of detail and don&#8217;t worry too much about cycle accuracy. Also, that bringing up of an emulation depends on having a detailed RTL-level description to start with&#8230; which is not necessarily the case. I must say that the cited six to 12 months for a VP (for a single SoC as discussed here) sounds reasonable to me &#8212; if you are building a cycle-level model that tries to emulate the final timing (<a href="http://jakob.engbloms.se/archives/153">which might not be really feasible at at all</a>). If you work at a higher level of abstraction like loosely-timed TLM, that time shrinks by a factor of ten or so. I agree that in the end, accuracy is critical &#8211; but before you get there, the approximations used by the VP will have gotten you pretty far in terms of software development and architecture testing.</p>
<p>Ran is also afraid of the lack of accuracy:</p>
<blockquote><p><em><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText"> Now, even if you build this platform successfully 9-12 months in advance, how do you know that your virtual platform representing your real design? How do you connect it to your verification and implementation environment and realistic power information? Frank seems to overlook these things. Looking at the analogy of the story described at the blog above, using a system-level platform that is not targeting the actual hardware for performance analysis and power trade-offs guarantees that the Chamelon will become a snake and you will get bitten. </span></em></p></blockquote>
<p>As with all simulations, virtual platforms needs to be used with care and understanding. It might also once again be a matter of system scale: for RTL simulation, you are looking inside a single chip, and the detailed design to save power there. With a VP, you might be looking at whether a particular OS kernel does even care to try to turn off unused hardware at all&#8230; and that might be just as important in the end as being accurate in how functional units turn on and off inside an accelerator.</p>
<p>In today&#8217;s software-driven systems that mostly consist of existing off-the-shelf hardware, not any particular SoC that is being designed right now, the large-scale behavior and smarts of the software in a setting containing lots of chips and functions is far more important than optimizations inside a chip.</p>
<h2>Frank&#8217;s Points</h2>
<p>Since Frank is a virtual platform supporter just like me, I instinctively agree with his points about VPs being pretty fast to develop and available long in advance of actual silicon. I like the way he deals with power in the ARM DevCon presentation cited (do have a look at it), but still there are some lingering doubts and issues&#8230;</p>
<p>What I have a hard time understanding is just how detailed the virtual platforms need to be. The use of SystemC TLM-2.0 LT is sensible for speed, but it seems from the DevCon presentation that the main emphasis is on AT-level (and therefore pretty slow) timing-accurate simulations that look at power cycle by cycle in the target. If that is the case, I think we could almost just as well go get ourselves a hardware accelerator, as cycle-level models (even if transaction-driven)</p>
<p>However, Frank also says this which I cannot but agree with: you should not always run around with a hammer and look at everything like a nail &#8212; any reasonable chip design process needs both virtual platforms and hardware accelerators, one cannot really replace the other:</p>
<blockquote><p><em>When discussing this matter with a friend, he pointed out rightfully so that both Ran’s and my post suffer from “Hammer and Nail-itis”. In fact, he pointed out, the combination of Cadence’s estimators (InCyte), C based synthesis, Palladium, and Synopsys virtual would be pretty powerful! It’s a good thing then that we acquired Synplicity which brought us Synplify high-level synthesis and Confirma FPGA Prototyping to Synopsys, and of course, that we have existing interfaces between our Virtual Platforms and Eve’s solutions. </em></p></blockquote>
<h2>Conclusion</h2>
<p>To me, the lesson from this discussion is clear: A virtual platform should be the starting point of a new design, but once you get down to RTL, hardware acceleration is really pretty useful. You need both, and VP should come first, not second. It is not an either-or issue, rather I expect system and chip designers to use both tools, and the only question is what should come first, which I think is naturally the simulation in the form of a virtual platform. That also allows the chip to be set into a system context, which is otherwise pretty hard before silicon arrives, and something that large system integrators are screaming for.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/344/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shaking a Linux Device Driver on a Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/337?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/337#comments</comments>
		<pubDate>Sun, 09 Nov 2008 22:23:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=337</guid>
		<description><![CDATA[To continue from last week&#8217;s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds. First some background. A key idea in the setup is to use the approach of assuming some processing time for [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-329" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" />To continue from <a href="http://jakob.engbloms.se/archives/330">last week&#8217;s post </a>about my Linux device driver and hardware teaching setup in <a href="http://www.virtutech.com/academia">Simics</a>, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.</p>
<p><span id="more-337"></span></p>
<p>First some background.</p>
<p>A key idea in the setup is to use the approach of <em>assuming some processing time </em>for the hardware accelerator, rather than creating detailed code and determining the actual processing time for a particular implementation. Given some assumed time, we can then see how it impacts program performance. This is a way of designing hardware where we look to how fast something needs to be to have a positive impact, rather than trying to make it as fast as possible. It also lets us analyze how performance in hardware is seen when using a complete OS stack and a real device driver rather than simple bare-metal software (which tends to show the performance in the best possible light). Essentially, it is loosely timed design-space exploration.</p>
<p>Initial tests of the driver used very short completion times, on the order of 1 microsecond. The read() call at this point simply waited for the hardware completion flag to become true, and then returned the results. That is not the kind of behavior that a driver should have, since if the hardware gets some kind of hiccup, we will be stuck looping  inside a kernel context. Instead, I implemented a blocking read variant that would put the calling process to sleep until a result arrives.</p>
<p class="MsoNormal">In order to test that my driver did the sleep function correctly, I changed the processing delay into the level of seconds&#8230; and promptly found a set of issues that forced several rewrites of the code. The most important was the need to switch to a software flag for completion rather than relying on the hardware flag, and the implementation of an interrupt handler to get a notification from the hardware.</p>
<p>Then, on Friday, I demonstrated the setup along with some new performance analysis tools to go with it to some students testing the setup. And the test program suddenly stopped working, obviously hanging at the first call to read() without ever getting unblocked.</p>
<p>The reason was a classic race condition: the code in the <tt>write()</tt> device driver call that sent input data into the hardware device waited until after the writing was complete (and then some more) before clearing the operation complete flag. Here is the relevant piece of code:</p>
<pre>for(i=0;i&lt;words;i++) {
  write_register(SIMPLE_INPUT, kbuf[i]);
}
*f_pos = 0;
kfree(kbuf);
clear_completion_state();</pre>
<p class="MsoNormal">With a sufficiently short delay to completion, the completion interrupt fired, was handled, and set the completion flag before the <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">write()</span></span> function even got to <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">clear_completion_state()</span></span>. After this, the test program called <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">read()</span></span> to read the result, and was blocked as the completion flag was not set. The interrupt to signal completion from the hardware had already triggered and its result deposited in the software flag, which had then been promptly overwritten inside write(). Thus, inside read(), the flag never became set, and the process waited forever.</p>
<p class="MsoNormal">The fix is obvious: just move the clearing of the flag to <em>before </em>the writing to the hardware begins.</p>
<p class="MsoNormal">To generalize from this brilliant example of concurrency carelessness, this is a really good accidental demonstration of the power of varying timing in a virtual platform to shake code and find timing-related bugs in a manner much more efficient than possible on physical hardware.</p>
<p class="MsoNormal">Had I described the exact (or even approximate) timing of a particular hardware implementation, this kind of bug would not have been found and the driver code would not have been as robust. An implementation relying on a very short completion time could check the hardware operation complete flag directly, but that broke down when the delay was long. The buggy implementation above worked fine with a long completion time, but broke down with a short. The fixed implementation works across a span of times from 10 ns to 10 s or more, which is all you can ask for I think.</p>
<p class="MsoNormal">A short fun Simics note on this: changing that timing parameter is a run-time change. It is possible to change it during a run, from the Simics command-line, using a simple one-line command:</p>
<pre class="MsoNormal" style="padding-left: 30px;"><span style="color: #0000ff;">simics&gt; </span>sd0-&gt;time_to_result = 10.0e-9</pre>
<p class="MsoNormal">It is really nice working with a system like that!</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/337/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Linux Device Drivers on a Virtual PowerPC</title>
		<link>http://jakob.engbloms.se/archives/330?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/330#comments</comments>
		<pubDate>Sun, 02 Nov 2008 10:02:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[endianness]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=330</guid>
		<description><![CDATA[There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on embedded PowerPC machines, and teaching myself how to do [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-329 alignleft" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" /></p>
<p>There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on <a href="http://www.virtutech.com/solutions/virtual_platform/powerpc/freescale/mpc8641d.html">embedded PowerPC machines</a>, and teaching myself how to do Linux device drivers in the process. This really brought out the best in virtual platform use.</p>
<p><span id="more-330"></span></p>
<p>The final result of my efforts will be more public early next year, when the students I have put to work on my Linux-based setup come back and show me what they accomplished (or not). Until then, here are some small tidbits on how easy it is to work with kernel-level code in a virtual machine. Actually, if I had been working on real hardware, I am not that certain that I would have had anything but a bricked machine in front of me &#8212; to put it simply, flash reprogramming seems to hate me, and I have managed to fail or destroy a few embedded boards that have been unlucky enough to cross my path.</p>
<p>The virtual platform was really very helpful to diagnose all the mistakes I made while creating my driver and making it talk to my custom hardware.</p>
<p>First of all, it was dead easy to test a new version of the driver: start the simulation from a checkpoint of a booted and configured machine, load the driver into the target file-system using the Simicsfs backdoor (similar to the VmWare hostfs solution), and then insmod it. This was automated in a script that typed the needed commands on the target-command line with no manual intervention. Each iteration takes a few seconds, which is just as fast an convenient as testing a simple program directly on the host.</p>
<p>Diagnosing what went wrong was greatly facilitated by the simulator: did the driver access the device I had prepared for it? Were values read as expected? Obviously, there were a lot of such cases, I am not the most expert device driver programmer (yet).</p>
<p>Here is one particularly interesting example: I empirically learnt that the Linux kernel &#8220;readl&#8221; function is always reading data little-endian, even on a big-endian machine. You have to use &#8220;readl_be&#8221; to get the big-endian data from a big-endian device attached to a big-endian machine. I guess the behavior makes sense for reuse of drivers across architectures, but it sure confused me when my driver was reading the right register but complaining about bad contents.</p>
<p>The simulator showed the problem very plainly:</p>
<ul>
<li>&#8220;value read is 0xabcd0101 (BE)&#8221;. Ok that looks right.</li>
<li>&#8220;register r3 contains 0x0101cdab&#8221;. Strange, looks like the wrong byte order. WHY I screamed to myself.</li>
<li>Using reverse execution to step back one instruction showed that the load instruction used was a byte-swapping 32-bit access. Aha!.</li>
<li>Go into Linux kernel headers (include/asm/io.h) to find that there were a bunch of other varieties available, and guess that readl_be() was the right solution.</li>
<li>Change device driver code, recompile, and retest. Now it worked.</li>
</ul>
<p>I would have assumed that the book I was using as my guide, the highly-recommended<a href="http://lwn.net/Kernel/LDD3/"> Linux Device Drivers, 3rd edition</a>&#8221; would have told me this. But it did not, as it is annoyingly tied to the horrible standard PC. It could really do with some extra chapters on drivers for PowerPC, ARM, and MIPS (to name some of the most important non-x86 architectures out there).</p>
<p>On the other side of the fence, I am using <a href="http://www.virtutech.com/products/simics-modelbuilder.html">Virtutech DML </a>to do the actual device, and that is working out very well. In my setup right now, I can change the device driver and the hardware it drives, recompile both, and then run an automated test script that starts from a checkpoint, inserts the hardware model in target memory, loads the device driver, and tests it in about five seconds. Very handy, and all completely automatic. The ability to load and insert hardware models on the fly during simulation is really very convenient here &#8212; I would have to have to reboot the target Linux from scratch each time I wanted to add or remove things from the virtual platform hardware setup.</p>
<p>To sum things up, so far, I have learnt quite a lot about doing Linux device drivers and how to setup hardware in a Linux system, and I think it would have been much harder to learn and experiment like I have done had I been stuck with physical hardware (not to mention the plain impossiblity of just inserting a  new piece of hardware in a simple way into a physical system).</p>
<p>It really shows that quite often, virtual hardware is &#8220;even better than the real thing&#8221;.</p>
<p>For fun, here is a screenshot of a complete test run of loading the device driver:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/11/hsi-course-complete-test-run-rebuilt-device-and-driver.png"><img class="aligncenter size-medium wp-image-335" title="hsi-course-complete-test-run-rebuilt-device-and-driver" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/hsi-course-complete-test-run-rebuilt-device-and-driver-300x187.png" alt="" width="300" height="187" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/330/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Cadence on Virtual Prototypes instead of Host Execution</title>
		<link>http://jakob.engbloms.se/archives/308?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/308#comments</comments>
		<pubDate>Sun, 19 Oct 2008 21:40:37 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=308</guid>
		<description><![CDATA[Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;Is Host-Code Execution History&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and [...]]]></description>
			<content:encoded><![CDATA[<p>Cadence technical blogger <a href="http://www.cadence.com/community/posts/jasona.aspx">Jason Andrews </a>wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;<a href="http://www.cadence.com/Community/blogs/sd/archive/2008/10/17/is-host-code-execution-history.aspx">Is Host-Code Execution History</a>&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.</p>
<p>However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.</p>
<p><span id="more-308"></span></p>
<p>The problems are these:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Most companies that are using host-code execution today use &#8220;explicit access&#8221;.  This means they require all places in the code that access the hardware to call read() and write() functions so every hardware access goes through a common set of functions and then they use #ifdef to change the hardware accesses to call the simulator if they are doing verification with host-code execution. If they are running on the target system, then pointer dereferences are used. </span></p>
<p>&#8230;</p>
<p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">This is where implicit access came in. It provided a way to automatically trap pointer dereferences that were reading and writing to hardware locations and convert the load or store instruction into a simulated read or write. For reads it would put the result into the proper host CPU register and the user had no idea that a line of C code would magically turn into a bus transaction on a Verilog BFM</span></p></blockquote>
<p>Yes, that is a right pain, and I have seen lots of solutions for it, none of which have the elegant simplicity of a processor simulation. The &#8220;implicit access&#8221; system is basically trying to trap memory accesses without overtly changing the source code of a program. I guess the best way to do this is binary instrumentation, but it is still very hard to get to work right and robustly. A simulator is simply much simpler in principle here.</p>
<p>Jason continues later on:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Given the hassle of host-code execution I would prefer to cross compile the software and run the target instruction set. Beyond the implicit or explicit access issue, this also eliminates issues with differences in data type sizes, data structure layout, byte order (endianess) and other differences between the host and target processor. </span></p></blockquote>
<p>That is absolutely true! Jason does not mention the additional fun of what happens when the target is running an OS that is happily fielding interrupts, scheduling software tasks, etc. Also, that having to maintain a separate build target and maybe code variant is very expensive, process-wise. The expense that a good virtual platform incurs can be paid for pretty quickly once such reduced friction costs are factored in.</p>
<p>So I guess I pretty  much agree with all that Jason is saying, and thanks him for mentioning <a href="http://www.virtutech.com/products">Simics</a>. Thanks for the insights into what was done in the 1990s, it always interesting to get pointers to old fundamental and interesting work.</p>
<p>About how the virtual platforms actually work inside: it is not that complicated in principle (but pretty hairy to get it quite right and fast in practice). You have to simplify the timing of the target processor, you have to convert from target processor binaries to host binary format using some kind of just-in-time compilation technique (also called dynamic binary translation or code morphing), and you have to provide some kind of direct access to target memory for the target processor simulation (like the DMI feature in <a href="http://systemc.org">SystemC TLM-2.0</a>, but usually the difficult bits are on the CPU side of that, not the memory side).  The most interesting bit is how to build the surroundign system model to not slow the CPU model down, and for this I can recommend a couple of pieces of writing:</p>
<ul>
<li>My ESC 2008 general intro to the subject of virtual prototypes (<a href="http://www.engbloms.se/presentations/engblom-ESC2008-class410-simulation-slides.pdf">slides</a>, <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">paper</a>)</li>
<li>Virtutech white paper on <a href="http://www.virtutech.com/whitepapers/modeling.html">system modeling </a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/308/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ChipDesignMag Article on Software in Hardware Design</title>
		<link>http://jakob.engbloms.se/archives/306?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/306#comments</comments>
		<pubDate>Fri, 17 Oct 2008 20:47:28 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ChipDesign Magazine]]></category>
		<category><![CDATA[hardware design]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=306</guid>
		<description><![CDATA[Chip Design Magazine published an article by me in their August/September 2008, about Getting Software into the Hardware Design Loop. The article is about the technical and marketing aspects of how chip designers can get early feedback from software and systems designers, early in the hardware design process. The vehicle for this? Virtual platforms, obviously. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.chipdesignmag.com/d">Chip Design Magazine</a> published an article by me in their August/September 2008, about <a href="http://www.chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">Getting Software into the Hardware Design Loop</a>. The article is about the technical and marketing aspects of how chip designers can get early feedback from software and systems designers, early in the hardware design process. The vehicle for this? Virtual platforms, obviously.</p>
<p><span id="more-306"></span></p>
<p>A key idea here is to start with a fast virtual platform (which means <a href="http://www.virtutech.com/getting_started/learn.html">Simics</a>-style, obviously, considering my background). Such a fast and low-detail virtual platform can be built very quickly and will run very quickly, which makes it possible to get it out to customers early and get feedback on functionality, programming interfaces, etc. And to get partners to port fundamental enabling software such as operating systems and network stacks to the target. The key here is really the speed of the simulator, as it directly impacts the productivity of the partners.</p>
<p>Once the target software is up to some extent on some version of the simulation, you can start to collect feedback. Later, at some point, more detailed models with lots of timing information will become available, and using the now working software, you can evaluate the detailed performance.</p>
<p>A current example of this kind of two-wave approach to virtual platforms is the <a href="http://www.virtutech.com/QorIQ/hybrid_tech_overview.html"></a><a href="http://www.virtutech.com/solutions/virtual_platform/powerpc/freescale/p4080.html">QorIQ P4080 </a>Hybrid Simulator. While the cycle-accurate models are being developed, the fast models were used to get software in place.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/306/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
