<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; virtual things</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/virtual/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Tue, 27 Jul 2010 19:57:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Wind River Blog: Simics Analyzer</title>
		<link>http://jakob.engbloms.se/archives/1137?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1137#comments</comments>
		<pubDate>Wed, 26 May 2010 19:40:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1137</guid>
		<description><![CDATA[I have a new blog post up at the Wind River blog network, about the new target analysis tools in Simics 4.4. It is a very fun piece of technology to play with, and you learn a lot just by poking around at existing software systems&#8230;]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="button-quicklink-blogs" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>I have a <a href="http://blogs.windriver.com/engblom/2010/05/analyzed.html">new blog post </a>up at the Wind River blog network, about the new target analysis tools in Simics 4.4. It is a very fun piece of technology to play with, and you learn a lot just by poking around at existing software systems&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1137/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pipeline Performance Simulator Anno 1960</title>
		<link>http://jakob.engbloms.se/archives/1126?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1126#comments</comments>
		<pubDate>Mon, 03 May 2010 19:56:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[Frederick Brooks]]></category>
		<category><![CDATA[Harwood Kolsky]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[IBM 7030]]></category>
		<category><![CDATA[ISCA]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[Tensilica]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1126</guid>
		<description><![CDATA[I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article &#8220;Stretch-ing is Great Exercise &#8212; It Gets You in Shape to Win&#8221; by Frederick Brooks (the man behind the Mythical Man-Month) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/05/4506VV3073.jpg"><img class="alignleft size-full wp-image-1128" style="margin: 5px 10px;" title="IBM Stretch panel" src="http://jakob.engbloms.se/wp-content/uploads/2010/05/4506VV3073.jpg" alt="" width="83" height="79" /></a>I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article &#8220;<a href="http://dx.doi.org/10.1109/MAHC.2010.26">Stretch-ing is Great Exercise &#8212; It Gets You in Shape to Win</a>&#8221; by Frederick Brooks (the man behind <a href="http://en.wikipedia.org/wiki/Mythical_man_month">the Mythical Man-Month</a>) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM created a simulator of the pipeline for the <a href="http://en.wikipedia.org/wiki/IBM_Stretch">IBM 7030 &#8220;Stretch&#8221; computer </a>developed from 1956 to 1961 (<a href="http://www-03.ibm.com/ibm/history/exhibits/vintage/vintage_4506VV3073.html">photo from IBM.com</a>).</p>
<p><span id="more-1126"></span></p>
<p>For those unfamiliar with the Stretch machine, it was a supercomputer developed by IBM which introduced many of the performance techniques and basic computer technologies that we all use today (most of them handed down to us via the IBM System/360). For example, it was the first to use 8-bit bytes and 64-bit floating point. It also introduced memory protection, memory interleaving, and instruction prefetching.</p>
<p>More relevant for my blog is the fact that the Stretch used the world&#8217;s first pipelined main processor, complete with interlocks to maintain program-order semantics. When developing this pipeline, Frederick Brooks claims that IBM developed a program to simulate the pipeline. This simulator was used to test the performance of the pipeline design on various test programs (this was before they were called benchmarks), and tune the design accordingly. The simulator was created by <a href="http://archive.computerhistory.org/resources/text/FindingAids/102658131.Kolsky.pdf">Harwood Kolsky</a>. There is no firm date for the pipeline simulator, but based on the development time of the Stretch, it can be dated somewhere around 1960.</p>
<p>Thus, the simulation-driven approach to computer architecture is about 50 years old by now. Should have gone to ISCA and used this as an excuse for a party I guess&#8230;</p>
<p>It is also interesting to note that the Stretch computer acquired a co-processor in 1962, to do cryptology work. This machine was the one-off <a href="http://en.wikipedia.org/wiki/IBM_7950">IBM 7950 &#8220;Harvest&#8221; </a>and was tailored for the needs of the NSA in the US. It was a seriously special-purpose hardware unit adding a few instructions to the Stretch machine, and beating any other machine at the time by about 50 to 200 on the particular NSA workloads.  Sounds like the kind of performance claims that Tensilica and other application-customized processors claim. 50 years ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1126/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Blog at Wind River!</title>
		<link>http://jakob.engbloms.se/archives/1121?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1121#comments</comments>
		<pubDate>Thu, 29 Apr 2010 19:14:03 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Wind River]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1121</guid>
		<description><![CDATA[One of the many nice effects of the Wind River acquisition of Simics is that I will be blogging as part of the Wind River Blog network. My first post there is up now, and it is a short (at least compared to a textbook, I admit it looks terribly long for a blog post) [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="button-quicklink-blogs" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>One of the many nice effects of the Wind River acquisition of Simics is that I will be blogging as part of the Wind River Blog network. <a href="http://blogs.windriver.com/engblom/2010/04/what_is_simics_really.html">My first post there is up now</a>, and it is a short (at least compared to a textbook, I admit it looks terribly long for a blog post) overview of how Simics works inside.</p>
<p>I think it is important for users of technologically advanced tools to know a bit of how they work. A classic example of this is compilers, where I taught an ESC class almost a decade ago which is my most <a href="http://jakob.engbloms.se/archives/750">popular piece of writing to date</a>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1121/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FFast: Good Idea, Too Bad About the Implementation</title>
		<link>http://jakob.engbloms.se/archives/1114?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1114#comments</comments>
		<pubDate>Sun, 11 Apr 2010 19:23:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[Antoine Trouvé]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[FFast]]></category>
		<category><![CDATA[Kazuaki Murakami]]></category>
		<category><![CDATA[RAPIDO]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1114</guid>
		<description><![CDATA[I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the RAPIDO 2010 workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be [...]]]></description>
			<content:encoded><![CDATA[<p>I just read a short paper by Antoine Trouvé and Kazuaki Murakami from the <a href="http://www2.lifl.fr/rapido/Rapido/Program.html">RAPIDO 2010</a> workshop on &#8220;rapid simulation and performance evaluation&#8221;. The paper is &#8220;FFast: Efficient Application of Compiled Simulation Techniques To A Fast ISS Over a Virtual Machine&#8221;. It explores the interesting idea of how an existing virtual machine infrastructure can be used to build a fast instruction-set simulator, and in the extension, a full system simulator.</p>
<p>To me, this idea is worth exploring, since using a mature VM like the .net CLR (used in this paper) or a JVM would offer a shortcut to get high-quality code generation for a JIT compiler. It could also offer other benefits, as these environments support many advanced configuration and management features. I have touched on this topic before, in the posts &#8220;<a href="http://jakob.engbloms.se/archives/1008">Dream ESL Language</a>&#8221; (VM as the basis for a simulator) and &#8220;<a href="http://jakob.engbloms.se/archives/264">The JVM as Universal Parallel Glue</a>&#8221; (that a common VM can  offer huge benefits for an ecosystem).</p>
<p><span id="more-1114"></span></p>
<p>In the paper, the authors show how they have built an ISS for MIPS which runs at 1 MIPS in basic interpretive mode, but at up to 225 MIPS in the most optimized mode. Decent performance on a 2.6 GHz Core 2, but still an order of magnitude compared to the fastest commercial offerings available. However, I think these numbers are not particularly interesting or relevant.</p>
<p>First of all, they only check the performance on basic user-level programs with no I/O, since there is nothing but the CPU present and they thus cannot run an operating system. This makes the numbers essentially &#8220;peak&#8221; numbers, for small programs, which is not particularly realistic. Second, their implementation does not go straight to bytecodes, but rather to C# code. This is not how a high-performance solution would work, as it is obvious that they struggle to get performance even on these small benchmarks using that approach. Too much effort seems to be spent on gaming the C# runtime system and compiler, in my opinion.</p>
<p>Thus, the paper does not really reveal anything useful in terms of &#8220;is building a JIT ISS using a VM a viable idea&#8221;? It probably tells us that it is not necessarily a broken idea, but there is a lot of work to bring the solution up to the level of native C-based solutions.</p>
<p>It is clear that the implementation effort in this case is lower, and that the porting cost to new hosts is also very low, compared to the native C-based approaches used in current industrial solutions. C# should also be more productive than using something like C++ for building a software system.</p>
<p>The most interesting aspect of the idea, and one which the authors do not explore at all, is using the power of the .net CLR to build a dynamic full-system simulator. Using the CLR, it should be trivial to build a solution where hardware models can be separately compiled and loaded dynamically at runtime. Using .net &#8220;properties&#8221;, it might be possible to support user inspection of a running system. Maybe the .net programming tools offer some really interesting possibilities for the debugging of full-system simulators. However, none of this is currently explored, which is a real shame. I guess I could hope that the authors read this short critique  and get some more ideas for future work, as I really think that virtual platforms could be built on top of virtual machines.  That idea is worth exploring in research.</p>
<p>On the nitpicking side, as always when reviewing academic papers in this blog, the authors seem to be unaware of some very relevant previous work. In particular, they should have mentioned Qemu and Simics. They could have used Qemu for MIPS as a point of comparison to compare speeds between their approach and a native C approach. As it is right now, the reference list looks like a fairly random walk around the DAC and DATE communities, but with little insight into the actual virtual platform or full-system simulation tools available today.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1114/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Describe is not the same as Design</title>
		<link>http://jakob.engbloms.se/archives/1083?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1083#comments</comments>
		<pubDate>Mon, 15 Feb 2010 20:56:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1083</guid>
		<description><![CDATA[The discussion on my previous blog post about &#8220;the ideal ESL language&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there. On one hand, we have the [...]]]></description>
			<content:encoded><![CDATA[<p>The discussion on my previous blog post about &#8220;<a href="http://jakob.engbloms.se/archives/1008">the ideal ESL language</a>&#8221; made me think some more about the purpose of a hardware modeling or description language. If you look closely, you realize that there are two quite different goals being pursued by the tools and languages discussed there.</p>
<p>On one hand, we have the task of supporting the design of new hardware bits, for the purpose of creating it. On the other hand, we have the task of describing a particular design for the purpose of simulating it. These two are not necessarily the same.</p>
<p><span id="more-1083"></span>To use an <a href="http://jakob.engbloms.se/archives/1035">analogy with building a house</a>, a design language helps the architect create the house (piece of hardware). Since the architect relies on craftsmen and experts (compilers) to do detailed design (how to put in windows, where to put light switches, etc.), the high-level description does not contain all the details of the house. However, if you are trying to simulate the house (piece of hardware) so that its inhabitants (software) don&#8217;t see the difference to the real thing, the details are sometimes what matters most. For example, the precise way to operate the stove in the house is very important for familiarity, but is a detail most likely left out of the architect&#8217;s initial drawings.</p>
<p>A design language can leave many things unspecified to be filled in by a compiler, but these things can be absolutely core to a description language. In particular, programming register maps tend to be created as a not-too-important side activity in hardware design. They do not really need to be visible in higher-level ESL languages, as they can obviously be filled in later by a tool or a human. But for a description language, they are absolutely core.</p>
<p>A description language can also leave out many parts of the hardware. If the software being used or written does not use certain modes or functions of a piece of hardware, those pieces can be ignored and implemented as dummies. That means that support for dummies is very important in description languages. But dummies make little sense in a design language, as you are unlikely to design a chip with lots of area spent on dummy functions that do nothing.</p>
<p>A description language can also ignore crucial aspects like power constraints and synthesis constraints. These are guidelines for a compilation step that has no bearing on the description of the hardware &#8212; the description language should describe what ended up happening, not the if, please, what, and buts that guided how we got there.</p>
<p>For virtual platform creation, you seem to need a bit of both. I maintain that most of a VP is based on old hardware that exists, which calls for languages with strong description abilities. That&#8217;s the space that <a href="http://jakob.engbloms.se/archives/99">Simics DML </a>was designed for. For the small part of the hardware that is novel would be nice to have some way to convert from a design language to a virtual platform. Here, I don&#8217;t really see any usable current tools or languages &#8212; SystemC is really more a design language, but if you want a virtual platform model, you have to use it as a description language. There is no automagic getting to a fast abstract model from a design-oriented description. That&#8217;s why we need new, higher level systems, that can push out decent descriptions from a design.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1083/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The System, Not the Parts</title>
		<link>http://jakob.engbloms.se/archives/1035?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1035#comments</comments>
		<pubDate>Sat, 19 Dec 2009 19:38:22 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[business issues]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Peter Day]]></category>
		<category><![CDATA[podcast commentary]]></category>
		<category><![CDATA[Russel Ackoff]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1035</guid>
		<description><![CDATA[I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;Peter Day&#8217;s World of Business&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was originally published in 2007. The main theme of the interview is the need to shift business thinking from small details [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg"><img class="alignleft size-full wp-image-1036" style="margin: 10px;" title="Peter days world of business" src="http://jakob.engbloms.se/wp-content/uploads/2009/12/300x300.jpg" alt="Peter days world of business" width="100" height="100" /></a>I just listened to the November 16, 2009, issue of the BBC podcast called &#8220;<a href="http://www.bbc.co.uk/podcasts/series/worldbiz/">Peter Day&#8217;s World of Business</a>&#8220;. It is a rerun (in memoriam) of an interview with business professor Russell Ackoff, which was <a href="http://news.bbc.co.uk/2/hi/business/6338527.stm ">originally published in 2007</a>.</p>
<p><span id="more-1035"></span></p>
<p>The main theme of the interview is the need to shift business thinking from small details to entire systems. From operations research where you spend lots of time understanding some process or department in great detail, to a system-level thinking where you focus on what an entire enterprise is doing.</p>
<p>For me, this struck a chord in my system-level heart&#8230; in my world of computer systems and virtual platforms, system-level is what it is so hard to get engineers to. Far too much time is spent (in my opinion) understanding, modeling, and tweaking subsystems. Far too little effort is spent on understanding the whole, how things fit together in practice, taking software, hardware, and software system evolution over time into account. The analogy is not perfect, but there are more things that are alike than are not.</p>
<p>The most interesting analysis that Russell Ackoff fires off from his perspective is that of comparing companies and architecture. An architect knows the whole of a building, but does not entirely go into details on just how it is to be built. He/she trusts the carpenters, bricklayers, and other workers to know how best to solve their local problems. Basically, applying hierarchical abstraction to the task of constructing an actual building.</p>
<p>This got me thinking some of why this is the case. I think it could be because building things (castles, cathedrals, houses, walls, pyramids, canals, &#8230;) must have been among the most complex tasks undertaken for a very long time in human history. Thanks to this long history, we have perfected the abstraction and division of labor in construction. Buildings are built in a certain way, by a certain set of crafts, since that method has been proven to work well for a very long time. So just like in the case of the design patterns craze in the late 1990&#8242;s, architecture might have something to teach us about how to build hardware/software systems too.</p>
<p>Note that for some reason, I cannot find a link to the podcast on the BBC homepage. But if you subscribe in iTunes or similar, I think you will find it. Something is not as user-friendly as it could be.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1035/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Dream ESL Language</title>
		<link>http://jakob.engbloms.se/archives/1008?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1008#comments</comments>
		<pubDate>Fri, 27 Nov 2009 19:51:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[FDL]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1008</guid>
		<description><![CDATA[This post is a belated comment on the FDL 2009 conference that I attended some months ago. I have had some things in mind for a while, but some recent podcast listening has brought the issues to front again. What has been striking is the extent to which FDL was about languages only to a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg"><img class="alignleft size-full wp-image-881" style="margin: 5px 10px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" /></a> This post is a belated comment on the FDL 2009 conference that I attended some months ago. I have had some things in mind for a while, but some recent podcast listening has brought the issues to front again. What has been striking is the extent to which FDL was about <em>languages </em>only to a very small degree. Compared to programming-language conferences like PLDI, there was precious little innovation going on in input languages, and very little concern for the <em>programming </em>aspects of virtual platform design and hardware modeling.</p>
<p><span id="more-1008"></span>Walking to and from the conference from my hotel, I listened through a <a href="http://www.twit.tv/floss79">FLOSS Weekly interview </a>with David Heinemeier Hanson, the creator of Ruby on Rails. His approach to programming and languages was quite unlike that exposed at FDL. In his world, anything that is repeated in code should be put into the language or library. In Ruby, that is easier than in many other cases, as the language can be extended arbitrarily without recompiling the VM. His focus on programmer productivity and convenience is in stark contrast to the FDL discussions which mostly dealt with how to simulate things in a single language, SystemC. Quite boring from a programming language perspective.</p>
<p>Another podcast that triggered thoughts on programming and how to improve it using languages was <a href="http://itc.conversationsnetwork.org/shows/detail4291.html">Stackoverflow Episode 73. </a>In the listener questions section, the topic of language evolution came up. Joel and Jeff pointed out that C# is a glowing example of a language that quickly evolves and adds useful features, including things from the field of dynamic languages. Quite interesting. They made the crucial point that backwards compatibility in a language is not really needed, as long as you can link code compiled from the old and the new languages together. So, if C# 3.0 won&#8217;t compile all C# 2.0 code, it is no big deal, as you can still have the old C# 2.0 compiler around, and then link with the new C# 3.0 code.</p>
<p>The key is linkability between modules, not the standard of the input language. Here, Microsoft&#8217;s .net system is starting to make a very impressive showing, I think. C#, VB, F#, Python, Ruby &#8212; a ton of languages all share the same common language runtime and the basic libraries of .net. After hearing a talk by Tim Harris of Microsoft UK at <a href="http://www.it.uu.se/research/upmarc/MCC09/prog">MCC 2009</a>, I am even more impressed by what .net can do.</p>
<p>.net was also the topic of the <a href="http://www.twit.tv/floss82">FLOSS Weekly interview with the team behind IronPython</a>. IronPython is Python on top of the .net framework, and the interview went into a lot of interesting details on how that has played out. The short answer is: very impressive, very smart, and very much the way things should be.</p>
<p>Note that even if the perspective is that &#8220;ESL languages describe a single hardware chip configuration, which is fixed&#8221;, having a language which is more dynamic still helps.Remember that modeling is programming, and anything that makes programming more efficient is a good. All you need to do is to have a &#8220;freeze&#8221; operation that says that &#8220;this particular set of things is my design&#8221;. But you might get there by interactively adding and removing things at a command-line interface.</p>
<p>Working in OSCI CCI WG, I have come to realize just how useful reflection in languages like Python is (or as we implement it in Simics). When all you have is a static C++ compiled binary, you cannot easily do things like ask objects for their type and other metadata like documentation. Since it just is not there. While in Python, you can do such inspection, and also extend things at run-time, which is very useful. If you want to add configuration hooks to a class, Python makes it dead easy, while C++ makes it major painful.</p>
<h2>The Dream ESL Language</h2>
<p>Overall, what that I get from all of this that a sound design for an &#8220;ESL&#8221; language, had we started today, would be:</p>
<ul>
<li>Basic semantics given by a virtual machine, not an input language.</li>
<li>Opportunities for several different input languages of potentially very different styles to be used, all compiling and linking into the same VM. That would open up for real innovation.</li>
<li>Extensive reflection and introspection features.</li>
<li>Dynamic reconfiguration during run-time, optionally frozen if the goal is to actually describe some hardware design for synthesis. But such synthesis would be from VM code, not some input language.</li>
</ul>
<p>Essentially, taking the approach of providing a stable interoperability layer between languages in the form of a VM, and allowing languages to be anything anyone could care to invent.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1008/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Finally, a Bug!</title>
		<link>http://jakob.engbloms.se/archives/975?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/975#comments</comments>
		<pubDate>Sun, 25 Oct 2009 20:41:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[Linux kernel]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=975</guid>
		<description><![CDATA[Part of my daily work at Virtutech is building demos. One particularly interesting and frustrating aspect of demo-building is getting good raw material. I might have an idea like &#8220;let&#8217;s show how we unravel a randomly occurring hard-to-reproduce bug using Simics&#8220;. This then turns into a hard hunt for a program with a suitable bug [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png"><img class="alignleft size-full wp-image-982" title="butterfly" src="http://jakob.engbloms.se/wp-content/uploads/2009/10/butterfly.png" alt="butterfly" width="90" height="91" /></a>Part of my daily work at Virtutech is building demos. One particularly interesting and frustrating aspect of demo-building is getting good raw material. I might have an idea like &#8220;let&#8217;s show how we unravel a randomly occurring hard-to-reproduce bug using <a href="http://www.virtutech.com/products/simics_hindsight.html">Simics</a>&#8220;. This then turns into a hard hunt for a program with a suitable bug in it&#8230; not the Simics tooling to resolve the bug. For some reason, when I best need bugs, I have hard time getting them into my code.</p>
<p>I guess it is Murphy&#8217;s law &#8212; if you really set out to want a bug to show up in your code,  your code will stubbornly be perfect and refuse to break. If you set out to build a perfect piece of software, it will never work&#8230;</p>
<p>So I was actually quite happy a few weeks ago when I started to get random freezes in a test program I wrote to show multicore scaling. It was the perfect bug! It broke some demos that I wanted to have working, but fixing the code to make the other demos work was a very instructive lesson in multicore debug that would make for a nice demo in its own right. In the end, it managed to nicely illustrate some common wisdom about multicore software. It was not a trivial problem, fortunately.</p>
<p><span id="more-975"></span>First, some notes about the program. It is a producer-consumer system using pthreads, with a single producer thread feeding a variable number of compute threads with data, over a shared queue structure (a simple one that uses a single lock to protect it, making it not very scalable for small data messages and lots of workers).</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/10/program-structure-2.png"><img class="aligncenter size-full wp-image-980" title="program structure 2" src="http://jakob.engbloms.se/wp-content/uploads/2009/10/program-structure-2.png" alt="program structure 2" width="411" height="237" /></a></p>
<p>The queue contains a circular buffer, managed using a standard set of full/empty/tail/head kinds of variables. There is also a flag &#8220;done&#8221; which is set once we are out of data, to tell the compute threads to shut down and terminate the program. As this program is used to demonstrate and test scaling, it is actually something that terminates. The main program spawns off all the threads, and then waits for all threads to finish before it terminates itself.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/10/program-structure.png"><img class="aligncenter size-full wp-image-981" title="program structure" src="http://jakob.engbloms.se/wp-content/uploads/2009/10/program-structure.png" alt="program structure" width="300" height="458" /></a></p>
<p>This program and the queue subsystem had worked perfectly for a long time for me, running on an MPC8641 machine with a Linux 2.6.23 kernel, with 1 to 8 cores and 1 to 16 threads. Regardless of settings like thread counts, data sizes, number of packets to compute, it always ran smoothly and terminated.</p>
<p>However, the other week, I moved the program, the exact same binary even, over to a new software stack built on a Linux 2.6.27 kernel. Still on the same MPC8641 machine. Suddenly, I started to see occasional freezes where the program would never terminate. I added some more diagnostic printouts to the program, and saw that the main program would simply freeze waiting for the other threads to terminate and report in. The freezes had no real relationship to input variables. Maybe they were a bit more common with short packets, but no real pattern emerged. They also happened randomly, running the program with the same parameters for a few times in a row would sometimes result in a freeze. Using control-C to quit it and restart would keep the new instance of program running well. Doing some other demo work, I found the same effect on a P4080 machine with 8 cores and a 2.6.30 Linux kernel.</p>
<p>This is a common pattern for parallelism bugs: they only manifest themselves as actual visible crashes or freezes or bad computation results once something in the software stack has changed, even though the fundamental issues have been there all the time. In this case, I think it was the Linux scheduler, but it is really hard to tell. Just because a program runs fine today it does not have to run fine tomorrow.</p>
<p>After deciding to finally sit down and turn this lemon into lemonade, I had to reproduce the error. Thankfully, that is easy when you have a simulator. The first few times I had to run the target program 20 times or so before hitting the issue, but with some parameter and timing variations I managed to create a script that would open a <a href="http://jakob.engbloms.se/archives/714">checkpoint</a>, and run the program a few times under script control, triggering the bug on the fourth run (every time, thanks to determinism).</p>
<p>To diagnose the problem I wrote some Simics script code that I actually felt was fairly cool. I guessed that the problem had something to do with the queue and its handling of &#8220;done&#8221;, since that is what told the threads to terminate.</p>
<p>The first problem was that the queue was not a global variable. Instead, it was dynamically allocated on the heap by a function, and a pointer passed around, but never stored in a global variable (a good computer science graduate never uses a global variable other than as the means of last resort). Finally, my script set a breakpoint on the line in the setup function that came after the allocation. With the program stopped at that point, I could read the local variable pointing to the queue, and find and store the addresses of all the interesting members of the structure.</p>
<p>The code looked like this (Simics CLI), for the record:</p>
<pre> $mbp = ($ctx.break ($st.pos (rule30_threaded.c:222)))
 $cpu = (wait-for-breakpoint $mbp)
 $pq_addr  = ($cpu.sym "pq")
 $pq_tail  = ($cpu.sym "&amp;(pq-&gt;tail)")
 $pq_empty = ($cpu.sym "&amp;(pq-&gt;empty)")
 $pq_full  = ($cpu.sym "&amp;(pq-&gt;full)")
 $pq_head  = ($cpu.sym "&amp;(pq-&gt;head)")
 $pq_done  = ($cpu.sym "&amp;(pq-&gt;done)")</pre>
<p>Next, I set breakpoints on all writes to empty, full, and done. This was the most expedient route to catch actual puts and gets to the queue. Breakpoints on the queue_put() and queue_get() functions are not really showing the true flow, as these functions start by contending for the lock. Looking at writes to the actual queue members gave me the point where the tasks had grabbed the lock.</p>
<p>The script that caught all writes to done, full, and empty, and on each write, it dumped the state of the queue including computing out the number of elements in the circular buffer (without having to run any code on the target). To get an idea for who was active, it also used OS awareness to find the currently executing thread ID, and scripted debugging to convert the current program counter into a position in the program source code (actually, the important issue was the name of the function we were executing in).</p>
<p>This trace of activity showed quite an interesting pair of patterns. When the program ran well, the queue was mostly full, and it looked like the producer task always got some kind of priority to fill it before consumers could get in and drain it. When the program froze, the queue was seldom more than a few elements deep. This was the same program, on the same kernel, just run a few milliseconds later.</p>
<p>Clearly, the Linux kernel can exhibit quite variable behavior even for a program this simple. I guess that&#8217;s why this is called &#8220;soft real time&#8221;&#8230; Another parallelism lesson here: the scheduler is very important, and a smart adaptive scheduler can wreak havoc with software that was accidentally tuned for a different scheduler.</p>
<p>In the end, the crucial hint was that whenever the program froze, the &#8220;done&#8221; flag was set with a queue that was empty or contained just a few elements. I was sure that I had handled this case in my code, checking specifically for that and making sure to wake up the other threads with a signal that &#8220;the queue is not empty any more, please come check for more work&#8221;&#8230; but looking closely at the code, it turned out the code only woke up a single thread. Thus, the froze resulted from the producer setting &#8220;done&#8221; with an empty queue, waking up a single compute thread, and then having the other threads wait forever for more data to be put into the queue. The fix was easy: use a broadcast signal rather than a single signal.</p>
<p>In retrospect, it seems really strange that this ever worked reliably&#8230; it almost that I suspect the old Linux kernel of having a flawed pthreads implementation where signals always wake up all waiting threads, and not just a single one like the documentation says. But that will wait for another day to be investigated.</p>
<p>Here is the code, for reference:</p>
<pre>void rule30_packet_queue_signal_done(rule30_packet_queue_t *q) {
 //
 // Grab lock, set the done signal atomically
 //
 pthread_mutex_lock (&amp;(q-&gt;mutex));
 q-&gt;done = 1;
 pthread_mutex_unlock (&amp;(q-&gt;mutex));
 // Signal any threads waiting for data to wake up
 // and discover that we are indeed done
 //
 // This is the bug:
 // - It only wakes up one thread...
 pthread_cond_signal (&amp;(q-&gt;notEmpty));
 // To be correct:
 // pthread_cond_broadcast (&amp;(q-&gt;notEmpty));
}</pre>
<p><em>Updated analysis:</em></p>
<p>My initial analysis was that when things worked, the &#8220;done&#8221; flag was set with enough data left in the queue that all threads had a chance to pull in data and come in and see the done flag being set.</p>
<p>However, today I went back and wrote a deeper analysis script that also checked for reads from the done flag (turning this check on only after the write to &#8216;done&#8217; to reduce the noise). I expected there to be a single reader when the freeze happened&#8230; but that was not the case. In my current test case, three out of five threads actually got in to read the done flag and terminate.  The crucial code for the compute threads looks like this:</p>
<pre> // Grab mutex,
 //   Check if the queue is empty, if so wait for someone
 //   to push something onto the queue, or signal done.
 //   both of which are done by setting the not_empty conditional variable
 pthread_mutex_lock (&amp;(queue-&gt;mutex));
 while ((queue-&gt;empty) &amp;&amp; !(queue-&gt;done)) {
   pthread_cond_wait (&amp;(queue-&gt;notEmpty), &amp;(queue-&gt;mutex));
 }</pre>
<p>To freeze, a thread actually has to be doing the conditional wait here. There are plenty of other places threads can be as the program is finishing. For example, they can be waiting to grab the initial mutex lock, or actually doing compute work. That explains why some threads actually still terminate even with the buggy version. It certainly also illustrates just how chaotic concurrent programs can be. More so that you can ever imagine, really.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/975/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The S4D Debug Conference</title>
		<link>http://jakob.engbloms.se/archives/942?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/942#comments</comments>
		<pubDate>Sun, 27 Sep 2009 19:38:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[Hardware debug support]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[S4D]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=942</guid>
		<description><![CDATA[An unplanned and unexpected bonus with my trip to the FDL 2009 conference was the co-located S4D conference. S4D means System, Software, SoC and Silicon Debug, and is a conference that has grown out of some recent workshops on the topic of debugging, as seen from the perspective of hardware designers (mostly). S4D was part [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="S4D" width="143" height="62" />An unplanned and unexpected bonus with my trip to the FDL 2009 conference was the co-located <a href="http://www.ecsi-association.org/ecsi/s4d/s4d09/mainpage.asp">S4D conference</a>. S4D means <em>System, Software, SoC and Silicon Debug</em>, and is a conference that has grown out of some recent workshops on the topic of debugging, as seen from the perspective of hardware designers (mostly). S4D was part of the same package as FDL and DASIP, entrance to one conference got you into the other two too. As I did not know about S4D until quite late in the process, this was a great opportunity for me to look at what they were doing.</p>
<p><span id="more-942"></span></p>
<p>It was sufficiently interesting that I spent all of Thursday in S4D rather than in  FDL. It was really the first time that I have seen so many people working with practical embedded systems debug in the same room. Debug tends to be a topic at embedded systems conferences of various kinds, but then mostly from a fairly superficial technical perspective: assuming fairly simple software tools. Here,  there were presentations on how current hardware debug is being extended to incorporate powerful trace and debug and synchronous stop facilities.</p>
<p>It was very interesting to see Infineon, ST, and ARM present their work in on-chip debug. Users at ST, Nokia and Continental presented their view of debug requirements, uses, and current home-grown tools. There were presentations from EDA vendors showing off debuggers for hardware designs and some virtual platforms tools for software debug. Freescale presented how their HyperTRK debug agent works with their P4080 hypervisor, covering the software-instrumentation approach. Debug tends to be a field neglected by academia, but there were some academic papers presented as well. <a href="http://sourceware.org/gdb/wiki/GDB_7.0_Release">gdb7</a>&#8216;s multi-threaded debug abilities were mentioned. Pretty much the only topic missing in action was reverse execution.</p>
<p>This mixed audience gave rise to quite a few interesting discussions during the day. It was simple fun, as far as I am concerned.</p>
<p>The following were the main themes addressed and discussed:</p>
<ul>
<li>How to make customers of silicon chips appreciate the on-chip debug and not just consider it an unnecessary cost that could be avoided if only their software engineers did not make any mistakes. Answer: sell it as a performance optimization tool instead.</li>
<li>Multicore debug, including hardware-supported tracing and synchronized stop of multiple cores on a single SoC.</li>
<li>Given that we have massive traces from hardware and software debug and trace facilities, how can we actually find errors? Processing of trace information to detect anomalies is going to be an important issue in the future.</li>
<li>Performance bugs are the next frontier, after current concerns with functionality bugs.</li>
</ul>
<p>If I were to take a critical look at the conference and its scope, there were some things that were not covered.</p>
<ul>
<li>System-level debug, outside the scope of a single SoC, was not in any talk.</li>
<li>Almost all the speakers and attendees came from the world of consumer electronics and automotive systems. It would have been nice with some input from long-time parallel world of servers and operating systems, such as Microsoft&#8217;s debugger teams.  In a sense, this is the inverse of my complaint about the <a href="http://jakob.engbloms.se/archives/905">SiCS Multicore Day 2009</a>.</li>
<li>As well as compiler people involved in creating debug information and how they deal with parallel programs.</li>
<li>Security vs debuggability, a <a href="http://jakob.engbloms.se/archives/799">favorite topic </a>of <a href="http://www.strombergson.com/kryptoblog/">Joachim Strömbergsson</a>. It would have been fun if Joachim would have been there. I asked Rolf Kühnis from Nokia about <a href="http://www.mipi.org/">security in MIPI</a>, and he said that it simply was not in scope for MIPI: each manufacturer deals with it in their own way.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/942/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>FDL Impressions</title>
		<link>http://jakob.engbloms.se/archives/936?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/936#comments</comments>
		<pubDate>Thu, 24 Sep 2009 07:24:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[Peter Flake]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=936</guid>
		<description><![CDATA[This is end of the second day of FDL 2009, and it is proving to be quite an interesting experience. The location is very bad, apart from the weather (coming from a Swedish Fall where temperatures are dropping towards 10 C, to a sunny 27 C is quite nice). But Sophia Antipolis is just a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg"><img class="alignleft size-full wp-image-881" style="margin: 5px 10px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" /></a>This is end of the second day of <a href="http://www.ecsi-association.org/ecsi/fdl/fdl09/mainpage.asp">FDL 2009</a>, and it is proving to be quite an interesting experience. The location is very bad, apart from the weather (coming from a Swedish Fall where temperatures are dropping towards 10 C, to a sunny 27 C is quite nice). But Sophia Antipolis is just a tech park with some hotels, and you cannot get anywhere interesting or civilized without a car. No shops, no restaurants except for hotels, and so sidewalks in parts.</p>
<p>But the conference is good enough to be worth the bodily discomforts. And I did find a nice Parcours Sportif for the morning run, as well as a nice breakfast buffet at the Mercure Hotel.</p>
<p><span id="more-936"></span>So what were the highlights and themes of FDL?</p>
<ul>
<li>SystemC is literally everywhere, it is really the only simulation kernel that researchers are using. Often not so much for hardware simulation, as rather for general simulation of timed concurrent processes. Not exactly what it was designed for&#8230;</li>
<li>There is a lot of work on bridging abstraction levels and using multiple levels of timing detail for different purposes. That is a nice change from a tradition of &#8220;everything has to be cycle accurate&#8221; that tended to come out of hardware design in previous years.</li>
<li>Architecture exploration is big, as always.</li>
<li>Validity of virtual platforms and models keep coming up, some people are really too concerned about precise agreement with hardware. In practice, it does not matter than much if it is only 95% correct and 90% complete, as the software will work well enough anyway for the platform to be useful&#8230; but that is a hard message for hardware people to accept.</li>
<li>ST-Ericsson&#8217;s ex-NXP local office gave a couple of interesting presentation of how they were using SystemC. For one of the groups, they had an interesting confusion between &#8220;SystemC&#8221; and &#8220;Virtual Platforms&#8221;. They could not quite keep the language and application of it apart, which is indicative of the language-centricity of hardware designers in general. They did not even equate it with their tool, which would have been logical (they are using CoWare).</li>
<li>Peter Flake made some really good points and asked good questions in almost every presentation session. I definitely respect his deep understanding .</li>
</ul>
<p>I presented my talk on SystemC and Checkpointing, and it was quite interesting to hear the questions. The two main themes of my presentation was the explicit conversion from internal state to the external state held in a checkpoint, and the necessity to not use threads to enable decent checkpointing. The threading discussion continued for a quite a while&#8230; and led to some interesting observations.</p>
<p>It seems that most people can accept the idea of abandoning threads in SystemC for hardware modeling&#8211; but not for software modeling <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . Essentially, considering a hardware unit as an event-driven state machine is fairly natural and easy to understand. But when people try to model software behavior (directly in a SystemC model, not using an ISS to run the real code), they tend to think that threads are more natural and easy. However, for a typical software development use-case for a virtual platform you will run software on an ISS. I think we might have a useful generally acceptable design point for modeling coming up here, with hardware modeled as event-driven blocks that can be checkpointed and controlled by the simulator.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/936/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Freescale P4080, in Physical Form</title>
		<link>http://jakob.engbloms.se/archives/933?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/933#comments</comments>
		<pubDate>Thu, 17 Sep 2009 10:16:37 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DWF]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[Jonas Svennebring]]></category>
		<category><![CDATA[MPC5606]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=933</guid>
		<description><![CDATA[Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers. There were several topic areas, such as automotive, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png"><img class="alignleft size-full wp-image-878" style="margin: 5px 10px;" title="freescale-logo-icon" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png" alt="freescale-logo-icon" width="80" height="80" /></a>Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers.</p>
<p><span id="more-933"></span>There were several topic areas, such as automotive, consumer, and networking. Networking was mostly focused on the issues of multicore hardware and software.</p>
<p>Of particular interest to me was to see a <a href="http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0162468rH3bTdG25E4">Freescale QorIQ P4080 </a>8-core networking/control-plane processor live for the first time. This chip was <a href="http://jakob.engbloms.se/archives/137">announced in the Summer of 2008</a>, with a full ecosystem of software support thanks to <a href="http://www.virtutech.com/qoriq">Virtutech Simics</a>. Now, when the silicon is here, software is indeed running on it thanks to the long headstart development got with the virtual platform. Note that several demos at the event used the Simics simulator to show the software support for the P4080, as there was only a single chip to go around.</p>
<p>I would have loved to have a meaningful picture of the first P4080 in Europe, but  a chip is not really very photogenic &#8211; the P4080 processor was in an open computer case, but covered with a 10 cm-high heat sink which made it fairly hard to actually see. That&#8217;s the challenge with infrastructure things: they are not designed to be seen&#8230; just to do their job well. If you have a new consumer electronics processor, you can at least drive a screen quickly or something. But watching 28 Gbps of Ethernet traffic is not as easy <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Jonas Svennebring of Freescale gave a good talk about how the process of bringup on the P4080 had worked out. It was a total validation of the methodology of using virtual platforms, at different levels of abstraction, and slipping in a bit of hardware emulation as well.</p>
<p>Freescale started software development on the functional fast model, and when clock-cycle-level detailed models of subsystems became available, they started using them as well for performance validation for small pieces of code. Any discrepancies in behavior between the two models was then used to correct the models and documentation. Finally, as the RTL for the silicon began to become available, they used a few emulation setups to run parts of the actual RTL (the emulator could only handle a subset of the entire chip), and validate the performance numbers in the detailed model and the behavior of both models. In the end, when the first silicon became available, Linux was up in a very short time (I cannot give the exact number, but it was a matter of days rather than weeks).</p>
<p>This is the typical iterative process that all chip designers are implementing today: using virtual platforms you can get a head start on development of software, and then as more details become available, you tune models and update both designs, models, and software, iterating towards a hardware/software combination that just works once the silicon realization of the hardware comes around.</p>
<p>So that was all cool.</p>
<p>Jonas also showed a die photo of the QorIQ, and that confirmed by opinion from the <a href="http://jakob.engbloms.se/archives/905">SiCS Multicore Day</a>: embedded multicore is not just about processor cores and cache, it is very much about accelerators to help offload repetitive work from the processing cores. More than half the chip was such acceleration logic! To me, this is a clear confirmation that heterogeneity is the future of hardware design, and a useful way to spend hundreds of millions of transistors to boost SoC performance.</p>
<p>The same was true for most other Freescale hardware showcased at the event. For example, there was the <a href="http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC560xS">MPC5606S dashboard processor</a>, running an LCD display with lots of dynamic graphics with 0.2% CPU load on a 60 MHz e200 Power Architecture processor. All the work was done by its display driver and accelerator. It is hard to argue with that kind of efficiency. That chip did not need a heatsink, either. It was just mounted on the back of an example board with no need for any external logic chips. Apparently, it could also have moved some physical gauges and blinked LEDs, but that demo was considered too distracting for this particular setting.</p>
<p>I also gave a talk at the DWF, about debugging software on multicore using virtual platforms. That was fun, as always. Need to get out more on the road and talk in conferences, I think <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/933/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SiCS Multicore Day 2009</title>
		<link>http://jakob.engbloms.se/archives/905?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/905#comments</comments>
		<pubDate>Mon, 07 Sep 2009 19:26:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Anders Landin]]></category>
		<category><![CDATA[CPP]]></category>
		<category><![CDATA[Ericsson]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Hazim Shafi]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[MCC]]></category>
		<category><![CDATA[Richard Kaufmann]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Visual Studio 2010]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=905</guid>
		<description><![CDATA[Last Friday, I attended this year&#8217;s edition of the SiCS Multicore Day. It was smaller in scale than last year, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from Hazim Shafi of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday, I attended this year&#8217;s edition of the <a href="http://www.sics.se/node/4360">SiCS Multicore Day</a>. It was smaller in scale than <a href="http://jakob.engbloms.se/archives/283">last year</a>, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from <a href="http://blogs.msdn.com/hshafi/">Hazim Shafi </a>of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a mid-day three-track session with research and industry talks from the Swedish multicore community.<span id="more-905"></span></p>
<p>I think that for next year, the organizers need to find keynote speakers that are not from the general computing multicore world. The Microsoft talk this year was a step in that direction, as it rather came from multicore programming than multicore hardware. Richard and Anders gave very interesting and good talks, no doubt about it. But it would have been nice with someone from ARM or Freescale or Tensilica or TI or ST or Ericsson or Cisco talking about the kinds of multicore embedded hardware that is being developed and used today. For example, the &#8220;next new thing&#8221; touted by the keynotes this year was GPGPU. Interesting for HPC and desktops, certainly. But pretty irrelevant for most of the people that I know. GPUs are huge, expensive, and power hungry.</p>
<p>GPGPU was one part of the theme this year. It is definitely catching on as <em>the </em>way to do number crunching in the desktop, server, and HPC world. It is not the universal panacea for any kind of parallelism, however, as Hazim and I noted in the panel discussion that ended the day. There are applications (such as <a href="http://www.virtutech.com/whitepapers/accelerator.html">parallel Simics</a>&#8230;) that scale well on general-purpose cores, but that will never ever work on GPUs. In general, the class of problems that work on GPUs is pretty limited to massive data-parallel problems like image and video manipulation.</p>
<p>In the eternal homogeneous vs heterogeneous debate (follow <a href="http://jakob.engbloms.se/archives/tag/homogeneous">the tags </a>in my blog for more posts on this topic), GPGPU was grudingly accepted as a good candidate for something that will not be homogeneized with the main processors. Additionally, Richard Kaufmann gave some hints that Intel or AMD are coming out with new chips with more accelerators on board&#8230; I guess it will be security, as is already done by Sun and <a href="http://jakob.engbloms.se/archives/80">IBM</a>. When I brought up the topic of more accelerators like pattern matching, compression, and the other things we see in chips from Freescale, Cavium, and others, the response was very &#8220;can only be economical for very high volume applications&#8221;.</p>
<p>It is striking how the GPGPU idea is bringing the classic telecommunications DSP-data plane/CPU-control plane division into the desktop and server space. Without any recognition being paid or any experience being reused from the 40 years that that has been done in telecoms and consumer electronics&#8230; as Jack Ganssle often says, us embedded folks get no respect.</p>
<p>In terms of programming, this year was all about general programming languages. Hazim from Microsoft talked about (and demoed) the quite pervasive addition of parallelism to both native C/C++ and managed .net code in Visual Studio 2010. Microsoft is dead serious about parallel programming, and are bringing out a whole set of different libraries and support structures to allow <a href="http://blogs.msdn.com/pfxteam/archive/2009/08/12/9867246.aspx">easier expression of parallel code</a>. In the &#8220;LINQ&#8221; data query language subset of C#, you could add some easy modifiers to &#8220;foreach&#8221; statements to make them parallel, for example. Having a language that is your own and which you can extend at will certainly pays off in terms of innovation here. C++ moves far slower than C#, that is becoming clearer and clearer. C# and its cousins in the .net system seem to be sneaking in lots of powerful language design ideas from places like Python, and also results from Microsoft&#8217;s powerful group of language researchers.</p>
<p>When I tried to bring up the idea of using domain-specific languages to program parallel applications, Hazim had the wonderful comment that &#8220;that might be applicable in certain domains&#8230;&#8221; &#8212; yes, that is the idea. By being narrow in terms of target domains, you gain expressive power and semantic insight that helps move programming from &#8220;how&#8221; towards &#8220;what&#8221;. But it sounds like domain-specific is a foul word inside of Microsoft &#8212; when the audience asked whether LINQ was not a exactly a domain-specific language for data access, Hazim was a pains to point out that it is Turing-complete and that someone had managed to write a Raytracer using it&#8230; interesting. This feels more political than market-based. I guess Micro</p>
<p>Richard Kaufmann had some interesting notes on throughput vs TTC (time-to-completion) jobs in servers. In the &#8220;cloud computing&#8221; era, throughput is much easier to scale: just add more servers. Classic HPC is more oriented towards TTC, as you do want your results within a reasonable time. Quite often, you can most work into a throughput-oriented style by simply running lots of jobs in parallel rather than pushing through a series of jobs sequentially. Note however that we have the entire field of real-time control, real-time communications, etc., that do not work like this. But that is not the market that HP is building servers for, or that Intel and AMD are servicing.</p>
<p>Outside the keynotes, Per Holmberg of Ericsson gave an interesting presentation on the adoption of multicore in the control plane of the <a href="http://www.ericsson.com/ericsson/corpinfo/publications/review/2002_02/161.shtml">Ericsson CPP </a>platform. The core of his talk was the observation that in these kinds of systems, multicore is not such a big revolution.</p>
<p>They have been distributed since the beginning. Thus, scaling by adding more processors (with local memories) is easy and multicore is only a packaging change from that. Also, most performance-intense operations are already offloaded onto DSP groups, network processors, ASICs, or FPGAs. There is not much parallelism left for the control plane to exploit. Essentially, only functions that unexpectedly become performance bottlenecks due to changes in traffic patterns are likely candidates for parallellization. Interesting point, and might be <a href="http://jakob.engbloms.se/archives/703">why the EETimes noted that multicore is slow to catch on in communications </a>(the article is a bit flawed).</p>
<p>Patrik Nyblom from Ericsson held a talk about how the <a href="http://www.erlang.org">Erlang </a>runtime engine was parallelized. From a practical perspective, the most interesting aspect was that this made applications parallel without changing a single line of code in the applications. Of course, applications had to be threaded to start with, but that is the most natural way in Erlang. He mentioned systems containing up to a quarter of a million threads &#8212; hard to do that in anything except Erlang.</p>
<p>He described how they had evolved from a simple implementation that worked well on synthetic benchmarks to a truly industrial-strength implementation. The difference was quite radical, as real codes feature more complex communications patterns, and make heavy use of device drivers and network stacks. This process forced the use of more and finer locks, and rethinking the balance between shared and separate heaps for threads.</p>
<p>They also had the opportunity to test their solution on a Tilera 64-core machines. This mercilessly exposed any scalability limitations in their system, and proved the conventional wisdom that going beyond 10+ cores is quite different from scaling from 1 to 8&#8230; The two key lessons they learned was that <em>no shared lock goes unpunished, </em>and <em>data has to be distributed as well as code.</em> Very interesting to hear this story from real software developers solving real problems.</p>
<p>The next multicore event taking place around here is the Second <a href="http://www.it.uu.se/research/upmarc/MCC09">Swedish WOrkshop on Multicore Computing </a>(MCC 2009), in Uppsala, November 26-27.</p>
<p>Update: note that the presentations from the event are available via <a href="http://www.multicore.se/">http://www.multicore.se/</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/905/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Another Layer of Virtual Indirection</title>
		<link>http://jakob.engbloms.se/archives/893?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/893#comments</comments>
		<pubDate>Sun, 23 Aug 2009 19:41:06 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ethernet]]></category>
		<category><![CDATA[indirection]]></category>
		<category><![CDATA[networking]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=893</guid>
		<description><![CDATA[After a long break, this is another blog post in the series of &#8220;how to do modeling for virtual platforms&#8221;. The previous installments dealt with checkpointing and determinism. This post is about the use of indirection in a model to increase its flexibility and ease of use, at the cost of a bit more work [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-486" title="gears-modeling" src="http://jakob.engbloms.se/wp-content/uploads/2008/12/gears-modeling.png" alt="gears-modeling" width="62" height="65" />After a long break, this is another blog post in the series of &#8220;how to do modeling for virtual platforms&#8221;. The previous installments dealt with <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://jakob.engbloms.se/archives/734">determinism</a>.</p>
<p>This post is about the use of <strong>indirection </strong>in a model to increase its flexibility and ease of use, at the cost of a bit more work for the first model to be created.In particular, indirection in the sense of having explicit objects in a simulation to represent things like networks and cables connecting virtual machines.</p>
<p><span id="more-893"></span>There is a well-known saying (by <a href="http://en.wikipedia.org/wiki/David_Wheeler_%28computer_scientist%29">David Wheeler</a>) that &#8220;any problem in computer science can be solved with another                                 layer of indirection&#8221;. Among computer architects, this is often used with addition &#8220;&#8230;or a cache&#8221;. I think this is true, most of the time. The number of times that adding some indirection to an architecture for a program has simplified it &#8212; or made it feasible at all &#8212; are too many to count. It is at the very core of object-oriented programming, and the number of times you end up passing around function pointers is innumerable.</p>
<p>In the world of virtual platforms, there is one particular area where I see a pretty useful layer of indirection missing. Networks. Many virtual platform solutions offer various ways of connecting a virtual platform to a physical world, for interfaces like USB, Ethernet, or serial. Most virtual platforms achieve this by making the virtual hardware directly connect to the outside world.</p>
<p>Here is an illustration for Ethernet, where I have included a PHY in the picture. Quite often, you don&#8217;t even get that, just an Ethernet device that includes its PHY and connects out to a physical network. That&#8217;s what Qemu tends to do, for example.</p>
<p><img class="aligncenter size-full wp-image-895" title="No indirection" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/No-indirection.png" alt="No indirection" width="248" height="311" />This approach is the simplest when your thinking is that you will model a single device, and simulate one virtual machine at a time, connecting to the physical network to receive stimulus. For USB, this means the useful feature of connecting a camera or USB disk on your PC to the virtual machine. And as a bonus, you can connect multiple machines together using some form of cross-connection on the PC (such as TAP network interface).</p>
<p>However, there is a much better structure that is employed in some simulators. It is based on making each network an explicit object in the simulation, and have all virtual devices talk to the virtual network. Connections to the physical world are then handled by the virtual network, or, even better, by another device attached to the same virtual network.What you also get is the ability to connect multiple virtual devices to each other over the virtual network, and to easily write simulation modules that inspect or do fault-injection on the network traffic.</p>
<p>The picture below illustrates the idea for Ethernet:</p>
<p><img class="aligncenter size-full wp-image-896" title="indirection" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/indirection.png" alt="indirection" width="369" height="356" />The cost of this architecture is that you have to create the virtual network object, and invent the interface between devices and the network. This increases the cost for the first network device you create, and if all  you are tasked with is that single device, I can see why some simulation designers took the direct route. However, if you think about the task of creating tens of devices connecting to the same type of network, the &#8220;cost&#8221; of creating a virtual network is actually negative. Using an indirect approach like this makes creating each device simpler, and each device immediately gets the benefit of all the services that have been added to the virtual network. As long as a device can connect to the virtual network, it can connect to the physical network without any extra coding or cost.</p>
<p>Encapsulating entire networks with multiple virtual machines within a single simulation session <a href="http://www.virtutech.com/whitepapers/networking.html">is also very beneficial for control, inspection, and determinism. </a>Relying on a physical connection between virtual machines makes all packets pass the unreliable and random real world on their way between machines, destroying any determinism or control you might have hoped to incur.</p>
<p>In the world of SystemC simulation, an indirect approach like this is also a way to overcome some silly language limitations. Unbelievable as it might sound to the uninitiated, in SystemC you set up a simulation once into a single static setup (in something called the elaboration phase), and then that is what you simulate. There is no option to setup connection between modules or even add new modules to the simulation after the initial setup. Here, you can use a layer of indirection as a work-around. At the  start of simulation, connect all devices that might at some point in time be connected to a particular network to that network. During simulation, configure and reconfigure the network module to only allow traffic from and to certain modules, essentially creating a useful illusion that they are connected and disconnected from the network.</p>
<p>I hope I have convinced you: if you ever build a virtual platform, make sure to make all connections indirect.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/893/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Toast to Abstraction Layers</title>
		<link>http://jakob.engbloms.se/archives/888?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/888#comments</comments>
		<pubDate>Thu, 13 Aug 2009 19:41:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[gadgets]]></category>
		<category><![CDATA[general research]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[DAC 2009]]></category>
		<category><![CDATA[information hiding]]></category>
		<category><![CDATA[TheToasterProject]]></category>
		<category><![CDATA[Thomas Thwaites]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=888</guid>
		<description><![CDATA[I just found &#8220;The Toaster Project&#8220;, a Royal College of Art project where Thomas Twaites built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-890" style="margin: 10px 5px;" title="toaster" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/toaster.png" alt="toaster" width="81" height="87" />I just found &#8220;<a href="http://www.thomasthwaites.com/thomas/toaster/page2.htm">The Toaster Project</a>&#8220;, a Royal College of Art project where <a href="http://www.thomasthwaites.com/">Thomas Twaites </a>built a simple toaster from scratch. Really from scratch, going all they way back to iron ore and raw petroleum. In the process, he had to smelt ore, create plastic from petroleum, etc. It is a very interesting observation about the immense industrial complexity behind the very simple everyday items of our lives. I also think it has something to tell us computer scientists about abstraction.</p>
<p><span id="more-888"></span>What Thomas is showing is just how efficiently today&#8217;s economy manages to hide complexity from consumers (users). That toaster is just there on the shelf at a very low cost. If you take it apart, you will note that it is made from plastic which has been moulded to shape, and various bits and pieces of steel and copper wires. At that level, you feel that you could almost build it yourself. However, that is just the tip of the iceberg. What the toaster project reveals is the next level of abstraction and information hiding going on: that copper wire contains an enormously complex process in its making. From ore extraction, energy production to fuel the process, copper foundries, factories converting raw copper into wires, and a huge logistical machine to move things around.</p>
<p>In essence, we have a very nice example of information hiding and abstraction. As a user of the toaster, I do not need to understand how it works, and I do definitely not have any idea of the huge chain of suppliers leading up to its presence on my breakfast table.</p>
<p>That&#8217;s where are going with computers, but it is going to take time. Today, most users are fairly well shielded from how computers really work. Until they break down, at least. As programmers, we are less lucky. In practice, most good programmers end up understanding at least the basics of assembly language and the memory hierarchy of the machine.</p>
<p>What is hidden today is mostly the innards of the silicon. I have no real idea of how a processor works at the level of transistors and electrons. I don&#8217;t have to care about that, while any computer user fifty years ago probably had a decent understanding of the electronics. If nothing else, that was how you investigated hardware faults and actually built computers in a factory. Before integrated circuits, the electronic bits were much more exposed.</p>
<p>I think the current trend towards virtualization in the IT space and virtual platforms in the system design space is showing that the abstraction stack we are using in computing is getting deeper and more opaque. It takes some getting used to, but in the end, we have to realize that most computer programmers will be like the toaster user. All they want is a virtual toaster that toasts virtual bread in a way that lets them do their job.That is: write software that really does not care that much about the particulars of the hardware it is running on.</p>
<p>For the designer of a toaster (or even worse, the manufacturer of copper wire or the oil producer for the raw materials for the plastics), this takes some getting used to. We have to accept that in many cases, a simple abstraction is sufficient to help programmers get moving. There is no need for perfect timing accuracy or all the details of bus transactions. As long as what comes out is sufficiently similar to toast (a virtual toaster spitting out candy would be a bad abstraction), most users are happy.</p>
<p>Brian Bailey touched on this in a blog post following DAC, called &#8220;<a href="http://www.chipdesignmag.com/bailey/2009/07/30/accuracy-does-not-imply-accuracy/">Accuracy does not imply accuracy</a>&#8220;. Same idea as the toaster: you have to accept less detail, more abstraction, to get somewhere useful. Not everyone needs to go back to basics&#8230; and doing so tends to be counter productive in the end.</p>
<p>It is late now, but I think I will have toast and jam for breakfast tomorrow. Writing this got me hungry.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/888/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can we Rely on C?</title>
		<link>http://jakob.engbloms.se/archives/885?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/885#comments</comments>
		<pubDate>Mon, 10 Aug 2009 07:49:04 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[Michael Barr]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=885</guid>
		<description><![CDATA[I have written several times on this blog about the odd propensity of the &#8220;EDA&#8221; business to consider the C and C++ languages &#8220;high level&#8221; languages. They are what I use almost daily for most of the demo-order programming I do, but I still don&#8217;t consider them very high-level. High-level for me is scripting (Python, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-166" style="margin: 5px 10px;" title="whyc" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/whyc.png" alt="whyc" width="100" height="106" />I have written several times on this blog about the odd propensity of the &#8220;EDA&#8221; business to consider the C and C++ languages &#8220;high level&#8221; languages. They are what I use almost daily for most of the demo-order programming I do, but I still don&#8217;t consider them very high-level. High-level for me is scripting (Python, Lua, &#8230;) or domain-specific languages (DML, Lex, Yacc, MatLab, &#8230;) or model-driven development (UML, LabView, Simulink, &#8230;) or languages which at least provide sensible and reasonably safe semantics (Erlang, Java, &#8230;).</p>
<p>However, in fact, most the embedded industry and the &#8220;virtual platform&#8221; industry rely on C and C++ to get our daily jobs done. Question is, how much longer can we expect to do that? An interesting post at Embedded.com by Michael Barr brought back my argument that modeling needs to move up in levels of abstraction just like mainstream programming.</p>
<p><span id="more-885"></span></p>
<p>Michael Barr wrote the column &#8220;<a href="http://www.eetimes.eu/semi/218900394">Real Programmers Program in C</a>&#8220;, where he points out that knowledge of C is declining among computer science graduates. It is simply not efficient enough for simple mainstream work like creating web services and custom IT applications.</p>
<p style="padding-left: 30px;">Clever though he is, the young man admitted he wasn&#8217;t making that quote up on the spot. That &#8220;real men program in C&#8221; is part of a lingo he and his fellow computer science students developed while categorizing the usefulness of the various programming languages available to them. Exploring a bit, I learned the quiche-like phrase assigns both a high difficulty factor to the C language and a certain age group to C programmers. Put simply, C was too hard for programmers of their generation to bother mastering.</p>
<p>Obviously, if you take this argument to the extreme, you end up with the Monthy Python sketch where a bunch of old men are trying to trumph each other with the tough childhoods they had. In the end, they claim to have eaten just a handful of cold gravel for breakfast, walked 50 km to school, and having to clean the road each day&#8230; and kids these days, they just don&#8217;t understand&#8230;</p>
<p>But apart from the fact that kids in the western world today are very lazy and can&#8217;t stomach running 15km to school each day and therefore lack the toughness to match Kenyans in marathons there is a real issue here.</p>
<p style="padding-left: 30px;">The bottom line is that embedded programmers aren&#8217;t going to stop using C anytime soon. There are several reasons for this. First, C compilers are available for the vast majority of 8-, 16-, and 32-bit CPUs. Second, C offers just the right mix of low-level and high-level language features for programming at the processor and driver level. Until the use of C starts to turn down in future such surveys, C programming skills will remain important.</p>
<p>The issue is that universities are moving up in the efficiency scale of languages, teaching students good things rather than hard things. Not all universities do (and I am trying my best to lobby for keeping assembly language and device driver programming in the core computer science curriculum whenever I can), but it is clear that the market for &#8220;general IT stuff&#8221; is so much bigger that it will attract more students to &#8220;easy&#8221; languages like Ruby and VisualBasic.</p>
<p>So we need to move both embedded programming and virtual platform technology much more in this direction to maintain  a steady influx of smart people into the field. High-level synthesis of hardware and virtual platform models from a VisualBasic form? Sounds like a stretch&#8230;</p>
<p>We also need to jump into the education system and create the courses and motivate professors to teach lower-level languages. Not all are that familiar with actual practices in industry, unfortunately.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/885/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checkpointing in SystemC @ FDL</title>
		<link>http://jakob.engbloms.se/archives/880?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/880#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:48:26 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[ESL]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Checkpointing]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Marius Monton]]></category>
		<category><![CDATA[Mark Burton]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=880</guid>
		<description><![CDATA[Along with Marius Monton and Mark Burton of GreenSocs, I will be presenting a paper on checkpointing and SystemC at the FDL, Forum on Specification and Design Languages, in late September 2009. The paper will explain how we did Simics-style checkpointing in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-881" style="margin: 5px;" title="fdllogosmall" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdllogosmall.jpg" alt="fdllogosmall" width="80" height="79" />Along with Marius Monton and Mark Burton of <a href="http://www.greensocs.com">GreenSocs</a>, I will be presenting a paper on <a href="http://jakob.engbloms.se/archives/714">checkpointing </a>and <a href="http://www.systemc.org">SystemC </a>at the FDL, <a href="http://www.ecsi-association.org/ecsi/fdl/fdl09/mainpage.asp?fn=advance">Forum on Specification and Design Languages</a>, in late September 2009.</p>
<p>The paper will explain how we did <a href="http://www.virtutech.com/whitepapers/simics_checkpointing.html">Simics-style checkpointing </a>in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics attribute system.</p>
<p><span id="more-880"></span>It is an approach that does not have the limitations of the &#8220;save the entire simulation process&#8221; method employed by Cadence (and I think also CoWare) in their <a href="http://jakob.engbloms.se/archives/817">SystemC checkpointing solution</a>. It does require you to mark all relevant state in your models, but the benefit from doing so is that regardless of how you change the code of a model, you can still use the same old checkpoints. It is also portable across hosts. We did have to do some patching to the OSCI SystemC kernel to draw out and reset all relevant state from the kernel. The OSCI kernel does not provide sufficient interfaces to checkpoint its state in its vanilla form.</p>
<p>The conference takes place on September 22 to 24, in Sophia Antipolis in France. Now all I have to do is figure out how to get there in the most convenient way. I expect this to be as much fun as the other EDA conferences I have been to recently (I seem to only go to such events nowadays, nothing left on the old embedded circuit for me it seems).</p>
<p>By the way, the FDL logo is really pretty. I think all long-running events should spend the time to create a recognizable logo. My old real-time conferences used to just have plain text and the <a href="http://www.ieee.org">IEEE </a>and <a href="http://www.acm.org">ACM </a>logos.</p>
<p><img class="aligncenter size-full wp-image-882" title="fdl_logo_new" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/fdl_logo_new.jpg" alt="fdl_logo_new" width="435" height="159" /></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/880/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Downloadable Book about Embedded Multicore</title>
		<link>http://jakob.engbloms.se/archives/877?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/877#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:27:08 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[John Logan]]></category>
		<category><![CDATA[Jonas Svennebring]]></category>
		<category><![CDATA[Patrik Strömblad]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=877</guid>
		<description><![CDATA[Freescale has now released the collected, updated, and restyled book version of the article series on embedded multicore that I wrote last year together with Patrik Strömblad of Enea, and Jonas Svennebring, and John Logan of Freescale. The book covers the basics of multicore software and hardware, as well as operating systems issues and virtual [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.freescale.com"><img class="alignleft size-full wp-image-878" style="margin-left: 5px; margin-right: 5px;" title="freescale-logo-icon" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png" alt="freescale-logo-icon" width="80" height="80" /></a>Freescale has now released the collected, updated, and restyled <a href="http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf">book version </a>of the article series on embedded multicore that I <a href="http://jakob.engbloms.se/archives/423">wrote last year </a>together with Patrik Strömblad of <a href="http://www.enea.com">Enea</a>, and Jonas Svennebring, and John Logan of <a href="http://www.freescale.com">Freescale</a>. The book covers the basics of multicore software and hardware, as well as operating systems issues and virtual platforms. Obviously, the virtual platform part was my contribution.</p>
<p><span id="more-877"></span></p>
<p>It is one of the more comprehensive introductions to how to think about and use multicore architectures in the high-end embedded space. It is free to download and print, but if you want a printed copy, such can be ordered at a price of (I am told) 15 USD (did not try it myself).</p>
<p>The PDF is at <a href="http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf">http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf </a>.</p>
<p>It will also be linked from the &#8220;Documentation&#8221; section for most Freescale multicore chips&#8217; information pages.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/877/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The TLM DAC</title>
		<link>http://jakob.engbloms.se/archives/865?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/865#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:47:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[GreenSocs]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[tlm]]></category>
		<category><![CDATA[TLM-2.0]]></category>
		<category><![CDATA[Virtutech]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=865</guid>
		<description><![CDATA[The past few days here at DAC, a big theme has been transaction level modeling (TLM). TLM is often considered to be SystemC TLM-2.0. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />The past few days here at <a href="http://www.dac.com/46th/index.aspx">DAC</a>, a big theme has been transaction level modeling (TLM).</p>
<p>TLM is often considered to be <a href="http://www.systemc.org/apps/group_public/workgroup.php?wg_abbrev=tlmwg">SystemC TLM-2.0</a>. Most of the statements from the EDA companies are to the effect that SystemC TLM-2.0 solves the problem of combining models from different sources. Scratching the surface of this happy picture, it is clear that it is not that simple&#8230;</p>
<p><span id="more-865"></span>The issue is that even if all agree on using the TLM-2.0 standard and its default standard generic memory-mapped bus protocol and payload for the memory-map part of their device models, there are other interfaces which are not standard at this point in time.</p>
<p>For example, there is no standard way to model interrupts between devices. So any time you have interrupts in a system (which tends to be always), you need to write custom wrappers between modules to convert different ways of modeling interrupts. Even worse, the standard way to do it is to use SystemC signals, which are definitely not TLM abstractions. They take a detour through the SystemC kernel, which is quite costly.</p>
<p>The defining property (from a simulation execution perspective) of TLM is that your simulation modules talk directly to each other through <em>direct function calls</em>, rather than passing over whatever simulation kernel you happen to be using. Essentially, TLM tends to convert simulators into being much more like &#8220;regular programs&#8221;, with fewer references to the simulation kernel and its event and time handling. In my world, unless you are doing direct function calls, you are not doing TLM.</p>
<p>Note that this state of things in the SystemC world is likely to change for the better over time. <a href="http://www.greensocs.com">GreenSocs</a> announced at DAC that they are working with <a href="http://www.virtutech.com">Virtutech </a>and an unnamed other partner to create a set of TLM interfaces for other interconnects, such as signals (interrupts under another name), serial, and Ethernet.</p>
<p>But apart from all the technicalities of SystemC TLM-2.0 and how it works, the big question is just what to use TLM for, and how. Here, everyone seems to try to turn TLM into their own use cases. The most obvious application is doing fast virtual platforms, but you also have TLM use as the basis for hardware synthesis, validation, golden reference models, architectural exploration, and pretty much all other EDA design tasks.</p>
<p>Even so, the most important message for me is that the EDA industry is actually starting to get interesting in TLM. It is no longer a quaint odd thing done by some peripheral start-up companies, but rather a mainstream technology that everyone has to pay attention to.</p>
<p>Finally, I want to point out that TLM is not just SystemC. TLM is a general idea that has been in <a href="http://jakob.engbloms.se/archives/130">active use since the late 1960s</a>. It is the obvious way to model a computer, if all you are concerned about is how it looks to the software. Another current example is the <a href="http://www.virtutech.com/whitepapers/simics-tlm.html">Simics style of TLM</a> (<a href="http://www.virtutech.com/whitepapers/modeling.html">and here</a>), which is similar to but different in details from the SystemC implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/865/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Driving an Old Canon Scanner using a VM</title>
		<link>http://jakob.engbloms.se/archives/842?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/842#comments</comments>
		<pubDate>Wed, 15 Jul 2009 18:43:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Canon]]></category>
		<category><![CDATA[LIDE30]]></category>
		<category><![CDATA[scanner]]></category>
		<category><![CDATA[USB]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Vista]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[XP]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=842</guid>
		<description><![CDATA[I have an old Canon LIDE 30 scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-843" style="margin-left: 5px; margin-right: 5px;" title="lide30" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/lide30.gif" alt="lide30" width="100" height="67" />I have an old <a href="http://www.canon-europe.com/For_Home/Product_Finder/Scanners/Flatbed/LIDE30/index.asp">Canon LIDE 30 </a>scanner that I purchased sometime late in 2003. At that time, it was connected to a PC running Windows XP, and drivers worked just fine. However, after I got my new computer in early 2009, with Vista 64, there are no more drivers available. There is a funny way around this though, using a virtual machine.</p>
<p><span id="more-842"></span>What I ended up doing to keep using my scanner (whose hardware is still very much intact and solid) is fairly obvious: I installed my old Windows XP license on a VMWare virtual machine (I had the good luck to have a full license with physical media), and then install the Canon LIDE30 driver on that virtualized XP.</p>
<p>VMWare Player is sufficient to let me attach the physical scanner to the virtual machine&#8217;s USB interface, and drive it without the host Vista 64 machine being any the wiser. To get the scanned pictures out, I have to resort to drag-and-drop, as I have failed to get shared folders to work with Player for some unknown reason.</p>
<p>The end result can be pretty complex&#8230; To send some emails from my work computer including scans with this scanner, I had to:</p>
<ul>
<li> Scan on the virtual XP machine</li>
<li>Drag-and-drop to the Pictures folder on my Vista 64 machine</li>
<li>Use file-sharing in Windows to move to my work laptop</li>
<li>Attach in Outlook</li>
</ul>
<p>Workable. It is also a pretty good demo of the power afforded by modern consumer operating systems. Imagine trying to do that in 1995&#8230; would not have been quite as fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/842/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DAC 2009 Panel and Paper</title>
		<link>http://jakob.engbloms.se/archives/823?&amp;owa_from=feed&amp;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/823#comments</comments>
		<pubDate>Wed, 01 Jul 2009 12:38:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[hardware-software interface]]></category>
		<category><![CDATA[Jason Andrews]]></category>
		<category><![CDATA[Ross Dickson]]></category>
		<category><![CDATA[Wild West panel]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=823</guid>
		<description><![CDATA[The 46th Design Automation Conference (DAC) is coming up in San Francisco in the US, last week of July. For me, this will be the first time I ever go to DAC. I have been to a couple of Design Automation and Test Europe  (DATE) conferences before, but DAC is supposedly even bigger as an [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />The <a href="http://www.dac.com/46th/index.aspx">46th Design Automation Conference (DAC) </a>is coming up in San Francisco in the US, last week of July. For me, this will be the first time I ever go to DAC. I have been to a couple of <a href="http://www.date-conference.com/">Design Automation and Test Europe  (DATE) </a>conferences before, but DAC is supposedly even bigger as an event for the EDA and related communities. I have the honor to be on a panel this year, as well as co-authoring a paper on software validation.</p>
<p><span id="more-823"></span>The panel is called &#8220;<a href="http://www.dac.com/events/eventdetails.aspx?id=95-49">The Wild West: Conquest of Complex Hardware-Dependent Software Design</a>&#8220;, and takes place on Thursday, July 30, at 16.30, in room 131. We will be discussing hardware/software integration, multicore software, and other topics that I like. We will have a good mix of tool providers and tool users.</p>
<p>The paper is called &#8220;Design Flow for Embedded System Device Driver Development and Verification&#8221;, and is co-authored by me, Jason Andrews of Cadence, and my colleague Ross Dickson. It is presented in the user track session called &#8220;<span id="ctl00_Center_Content_Placeholder__lblEventTitle" class="sestitle"><a href="http://www.dac.com/events/eventdetails.aspx?id=95-3-U">Verification: A Front-End Perspective</a>&#8220;, on Tuesday, at 16.30. It deals with how you can use directed random testing to verify software drivers for custom hardware, using a virtual platform.<br />
</span></p>
<p>I will at the DAC all week, sounds like a great fun event!<span class="sestitle"><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/823/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
