<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; multicore debug</title>
	<atom:link href="http://jakob.engbloms.se/archives/category/parallel-computing/multicore-debug/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Photoshop Scalability and &#8220;-10% overhead&#8221;</title>
		<link>http://jakob.engbloms.se/archives/1311?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1311#comments</comments>
		<pubDate>Mon, 01 Nov 2010 11:45:51 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Cary Millsap]]></category>
		<category><![CDATA[Clem Cole]]></category>
		<category><![CDATA[Communications of the ACM]]></category>
		<category><![CDATA[GPGPU]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[performance optimization]]></category>
		<category><![CDATA[Photoshop]]></category>
		<category><![CDATA[Russell Williams]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1311</guid>
		<description><![CDATA[I just finished reading the October 2010 issue of Communications of the ACM. It contained some very good articles on performance and parallel computing. In particular, I found the ACM Case Study on the parallelism of Photoshop a fascinating read. There was also the second part of Cary Millsap&#8217;s articles about &#8220;Thinking Clearly about Performance&#8221;. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/10/cacm-10-20101.jpg"><img class="alignleft size-full wp-image-1313" style="margin: 10px 5px;" title="cacm 10 2010" src="http://jakob.engbloms.se/wp-content/uploads/2010/10/cacm-10-20101.jpg" alt="" width="62" height="80" /></a>I just finished reading the <a href="http://cacm.acm.org/magazines/2010/10">October 2010 </a>issue of <a href="http://cacm.acm.org/">Communications of the ACM</a>. It contained some very good articles on performance and parallel computing. In particular, I found the ACM Case Study on the parallelism of Photoshop a fascinating read. There was also the second part of Cary Millsap&#8217;s articles about &#8220;Thinking Clearly about Performance&#8221;.</p>
<p><span id="more-1311"></span>Cary&#8217;s articles deal mostly with database tuning in the Oracle ecosystem, but most of his observations apply to any kind of programming with a performance requirement. It is worth a read. It was good to see him dissect performance, including obvious &#8211; but not really obvious &#8211; concepts like the difference in usefulness between average and worst-case response times from a user perspective.  In essence, you need to watch the spread of response times, and try to keep the worst times from getting too bad, rather than just look at an average that might conceal extremes that frustrate users.</p>
<p>Cary also made the comment noted in the title of this post. In his opinion, the performance instrumentation built into Oracle has an overhead of -10% &#8211; or even -20% or -30%, since it enables optimizations that would otherwise have been impossible to do. This is something worth noting in general &#8211; overhead that looks bad when considered as a local cost might be a net benefit in the grand scale of things, by enabling measurements and insight that let a program run much faster.</p>
<p>The ACM case study on Photoshop can be found online as a <a href="http://queue.acm.org/detail.cfm?id=1858330">resource at the ACM Queue</a>, with what seems to be mostly the same content. It was written by Clem Cole, at Intel, who interviews Russell Williams of the Photoshop team. It is very instructive to see how the Photoshop team has built an application that works well with 2 to 4 and maybe 8 cores, but that really needs to reconsider parts of its architecture to scale beyond 8.</p>
<p>Clem from Intel pushes Russell by bringing up various examples of next-generation architectures, in particular the fact that clusters-on-a-chip and NUMA memories look inevitable. The Photoshop people seem to take a wait-and-see approach to this: they first want to see some architecture have real traction in the market before they commit and rearchitect their software to make use of it.</p>
<p>The problems of debugging parallel software are also brought up. There used to be a simple bug in the asynchronous I/O system in Photoshop that took ten years to uncover!  Essentially, the programmers had not considered atomicity properly in the presence of multiple threads. With that kind of example, it is not surprising that the Photoshop programmers are very careful when planning and performing parallelizations.</p>
<p>The target domain of Photoshop is to some extent naturally parallel, but not as much as I would have thought. Since a user might operate on any part of an image, large or small, and maybe start and then abort an operation, it is not just a matter of splitting a image evenly across threads or cores. There is a significant amount of variation in just how parallel things can be in Photoshop.</p>
<p>Photoshop has had an easy-to-use parallelization system in place since  around 1994, which lets programmers write simple serial computational  kernels which are automatically applied to parts of an image in  parallel. The Photoshop program itself takes care of the synchronization  between kernels, and the kernels can be simple and robust and without  any parallel code inside. This is a <a href="http://jakob.engbloms.se/archives/209 ">pattern that has been seen before</a>,  and which does make a lot of sense &#8211; if it can be applied successfully.  Apparently, this is not necessarily the easiest thing to scale beyond  four cores.</p>
<p>The main performance limitation for Photoshop performance keeps being memory bandwidth, rather than raw compute performance. This also limits the need to aggressively scale to higher levels of parallelism: as long as multiple threads do not give more bandwidth, it has proven hard to use more than two or three threads on any multicore processor as that is sufficient to saturate the memory system. Apparently, this is different on the Nehalem (Core i7/i5/i3) generation of Intel multicore processors, where each core has a dedicated non-stealable slice of the memory bandwidth.</p>
<p>For the near future, it seems that the big step for Photoshop is going the route of using GPUs for acceleration, rather than 10+ core main processors.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 25px; width: 1px; height: 1px; overflow: hidden;">http://queue.acm.org/detail.cfm?id=1858330</div>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1311"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1311" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1311" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1311/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>S4D 2010</title>
		<link>http://jakob.engbloms.se/archives/1251?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1251#comments</comments>
		<pubDate>Wed, 15 Sep 2010 08:02:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Debug]]></category>
		<category><![CDATA[ESCUG]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[Infineon]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[John Aynsley]]></category>
		<category><![CDATA[Pat Brouillette]]></category>
		<category><![CDATA[S4D]]></category>
		<category><![CDATA[Simon Davidmann]]></category>
		<category><![CDATA[Southampton]]></category>
		<category><![CDATA[ST]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Thorsten Grötker]]></category>
		<category><![CDATA[TrustZone]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1251</guid>
		<description><![CDATA[Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and the breaks, and I think we all got new insights and ideas.</p>
<p><span id="more-1251"></span></p>
<h2><a href="../wp-content/uploads/2010/09/P1140077.jpg"><img class="aligncenter size-full wp-image-1276" title="P1140077" src="../wp-content/uploads/2010/09/P1140077.jpg" alt="" width="400" height="258" /></a></h2>
<h2>S4D Talks, Themes, and Topics</h2>
<p>More is available in &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D part 2</a>&#8220;.</p>
<h3>Tracing and Instrumentation</h3>
<p>The papers presented covered a wide variety of topics from a variety of angles. Still, everybody felt that two topics kept coming back in various forms in a majority of the papers and discussions: <em>tracing</em> and <em>instrumentation</em>.</p>
<p>Code instrumentation is not a dirty word anymore. The traditional judgment that inserting probes into your software is plain bad does not apply anymore, at least not in the minds of the people at S4D. Instrumentation was applied to drivers, OS kernels, and regular user-level software. I think the key insight is that there is clear value in having the developers that write a piece of software also mark points of interest in the code. When analyzing a trace of an execution, that means that the information in the trace becomes meaningful to the software developers, as it is on the right level of abstraction. Instrumentation naturally produces traces, which can be fed out using  shared memory, networks, special-purpose hardware, and more.</p>
<p>One of the instrumentation trace solutions presented (the SVEN system from Intel Digital Home presented by Pat Brouillette), actually leaves the instrumentation in place in the shipping customer systems. In this way, you cannot really claim that instrumentation is intrusive &#8211; it is just part of the software, always. Customers can even activate the tracing in deployed systems, and ship the traces back to the developers for analysis of bugs found in the field. It is another approach to <a href="http://jakob.engbloms.se/archives/1231">record and replay</a> that touches on my paper on transporting bugs with checkpoints.</p>
<p>The increased interest in instrumentation probably has something to do with the nature of the systems that are being addressed. For systems using shared memory multicore hardware and general-purpose operating systems, the cost of instrumentation is easier to take than for very small constrained embedded systems. Essentially, as systems get more complex, instrumentation becomes more tractable.</p>
<p>Instrumentation can interact with hardware trace and debug functions is a neat way to build a system which is more powerful than a hardware or software system would be on its own. Especially for software stacks involving hypervisors and multiple complex operating systems, that is likely necessary.</p>
<p>Once we have a trace, just <a href="../archives/942">like last year</a>, we need to have tools for analyzing the tons of data you get from tracing a modern system. ST talked about a tracing system that generated 100s of gigabytes of data.</p>
<p>One trace aspect that kept coming up was the need for <em>time stamps </em>on trace data. To reconcile multiple traces and understand how different concurrent units talk to each other, a global time stamping mechanism is crucial. There seems to be work on hardware to support this.</p>
<h3>Security, Secrecy, and Debug</h3>
<p>I moderated a panel on hardware support for debug, and posed the question on how to balance security and the need to debug. This generated a number of interesting answers from the panel and the audience.</p>
<p>The conflict between debuggability and secrecy is there. Even from the same customer you first get &#8220;you have to make the internal state of the controller inaccessible and hidden to avoid customers modifying their engines&#8221;&#8230; and then when a problem appears in the field, they ask for a way to analyze and trace that very same system. Hard to support both requirements in a reasonable way.</p>
<p>A sophisticated solution to debug security from companies like ARM, Infineon, and ST is debug that can be enabled using key exchange. The chips are built with a &#8220;locked door&#8221; in place, but the keys to the door are kept well-guarded. In this way the same chip can be used in development and in the field.</p>
<p>To support debug of systems involving secure modes like ARM TrustZone, ARM has defined several levels of access in their CoreSight hardware modules. This makes it possible for a debugger to be restricted to just debugging user-level code, just OS and user-level code, or all of the software stack. To me, this sounds like it could allow mobile phone manufacturers to &#8220;securely&#8221; let their application developers use hardware-based debug, without compromising operating systems or secure boot modes.</p>
<p>The classic technique of using fuses to turn off functions is also relevant, at least for systems with moderate levels of security. This can certainly be overcome using special tools to peel off the top of chips and reconnect the fuses, but the panel seemed to think that that level of attack was in general not worth protecting against. However, the audience pointed out that  this was actually being done to automotive engine controllers and there are people making a good living from such antics.</p>
<h3>ESCUG Meeting</h3>
<p>The ESCUG meeting was a mix of fairly slick commercial presentations from OVP/IMperas chief Simon Davidmann and SystemC guru John Aynsley, and research presentations of varying quality.</p>
<p>One thing that struck me was that the academics spent a significant time in all presentations about how their approaches were compatible with the existing SystemC structure, where they host their open-source efforts, etc. I guess that is good in that they show a certain concern for reality &#8211; but it is also a bit sad that they did not get time to actually talk that much about the core ideas they were bringing forward. I am personally much more interested in new ideas than infrastructure and project management. It does not bode well for European research if this is what people are forced to produce, in lieu of real innovation.</p>
<h3>Thorsten Grötker&#8217;s Keynote</h3>
<p>On Wednesday morning, Thorsten from Synopsys did a look back over the history of SystemC, free from product pitching. He only mentioned Synopsys in his introduction, where the high-level message was that the embedded software is really the key problem for industry today. I cannot disagree with that.</p>
<p>During the SystemC parts of his talk he did say a few things that I did not quite agree with&#8230; in particular that TLM was unknown prior to 1999. It was not called that, but it certainly existed in the field of full-system simulation. The main problem is that Thorsten only sees the EDA history of modeling, not the computer architecture and software-driven work that did simulations as far back as 1950 (the famous Gill paper), and fast simulation since at <a href="http://jakob.engbloms.se/archives/130">least 1967</a>.</p>
<p>He also claims that with SystemC you have a single language for both detailed and TLM models. That is true&#8230; but you still need multiple models, one at each level of abstraction. So yes, one language, multiple models. However, that gluability really comes with a performance and complexity cost. It makes it too easy to slip into bad modeling even in TLM.</p>
<p>An interesting theme that Thorsten picked up from John&#8217;s talk at ESCUG is the use of SystemC to model software and RTOS, using the upcoming process control extensions. If you stretch that into the area of software synthesis, it means that SystemC is going to collide with the field of model-driven software development. Will you use SystemC, coming from the hardware world, or UML/MATLAB/Domain-specific languages coming from the software world?  Thorsten makes the interesting point that in order to integrate with that world, SystemC will require some concepts from that world (like pins and clocks enable interaction with RTL). I am not sure that is true, necessarily, I think you can just as well create point adaptors to the same effect.</p>
<h2>Getting to Southampton</h2>
<p>The <a href="http://www.soton.ac.uk/">University of Southampton </a>hosted the event, and it took place in the university lecture halls.  That means that we got free very fast WiFi (unlike any commercial conference venue I have ever seen).  The university campus was full of services (unlike the desolate place that last year&#8217;s FDL/S4D choose).  Housing in the <a href="http://www.soton.ac.uk/accommodation/halls/gleneyre/index.html">Glen Eyre residential halls </a>was a bit spartan but functional. Felt like being back in my days as a student living in student housing.</p>
<p>The instructions from the conference about how to get to the conference was a bit confusing and incomplete. In practice, it is very easy to get to Southampton from both Gatwick (direct train) and Heathrow (NationalExpess bus 203).  At Heathrow, I had a bit of luck with the bus to Southampton. The instructions from the NationalExpress website had me believe that I had to get from Terminal 5 where we landed to the central bus station and then catch the bus at 15.00. As we landed 40 minutes late (14.40), this looked very hopeless&#8230; until I found the NationalExpress counter in the arrivals hall at Terminal 5 and they told me the bus would leave at 15.30. Nice, no stress. The bus to Southampton even had free Wifi on board!</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062.jpg"></a><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg"><img class="aligncenter size-full wp-image-1275" title="P1140062-1" src="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg" alt="" width="400" height="246" /></a></p>
<p>Once in Southampton, you then had to take the bus U1A out to the university campus, and finding a bus stop for that was the most difficult part of the journey, actually. Some of the buses from Heathrow stop at Southampton university.</p>
<p>See also &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D Part 2</a>&#8221; for a few more tidbits from S4D.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1251"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1251" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1251" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1251/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: True Concurrency is Different</title>
		<link>http://jakob.engbloms.se/archives/1151?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1151#comments</comments>
		<pubDate>Fri, 18 Jun 2010 20:24:04 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[Wind River Blog]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1151</guid>
		<description><![CDATA[I have another blog up at Wind River. This one is about multicore bugs that cannot happen on multithreaded systems, and is called True Concurrency is Truly Different (Again). It bounces from a recent interesting Windows security flaw into how Simics works with multicore systems. Tweet]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="button-quicklink-blogs" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>I have another blog up at Wind River. This one is about multicore bugs that cannot happen on multithreaded systems, and is called <a href="http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html#more">True Concurrency is Truly Different (Again). </a>It bounces from a recent interesting Windows security flaw into how Simics works with multicore systems.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1151"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1151" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1151" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1151/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wind River Blog: Simics Analyzer</title>
		<link>http://jakob.engbloms.se/archives/1137?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1137#comments</comments>
		<pubDate>Wed, 26 May 2010 19:40:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Wind River Blog]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1137</guid>
		<description><![CDATA[I have a new blog post up at the Wind River blog network, about the new target analysis tools in Simics 4.4. It is a very fun piece of technology to play with, and you learn a lot just by poking around at existing software systems&#8230; Tweet]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png"><img class="alignleft size-full wp-image-1122" style="margin: 5px 10px;" title="button-quicklink-blogs" src="http://jakob.engbloms.se/wp-content/uploads/2010/04/button-quicklink-blogs.png" alt="" width="46" height="46" /></a>I have a <a href="http://blogs.windriver.com/engblom/2010/05/analyzed.html">new blog post </a>up at the Wind River blog network, about the new target analysis tools in Simics 4.4. It is a very fun piece of technology to play with, and you learn a lot just by poking around at existing software systems&#8230;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1137"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1137" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1137" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1137/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MCC 2009 Presentations Online</title>
		<link>http://jakob.engbloms.se/archives/1023?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1023#comments</comments>
		<pubDate>Thu, 03 Dec 2009 08:29:35 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[Andras Vajda]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[Ericsson]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[keynote]]></category>
		<category><![CDATA[LTE]]></category>
		<category><![CDATA[MCC]]></category>
		<category><![CDATA[UpMarc]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1023</guid>
		<description><![CDATA[The presentations from the 2009 Swedish Workshop on Multicore Computing (MCC 2009) are now online at the program page for the workshop. Let me add some comments on the workshop per se. This was the first multicore event that I have been to where we did not have a keynote speaker or technical paper from [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1016" style="margin-top: 5px; margin-bottom: 5px;" title="UPMARC_700x150" src="http://jakob.engbloms.se/wp-content/uploads/2009/11/UPMARC_700x150.gif" alt="UPMARC_700x150" width="122" height="45" />The presentations from the 2009 Swedish Workshop on Multicore Computing (MCC 2009) are now online at the <a href="http://www.it.uu.se/research/upmarc/MCC09/prog">program page for the workshop</a>. Let me add some comments on the workshop per se.</p>
<p><span id="more-1023"></span>This was the first multicore event that I have been to where we did not have a keynote speaker or technical paper from a hardware company. So there was really nothing here directly about how to build multicore chips. Rather, the workshop tended to be about how to program, use, measure performance on, verify software for, and generally work with multicore chips. From the perspective of software people, rather than hardware designers.</p>
<p>Obviously, hardware aspects enter into such talks, but it is the perspective of a user, not a designer. For example, a hardware designer could explain how an atomic compare-and-swap is optimized in a multicore device. But here, we saw measurements on the actual operation latencies observed on real machines using such operations. Quite refreshing, and closer to my personal interests.</p>
<p>The keynote by <a href="http://a-vajda.eu/blog/">Andras Vajda</a> of Ericsson was quite interesting. The slides are not online, but the main points that I picked up and that I might not have considered before:</p>
<ul>
<li>Software development costs can mean that the cheapest, fastest, most efficient hardware is not necessarily the most economic. Too hard to code for means the software development time and effort removes the advantage. Obvious, but worth reiterating. Software is king.</li>
<li>The workload on a cellular basestation can sometimes be highly linear and single-threaded. For example, serving a single terminal with a very high bandwidth LTE connection. And suddenly shift to a massively parallel workload as a crowd of a thousand all suddenly appear and start doing data downloads. And then go back to serial again. This means that the age-old argument that signal processing naturally &#8220;<a href="http://www.edn.com/blog/980000298/post/50023005.html">conveniently concurrent</a>&#8221; (<a href="http://www.scdsource.com/article.php?id=87">and here</a>) is not always true. Nice point!</li>
<li>Thus, we need adaptable architectures that can trade serial and parallel performance over time, and rebalance quite quickly. In the same chip.</li>
<li>He is a firm believer that homogeneous systems will win out in the end, I still hold on to a belief in accelerators and offload engines and DSPs. This is partially because of an admitted focus on servers and services processors, and not on the baseband and signalling side. Makes sense.</li>
<li>Domain-specific languages (DSL) are the future of efficient programming. Agree.</li>
</ul>
<p>On the topic of DSLs, there was a question about the cost to support them. To me, that is a non-issue. In the organizations that I have worked, it seems that maintaining a useful DSL requires at most one engineer. Developing one, a few good computer scientists for a fairly limited time. In any case, they tend to appear organically when good programmers <a href="http://jakob.engbloms.se/archives/747">generalize repeated tasks</a>.</p>
<p>I gave a keynote about how multicore has impacted virtual platforms (in particular, <a href="http://www.virtutech.com/products/simics">Virtutech Simics</a>) with the following main points:</p>
<ul>
<li>Multicore targets increase the performance pressure on a virtual platform, as more processors will have to be simulated.</li>
<li>Multicore hosts means that sequential performance of the host is going down compared to the aggregate parallel performance demands from the targets.</li>
<li>To handle large target systems, the virtual platform itself has to run multithreaded on a multicore host. Getting this in place is a major, interesting, and sometimes painful process.</li>
<li>Once you have a parallel virtual platform, multicore hosts provide a very nice boost in scalability and the manageable system sizes. A single multithreaded virtual platform process is also a bit easier to manage from a user perspective.</li>
<li>All features in the virtual platform have to be multicore and multimachine-aware&#8230; meaning that they often get a bit harder to use initially, as there is no &#8220;default processor&#8221; you can fall back to for debugging setups etc. Everything has to be explicitly targeted.</li>
<li>Multicore targets have proven to  be a great sales driver for virtual platforms, as debugging software on a physical multicore, multichip, multiboard system is just too painful.</li>
</ul>
<p>Overall, this was a fun event, looking forward to next year at Chalmers!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1023"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1023" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1023" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1023/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Freescale P4080, in Physical Form</title>
		<link>http://jakob.engbloms.se/archives/933?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/933#comments</comments>
		<pubDate>Thu, 17 Sep 2009 10:16:37 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DWF]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[Jonas Svennebring]]></category>
		<category><![CDATA[MPC5606]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=933</guid>
		<description><![CDATA[Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers. There were several topic areas, such as automotive, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png"><img class="alignleft size-full wp-image-878" style="margin: 5px 10px;" title="freescale-logo-icon" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png" alt="freescale-logo-icon" width="80" height="80" /></a>Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers.</p>
<p><span id="more-933"></span>There were several topic areas, such as automotive, consumer, and networking. Networking was mostly focused on the issues of multicore hardware and software.</p>
<p>Of particular interest to me was to see a <a href="http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0162468rH3bTdG25E4">Freescale QorIQ P4080 </a>8-core networking/control-plane processor live for the first time. This chip was <a href="http://jakob.engbloms.se/archives/137">announced in the Summer of 2008</a>, with a full ecosystem of software support thanks to <a href="http://www.virtutech.com/qoriq">Virtutech Simics</a>. Now, when the silicon is here, software is indeed running on it thanks to the long headstart development got with the virtual platform. Note that several demos at the event used the Simics simulator to show the software support for the P4080, as there was only a single chip to go around.</p>
<p>I would have loved to have a meaningful picture of the first P4080 in Europe, but  a chip is not really very photogenic &#8211; the P4080 processor was in an open computer case, but covered with a 10 cm-high heat sink which made it fairly hard to actually see. That&#8217;s the challenge with infrastructure things: they are not designed to be seen&#8230; just to do their job well. If you have a new consumer electronics processor, you can at least drive a screen quickly or something. But watching 28 Gbps of Ethernet traffic is not as easy <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Jonas Svennebring of Freescale gave a good talk about how the process of bringup on the P4080 had worked out. It was a total validation of the methodology of using virtual platforms, at different levels of abstraction, and slipping in a bit of hardware emulation as well.</p>
<p>Freescale started software development on the functional fast model, and when clock-cycle-level detailed models of subsystems became available, they started using them as well for performance validation for small pieces of code. Any discrepancies in behavior between the two models was then used to correct the models and documentation. Finally, as the RTL for the silicon began to become available, they used a few emulation setups to run parts of the actual RTL (the emulator could only handle a subset of the entire chip), and validate the performance numbers in the detailed model and the behavior of both models. In the end, when the first silicon became available, Linux was up in a very short time (I cannot give the exact number, but it was a matter of days rather than weeks).</p>
<p>This is the typical iterative process that all chip designers are implementing today: using virtual platforms you can get a head start on development of software, and then as more details become available, you tune models and update both designs, models, and software, iterating towards a hardware/software combination that just works once the silicon realization of the hardware comes around.</p>
<p>So that was all cool.</p>
<p>Jonas also showed a die photo of the QorIQ, and that confirmed by opinion from the <a href="http://jakob.engbloms.se/archives/905">SiCS Multicore Day</a>: embedded multicore is not just about processor cores and cache, it is very much about accelerators to help offload repetitive work from the processing cores. More than half the chip was such acceleration logic! To me, this is a clear confirmation that heterogeneity is the future of hardware design, and a useful way to spend hundreds of millions of transistors to boost SoC performance.</p>
<p>The same was true for most other Freescale hardware showcased at the event. For example, there was the <a href="http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC560xS">MPC5606S dashboard processor</a>, running an LCD display with lots of dynamic graphics with 0.2% CPU load on a 60 MHz e200 Power Architecture processor. All the work was done by its display driver and accelerator. It is hard to argue with that kind of efficiency. That chip did not need a heatsink, either. It was just mounted on the back of an example board with no need for any external logic chips. Apparently, it could also have moved some physical gauges and blinked LEDs, but that demo was considered too distracting for this particular setting.</p>
<p>I also gave a talk at the DWF, about debugging software on multicore using virtual platforms. That was fun, as always. Need to get out more on the road and talk in conferences, I think <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/933"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/933" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/933" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/933/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SiCS Multicore Day 2009</title>
		<link>http://jakob.engbloms.se/archives/905?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/905#comments</comments>
		<pubDate>Mon, 07 Sep 2009 19:26:27 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual machines]]></category>
		<category><![CDATA[Anders Landin]]></category>
		<category><![CDATA[CPP]]></category>
		<category><![CDATA[Ericsson]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Hazim Shafi]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[MCC]]></category>
		<category><![CDATA[Richard Kaufmann]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Visual Studio 2010]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=905</guid>
		<description><![CDATA[Last Friday, I attended this year&#8217;s edition of the SiCS Multicore Day. It was smaller in scale than last year, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from Hazim Shafi of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday, I attended this year&#8217;s edition of the <a href="http://www.sics.se/node/4360">SiCS Multicore Day</a>. It was smaller in scale than <a href="http://jakob.engbloms.se/archives/283">last year</a>, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from <a href="http://blogs.msdn.com/hshafi/">Hazim Shafi </a>of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a mid-day three-track session with research and industry talks from the Swedish multicore community.<span id="more-905"></span></p>
<p>I think that for next year, the organizers need to find keynote speakers that are not from the general computing multicore world. The Microsoft talk this year was a step in that direction, as it rather came from multicore programming than multicore hardware. Richard and Anders gave very interesting and good talks, no doubt about it. But it would have been nice with someone from ARM or Freescale or Tensilica or TI or ST or Ericsson or Cisco talking about the kinds of multicore embedded hardware that is being developed and used today. For example, the &#8220;next new thing&#8221; touted by the keynotes this year was GPGPU. Interesting for HPC and desktops, certainly. But pretty irrelevant for most of the people that I know. GPUs are huge, expensive, and power hungry.</p>
<p>GPGPU was one part of the theme this year. It is definitely catching on as <em>the </em>way to do number crunching in the desktop, server, and HPC world. It is not the universal panacea for any kind of parallelism, however, as Hazim and I noted in the panel discussion that ended the day. There are applications (such as <a href="http://www.virtutech.com/whitepapers/accelerator.html">parallel Simics</a>&#8230;) that scale well on general-purpose cores, but that will never ever work on GPUs. In general, the class of problems that work on GPUs is pretty limited to massive data-parallel problems like image and video manipulation.</p>
<p>In the eternal homogeneous vs heterogeneous debate (follow <a href="http://jakob.engbloms.se/archives/tag/homogeneous">the tags </a>in my blog for more posts on this topic), GPGPU was grudingly accepted as a good candidate for something that will not be homogeneized with the main processors. Additionally, Richard Kaufmann gave some hints that Intel or AMD are coming out with new chips with more accelerators on board&#8230; I guess it will be security, as is already done by Sun and <a href="http://jakob.engbloms.se/archives/80">IBM</a>. When I brought up the topic of more accelerators like pattern matching, compression, and the other things we see in chips from Freescale, Cavium, and others, the response was very &#8220;can only be economical for very high volume applications&#8221;.</p>
<p>It is striking how the GPGPU idea is bringing the classic telecommunications DSP-data plane/CPU-control plane division into the desktop and server space. Without any recognition being paid or any experience being reused from the 40 years that that has been done in telecoms and consumer electronics&#8230; as Jack Ganssle often says, us embedded folks get no respect.</p>
<p>In terms of programming, this year was all about general programming languages. Hazim from Microsoft talked about (and demoed) the quite pervasive addition of parallelism to both native C/C++ and managed .net code in Visual Studio 2010. Microsoft is dead serious about parallel programming, and are bringing out a whole set of different libraries and support structures to allow <a href="http://blogs.msdn.com/pfxteam/archive/2009/08/12/9867246.aspx">easier expression of parallel code</a>. In the &#8220;LINQ&#8221; data query language subset of C#, you could add some easy modifiers to &#8220;foreach&#8221; statements to make them parallel, for example. Having a language that is your own and which you can extend at will certainly pays off in terms of innovation here. C++ moves far slower than C#, that is becoming clearer and clearer. C# and its cousins in the .net system seem to be sneaking in lots of powerful language design ideas from places like Python, and also results from Microsoft&#8217;s powerful group of language researchers.</p>
<p>When I tried to bring up the idea of using domain-specific languages to program parallel applications, Hazim had the wonderful comment that &#8220;that might be applicable in certain domains&#8230;&#8221; &#8212; yes, that is the idea. By being narrow in terms of target domains, you gain expressive power and semantic insight that helps move programming from &#8220;how&#8221; towards &#8220;what&#8221;. But it sounds like domain-specific is a foul word inside of Microsoft &#8212; when the audience asked whether LINQ was not a exactly a domain-specific language for data access, Hazim was a pains to point out that it is Turing-complete and that someone had managed to write a Raytracer using it&#8230; interesting. This feels more political than market-based. I guess Micro</p>
<p>Richard Kaufmann had some interesting notes on throughput vs TTC (time-to-completion) jobs in servers. In the &#8220;cloud computing&#8221; era, throughput is much easier to scale: just add more servers. Classic HPC is more oriented towards TTC, as you do want your results within a reasonable time. Quite often, you can most work into a throughput-oriented style by simply running lots of jobs in parallel rather than pushing through a series of jobs sequentially. Note however that we have the entire field of real-time control, real-time communications, etc., that do not work like this. But that is not the market that HP is building servers for, or that Intel and AMD are servicing.</p>
<p>Outside the keynotes, Per Holmberg of Ericsson gave an interesting presentation on the adoption of multicore in the control plane of the <a href="http://www.ericsson.com/ericsson/corpinfo/publications/review/2002_02/161.shtml">Ericsson CPP </a>platform. The core of his talk was the observation that in these kinds of systems, multicore is not such a big revolution.</p>
<p>They have been distributed since the beginning. Thus, scaling by adding more processors (with local memories) is easy and multicore is only a packaging change from that. Also, most performance-intense operations are already offloaded onto DSP groups, network processors, ASICs, or FPGAs. There is not much parallelism left for the control plane to exploit. Essentially, only functions that unexpectedly become performance bottlenecks due to changes in traffic patterns are likely candidates for parallellization. Interesting point, and might be <a href="http://jakob.engbloms.se/archives/703">why the EETimes noted that multicore is slow to catch on in communications </a>(the article is a bit flawed).</p>
<p>Patrik Nyblom from Ericsson held a talk about how the <a href="http://www.erlang.org">Erlang </a>runtime engine was parallelized. From a practical perspective, the most interesting aspect was that this made applications parallel without changing a single line of code in the applications. Of course, applications had to be threaded to start with, but that is the most natural way in Erlang. He mentioned systems containing up to a quarter of a million threads &#8212; hard to do that in anything except Erlang.</p>
<p>He described how they had evolved from a simple implementation that worked well on synthetic benchmarks to a truly industrial-strength implementation. The difference was quite radical, as real codes feature more complex communications patterns, and make heavy use of device drivers and network stacks. This process forced the use of more and finer locks, and rethinking the balance between shared and separate heaps for threads.</p>
<p>They also had the opportunity to test their solution on a Tilera 64-core machines. This mercilessly exposed any scalability limitations in their system, and proved the conventional wisdom that going beyond 10+ cores is quite different from scaling from 1 to 8&#8230; The two key lessons they learned was that <em>no shared lock goes unpunished, </em>and <em>data has to be distributed as well as code.</em> Very interesting to hear this story from real software developers solving real problems.</p>
<p>The next multicore event taking place around here is the Second <a href="http://www.it.uu.se/research/upmarc/MCC09">Swedish WOrkshop on Multicore Computing </a>(MCC 2009), in Uppsala, November 26-27.</p>
<p>Update: note that the presentations from the event are available via <a href="http://www.multicore.se/">http://www.multicore.se/</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/905"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/905" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/905" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/905/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Downloadable Book about Embedded Multicore</title>
		<link>http://jakob.engbloms.se/archives/877?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/877#comments</comments>
		<pubDate>Sat, 08 Aug 2009 19:27:08 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[John Logan]]></category>
		<category><![CDATA[Jonas Svennebring]]></category>
		<category><![CDATA[Patrik Strömblad]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=877</guid>
		<description><![CDATA[Freescale has now released the collected, updated, and restyled book version of the article series on embedded multicore that I wrote last year together with Patrik Strömblad of Enea, and Jonas Svennebring, and John Logan of Freescale. The book covers the basics of multicore software and hardware, as well as operating systems issues and virtual [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.freescale.com"><img class="alignleft size-full wp-image-878" style="margin-left: 5px; margin-right: 5px;" title="freescale-logo-icon" src="http://jakob.engbloms.se/wp-content/uploads/2009/08/freescale-logo-icon.png" alt="freescale-logo-icon" width="80" height="80" /></a>Freescale has now released the collected, updated, and restyled <a href="http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf">book version </a>of the article series on embedded multicore that I <a href="http://jakob.engbloms.se/archives/423">wrote last year </a>together with Patrik Strömblad of <a href="http://www.enea.com">Enea</a>, and Jonas Svennebring, and John Logan of <a href="http://www.freescale.com">Freescale</a>. The book covers the basics of multicore software and hardware, as well as operating systems issues and virtual platforms. Obviously, the virtual platform part was my contribution.</p>
<p><span id="more-877"></span></p>
<p>It is one of the more comprehensive introductions to how to think about and use multicore architectures in the high-end embedded space. It is free to download and print, but if you want a printed copy, such can be ordered at a price of (I am told) 15 USD (did not try it myself).</p>
<p>The PDF is at <a href="http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf">http://www.freescale.com/files/32bit/doc/ref_manual/EMBMCRM.pdf </a>.</p>
<p>It will also be linked from the &#8220;Documentation&#8221; section for most Freescale multicore chips&#8217; information pages.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/877"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/877" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/877" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/877/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Article in ECNmag about Multicore and Virtual Platforms</title>
		<link>http://jakob.engbloms.se/archives/807?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/807#comments</comments>
		<pubDate>Tue, 09 Jun 2009 06:46:49 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ECNmag]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=807</guid>
		<description><![CDATA[I have a short article on multicore systems development and virtual platforms in the May 2009 issue of ECN magazine, over at www.ecnmag.com. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-808" style="margin: 5px;" title="ecn_logos" src="http://jakob.engbloms.se/wp-content/uploads/2009/06/ecn_logos.gif" alt="ecn_logos" width="84" height="52" />I have a short article on <a href="http://www.ecnmag.com/article-cover-story-Virtual-Platforms-051509.aspx">multicore systems development and virtual platforms </a>in the May 2009 issue of ECN magazine, over at <a href="http://www.ecnmag.com">www.ecnmag.com</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/807"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/807" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/807" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/807/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulation Determinism: Necessary or Evil?</title>
		<link>http://jakob.engbloms.se/archives/734?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/734#comments</comments>
		<pubDate>Sun, 19 Apr 2009 20:36:02 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[determinism]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[repeatability]]></category>
		<category><![CDATA[reverse execution]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=734</guid>
		<description><![CDATA[In my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-735" style="margin-left: 10px; margin-right: 10px;" title="gears" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears.png" alt="gears" width="56" height="57" />In my series (well, I have one previous post about <a href="http://jakob.engbloms.se/archives/714"><em>checkpointing</em></a>) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: <em>determinism.</em> Determinism is often misunderstood as meaning &#8220;unchanging&#8221; or &#8220;constant&#8221; behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.</p>
<p><span id="more-734"></span></p>
<h2>What?</h2>
<p>Determinism really means this:</p>
<ul>
<li>Given a certain initial state</li>
<li>And a certain sequence of external inputs</li>
<li>The end result and state of the simulation will always be the same</li>
</ul>
<p>The key to note is that you need to require both the starting state and the sequence of external inputs to be the same in order to get the same result. If either of these change, you can well get a different result. Implementing a deterministic simulator requires all internal events and activities in the simulator to be performed in the same order and at the same time in each simulation run. It means that the host computer environment state cannot be allowed to affect the simulator execution, and that in turn means that all sorting of internal events have to be done in defined orders in all instances.</p>
<p>I have a story about how hard that can be in practice. I once talked to some compiler developers who had the issue that when recompiling the same program with the same set of compiler options, the results might come out different, even on the same machine. The problem was that each run of the compiler was done in a different overall system state, and this might affect how the OS memory allocation functions allocated items in memory. It turned out that in some cases, the precise value of the <em>pointers </em>to the items in a complex data structure were used by standard libraries to handle iteration over nodes in the data structures. Thus, a different memory allocation pattern gave a different iteration order and a different traversal order of nodes, and in the end an almost arbitrarily different result. The correct solution they had to implement was to use a defined lexical ordering to traverse and iterate, not anything dependent on the state of the host machine. It is nothing different in a simulator: define the order of <em>everything</em>, in order to be deterministic.</p>
<h2>Why?</h2>
<p>The crucial benefit that determinism brings to a simulation in general and a virtual platform in particular is <em>repeatable debugging</em>. With determinism and an appropriate recording mechanism (and most practically <a href="http://jakob.engbloms.se/archives/714">checkpointing</a>) you can rely on being able to repeat a run resulting in a bug any number of times with the precise same sequence of events in the simulation. In particular, the same sequence and timing and timing relative to instructions executed for events visible to and relevant for the software running on the virtual platform. Especially for multicore and parallel computing systems this is incredibly powerful, and something that just cannot be achieved on physical hardware (due to its inherent randomness and chaotic behavior, see my 2006 and 2007 ESC Silicon Valley talks for more on this, at my <a href="http://www.engbloms.se/jakob_publications.html">publications </a>and <a href="http://www.engbloms.se/jakob_presentations.html">presentations </a>pages).</p>
<p>If you assume stability of the simulation infrastructure and the simulation platform, determinism also makes debugging the simulation itself easier. Often, a bug in a simulation model is repeatable, and with determinism, it is easy to repeat the same external stimulus sequence to the module and debug it repeatably.</p>
<p>Determinism also makes it easy to detect change in the behavior of a simulation: if the same simulation setup results in a different result or final simulation state, you know something in the setup (model, model parameters, or software) changed. There is no randomness that cause changes without some fundamental parameter being changed. Such boring reliable behavior is generally exactly what you want when testing and debugging large, complex systems.</p>
<p>Obviously, once determinism becomes a requirement, missing determinism in a model is a bug in itself &#8212; and finding such bugs can certainly be interesting exercises.</p>
<h2>Why Not?</h2>
<p>Just like for checkpointing, one reason not do to determinism is that it is hard, as discussed above.</p>
<p>The most common reason that people claim to want to avoid determinism is that they want to explore alternatives within their simulation. Basically, there is a need for <em>variability </em>that would seem to be at odds with determinism. The typical argument is that &#8220;if my simulation model contains a non-deterministic choice, I want the simulation to expose that and not just make the same decision every time&#8221;. This is where determinism tends to be considered <em>evil</em>. However, this argument is not correct.</p>
<p>If we take the case that at some point P in a simulation run there are two different events <em>E</em> and <em>F</em> that can fire (since they are both posted to the same point in virtual time), a deterministic simulator will always select one and the same. This is necessary to reap the system-level benefits discussed above. However, nothing prevents us from programming a change from this behavior into our system explicitly, <em>introducing controlled and repeatable variation. </em>In such a setup, we will have a random decision being made in each simulation run, but one where the outcome in any particular run can be repeated by setting the same random seed parameter.</p>
<p>This brings the best of both worlds: variation to expose issues where there is potential non-determinism or lack of synchronization in the model, and perfect repeatability of the issues this poses in terms of target software and simulation system behavior. The reason for the simultaneous readiness can be considered to be lacking synchronization in the model, in general, and such a randomizer of behavior will expose that at several different levels. But uncontrolled randomness is not the answer.</p>
<p>Another common misconception is that at a higher level, determinism in a virtual platform means that target software will always run in the same way. That is not true, and misses the importance of state in the deterministic behavior equation. If the initial state when a program starts is different, a different execution will result. If software is run on top of any non-trivial operating system, there is plenty of such variation. In one of our simplest Simics demos, we show this by running an intentionally buggy race-condition-ridden program. Each time it is run, it hits a different number of race conditions. But thanks to determinism (best demoed using reverse execution), we can repeat each run perfectly.</p>
<p>Thus, determinism is not equal to constant behavior or lack of variation.</p>
<h2>The reverse argument</h2>
<p>Finally, determinism is the simplest way to implement reverse execution: if you have recording, determinism, and checkpointing, you can easily virtually reverse the execution by going back to a checkpoint and replay the execution from that point. If you stop one instruction before the current instruction, you have in essence stepped backwards one step in time. This is how both VMWare and Simics implement reverse execution and debugging. And it could not happen without determinism.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/734"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/734" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/734" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/734/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM z10 Heavy-Duty Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/639?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/639#comments</comments>
		<pubDate>Sun, 15 Feb 2009 17:17:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[CECsim]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[z10]]></category>
		<category><![CDATA[zSeries]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=639</guid>
		<description><![CDATA[Unknown to most, IBM has one of the world&#8217;s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-640" style="margin: 5px;" title="ibm_z10" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/ibm_z10.png" alt="ibm_z10" width="118" height="118" />Unknown to most, IBM has one of the world&#8217;s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found in the <a href="http://www.research.ibm.com/journal/rd53-1.html">IBM Journal of Research and Development</a>, number 1, 2009, . It is called <a href="http://www.research.ibm.com/journal/rd/531/koerner.pdf">&#8220;IBM System z10 Firmware Simulation&#8221;, by Körner et al</a>.</p>
<p><span id="more-639"></span>The z10 is the latest generation of the classic IBM mainframe family that started with S/360 back in the 1960s. The simulation for just running the firmware of these beasts is making most other virtual platforms look positively puny &#8211; focusing on single SoCs for consumer or digital devices. It also shows that virtual platforms as a technology can scale all the way from single-core bare-metal simple machines that are useful for developing initial software for simple embedded systems up to servers and racks containing hundreds of processing units and very diverse hardware.</p>
<p>The teminology used is unusual, compared to the EDA/ESL and computer architecture research worlds. But it is good. The key concept is a &#8220;VPO&#8221;, Virtual Power On. For a computer of this class, doing Power On is a major event, and calling it a &#8220;boot&#8221; does not really cover its full complexity, involving many different layers of software running on the same and different computers. The VPO was targeted at four months prior to hardware tape-out &#8212; and this means that at that point in time the virtual system would be complete and the firmware complete enough to do a power on.</p>
<p>The simulation system used for the z10 mixes IBM&#8217;s in-house <a href="http://researchweb.watson.ibm.com/journal/rd/464/vonbuttlar.html">CECsim </a>with <a href="http://www.virtutech.com/solutions/virtual_platform/power">Virtutech Simics</a>. CECsim executes the code for the <a href="http://jakob.engbloms.se/archives/80">central zSeries processors</a>, while Simics simulates the FSP-1 &#8220;flexible support processor&#8221; based on the Power Architecture. In previous generations of simulation, the FSP code had been host-compiled and run on an x86 workstation instead of running the actual Power Architecture binaries. Running the real binaries brought additional verification value to the software, finding 3 times more bugs than in the previous host-based simulation:</p>
<blockquote><p>Because the Simics environment now enables us to execute all FSP code in simulation, a far greater amount of code is simulated. Correspondingly, the number of defects found in simulation also increased, by more than 33(Table 2).</p></blockquote>
<p>The article also describes how hardware-accelerated simulation of the actual VHDL of complex new IO chips were used to validate the bits-and-cycles-level interfacing between code and the logic, as well as to validate the logic design itself.</p>
<p>Overall, the article is one of best presentations of comprehensive use of various types of simulation tools and techniques to remove firmware defects as early as possible in the system development project.</p>
<p>For more on the history of this, I refer to a previous blog post here, &#8220;<a href="http://jakob.engbloms.se/archives/130">The 1970 rules strikes again</a>&#8220;, where I described some late 1960&#8242;s mainframe simulation technology and its uses. Also, browse the back issues of the IBM JRD archives, there are lots of nuggets to be found there!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/639"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/639" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/639" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/639/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Multicore Debug&#8221; Made Top Ten Embedded.com for 2008</title>
		<link>http://jakob.engbloms.se/archives/492?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/492#comments</comments>
		<pubDate>Thu, 01 Jan 2009 20:10:33 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[Embedded.com]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=492</guid>
		<description><![CDATA[Embedded.com just listed the ten most visited articles on their website during 2008, and my contribution on debugging multiprocessor code was number ten. If you want some more meat around multiprocessor debug, please peruse the various papers and presentations found on my personal website. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-493" style="margin: 5px 10px;" title="embeddedcom-logo" src="http://jakob.engbloms.se/wp-content/uploads/2009/01/embeddedcom-logo.gif" alt="embeddedcom-logo" width="117" height="59" />Embedded.com just <a href="http://www.embedded.com/design/212700107">listed the ten most visited articles on their website during 2008</a>, and my contribution on <a href="http://www.embedded.com/209101250">debugging multiprocessor code </a>was number ten. If you want some more meat around multiprocessor debug, please peruse the various <a href="http://www.engbloms.se/jakob_publications.html">papers </a>and <a href="http://www.engbloms.se/jakob_presentations.html">presentations </a>found on <a href="http://www.engbloms.se/jakob.html">my personal website</a>.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/492"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/492" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/492" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/492/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Article in Elektronik i Norden: Virtual Platforms</title>
		<link>http://jakob.engbloms.se/archives/423?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/423#comments</comments>
		<pubDate>Sat, 06 Dec 2008 19:48:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[elektronik i norden]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[p4080]]></category>
		<category><![CDATA[qoriq]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=423</guid>
		<description><![CDATA[I have an article appearing in the latest issue of Elektronik i Norden, about using virtual platforms for multicore computer systems. It is framed in the context of the Freescale multicore push, in particular the QorIQ P4080, and addresses the common issues of debug, execution speed, and the need to zoom in on details every [...]]]></description>
			<content:encoded><![CDATA[<p>I have an article appearing in the latest issue of <a href="http://www.elinor.se/index.php/Om-Oss.html">Elektronik i Norden</a>, about using <a href="http://www.webbkampanj.com/ein/0818/?page=41">virtual platforms for multicore computer systems</a>. It is framed in the context of the <a href="http://www.freescale.com">Freescale </a>multicore push, in particular the <a href="http://www.freescale.com/webapp/sps/site/prod_summary.jsp?fastpreview=1&amp;code=P4080">QorIQ P4080</a>, and addresses the common issues of debug, execution speed, and the need to zoom in on details every once in a while.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/423"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/423" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/423" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/423/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Swedish Workshop on Multicore 2008: Nov 27-28: CFP!</title>
		<link>http://jakob.engbloms.se/archives/234?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/234#comments</comments>
		<pubDate>Fri, 22 Aug 2008 08:00:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[MCC]]></category>
		<category><![CDATA[Swedish Workshop on Multicore Computing]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=234</guid>
		<description><![CDATA[The first Swedish Workshop on Multicore Computing (MCC) will take place in Ronneby on November 27 and 28, 2008. The call for papers is now out, and it is open until September 26. If you have something cool to present or publish about multicore computing, and happen to be here in Sweden, please do submit [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-125 alignleft" style="margin: 5px 10px;" title="coreshrink1" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/coreshrink1.png" alt="Shrinking cores" width="100" height="100" /></p>
<p>The first <a href="http://www.bth.se/mcc08">Swedish Workshop on Multicore Computing (MCC) </a>will take place in Ronneby on November 27 and 28, 2008. The call for papers is now out, and it is open until September 26. If you have something cool to present or publish about multicore computing, and happen to be here in Sweden, please do submit an abstract!</p>
<p>Disclosure: I am in the program committee for this event.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/234"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/234" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/234" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/234/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EETimes Article on Multicore Debug</title>
		<link>http://jakob.engbloms.se/archives/154?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/154#comments</comments>
		<pubDate>Sun, 20 Jul 2008 08:35:05 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[EETimes]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[multicore]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=154</guid>
		<description><![CDATA[I have another short technical piece published about Multicore Debug at the EETimes (and their network of related publications, like Embedded.com). Pretty short piece, and they cut out some bits to make it fit their format. Nothing new to fans of virtual platforms for software development, basically we can use virtual platforms to reintroduce control [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-155 alignleft" style="margin: 10px;" title="eetimes logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/07/eetimes.png" alt="" width="127" height="56" />I have another short technical piece published about <a href="http://www.eetimes.com/news/design/showArticle.jhtml?articleID=209100262">Multicore Debug at the EETimes </a>(and their network of related publications, like <a href="http://www.embedded.com/design/209101250">Embedded.com</a>). Pretty short piece, and they cut out some bits to make it fit their format. Nothing new to fans of virtual platforms for software development, basically we can use virtual platforms to reintroduce control over parallel and for all practical purposes chaotic hardware/software systems.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/154"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/154" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/154" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/154/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Power Architecture Conference slides online</title>
		<link>http://jakob.engbloms.se/archives/148?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/148#comments</comments>
		<pubDate>Thu, 10 Jul 2008 20:38:07 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[Power Architecture Conference]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=148</guid>
		<description><![CDATA[The slides from the Power Architecture Conference in München and Paris are now online (and have been for a few weeks) at the Power.org site for the event. Some interesting things there about Power Architecture in particular but also virtual platforms were an almost main theme of the show. Tweet]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-104" style="float: left; margin-left: 10px; margin-right: 10px;" title="powerlogo" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/powerlogo.jpg" alt="Power.org Logo" width="79" height="100" />The slides from the Power Architecture Conference in München and Paris are now online (and have been for a few weeks) at the <a href="http://www.power.org/events/powercon/munich/">Power.org site for the event</a>. Some interesting things there about Power Architecture in particular but also virtual platforms were an almost main theme of the show.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/148"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/148" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/148" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/148/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Power Architecture Conference München 2008</title>
		<link>http://jakob.engbloms.se/archives/128?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/128#comments</comments>
		<pubDate>Fri, 23 May 2008 17:07:16 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[Power Architecture Conference]]></category>
		<category><![CDATA[Simics]]></category>
		<category><![CDATA[Simics Accelerator]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=128</guid>
		<description><![CDATA[On Tuesday next week, I will be presenting at the Power Architecture Conference (PAC) in München, Germany. The topics will be multicore debug using virtual hardware, and the new Simics Accelerator technology. Especially Simics Accelerator is pretty interesting technology. It is a simple idea, using multiple host cores to run a virtual platform, with fairly [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft alignnone size-medium wp-image-104" style="float: left; margin-left: 10px; margin-right: 10px; margin-top: 0px; margin-bottom: 0px;" title="powerlogo" src="http://jakob.engbloms.se/wp-content/uploads/2008/04/powerlogo.jpg" alt="Power.org Logo" width="79" height="100" />On Tuesday next week, I will be presenting at the <a href="http://www.power.org/events/powercon/munich/">Power Architecture Conference</a> (PAC) in München, Germany. The topics will be multicore debug using virtual hardware, and the new Simics Accelerator technology. Especially <a href="http://www.virtutech.com/products/simics_accelerator.html">Simics Accelerator </a>is pretty interesting technology.</p>
<p>It is a simple idea, using multiple host cores to run a virtual platform, with fairly amazing results. Now, using a single computer we can run fairly incredible simulations that were the realm of pure fantasy just a few years ago. We also got a nice new little box to demonstrate it with, an eight-core Dell with 16 GB of RAM. With 64-bit Linux, this thing makes my Core 2 Duo laptop with 32-bit Vista look like yesteryear&#8217;s snail&#8230;  And creates that giggling feeling that a really impressive new toy brings up in even the most grown up boys. Booting a 16-machine network of PowerPC boards was so fast it was not demoworthy.  I think we have to up the ante to some 100 target machines to make it interesting, and I have no doubt that a combination of multithreading and idle-loop optimization will make that thing be usefully interactive from the target command lines. There are many other wild things we could try on that demo box, once it gets back from the Power Architecture Conferences tour.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/128"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/128" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/128" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/128/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtual Platform by Virtualization Extensions &#8212; 1969</title>
		<link>http://jakob.engbloms.se/archives/121?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/121#comments</comments>
		<pubDate>Sun, 11 May 2008 18:53:11 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[1969]]></category>
		<category><![CDATA[conference paper]]></category>
		<category><![CDATA[HITAC-8400]]></category>
		<category><![CDATA[Hitachi]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[SOSP]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=121</guid>
		<description><![CDATA[By means of a trip down virtualization history, I found a real gem in 1969 paper called A program simulator by partial interpretation, by Kazuhiro Fuchi, Hozumi Tanaka, Yuriko Manago, Toshitsugu Yuba of the Japanese Government Electrotechnical Laboratory. It was published at the second symposium on Operating systems principles (SOSP) in 1969. It describes a [...]]]></description>
			<content:encoded><![CDATA[<p>By means of a trip down virtualization history, I found a real gem in 1969 paper called <a href="http://portal.acm.org/citation.cfm?id=961053.961092&amp;coll=ACM&amp;dl=ACM&amp;CFID=67556471&amp;CFTOKEN=25257537"><strong>A program simulator by partial interpretation</strong>,</a> by Kazuhiro Fuchi, Hozumi Tanaka, Yuriko Manago, Toshitsugu Yuba of the Japanese Government Electrotechnical Laboratory. It was published at the <span class="mediumb-text">second symposium on Operating systems principles</span> (SOSP) in 1969. It describes a system where regular target instructions are directly interpreted, and any privileged instructions are trapped and simulated. Very similar to how VmWare does it for x86, or any other modern virtualization solution.</p>
<p><span id="more-121"></span></p>
<p>The interesting bit is really the uses that this system is put to:</p>
<blockquote><p>In promoting the ETSS project a program simulator based on an idea of partial interpretation has been constructed, and its principle and design are described in the paper. This new approach has been introduced to provide the simulator with such features as high speed and high accuracy in simulation and simplification in implementation. The essence of the idea of partial interpretation is using direct execution of instructions by hardware and simulation of them by an interpreter in combination, wherewith the hardware interrupt mechanism intermediates the two phases of the whole simulation. An interruption takes place when executing a &#8220;privileged&#8221; instruction, which triggers the simulation of the instruction. The other type of instructions are normally rendered to direct execution by hardware. The simulation method for devices operating in parallel is also described with respect to the timing control and scheduling. <strong>A program simulator of this type provides a powerful tool for debugging &#8220;supervisor &#8221; programs and opens a new approach to system expansion</strong>.</p></blockquote>
<p>Note that last part. This is essentially a virtual machine used for operating-system debug. So far, the earliest mention of this idea that I have found. There are similar ideas in a classic 1972 IBM paper. If anyone has seen anything older, please comment and tell me!</p>
<p>It is also fun reading these old papers&#8230; they are usually scanned from a paper copy, and therefore really show how papers looked and felt forty years ago. Long before desktop publishing, or even TeX.</p>
<p><img class="aligncenter size-full wp-image-122" title="fuchi-1969" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/fuchi-1969.png" alt="Abstract of Fuchi 1969 paper" /></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/121"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/121" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/121" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/121/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SICS Multicore Day August 31</title>
		<link>http://jakob.engbloms.se/archives/17?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/17#comments</comments>
		<pubDate>Sun, 02 Sep 2007 20:13:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Hardware debug support]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Joe Armstrong]]></category>
		<category><![CDATA[Niagara]]></category>
		<category><![CDATA[QuviQ]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[Sun]]></category>
		<category><![CDATA[transactional memory]]></category>
		<category><![CDATA[UltraSPARC]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/17</guid>
		<description><![CDATA[The SICS Multicore Day August 31 was a really great event! We had some fantastic speakers presenting the latest industry research view on multicores and how to program them. Marc Tremblay did the first presentation in Europe of Sun&#8217;s upcoming Rock processor. Tim Mattson from Intel tried hard to provoke the crowd, and Vijay Saraswat [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.sics.se/node/1854">SICS Multicore Day August 31 </a>was a really great event! We had some fantastic speakers presenting the latest industry research view on multicores and how to program them. Marc Tremblay did the first presentation in Europe of Sun&#8217;s upcoming Rock processor. Tim Mattson from Intel tried hard to provoke the crowd, and Vijay Saraswat of IBM presented their X10 language. Erik Hagersten from Uppsala University provided a short scene-setting talk about how multicore is becoming the norm.</p>
<p><span id="more-17"></span><br />
The Rock is a very interesting piece of work. It tries to be both a throughput-oriented design like the Niagara/Ultrasparc T machines, and a single-thread high-performance design. Even though on balance, it is more skewed towards the throughput computing aspect. What is very cool is how they use additional threads to help boost the performance of a main thread using &#8220;scout threads&#8221; (a concept I saw presented back at ISCA 2004). This makes it possible to use threads to either boost single-thread performance OR do throughput, creating a more flexible design than is usually the case. It is also the first commercial implementation of <a href="http://research.sun.com/spotlight/2007/2007-08-13_transactional_memory.html">transactional memory</a>. And 16-way. And due for next year.</p>
<p>So far, Rock seems like a very successful and very visionary project that is trying in yet another way to gain momentum by pure hardware innovation. Just like the UltraSparc T line, Sun is trying to out-invent IBM and Intel/AMD. Who seem to be mostly progressing by just piling on more of the same old features. I really hope this play goes well, if we were down to just IBM/PPC &amp; System Z and Intel-AMD/x86-64 on the server and desktop side, the world would just be too boring.</p>
<p>The Intel and IBM talks on programming were both grounded in the idea that to make people accept a new programming language/API, it has to be an evolution of what the programmers already know. Which pretty much ties us down to C/C++/Java/C# with extensions and modified semantics.</p>
<p>X10 is basically Java with some nicely considered features to support local and global memories and programs that can scale to BlueGene-style massively clustered machines. Tim basically tells everyone to stop inventing new languages and focus on improving existing frameworks like MPI and OpenMP in collaboration with industry. Presented in a very funny style, Tim is a great presenter, and tries hard to get the audience to react. In this crowd, most people agreed. Except the Erlang people, who feel that they do have a better solution to multithreading and multicore than any patched-up language in the C-Java family. I must agree with them, and I do feel that <a href="http://www.erlang.org/">Erlang </a>today is mature enough to serve that purpose.</p>
<p>The panel session at the end was very entertaining, where some people (including myself and <a href="http://armstrongonsoftware.blogspot.com/">Joe Armstrong</a>) tried to ask tough questions to the keynote speakers (and Ulf Wiger of Ericsson). Quite engaging and a rare chance to directly engage with some industry heavyweights who otherwise tend to sit on the other side of the Atlantic.</p>
<p>I think the prize for coolest tech of the day goes to <a href="http://www.quviq.com/">QuviQ</a>, a spin-off from <a href="http://www.chalmers.se/">Chalmers </a>doing automated testing tools that really work well for parallel and distributed systems.  Their method of minimizing the trace of a failed test case is really interesting, and finds things that no human tester would ever find.</p>
<p>I also presented a talk on &#8220;Debugging Multicore Software using Virtual Hardware&#8221;, in the breakout sessions. I guess our Tools track was the least visited of the three tracks, but the audience asked some good questions. And there were some good discussions afterwards.</p>
<p>However, to summarize the day, I am a bit disappointed that not more is being done on the hardware side to help people debug their multicore and multiprocessor parallel programs.  Transactional memory is all nice and dandy and can help simplify low-level locking primitives for threaded programs. But I would like to see much more in terms of smart tracing, hardware breakpoints and triggers, massive synchronized stops, and similar features. And instructions and features that make parallel expressions simpler. Here, the embedded folks doing things like <a href="http://www.arm.com/products/solutions/CoreSight.html">ARM CoreSight</a> seems to have been much more successful than the server-class designers at Sun, Intel, and IBM. But even ARM do not spend more than 10-15% of the chip area on debug support.</p>
<p>I think it would be interesting to  see what would happen if you could spend 25-30% of the chip on some seriously powerful debug features. Full support for remote control of all cores at the same time, lots of bandwidth for debug data and commands, and fat traces of all traffic on and off the chip. Performance and event counters everywhere. That would make the peak performance of chip likely less than a competing chip not spending as much space on debug support &#8212; but it would make achieving a high utilization much easier, and that might actually make the debug-intense chip more economical. Would be interesting to try. But I guess nobody would dare to buy such a design.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/17"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/17" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/17" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/17/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

