<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; blog commentary</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/blog-commentary/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Grant Martin on the &#8220;Verification is 70% of the Effort&#8221; Claim</title>
		<link>http://jakob.engbloms.se/archives/361?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/361#comments</comments>
		<pubDate>Sun, 30 Nov 2008 08:16:36 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Grant Martin]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=361</guid>
		<description><![CDATA[Over at Taken for Granted, Grant Martin just did a very good write-up on the &#8220;accepted fact&#8221; that verification is seventy percent of a chip design effort. It is not exactly easy to prove this point, but is it really just an urban myth that has gained credibility by being repeated over and over again? [...]]]></description>
			<content:encoded><![CDATA[<p>Over at Taken for Granted, Grant Martin just did a very good write-up on <a href="http://www.chipdesignmag.com/martins/2008/11/27/the-myths-of-eda-the-70-rule/">the &#8220;accepted fact&#8221; that verification is seventy percent of a chip design effort</a>. It is not exactly easy to prove this point, but is it really just an urban myth that has gained credibility by being repeated over and over again?</p>
<p>Go over there to see what he has to say.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/361"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/361" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/361" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/361/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cadence-Ran vs Synopsys-Frank over Low-Power and Virtual Things</title>
		<link>http://jakob.engbloms.se/archives/344?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/344#comments</comments>
		<pubDate>Sat, 15 Nov 2008 22:32:11 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[power analysis]]></category>
		<category><![CDATA[Ran Avinun]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[Synopsys]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=344</guid>
		<description><![CDATA[Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design &#8220;properly&#8221;. Quite an interesting exchange, and I think that Frank is a bit more right in his thinking about virtual platforms and how to use them. Read on for some comments on the exchange.</p>
<p><span id="more-344"></span><br />
The following appears to be to sequence of events:</p>
<ul>
<li>Cadence press release, in September, about their &#8220;<a href="http://www.design-reuse.com/news/19019/power-analysis-pre-rtl-exploration.html">Palladium Incisive Palladium Dynamic Power Analysis and Cadence InCyte Chip Estimator</a>&#8220;, quoting Ran:</li>
<blockquote><p><em>Cadence Incisive Palladium Dynamic Power Analysis enables SoC designers, architects and validation engineers to quickly estimate the power consumption of their system during the design phase, analyzing the effects of running various real software stacks and other real-world stimuli. The new offerings also include the Cadence InCyte Chip Estimator, which can now provide what-if power analysis through exploration of different low-power techniques. The InCyte Chip Estimator also generates automatically the Si2 Common Power Format (CPF), which helps drive architectural power specification and intent into implementation and verification.</em></p></blockquote>
<li>Frank Schirrmeister blogged &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=50">On Chameleons, Low Power and the Marketing Power of Copy Editing</a>&#8220;, basically saying that what Cadence was selling was something that was bound to the RTL level and thus arriving with estimates pretty late in the design process, after most important architecture decisions had been made. Instead, he proposed a flow using <strong>virtual prototypes </strong>that contained a sequence of successively better estimates, from the usual initial spreadsheet to estimates actually derived from RTL later in the process (or for IP blocks that already exist). Synopsys is not alone in this, <a href="http://www.neosera.com">Neosera </a>and <a href="http://www.scdsource.com/article.php?id=82">ChipVision</a> are after similar ideas. I think this approach makes excellent sense, following the idea that getting some kind of approximate feedback from a complete system early in the process is better than getting lots of details from a small part of a system late in the process.</li>
<li>Ran Avinun then blogged a reply to Frank, at &#8220;<a href="http://http://www.cadence.com/Community/blogs/sd/archive/2008/10/30/the-power-of-cadence-system-power-flow-vs-viewing-from-the-top.aspx">The Power of Cadence System Power Flow vs. Viewing from the Top</a>&#8220;. His contention there is that virtual prototypes have their uses, but that real designers will be using hardware accelerators, as that provides the key accuracy needed to do real power work. Also, he sees the creation of a virtual platform as a big problem, and cites a number of cases where running the actual semi-final RTL with power simulation was key to project success. Also, Ran sees the time needed to create a virtual platform as a big obstacle.</li>
<li>Frank then replied to the reply, at &#8220;<a href="http://www.synopsysoc.org/viewfromtop/?p=53">Hammers, Nails and the Spirits That I Called …</a>&#8220;&#8230; where he points out that Ran has some misconceptions about virtual platforms, admits that the Cadence flow works well, but that it does miss the point of early power estimation before the design is too frozen to be much changed. There is a pretty but hard-to-read diagram in the post, from <a href="http://www.design-reuse.com/articles/12728/towards-activity-based-system-level-power-estimation.html">a 2005 article he wrote while at ChipVision in Germany</a>, pointing out the need to evaluate designs with actual test data from the real world.</li>
</ul>
<p>What do I make of all of this?</p>
<h2>Ran&#8217;s Points</h2>
<p>I must admit that I think the Palladium hardware simulation accelerator boxes are very cool pieces of hardware, which at least used to be based on custom logic systems that use several cycles of a fixed sized hardware to simulate multiples of the hardware&#8217;s based emulation capacity (so 10M capacity system can use 10 cycles per target cycle to simulate 100M, for example). However, I do agree with Frank that these are dependent on having actual RTL in place to be of much use.</p>
<p>Another issue with hardware emulators is their overall availability: compared to the number of PCs available in an organization, they are going to be very limited. As discussed in many different forums, a key advantage of a pure virtual platform is that it can turn any programmer&#8217;s PC into a target system running the real target software. Without having to book time on a limited set of physical target machines, and hardware accelerators are such limited-in-supply hardware machines. So a virtual platform is much more available to people within, and especially outside, a design organization. Also, unless you are happy to release RTL for your design to people outside your organization, hardware acceleration is going to do little to help your end users get the most out of your design, pre-silicon.</p>
<p>My final gripe with hardware emulators is their limited scope. They tend to max out at a the borders of a single chip, or less. A virtual platform, on the other hand, has much more room to scale, to include multiple chips, <a href="http://www.virtutech.com/products/simics_accelerator.html">multiple boards</a>, or even <a href="http://www.compactpci-systems.com/articles/id/?3537">complete racks</a> and networks of networks. You cannot really do that in any hardware simulation, as it involves too many billions of gates running too many billions on instructions. The general rule of simulation still applies with hardware acceleration: <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">you need to increase the level of abstraction to handle larger systems</a>.</p>
<p>As to the problem faced by Ran&#8217;s customers, having RTL but no virtual platform: what were they thinking of? Seriously, if you want to do design today a virtual platform should be your starting point, not an afterthought. Time and again, we see examples today where using virtual platforms <a href="http://www.chipdesignmag.com/display.php?articleId=2720&amp;issueId=31">gets chips to customers ahead of time and provides the ability to test ideas before committing to final RTL</a>. It seems that Ran agrees with this need, but his means are different:</p>
<blockquote><p><em>&#8220;As was stated above, big reason our customers use RTL emulation platforms is for accuracy, and while virtual platforms can offer certain performance, eventually the need to accuracy becomes critical and can not be overlooked, even for initial performance and power estimation analysis. Frank seems to forget in his statement above that the average bring-up time of new virtual platforms takes 6-12 months while the average bring-up time of many emulated designs takes days.&#8221;</em></p></blockquote>
<p>The time to create a virtual platform is actually pretty short, if you do it at a sufficently abstract level of detail and don&#8217;t worry too much about cycle accuracy. Also, that bringing up of an emulation depends on having a detailed RTL-level description to start with&#8230; which is not necessarily the case. I must say that the cited six to 12 months for a VP (for a single SoC as discussed here) sounds reasonable to me &#8212; if you are building a cycle-level model that tries to emulate the final timing (<a href="http://jakob.engbloms.se/archives/153">which might not be really feasible at at all</a>). If you work at a higher level of abstraction like loosely-timed TLM, that time shrinks by a factor of ten or so. I agree that in the end, accuracy is critical &#8211; but before you get there, the approximations used by the VP will have gotten you pretty far in terms of software development and architecture testing.</p>
<p>Ran is also afraid of the lack of accuracy:</p>
<blockquote><p><em><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText"> Now, even if you build this platform successfully 9-12 months in advance, how do you know that your virtual platform representing your real design? How do you connect it to your verification and implementation environment and realistic power information? Frank seems to overlook these things. Looking at the analogy of the story described at the blog above, using a system-level platform that is not targeting the actual hardware for performance analysis and power trade-offs guarantees that the Chamelon will become a snake and you will get bitten. </span></em></p></blockquote>
<p>As with all simulations, virtual platforms needs to be used with care and understanding. It might also once again be a matter of system scale: for RTL simulation, you are looking inside a single chip, and the detailed design to save power there. With a VP, you might be looking at whether a particular OS kernel does even care to try to turn off unused hardware at all&#8230; and that might be just as important in the end as being accurate in how functional units turn on and off inside an accelerator.</p>
<p>In today&#8217;s software-driven systems that mostly consist of existing off-the-shelf hardware, not any particular SoC that is being designed right now, the large-scale behavior and smarts of the software in a setting containing lots of chips and functions is far more important than optimizations inside a chip.</p>
<h2>Frank&#8217;s Points</h2>
<p>Since Frank is a virtual platform supporter just like me, I instinctively agree with his points about VPs being pretty fast to develop and available long in advance of actual silicon. I like the way he deals with power in the ARM DevCon presentation cited (do have a look at it), but still there are some lingering doubts and issues&#8230;</p>
<p>What I have a hard time understanding is just how detailed the virtual platforms need to be. The use of SystemC TLM-2.0 LT is sensible for speed, but it seems from the DevCon presentation that the main emphasis is on AT-level (and therefore pretty slow) timing-accurate simulations that look at power cycle by cycle in the target. If that is the case, I think we could almost just as well go get ourselves a hardware accelerator, as cycle-level models (even if transaction-driven)</p>
<p>However, Frank also says this which I cannot but agree with: you should not always run around with a hammer and look at everything like a nail &#8212; any reasonable chip design process needs both virtual platforms and hardware accelerators, one cannot really replace the other:</p>
<blockquote><p><em>When discussing this matter with a friend, he pointed out rightfully so that both Ran’s and my post suffer from “Hammer and Nail-itis”. In fact, he pointed out, the combination of Cadence’s estimators (InCyte), C based synthesis, Palladium, and Synopsys virtual would be pretty powerful! It’s a good thing then that we acquired Synplicity which brought us Synplify high-level synthesis and Confirma FPGA Prototyping to Synopsys, and of course, that we have existing interfaces between our Virtual Platforms and Eve’s solutions. </em></p></blockquote>
<h2>Conclusion</h2>
<p>To me, the lesson from this discussion is clear: A virtual platform should be the starting point of a new design, but once you get down to RTL, hardware acceleration is really pretty useful. You need both, and VP should come first, not second. It is not an either-or issue, rather I expect system and chip designers to use both tools, and the only question is what should come first, which I think is naturally the simulation in the form of a virtual platform. That also allows the chip to be set into a system context, which is otherwise pretty hard before silicon arrives, and something that large system integrators are screaming for.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/344"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/344" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/344" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/344/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cadence on Virtual Prototypes instead of Host Execution</title>
		<link>http://jakob.engbloms.se/archives/308?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/308#comments</comments>
		<pubDate>Sun, 19 Oct 2008 21:40:37 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[Cadence]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=308</guid>
		<description><![CDATA[Cadence technical blogger Jason Andrews wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;Is Host-Code Execution History&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and [...]]]></description>
			<content:encoded><![CDATA[<p>Cadence technical blogger <a href="http://www.cadence.com/community/posts/jasona.aspx">Jason Andrews </a>wrote a short piece a couple of days ago on his perception that host-based execution is becoming unncessary thanks to fast virtual platforms. In &#8220;<a href="http://www.cadence.com/Community/blogs/sd/archive/2008/10/17/is-host-code-execution-history.aspx">Is Host-Code Execution History</a>&#8220;, he tells the story of a technique from long time ago where a target program was executed directly on the host, and memory accesses captured and passed to a Verilog simulator. The problem being solved was the lack of a simulator for the MIPS processor in use, and the solution was pretty fast and easy to use. Quite interesting, and well worth a read.</p>
<p>However, like all host-compiled execution (which I also like to call API-level simulation) it suffered from some problems, and virtual platforms today might offer the speed of host-compiled simulation without all the problems.</p>
<p><span id="more-308"></span></p>
<p>The problems are these:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Most companies that are using host-code execution today use &#8220;explicit access&#8221;.  This means they require all places in the code that access the hardware to call read() and write() functions so every hardware access goes through a common set of functions and then they use #ifdef to change the hardware accesses to call the simulator if they are doing verification with host-code execution. If they are running on the target system, then pointer dereferences are used. </span></p>
<p>&#8230;</p>
<p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">This is where implicit access came in. It provided a way to automatically trap pointer dereferences that were reading and writing to hardware locations and convert the load or store instruction into a simulated read or write. For reads it would put the result into the proper host CPU register and the user had no idea that a line of C code would magically turn into a bus transaction on a Verilog BFM</span></p></blockquote>
<p>Yes, that is a right pain, and I have seen lots of solutions for it, none of which have the elegant simplicity of a processor simulation. The &#8220;implicit access&#8221; system is basically trying to trap memory accesses without overtly changing the source code of a program. I guess the best way to do this is binary instrumentation, but it is still very hard to get to work right and robustly. A simulator is simply much simpler in principle here.</p>
<p>Jason continues later on:</p>
<blockquote><p><span id="anormal_12" class="Cadence_CS_BlogDetail_BlogText">Given the hassle of host-code execution I would prefer to cross compile the software and run the target instruction set. Beyond the implicit or explicit access issue, this also eliminates issues with differences in data type sizes, data structure layout, byte order (endianess) and other differences between the host and target processor. </span></p></blockquote>
<p>That is absolutely true! Jason does not mention the additional fun of what happens when the target is running an OS that is happily fielding interrupts, scheduling software tasks, etc. Also, that having to maintain a separate build target and maybe code variant is very expensive, process-wise. The expense that a good virtual platform incurs can be paid for pretty quickly once such reduced friction costs are factored in.</p>
<p>So I guess I pretty  much agree with all that Jason is saying, and thanks him for mentioning <a href="http://www.virtutech.com/products">Simics</a>. Thanks for the insights into what was done in the 1990s, it always interesting to get pointers to old fundamental and interesting work.</p>
<p>About how the virtual platforms actually work inside: it is not that complicated in principle (but pretty hairy to get it quite right and fast in practice). You have to simplify the timing of the target processor, you have to convert from target processor binaries to host binary format using some kind of just-in-time compilation technique (also called dynamic binary translation or code morphing), and you have to provide some kind of direct access to target memory for the target processor simulation (like the DMI feature in <a href="http://systemc.org">SystemC TLM-2.0</a>, but usually the difficult bits are on the CPU side of that, not the memory side).  The most interesting bit is how to build the surroundign system model to not slow the CPU model down, and for this I can recommend a couple of pieces of writing:</p>
<ul>
<li>My ESC 2008 general intro to the subject of virtual prototypes (<a href="http://www.engbloms.se/presentations/engblom-ESC2008-class410-simulation-slides.pdf">slides</a>, <a href="http://www.engbloms.se/publications/engblom-ESC2008-class410-simulation-paper.pdf">paper</a>)</li>
<li>Virtutech white paper on <a href="http://www.virtutech.com/whitepapers/modeling.html">system modeling </a></li>
</ul>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/308"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/308" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/308" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/308/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simon Kågström, PhD</title>
		<link>http://jakob.engbloms.se/archives/119?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/119#comments</comments>
		<pubDate>Sat, 10 May 2008 07:08:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=119</guid>
		<description><![CDATA[Yesterday, I had the honor of being the opponent at the PhD defense of Simon Kågström at Blekinge Tekniska Högskola (BTH, Blekinge University of Technology in English). His PhD thesis deals mainly with the multiprocessor port of an industrial in-house operating system, and a secondary theme was the design of the Cibyl C-programs-to-JVM translator. All [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-120" style="float: left;" title="bthsmall" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/bthsmall-150x148.png" alt="BTH logo" width="150" height="148" />Yesterday, I had the honor of being the opponent at the PhD defense of <a href="http://www.ipd.bth.se/ska/">Simon Kågström</a> at <a href="http://www.bth.se">Blekinge Tekniska Högskola</a> (BTH, Blekinge University of Technology in English). His <a href="http://www.ipd.bth.se/ska/phd.html">PhD thesis</a> deals mainly with the multiprocessor port of an industrial in-house operating system, and a secondary theme was the design of the <a href="http://code.google.com/p/cibyl/">Cibyl </a>C-programs-to-JVM translator. All of his papers are very well-written and a joy to read, and the engineering work behind it is very solid.</p>
<p>The most important data in the PhD thesis is really just how much work it is to do an SMP port of an OS kernel. And how hard it is to get performance up to good levels even with several years of work. Really emphasizes the point that hard work and perseverance and just lots of calendar time is what it takes to create a good SMP OS. That&#8217;s why Solaris and AIX are still years ahead of Linux in this respect &#8212; you just need to hit the snags, fix them, retest, and hit the next snag. It takes time to polish, basically.</p>
<p>So, if you have any interest in multiprocessor operating systems, Simon&#8217;s work is well-worth a read. Also check out his blog at <a href="http://simonkagstrom.livejournal.com/">http://simonkagstrom.livejournal.com/</a>.  And by the way, he did pass.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/119"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/119" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/119" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/119/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Grant Martin on Manycore Multicore MPSoC AMP SMP Multi-X&#8230;</title>
		<link>http://jakob.engbloms.se/archives/114?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/114#comments</comments>
		<pubDate>Sat, 03 May 2008 19:23:45 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[multicore]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=114</guid>
		<description><![CDATA[Grant Martin is a nice fellow from Tensilica who has a blog at ChipDesignMag. In a recent post, he raises the question of nomenclature and taxonomy for multicore processor designs: &#8230;the discussion, and the need to constantly define our terms (and redefine them, and discuss them when people disagree) makes me wish that the world [...]]]></description>
			<content:encoded><![CDATA[<p>Grant Martin is a nice fellow from Tensilica who has a <a href="http://www.chipdesignmag.com/martins/">blog at ChipDesignMag</a>. In a <a href="http://www.chipdesignmag.com/martins/?p=5">recent post</a>, he raises the question of nomenclature and taxonomy for multicore processor designs:</p>
<blockquote><p>&#8230;the discussion, and the need to constantly define our terms (and redefine them, and discuss them when people disagree) makes me wish that the world of electronics, system and software design had some agreement on what the right terms are and what they mean&#8230;</p></blockquote>
<p>I think this is a good idea, but we need to keep the core count out of it&#8230;</p>
<p><span id="more-114"></span></p>
<p>The reason for the confusion of terms and the strong will to create new terms all the time is really that people feel that there is a real difference between a dual-core x86 processor used in a laptop and a highly integrated 100-core-or-more embedded design for traffic processing in a large switch. And for that reason, they want to define a term to define themselves out of the mainstream desktop/server space with a few large cores.</p>
<p>But the number of cores is probably the least useful parameter to use as a differentiator. If 4 cores is multicore and 32 cores manycore today, in a few years time the decrease in feature width will have moved 32 cores into multi and 128 cores into many&#8230; etc. So that is really something is bound to change over time.</p>
<p>I think that rather we need to look at other aspects of a chip design, in particular those that are not just straight multiplication of features. Those aspects that really matter to the kinds of programs the chip takes nicely to, and that architects have to think hard about.</p>
<p>Programming models are not the right answer to this. As Grant says, programming models need to be put in a taxonomy of its own:</p>
<blockquote><p>A kind of taxonomy of multicore related terms, together with a taxonomy of programming models (SMP, AMP, etc.) that everyone could be referred to when these discussions are held and that everyone could begin to build a consensus around would be of great value to all.</p></blockquote>
<p>If nothing else, we all know that any programming model can be put onto pretty much any piece of silicon, given a sufficiently thick layer of middleware. It might not be the most efficient way to program any particular hardware in terms of hardware resources used, but someone is going to do it anyway.</p>
<p>So what is left in the chip taxonomy?</p>
<p>I think we need to look at things like where memories are located (global, local to each core, shared by a small group), number of levels of memories, whether they are caches or program-controlled. How interrupts and IO are routed is another interesting aspect. Can any core do anything, or do we have master nodes that can do more things? Are all cores equal in terms of performance and computational ability, or do they differ?</p>
<p>As Grant says, a great subject for academia to dig into.</p>
<p>The comments at the end of the post about some secret activities from the Multicore Association by Markus Levy makes me agree with Grant: please get the ideas and drafts out into the open, and make sure to get the widest input possible!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/114"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/114" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/114" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/114/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sun buys Montalvo</title>
		<link>http://jakob.engbloms.se/archives/113?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/113#comments</comments>
		<pubDate>Mon, 28 Apr 2008 10:14:57 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[multicore]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=113</guid>
		<description><![CDATA[Sun just bought Montalvo whose hardware I blogged about some while ago. And just like the Apple acquisition of PA Semi, the question of &#8220;why&#8221; appears. Some analysts blame the simple fact that both Montalvo and PA Semi simply needed to be acquired, since their venture capitalists did not want to put in the next [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-78" style="float: left; margin: 10px;" src="http://jakob.engbloms.se/wp-content/uploads/2008/02/montalvo-fg.gif" alt="" width="150" height="94" /></p>
<p>Sun just bought Montalvo whose hardware I blogged about some while ago. And just like the Apple acquisition of PA Semi, the question of &#8220;why&#8221; appears. Some analysts blame the simple fact that both Montalvo and PA Semi simply needed to be acquired, since <a href="http://venturebeat.com/2008/03/20/montalvo-seeking-a-hoard-of-cash/">their venture capitalists did not want to put in the next 100 million USD needed to go to silicon (Montalvo)</a> or really expand on the opportunity already at hand (PA Semi). Here is my crazy guess.</p>
<p><span id="more-113"></span></p>
<p>Look at the following:</p>
<ul>
<li>Sun has seen great success with the UltraSparc T line of processors, which are basically &#8220;lots of simple cores on a single chip for thread-parallel applications&#8221;.</li>
<li>Sun is investing in Solaris for x86 and has great success with its x86-based servers (based on AMD processors).</li>
<li>Montalvo is building something quite similar to &#8220;lots of simple cores on a single chip&#8221; for x86. Which should run Solaris-x86 and most other x86 operating systems.</li>
<li>Sun has been buying companies and key components for a while now (AMD processors, Fujitsu processors, the company Afara that created the UltraSparc T line).</li>
</ul>
<p>So my guess is&#8230; based purely on technological similarities and no indirect approaches and conspiracy theories. It assumes that Sun does want to make use of Montalvo&#8217;s tech as it currently stands:</p>
<ul>
<li>Sun buys Montalvo to build x86-based UltraSparc T-style machines for throughput computing. Nice complement to the current high-single-thread-performance AMD-based x86 machines.</li>
</ul>
<p>Note 1: The indirect approach theory here is that Sun wants to use <a href="http://www.silobreaker.com/DocumentReader.aspx?Item=5_842234139">Montalvo to put cost pressure on AMD,</a> just like there is <a href="http://valleywag.com/382944/steve-jobs-buys-pa-semi-for-a-chip-++-a-bargaining-chip">speculation that Apple is going to use PA Semi to put cost pressure on Intel.</a></p>
<p>Note 2: This would also put Sun into direct chip competition with Intel Atom-based designs&#8230; which might be slightly less clever. Never mind, do it anyway <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Note 3: I have no insider on anything on this, this is totally based on speculation from public tech facts.</p>
<p>Note 4: Similar ideas are bandied in a <a href="http://venturebeat.com/2008/04/03/sun-microsystems-could-use-montalvo-as-a-strategic-lever-against-intel/">rumor comment at VentureBeat from early April 2008:</a></p>
<blockquote><p>But Sun could also choose to avoid a fight with Intel, using the patents to protect itself and to employ the techniques for power savings in its own future SPARC microprocessor offerings. Sun’s most ambitious processors already employ many equal-sized cores on a single chip; the asymmetric architecture of Montalvo’s chips might add interesting capabilities to Sun’s SPARC line-up. In any case, Sun could be picking up the assets at a fire sale price and using them for strategic leverage.</p></blockquote>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/113"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/113" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/113" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/113/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux KVM for IBM Mainframes</title>
		<link>http://jakob.engbloms.se/archives/101?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/101#comments</comments>
		<pubDate>Thu, 10 Apr 2008 12:17:12 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/101</guid>
		<description><![CDATA[There was an interesting little note at the CodeMonkey blog&#8230; basically, the Linux kvm kernel hardware virtualization support system now works on IBM z series mainframes. Using the z architecture virtualization support in hardware.  Nice to see some attention being put on non-x86 architectures. And a nice historical note that current x86 virtualization extensions were [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www-03.ibm.com/systems/z/hardware/i/z10_110x110.jpg" align="left" height="110" hspace="10" vspace="10" width="110" />There was an interesting little note at the <a href="http://blog.codemonkey.ws/2008/04/kvm-for-mainframe.html ">CodeMonkey blog</a>&#8230; basically, the Linux kvm kernel hardware virtualization support system now works on IBM z series mainframes. Using the z architecture virtualization support in hardware.  Nice to see some attention being put on non-x86 architectures. And a nice historical note that current x86 virtualization extensions were indeed inspired by the s/370 architecture from the mid-1970s. Cool.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/101"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/101" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/101" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/101/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Off-Topic: Studying Malware Analysis at HUT.fi</title>
		<link>http://jakob.engbloms.se/archives/71?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/71#comments</comments>
		<pubDate>Sun, 03 Feb 2008 20:18:45 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[off-topic]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/71</guid>
		<description><![CDATA[The F-Secure weblog is one of my regular reads, and today they presented one of the coolest industry-academia items for a long time: F-Secure are teaching an entire course at the Helsinki University of Technology, called &#8220;Malware Analysis and Antivirus Technologies&#8221;. Kudos to F-Secure for the time and money that must have gone into doing [...]]]></description>
			<content:encoded><![CDATA[<p> The F-Secure weblog is one of my regular reads, and today they presented one of the coolest industry-academia items for a long time: F-Secure are teaching an entire course at the <a href="http://www.f-secure.com/weblog/archives/00001370.html">Helsinki University of Technology, called &#8220;Malware Analysis and Antivirus Technologies&#8221;. </a>Kudos to F-Secure for the time and money that must have gone into doing that!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/71"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/71" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/71" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/71/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blog tip: The Wonderful World of Early Computing</title>
		<link>http://jakob.engbloms.se/archives/70?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/70#comments</comments>
		<pubDate>Fri, 25 Jan 2008 19:07:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[history]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/70</guid>
		<description><![CDATA[There is a nice blog post over at Neatorama with many pictures of early computers. The material is nothing new to someone familiar with computing history, but the pictures collected are very nice indeed. Tweet]]></description>
			<content:encoded><![CDATA[<p>There is a nice blog post over at <a href="http://www.neatorama.com/2008/01/25/the-wonderful-world-of-early-computing/">Neatorama</a> with many pictures of early computers. The material is nothing new to someone familiar with computing history, but the pictures collected are very nice indeed. <a href="http://www.neatorama.com/2008/01/25/the-wonderful-world-of-early-computing/"><br />
</a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/70"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/70" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/70" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/70/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Brilliant Virtualization Comic</title>
		<link>http://jakob.engbloms.se/archives/68?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/68#comments</comments>
		<pubDate>Fri, 18 Jan 2008 20:32:36 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[off-topic]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/68</guid>
		<description><![CDATA[I&#8217;ve never seen the comics at xkcd.com before, but they are really quite brilliant nerdy comics. Liking virtualization and simulation, I found number 350 at http://xkcd.com/350/ especially fun. And note that that is what some serious researchers are doing, using virtual machines as active honey pots (&#8220;honey monkeys&#8220;) to go out and contract infections by [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://xkcd.com/" align="left" /><img src="http://imgs.xkcd.com/static/xkcdLogo.png" align="left" height="83" hspace="10" width="185" />I&#8217;ve never seen the comics at <a href="http://xkcd.com">xkcd.com</a> before, but they are really quite brilliant nerdy comics.  Liking virtualization and simulation, I found number 350 at <a href="http://xkcd.com/350/">http://xkcd.com/350/</a> especially fun. And note that that is what some serious researchers are doing, using virtual machines as active honey pots (&#8220;<a href="http://research.microsoft.com/HoneyMonkey/">honey monkeys</a>&#8220;) to go out and contract infections by actively searching the web with machines in various stages of patching.</p>
<p><span id="more-68"></span><img src="http://imgs.xkcd.com/comics/network.png" height="414" width="740" /></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/68"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/68" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/68" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/68/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Multithreading Game AI</title>
		<link>http://jakob.engbloms.se/archives/64?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/64#comments</comments>
		<pubDate>Tue, 01 Jan 2008 13:01:23 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[games]]></category>
		<category><![CDATA[multicore]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/64</guid>
		<description><![CDATA[Over at an online publication called AI Game Dev, there is an elucidating post on how to do multithreading of game AI code (posted in June 2007). Basically, the conclusion is that most of the CPU time in an AI system is spent doing collision detection, path finding, and animation. This focus of time in [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://files.aigamedev.com/MASCOT.jpg" align="left" height="120" hspace="10" width="96" />Over at an online publication called AI Game Dev, there is an <a href="http://aigamedev.com/questions/multi-threading-strategies">elucidating post on how to do multithreading of game AI code</a> (posted in June 2007). Basically, the conclusion is that most of the CPU time in an AI system is spent doing collision detection, path finding, and animation. This focus of time in a few domain-given hot spots turns the problem of parallelizing the AI into one of parallelizing some core supporting algorithms, rather than trying to parallelize the actual decision making itself. The key to achieving this is to make the decision-making part able to work asynchronously with the other algorithms, which is not trivial but still much easier than threading the decision making itself. The threading of the most time-consuming parts turns into classic algorithm parallelization, which is more familiar and easier to do than threading general-purpose large code bases.  A good read, basically, that taught me some more about parallelization in the games world.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/64"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/64" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/64" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/64/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mark Nelson&#8217;s Multicore Non-Panic and Embedded Systems</title>
		<link>http://jakob.engbloms.se/archives/59?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/59#comments</comments>
		<pubDate>Fri, 07 Dec 2007 20:44:46 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/59</guid>
		<description><![CDATA[Via thinkingparallel.com I just found an interesting article from last Summer, about the actual non-imminence of the end of the computing world as we know it due to multicore. Written by Mark Nelson, the article makes some relevant and mostly correct claims, as long as we keep to the desktop land that he knows best. [...]]]></description>
			<content:encoded><![CDATA[<p>Via <a href="http://thinkingparallel.com">thinkingparallel.com</a> I just found an interesting article from last Summer, about the actual non-imminence of the end of the computing world as we know it due to multicore.  <a href="http://marknelson.us/2007/07/30/multicore-panic/">Written by Mark Nelson, the article makes some relevant and mostly correct claims</a>, as long as we keep to the desktop land that he knows best.  So here is a look at these claims in the context of embedded systems.<br />
<span id="more-59"></span><br />
1. On the desktop, current few-way multicore solutions do seem to give immediate benefit thanks to the vast number of threads executing as background tasks and similar in a modern Windows installation.</p>
<p>2. A few more ways of multicores will be gobbled up by eyecandy work as Linux, Windows, and OS X keep fighting on what OS looks the best. And this means lots of easily parallelized threads.</p>
<p>3. Long-term, things look bleaker. How do we make use of a 32-way or even 128-way general-purpose machine?</p>
<p>In the embedded systems that I know and love, claim 1 certainly holds up in many cases.  Control-plane applications in core network and telecom systems do feature piles of threads today, and can quite easily be scaled out onto a few cores using SMP.  This is what ARM has also been advocating is the case for most of the mobile phone workloads that today run on single ARM cores.  Using an ARM multicore will work fine up to four cores, since there is ample threads to go around inside a modern phone.  All you need is the OS to be SMP capable, and that seems to be finally happening with the last big RTOSes announcing SMP versions this fall.</p>
<p>Note that there is a different way of using initial multicores in the embedded world, by consolidating what used to be several processors onto a single chip.  Basically, using a dualcore processor as a natural replacement for two singlecore processors, pretty much running the same workload. This scenario uses two (or more) different operating systems in AMP mode (see <a href="http://jakob.engbloms.se/archives/22">http://jakob.engbloms.se/archives/22</a>).  In this way, it is quite likely that quite a few systems can take advantage 2, 3, 4, and maybe even 8-way systems without much work.</p>
<p>Claim 2 makes no sense in the embedded field.  At least not in the sense that &#8220;your platform software provider will add end-user benefits that eat up more CPU and that does not require you to update your own code&#8221;.  Maybe you could claim this for mobile phones, but mainstream mobile phone OSes like Symbian have not exactly been aggressive on this front.  People don&#8217;t seem to be looking for eye candy of that kind in phones &#8212; currently at least (update: see comment on this, the iPhone could be changing this tenet).</p>
<p>Claim 3 is applicable.  At least for control-oriented applications that run on general-purpose shared-memory machines.  Unless you count on a continuation of the consolidation trend: imagine a system where you combine more and more boards from a current rack onto a single chip, or add &#8220;more boards&#8221; by adding in more AMP operating system instances.  It makes sense, since in many cases the actual applications feature ample parallelism that today is exploited by using multiple boards or discrete processing units working in close cooperation to handle the volumes of work present.</p>
<p>For media and radio interface applications, you have a real easy time to use &#8220;any&#8221; amount of parallelism.  But that is more similar to the GPUs used in current PCs than the case for the main processor(s) which is being discussed here.</p>
<p>Long-term, PC/desktop/server computing and embedded computing do have some common challenge of using many cores effectively.  But the advantage of embedded computing is that most application domains are effectively parallel by nature, and &#8220;all&#8221; you have to do is find a way to move that parallelism onto a single chip.</p>
<p>His final statement is that:</p>
<blockquote><p> Our industry press thrives on a good crisis. The switch to multicore processors has presented the brain trust with the opportunity to drum up a convincing one, and they haven’t let us down. Just try to take it with a grain of salt. The crises we’ve had in the past have mostly been resolved with boring, step-wise evolution, and this one will be no different. Maybe 15 or 20 years from now we’ll be writing code in some new transaction based language that spreads a program effortlessly across hundreds of cores. Or, more likely, we’ll still be writing code in C++, Java, and .Net, and we’ll have clever tools that accomplish the same result.</p></blockquote>
<p>I think he is right about this, and that the end result will be a set of fairly ugly domain-specific frameworks that makes parallel programming reasonably easy. Just like GUI coding frameworks popped up when GUIs were new, relieving you of the tediousness of writing all the plumbing code. But it took a few years to nail down what was to go into a framework and their</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/59"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/59" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/59" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/59/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>FTF Paris: Debug connections threat to secure network devices</title>
		<link>http://jakob.engbloms.se/archives/38?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/38#comments</comments>
		<pubDate>Thu, 11 Oct 2007 12:18:34 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/38</guid>
		<description><![CDATA[In a report from FTF Paris 2007, Info World makes some interesting comments on security and locking-down of mobile devices. Info World Â» Blog Archive Â» â€˜Flat IPâ€™ mobile networks face new security challenges: Freescale demonstrated a hardware reference platform with a number of security features for future mobile devices, its i.MX31 and i.MX31L multimedia [...]]]></description>
			<content:encoded><![CDATA[<p>In a report from FTF Paris 2007, Info World makes some interesting comments on security and locking-down of mobile devices.<a href="http://infoworld.bareinfo.com/archives/778"> Info World Â» Blog Archive Â» â€˜Flat IPâ€™ mobile networks face new security challenges:</a></p>
<blockquote><p><span id="more-38"></span><em>Freescale demonstrated a hardware reference platform with a number of security features for future mobile devices, its i.MX31 and i.MX31L multimedia applications processors. Based on the Arm 11 core designed by Arm Holdings, the chips have a run-time integrity checker that verifies the digital signature of code before executing it. This can help stop malware sneaking onto the device â€” although it could also be used to lock down a mobile device and prevent the installation of third-party applications, much as Apple has attempted to do with its iPhone.</em></p></blockquote>
<blockquote><p><em>Prototypes are often designed with additional standard circuitry to make it easier to observe their behavior under test. Probes applied to that circuitry, known as a JTAG interface, can even be used to issue debugging instructions to the microprocessor. The connections for the prototypeâ€™s JTAG interface often survive â€” in different positions on the circuit board â€” right through to final production. Identifying where these points were located on Appleâ€™s iPhone became one of the goals of those trying to unlock the devices as access to it might have allowed them to debug Appleâ€™s security code.</em></p></blockquote>
<p>This is the same concern that was expressed in <a href="http://www.strombergson.com/kryptoblog/">Strombergsons </a>comments to <a href="http://jakob.engbloms.se/archives/17">my post on hardware support for parallel programming</a>. Basically, that remnants of debug support for the development phase can be used in deployment to hack into the device.</p>
<p>And that is pretty hard to get around if you assume you want to do debugging on the device. Which I guess is needed for almost all devices, if nothing else in order to analyze performance on actual hardware. Otherwise, I do believe that virtual prototype platforms and simulators like Simics is a key technology to develop safe applications safely &#8212; using a virtual system for debug, you have no need for debug backdoors on the final hardware. That backdoor is only there in the virtual hardware, not in the physical manifestation of the hardware. Of course, it is then key that the virtual debugger cannot be used by bad guys to break into the software. I think that can be worked around by making it impossible to get a complete system image off of a target system, which is back to physical security.</p>
<p>I think we need some kind of thinking akin to the key tenet of crypto theory, that information should be safe even if all algorithms and mechanisms involved in encrypting it is known. The secrecy of the key is all that is needed. In the same vein, we need to ensure that a piece of software running on a particular piece of hardware is protected against access and intrusion, if the attacker gets access to the source code of the program and a simulator for the hardware or even debug hardware. There has to be some kind of &#8220;key&#8221; mechanism that can be used to ensure this.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/38"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/38" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/38" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/38/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Power.Org Dev Con: C Domination a Problem for Multicore</title>
		<link>http://jakob.engbloms.se/archives/35?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/35#comments</comments>
		<pubDate>Sun, 30 Sep 2007 08:34:01 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/35</guid>
		<description><![CDATA[I just read a EETimes report from a panel at the Power.org Developers Conference (actually, it is more accurately called the Power Architecture Developers Conference, of PADC), about programming multicore processors for the embedded market. Note that I was not there in person, so I can only take the few quotes in the article and [...]]]></description>
			<content:encoded><![CDATA[<p>I just read a <a href="http://www.eetimes.com/news/latest/showArticle.jhtml;?articleID=202102427">EETimes report</a> from a panel at the <a href="http://www.power.org/devcon/07">Power.org Developers Conference</a> (actually, it is more accurately called the Power Architecture Developers Conference, of PADC), about programming multicore processors for the embedded market. Note that I was not there in person, so I can only take the few quotes in the article and comment on them. The main conclusions are that:</p>
<ul>
<li>C/C++ is going to be the dominant language for embedded for the near future. Nothing really surprising at that.</li>
<li>C/C++ being dominant means that parallelism in multicore processors, especially shared-memory systems, will be harder to exploit. That is certainly true.</li>
<li>Tool vendors have no good idea about what to do next.</li>
<li>You cannot expect to get traction with a new language.</li>
</ul>
<p>In a sense, blaming the market for not having the good sense to adapt new tools to tackle multicore.</p>
<p>I don&#8217;t think things have to be that bleak.</p>
<p><span id="more-35"></span></p>
<p>I do believe that there are some hopeful points that indicate that engineers and vendors are beginning to move towards ways to explore multicore:</p>
<p>1. Increased availability of shared-memory APIs like OpenMP and Pthreads is happening, even if that does not really help much considering that shared-memory programming itself is pretty broken.</p>
<p>2. I do not quite agree with the quote by Erik Heilikka that embedded is a trailing-end adopter of new programming languages:</p>
<blockquote>
<blockquote><p><em>&#8220;The inability of C/C++ code to parallelize coupled with its ubiquity throughout the embedded market is a major issue for multi-core going forward,&#8221; Heikkila wrote in a follow up email to EE Times. &#8220;Any alternative parallel programming languages certainly won&#8217;t materialize in the embedded market, but instead will more likely gain momentum in a more mainstream computing market before making its way into embedded applications,&#8221; he added.</em></p></blockquote>
</blockquote>
<p>Au contraire, the embedded market have many times adopted good solutions that are very different from mainstream software practice.</p>
<ul>
<li>High-level model-driven tools like Matlab/Simulink and NI LabView are starting to generate parallel code from a higher-level abstraction. This is one very important way forward.</li>
<li>The use of UML and SDL and similar modeling/programming languages in the telecom sector should open for better code generators straight to parallel code.</li>
<li>Erlang is a good example of a good language for parallel systems that started in embedded and is now going mainstream.</li>
</ul>
<p>So thanks to the domain-specific nature of most embedded work, tools can be created that support the creation of parallel implementations without forcing a programmer to care about shared memory and C. And the tools already exist.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/35"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/35" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/35" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/35/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comment on Joel Spolsky and Programming to &#8220;Moores Law&#8221;</title>
		<link>http://jakob.engbloms.se/archives/32?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/32#comments</comments>
		<pubDate>Sat, 29 Sep 2007 13:49:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[blog commentary]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/32</guid>
		<description><![CDATA[Joel Spolsky is always worth a read, and in his post Strategy Letter VI he has a lot of smart things to say about how to consider programming. His basic message is that if you optimize your code too much to work well and fit in the memory of a current machine, by the time [...]]]></description>
			<content:encoded><![CDATA[<p>Joel Spolsky is always worth a read, and in his post <a href="http://www.joelonsoftware.com/items/2007/09/18.html">Strategy Letter VI</a> he has a lot of smart things to say about how to consider programming. His basic message is that if you optimize your code too much to work well and fit in the memory of a current machine, by the time that you are done, you find yourself run over by competitors that just assumed machines would be faster and used the same programming time to implement cooler products.</p>
<p>I just have to take issue with this.</p>
<p><span id="more-32"></span><br />
The assumption behind this is that CPU power and memory sizes just keep increasing as time goes on, and that you can take this for granted. But that is exactly what is not happening right now, if you count single-thread performance. You have to be parallel to get this benefit. For me, this means that I would be a bit more careful, unless I was certain that my system and my programmers were really clear on how to make the application in question scale out well as cores multiply but single-thread performance stays the same.</p>
<p>To be fair, Joel also points out that a big breakthrough for him would be a better way to do Javascript that increases programmer efficiency and avoids cross-platform difficulties (use some other language frontend and generate Javascript optimized for a particular browser platform). That kind of software innovation sounds a whole lot more plausible to me as a bet to just keep programming bloated inefficient Javascript.</p>
<p>But on the idea that faster hardware will let horribly large Javascript programs run better: I have a hard time seeing that, if more compute power is delivered by parallel machines rather than faster single-thread performance. Parallel Javascript seems like a really bad idea. Threaded programs running inside my browser? Written in a horrible language like Javascript? I can just see it crashing and crashing and crashing all the time. No thanks.</p>
<p>Another problem with Joel&#8217;s idea is that sometimes hte hardware platform you target exists now and is fixed. If you code for a particular mobile phone that is on the market currently, you cannot expect it to magically improve its speed over time. It is here and you have to live with that. Same thing for certain embedded applications where hardware remains stable for a long time due to certification and safety concerns. Not everyone can enjoy continuous hardware upgrades&#8230;</p>
<p>Finally, sandboxing is GOOD. Not bad as Joel says. If there is one thing that it makes sense to &#8220;waste&#8221; processor cycles and memory on, it is security. Isolating software from other software and keeping code coming off of the Internet tightly locked up in a little sandbox is a good thing. If we did things more like that, we would have much fewer security problems.</p>
<p>I agree with Steve Gibson of grc.com and Security Now &#8212; Javascript is a very bad idea from a security perspective. But it does make for nice dancing applications on the web. Too bad we cannot both have our cake and eat it, too.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/32"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/32" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/32" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/32/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

