<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; memory bandwidth</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/memory-bandwidth/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>&#8220;Nulticore Effect&#8221;</title>
		<link>http://jakob.engbloms.se/archives/447?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/447#comments</comments>
		<pubDate>Tue, 09 Dec 2008 19:50:08 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[Embarrassingly Parallel]]></category>
		<category><![CDATA[IEEE Spectrum]]></category>
		<category><![CDATA[Jack Ganssle]]></category>
		<category><![CDATA[manycore]]></category>
		<category><![CDATA[memory bandwidth]]></category>
		<category><![CDATA[Sandia Labs]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=447</guid>
		<description><![CDATA[Jack Ganssle wrote a column about the failure of multicore to scale, based on an article in IEEE Spectrum. He makes the following claim: Now a study in IEEE Spectrum shows that even for the classic embarrassingly parallel problems like weather simulations multicore offers little benefit. The curve in that article is priceless. As the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-270" title="onoff" src="http://jakob.engbloms.se/wp-content/uploads/2008/09/onoff.png" alt="" width="72" height="70" />Jack Ganssle <a href="http://www.embedded.com/columns/breakpoint/212300032">wrote a column about the failure of multicore to scale</a>, based on an <a href="http://www.spectrum.ieee.org/nov08/6912">article in IEEE Spectrum</a>. He makes the following claim:</p>
<blockquote><p>Now a <a style="font-weight: bold;" href="http://www.spectrum.ieee.org/nov08/6912">study in IEEE Spectrum</a> shows that even for the classic embarrassingly parallel problems like weather simulations multicore offers little benefit. The curve in that article is priceless. As the number of cores grow from two to 64 performance plummets by a factor of five. Additional processors nullify each other.</p>
<p>Call it the <span style="font-weight: bold;">Nulticore Effect.</span></p>
<p><span id="more-447"></span></p></blockquote>
<p>I think that Jack misunderstood some of the article. What it really says, as far as I can tell, is that certain types of applications will have problems with the lower external memory bandwidth per core afforded by a 16-way or 32-way multicore based on traditional processor architectures.</p>
<p>As I read it, regular classic &#8220;embarrassingly parallel&#8221; (or as Grant Martin would say, &#8220;proudly parallel&#8221;) problems can be handled by managing data location and computation location carefully to colocate data and code, which lends itself to on-chip caching and probably also local-memory architectures.</p>
<p>When other problems that are less regular are going to run into the memory bandwidth wall:</p>
<blockquote><p>But an increasing number of important science and                 engineering problems—not to mention national security                 problems—are of a different sort. These fall under the                 general category of informatics and include calculating                 what happens to a transportation network during a                 natural disaster and searching for patterns that predict                 terrorist attacks or nuclear proliferation failures.                 These operations often require sifting through enormous                 databases of information.</p></blockquote>
<p>So while I think the Sandia people have a very good point to make, it is not the end of the usefulness of multicore. It is only the case for bandwidth-intense irregular algorithms, while many systems today make good use of hundreds of cores without a problem. Also, the research in IEEE Spectrum proposes a solution in the form of stacked memory &#8212; so what we really have is a bit of PR for a particular kind of architecture&#8230;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/447"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/447" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/447" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/447/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SiCS Multicore Days: The Debate Points</title>
		<link>http://jakob.engbloms.se/archives/283?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/283#comments</comments>
		<pubDate>Fri, 19 Sep 2008 20:14:24 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[homogeneous]]></category>
		<category><![CDATA[memory bandwidth]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[panel discussion]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=283</guid>
		<description><![CDATA[It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information. [...]]]></description>
			<content:encoded><![CDATA[<p>It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.</p>
<p><span id="more-283"></span></p>
<p>What was quite striking this year was the greater difference of opinion between the speakers. I guess that in 2007, most of the discussion was on the level of &#8220;ouch, here comes multicore and what are we going to do about it&#8221;. This year, we got a bit deeper and with one more year of experience and massive research work, the collective world of multicore have made some progress and gained insights. And that&#8217;s when the differences start to show up; the fact that we have differences of opinion tells us that we are starting to dig into details and turning up different answers due to different viewpoints and user experiences.</p>
<p>So where were the differences this time?</p>
<ul>
<li>Heterogeneous vs homogeneous cores (on a single chip). Kunle Olukotun clearly supported the heterogeneous style (which is what you with Sun&#8217;s Niagara that he designed the basis for). Erik Hagersten was more interested in the difference between thin and fat cores of the same basic ISA, and Anant Agarwal was strongly in favor of completely homogeneous systems (which is what they build at Tilera). In my biased view, I think the argument for heterogeneous in pure energy efficiency is always going to prevail. See some of my previous blog posts on this topic, for some background:
<ul>
<li><a href="http://jakob.engbloms.se/archives/222">DNS Hardware Acceleration</a>.</li>
<li><a href="http://jakob.engbloms.se/archives/157">Interview with Kunle Olukotun at the Register</a>.</li>
<li><a href="http://jakob.engbloms.se/archives/44">Homogeneous vs heterogenous</a>.</li>
<li><a href="http://jakob.engbloms.se/archives/90">Homogeneous vs heterogeneous, continued</a>.</li>
<li><a href="http://jakob.engbloms.se/archives/80">IBM Z6 accelerators</a>.</li>
<li><a href="http://jakob.engbloms.se/archives/77">Montalvo and heterogeneous x86</a>.</li>
</ul>
</li>
<li>Domain-specific vs general-purpose programming languages. The same sides here, with Kunle advocating domain-specific languages, and Anant and David Padua more in the general-purpose camp. I like domain-specific better, it seems to rhyme more with what I see people actually doing today to increase programming productivity overall.</li>
<li>Memory bottleneck or not? The most interesting discussion came when memory bandwidth and cache sizes were discussed. One quite common school of thought over the past few years teach that caches per core will shrink, and bandwidth to get data into and out of a chip is going to be a severe restriction on what can be done. Not all in the panel agreed with this, there was the idea (mostly from Kunle) that in some way the massive bandwidths and low latencies achievable within a chip (compared to between chip in a classic discrete-processors multiprocessor) could make this less of a problem. Personally, I think this is going to be some kind of problem, but maybe not as much as passing data around faster might reduce the need to store it temporarily. Despite the need for more bandwidth, nobody really agreed with Erik&#8217;s thought that maybe it makes sense to build chips that do not max out on the number of cores they contain, but rather try to balance core count with achievable IO bandwidth. That idea has some merit.</li>
<li>Core counts. Moore&#8217;s law tells us there are going to be thousands of cores on a chip fairly soon&#8230; but if we do not manage to make good use of them, maybe the growth in core counts will slow soon. Putting four or six or eight cores into a general-purpose system makes sense today, but more than that might turn out to be a waste for the vast majority of users that do not have problems to solve and programs to run that can make of more than that. In the same sense, maybe it is better with slightly fewer more powerful cores than a maximum amount of minimalistic cores, considering the state of software available today. So it sounds like a fairly divergent future here.</li>
<li>Shared memory or local memories? Most of the seemed to be in the camp proposing that shared memory is too convenient not to have, even when it really is bad for you. Several bad jokes comparing shared memory to alcohol, and the moderator of the panel suggesting that a good way to avoid the hangover of shared memory is to stay drunk&#8230; whatever that means in practice.</li>
</ul>
<p>Somethings were generally agreed upon, though.</p>
<ul>
<li>Programming is an issue, shared-memory or local-memory or whatever. the idea for the solution varied, however, as discussed above.</li>
<li>Cores will still be plentiful and that operating-systems focusing on sharing time on a single very valuable core is an idea of the past. The keyword for the future is spatial sharing and reducing the overhead of management (I have some previous blog posts on this topic, especially on the <a href="http://jakob.engbloms.se/archives/58">subject of IMA</a> and <a href="http://jakob.engbloms.se/archives/123">real-time control when cores are free</a>).</li>
<li>Virtualization and isolating partitions of a multicore chip from each are necessary mechanisms. Running multiple different operating systems on a single chip will be quite normal, probably under the control of some global hypervisor.</li>
</ul>
<p>Any comments on this from my small audience? I think the topics under discussion are quite fascinating and the kind of issues on which the success of major chip design projects will be decided. A good architecture with a good programming model has a great chance of success (as long as it looks like a continuation of something existing <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ).</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/283"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/283" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/283" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/283/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

