<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; kunle olukotun</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/kunle-olukotun/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>The JVM as Universal Parallel Glue?</title>
		<link>http://jakob.engbloms.se/archives/264?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/264#comments</comments>
		<pubDate>Fri, 12 Sep 2008 20:45:08 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jvm]]></category>
		<category><![CDATA[kunle olukotun]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=264</guid>
		<description><![CDATA[The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first. Kunle Olukotun&#8216;s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-265" style="margin: 5px 10px;" title="javalogo" src="http://jakob.engbloms.se/wp-content/uploads/2008/09/javalogo.png" alt="" width="40" height="74" /><em>The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first. </em></p>
<p><a href="http://ogun.stanford.edu/~kunle/">Kunle Olukotun</a>&#8216;s presentation on the work of the <a href="http://ppl.stanford.edu/">Stanford Pervasive Parallelism lab </a>included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.</p>
<p>But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.</p>
<p><span id="more-264"></span></p>
<p>And my conclusion is that the Java Virtual Machine might be the best candidate. Not the JVM as it is today, though. Here is my idea:</p>
<ul>
<li>The Sun Java JDK and its optimized HotSpot VM is now open-source, thanks to the <a href="http://www.openjdk.org/">OpenJDK</a>. This opens the door to new innovation based on solid technology.</li>
<li>The HotSpot is a pretty good VM, and therefore other languages are starting to use it as a potential backend. For example, Python can be compiled to the JVM, as can <a href="http://en.wikibooks.org/wiki/Ada_Programming/Platform/VM/Java">Ada</a>, and I expect many other language environments to follow suit. The reason is that developing and optimizing a VM is hard work, and if there already is a good one in existence, targeting that is easier than doing you own.</li>
<li>I think that long-term, this might well replace C as the universal language that you target when you do special-purpose code generators from custom languages&#8230; which are really DSLs.</li>
<li>Thanks to this foreseen ubiquity of the OpenJDK JVM as a universal byte-code execution machine, it will provide a single point of leverage across a large range of applications in a multitude of programming languages.</li>
<li>As demonstrated by the work of the PPL and the approach taken by RapidMind, the idea of using an abstract byte code for software delivery makes very much sense in a heterogeneous and networked environment. It also provides a good infrastructure for analysis and optimization. It simply is very sensible.</li>
</ul>
<p>However, the JVM as it stands today is not really suitable for this. It will need some extensions, which I am not the man to invent. With an open-source common JVM, such innovation will be easier to do. Thanks to Sun for opening up Java! For example:</p>
<ul>
<li>Support for dynamic languages like Ruby and Python: not the same dependence on Java-type static typing and Java types. They work well for Java, but less so for other languages. It would be nice with lists for real as well, and not just as a library container.</li>
<li>Support for threads. Not OS threads, but the typical very light-weight threads used in environments like <a href="http://www.erlang.org/">Erlang</a> and <a href="http://www.mozart-oz.org/">OZ</a>. Or even lighter, like the serial units of computations in the kernels of RapidMind and CUDA and similar GPGPU efforts.</li>
<li>Support for SIMD operations, to express data-level parallelism which is often pretty easy to find on a source-code level.</li>
<li>Support for data blocking, locality, tiling of some kind, to control data locality. Maybe this already exists in X10 (which I heard about at last year&#8217;s <a href="http://jakob.engbloms.se/archives/17">Multicore Day</a>).</li>
<li>Support for communication using messages, and I assume that the best model for expressing the threads is through local data, share-nothing, message passing. With a special case for sharing large data blocks.</li>
<li>Some kind of data sharing mechanism that is more structured and understandable for a runtime system than pure locks &amp; shared data.</li>
<li>And a system that takes such an advanced byte code and makes it run well on any particular machine, be it a Tilera Tile, an 8-core P4080, a 128000-core BlueGene, a GPGPU, or a plain middle-of-the-road UltraSparc T2.</li>
</ul>
<p>So there is some work to be done. But I really think this idea has some merit&#8230; if only I had research funding and some good students. Or a crazy VC. <img src='http://jakob.engbloms.se/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/264"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/264" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/264" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/264/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Kunle Olukotun Interview: Heterogeneity, Domain-Specific Programming</title>
		<link>http://jakob.engbloms.se/archives/157?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/157#comments</comments>
		<pubDate>Sun, 20 Jul 2008 20:44:49 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[DSP]]></category>
		<category><![CDATA[heterogeneous]]></category>
		<category><![CDATA[kunle olukotun]]></category>
		<category><![CDATA[Motorola]]></category>
		<category><![CDATA[Niagara]]></category>
		<category><![CDATA[QUICC]]></category>
		<category><![CDATA[Stanford Pervasive Parallelism Laboratory]]></category>
		<category><![CDATA[Sun]]></category>
		<category><![CDATA[TI]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=157</guid>
		<description><![CDATA[The Radio Register has a nice interview with Kunle Olukotun, the man most known for the Afara/Sun Niagara/UltraSparc T1-2-etc. design. It is a long interview, lasting well over an hour, but it is worth a listen. A particular high point is the story on how Kunle worked on parallel processors in the mid-1990s when everyone [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-115 alignleft" style="margin: 10px;" src="http://jakob.engbloms.se/wp-content/uploads/2008/05/rtfm_logo.png" alt="TheRegister Radio Logo" width="48" height="48" />The <a href="http://www.theregister.co.uk/2008/07/18/scc18_kunle_olukotun_ppl/">Radio Register has a nice interview </a>with <a href="http://ogun.stanford.edu/~kunle/">Kunle Olukotun</a>, the man most known for the Afara/Sun Niagara/UltraSparc T1-2-etc. design. It is a long interview, lasting well over an hour, but it is worth a listen. A particular high point is the story on how Kunle worked on parallel processors in the mid-1990s when everyone else was still chasing single-thread performance. He really was a very early proponent of multicore, and saw it coming a bit before most other (general-purpose) computer architects did. Currently, he is working on how to program multiprocessors, at the <a href="http://en.wikipedia.org/wiki/International_Symposium_on_Computer_Architecture">Stanford Pervasive Parallelism Laboratory (PPL)</a>. In the interview, I see several themes that I have blogged about before being reinforced&#8230;</p>
<p><span id="more-157"></span></p>
<p>The themes are:</p>
<ul>
<li>The way there is to provide programming environments that express algorithms in a way that is clear in terms on intent but that does not constrain the implementation too much. It should hide parallelism to the user, and make that a property of the compiler and implementation, not the program. But to make this work, you need higher-level expressivity. For example, Kunle says that you want to say &#8220;do a matrix multiply&#8221; rather than &#8220;here is a pile of loops and expressioins implementing a matrix multiply&#8221;. Very sensible.</li>
<li>The proper way to do parallel programming this is to be domain-specific. Create domain-specific languages that map to how domain experts think about a problem, and then have a compiler and runtime take care of how to implement it for a particular machine. Rather than provide a low-level language that lets you express parallelism explicitly, like CUDA, Brook, threading libraries, etc, you should be using MatLab if you are doing science (which is something that National Instruments have been doing with their LabView tools).</li>
<li>Future hardware will be heterogeneous, with a mix of control-oriented simple cores (Niagara-style), data-processing-oriented cores (DSPs, security accelerators, etc.), and the occasional heavy-weight ILP-oriented core (Intel Core 2-style) for the occasional program that just needs maximal single-thread performance.</li>
<li>Operating systems today were created in an era where processors were precious resources that had to be efficiently shared. This is no longer really the case, it makes sense to dedicate single cores to single tasks for extended periods of time, as this maximizes efficiency. There are cores to spare, no problem on that accord. Nice to hear this from a general-purpose proponent, and not just embedded people.</li>
</ul>
<p>I fully agree with all of these, and I find the use of domain-specific languages and frameworks especially important. Software people in all ages have become more efficient by designing better languages, more suited for a particular task than a very general language. A language is a just a tool, after all, not a religion (even if some people seem to view it that way), and should be changed depending on the task at hand.</p>
<p>Final note for us embedded folks: when Kunle talks about how he realized around 1995 that lots of simple processors on a chip becoming feasible thanks to Moore&#8217;s law, the first real multicore chips were already shipping. Motorola had the QUICC 68000+CPM heterogeneous network processors on the market by then, and Texas Instruments had the C80 four-simple-DSPs-plus-a-simple-RISC multicore chip for digital video also out. But the <a href="http://en.wikipedia.org/wiki/International_Symposium_on_Computer_Architecture">ISCA </a>crowd really did not notice this development at all, at the time.</p>
<p>Obscure note two: Virtutech Simics was actually used by Afara to help in the design work of the Niagara, since Simics had very good support for 64-bit SPARC architectures thanks to the early work done with Sun.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/157"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/157" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/157" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/157/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

