<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; ARM</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/arm/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Steve Furber: Emulated BBC Micro on Archimedes on PC</title>
		<link>http://jakob.engbloms.se/archives/1476?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1476#comments</comments>
		<pubDate>Sat, 13 Aug 2011 09:35:43 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[ACORN Archimedes]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[BBC Micro]]></category>
		<category><![CDATA[Communications of the ACM]]></category>
		<category><![CDATA[Steve Furber]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1476</guid>
		<description><![CDATA[I just read an interview with Steve Furber, the original ARM designer, in the May 2011 issue of the Communications of the ACM. It is a good read about the early days of the home computing revolution in the UK. He not only designed the ARM processor, but also the BBC Micro and some other [...]]]></description>
			<content:encoded><![CDATA[<p>I just read an <a href="http://cacm.acm.org/magazines/2011/5/107684-an-interview-with-steve-furber/fulltext">interview with Steve Furber</a>, the original ARM designer, in the May 2011 issue of the Communications of the ACM. It is a good read about the early days of the home computing revolution in the UK. He not only designed the ARM processor, but also the <a href="http://en.wikipedia.org/wiki/BBC_Micro">BBC Micro </a>and some other early machines.</p>
<p><span id="more-1476"></span>The title of this blog post is based on one fun little tidbit in the article. Steve actually keeps a functioning BBC Micro system around in the shape of an emulator.  But the emulator is not running on his current PC directly, but on top an emulated Acorn Archimedes machine.  A machine from 1981 emulated on top of a machine from 1987 that is in turn emulated on top of a machine from this year.  A real-life example of how nested emulation can help save our digital heritage (an interesting topic I <a href="http://jakob.engbloms.se/archives/180">blogged about before</a>).</p>
<p>He also has a physical BBC Micro around, and it is impressive just how high-quality construction could be back then. I still believe the old IBM keyboards are the best ever made, while the cheap laptop I am writing this on probably costs less to manufacture than that keyboard did&#8230; obviously at the expense of keyboard feeling.  Grumble grumble kurmudgeon me, right? Steve loves the way the BBC Micro keyboard was designed, and claims that it could hold up to 10 years of abuse by children. In contrast, my old ZX Spectrum had some 4 keyboard membranes worn out in the six years or so I used it. But that is what cheap gives you.</p>
<p>He also touches on the topic of just how inaccessible computing is getting in terms of getting to grips with the machine and the hardware level. Back in the 1980s, you could pretty easily build things and attach them to your home computer.  Slow buses and low-level access and no stupid operating system getting in your way facilitated that.  He proposes people using a PIC development kit, but maybe an Arduino is a better choice.  I like LEGO Mindstorms, but that is indeed a very high-level view of hardware and robotics.  Not at all like the direct programming experience I had with <a href="http://jakob.engbloms.se/archives/758">my first ZX Spectrum back in 1984</a>.</p>
<p>It is also funny to read how 1983-era 16-bit processors were not keeping up with memory &#8211; too complex instructions made the machine use memory too rarely. How the world had changed!</p>
<p>Recommended reading!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1476"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1476" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1476" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1476/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Memory Models: x86 is TSO, TSO is Good</title>
		<link>http://jakob.engbloms.se/archives/1435?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1435#comments</comments>
		<pubDate>Wed, 22 Jun 2011 15:16:35 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Doug Lea]]></category>
		<category><![CDATA[Francesco Zappa Nardelli]]></category>
		<category><![CDATA[memory consistency]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[SPARC]]></category>
		<category><![CDATA[UpMarc]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1435</guid>
		<description><![CDATA[By chance, I got to attend a day at the UPMARC Summer School with a very enjoyable talk by Francesco Zappa Nardelli from INRIA. He described his work (along with others) on understanding and modeling multiprocessor memory models. It is a very complex subject, but he managed to explain it very well. He showed a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/11/UPMARC_700x150.gif"><img class="size-full wp-image-1016 alignleft" title="UPMARC_700x150" src="http://jakob.engbloms.se/wp-content/uploads/2009/11/UPMARC_700x150.gif" alt="" width="122" height="45" /></a>By chance, I got to attend a day at the <a href="http://www.it.uu.se/research/upmarc/events/SS2011/Programme.html">UPMARC Summer School</a> with a very enjoyable talk by <a href="http://moscova.inria.fr/~zappa/">Francesco Zappa Nardelli </a>from INRIA. He described his work (along with others) on <a href="http://www.cl.cam.ac.uk/~pes20/weakmemory/">understanding and modeling multiprocessor memory models</a>. It is a very complex subject, but he managed to explain it very well.</p>
<p><span id="more-1435"></span>He showed a very interesting discussion from a few years ago on the x86 memory model and the implementation of spinlocks in the Linux kernel. Various experts went back and forth over whether the final MOV that sets a lock variable to 1 needed to be prefixed by LOCK or not. The discussion ended when Linus Torvalds said &#8220;I know that it is needed&#8221;. Only to see an Intel architect finally intervene and say &#8220;you know, really, it isn&#8217;t needed&#8221;. This was followed by a series of releases of Intel manuals documenting the x86 memory model, with increasing precision in each release. Intel also actually changed the published rules along the road, withdrawing some optimizations as they realized that they would break existing software.</p>
<p>Note that such a description of a memory model must both describe existing hardware, and serve as the guideline for future hardware. Therefore, there are optimizations that are not implemented today but which are possible given the rules. Such optimization opportunities can be removed from the rulebook as long as they have never been part of shipping hardware, so it is not as crazy as it might sound.</p>
<p>Anyway, the point that Francesco made was both to tell an interesting story from history, and making the point that describing and understanding memory models is hard. I certainly agree with that. I recall an ISCA many years ago when some computer architecture professors all agreed that very few people really understand consistency and weak memory models.</p>
<p>To make life easier for programmers, Francesco and Peter Sewell (in Cambridge) has defined their own set of rules for x86 memory consistency. This is not an architecture spec, but a rule set for regular programmers. It is found at <a href="http://www.cl.cam.ac.uk/~pes20/weakmemory/">http://www.cl.cam.ac.uk/~pes20/weakmemory/</a>. Essentially, the conclusion is that x86 in practice implements the old SPARC TSO memory model.</p>
<p>They have also attempted to formalize the Power Architecture memory model. Both the actual memory model and their model of it can only be described as very complex. The programmer&#8217;s model is expressed in terms of store queues, speculative instruction execution, and commits of instructions. Not something you easily keep in your head. It is interesting to note that ARM MPCore essentially copied the Power Architecture.</p>
<p>He showed an interactive simulation of the Power memory model, and the way that you need to think about it in terms of propagating information between threads and committing them. It is possible to propagate values and then another propagation overrides a value before the thread commits&#8230; Fun. Or a headache.</p>
<p>The big take-away from the talk for me is that it confirms the observation made may times before that <a href="http://en.wikipedia.org/wiki/Memory_ordering">SPARC TSO </a>seems to be the optimal memory model. It is sufficiently understandable that programmers can write correct code without having barriers everywhere. It is sufficiently weak that you can build fast hardware implementation that can scale to big machines.</p>
<p>Maybe TSO does not theoretically scale in the same insane way as Power or Alpha does/did. But the cost of that theoretical scalability is that programmers might have to litter their code with sync operations just to get it to run correctly. With too many sync operations, the code will run very slowly negating any advantage on the hardware level. Note that sync operations can be very expensive. <a href="http://g.oswego.edu/">Doug Lea</a>, in the audience, pointed out that a sync can cost up to 300 cycles on a POWER5.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1435"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1435" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1435" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1435/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>S4D 2010</title>
		<link>http://jakob.engbloms.se/archives/1251?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1251#comments</comments>
		<pubDate>Wed, 15 Sep 2010 08:02:42 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[multicore debug]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Debug]]></category>
		<category><![CDATA[ESCUG]]></category>
		<category><![CDATA[FDL]]></category>
		<category><![CDATA[Infineon]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[John Aynsley]]></category>
		<category><![CDATA[Pat Brouillette]]></category>
		<category><![CDATA[S4D]]></category>
		<category><![CDATA[Simon Davidmann]]></category>
		<category><![CDATA[Southampton]]></category>
		<category><![CDATA[ST]]></category>
		<category><![CDATA[SystemC]]></category>
		<category><![CDATA[Thorsten Grötker]]></category>
		<category><![CDATA[TrustZone]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1251</guid>
		<description><![CDATA[Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg"><img class="alignleft size-full wp-image-941" title="S4D" src="http://jakob.engbloms.se/wp-content/uploads/2009/09/S4D1.jpg" alt="" width="143" height="62" /></a>Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and the breaks, and I think we all got new insights and ideas.</p>
<p><span id="more-1251"></span></p>
<h2><a href="../wp-content/uploads/2010/09/P1140077.jpg"><img class="aligncenter size-full wp-image-1276" title="P1140077" src="../wp-content/uploads/2010/09/P1140077.jpg" alt="" width="400" height="258" /></a></h2>
<h2>S4D Talks, Themes, and Topics</h2>
<p>More is available in &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D part 2</a>&#8220;.</p>
<h3>Tracing and Instrumentation</h3>
<p>The papers presented covered a wide variety of topics from a variety of angles. Still, everybody felt that two topics kept coming back in various forms in a majority of the papers and discussions: <em>tracing</em> and <em>instrumentation</em>.</p>
<p>Code instrumentation is not a dirty word anymore. The traditional judgment that inserting probes into your software is plain bad does not apply anymore, at least not in the minds of the people at S4D. Instrumentation was applied to drivers, OS kernels, and regular user-level software. I think the key insight is that there is clear value in having the developers that write a piece of software also mark points of interest in the code. When analyzing a trace of an execution, that means that the information in the trace becomes meaningful to the software developers, as it is on the right level of abstraction. Instrumentation naturally produces traces, which can be fed out using  shared memory, networks, special-purpose hardware, and more.</p>
<p>One of the instrumentation trace solutions presented (the SVEN system from Intel Digital Home presented by Pat Brouillette), actually leaves the instrumentation in place in the shipping customer systems. In this way, you cannot really claim that instrumentation is intrusive &#8211; it is just part of the software, always. Customers can even activate the tracing in deployed systems, and ship the traces back to the developers for analysis of bugs found in the field. It is another approach to <a href="http://jakob.engbloms.se/archives/1231">record and replay</a> that touches on my paper on transporting bugs with checkpoints.</p>
<p>The increased interest in instrumentation probably has something to do with the nature of the systems that are being addressed. For systems using shared memory multicore hardware and general-purpose operating systems, the cost of instrumentation is easier to take than for very small constrained embedded systems. Essentially, as systems get more complex, instrumentation becomes more tractable.</p>
<p>Instrumentation can interact with hardware trace and debug functions is a neat way to build a system which is more powerful than a hardware or software system would be on its own. Especially for software stacks involving hypervisors and multiple complex operating systems, that is likely necessary.</p>
<p>Once we have a trace, just <a href="../archives/942">like last year</a>, we need to have tools for analyzing the tons of data you get from tracing a modern system. ST talked about a tracing system that generated 100s of gigabytes of data.</p>
<p>One trace aspect that kept coming up was the need for <em>time stamps </em>on trace data. To reconcile multiple traces and understand how different concurrent units talk to each other, a global time stamping mechanism is crucial. There seems to be work on hardware to support this.</p>
<h3>Security, Secrecy, and Debug</h3>
<p>I moderated a panel on hardware support for debug, and posed the question on how to balance security and the need to debug. This generated a number of interesting answers from the panel and the audience.</p>
<p>The conflict between debuggability and secrecy is there. Even from the same customer you first get &#8220;you have to make the internal state of the controller inaccessible and hidden to avoid customers modifying their engines&#8221;&#8230; and then when a problem appears in the field, they ask for a way to analyze and trace that very same system. Hard to support both requirements in a reasonable way.</p>
<p>A sophisticated solution to debug security from companies like ARM, Infineon, and ST is debug that can be enabled using key exchange. The chips are built with a &#8220;locked door&#8221; in place, but the keys to the door are kept well-guarded. In this way the same chip can be used in development and in the field.</p>
<p>To support debug of systems involving secure modes like ARM TrustZone, ARM has defined several levels of access in their CoreSight hardware modules. This makes it possible for a debugger to be restricted to just debugging user-level code, just OS and user-level code, or all of the software stack. To me, this sounds like it could allow mobile phone manufacturers to &#8220;securely&#8221; let their application developers use hardware-based debug, without compromising operating systems or secure boot modes.</p>
<p>The classic technique of using fuses to turn off functions is also relevant, at least for systems with moderate levels of security. This can certainly be overcome using special tools to peel off the top of chips and reconnect the fuses, but the panel seemed to think that that level of attack was in general not worth protecting against. However, the audience pointed out that  this was actually being done to automotive engine controllers and there are people making a good living from such antics.</p>
<h3>ESCUG Meeting</h3>
<p>The ESCUG meeting was a mix of fairly slick commercial presentations from OVP/IMperas chief Simon Davidmann and SystemC guru John Aynsley, and research presentations of varying quality.</p>
<p>One thing that struck me was that the academics spent a significant time in all presentations about how their approaches were compatible with the existing SystemC structure, where they host their open-source efforts, etc. I guess that is good in that they show a certain concern for reality &#8211; but it is also a bit sad that they did not get time to actually talk that much about the core ideas they were bringing forward. I am personally much more interested in new ideas than infrastructure and project management. It does not bode well for European research if this is what people are forced to produce, in lieu of real innovation.</p>
<h3>Thorsten Grötker&#8217;s Keynote</h3>
<p>On Wednesday morning, Thorsten from Synopsys did a look back over the history of SystemC, free from product pitching. He only mentioned Synopsys in his introduction, where the high-level message was that the embedded software is really the key problem for industry today. I cannot disagree with that.</p>
<p>During the SystemC parts of his talk he did say a few things that I did not quite agree with&#8230; in particular that TLM was unknown prior to 1999. It was not called that, but it certainly existed in the field of full-system simulation. The main problem is that Thorsten only sees the EDA history of modeling, not the computer architecture and software-driven work that did simulations as far back as 1950 (the famous Gill paper), and fast simulation since at <a href="http://jakob.engbloms.se/archives/130">least 1967</a>.</p>
<p>He also claims that with SystemC you have a single language for both detailed and TLM models. That is true&#8230; but you still need multiple models, one at each level of abstraction. So yes, one language, multiple models. However, that gluability really comes with a performance and complexity cost. It makes it too easy to slip into bad modeling even in TLM.</p>
<p>An interesting theme that Thorsten picked up from John&#8217;s talk at ESCUG is the use of SystemC to model software and RTOS, using the upcoming process control extensions. If you stretch that into the area of software synthesis, it means that SystemC is going to collide with the field of model-driven software development. Will you use SystemC, coming from the hardware world, or UML/MATLAB/Domain-specific languages coming from the software world?  Thorsten makes the interesting point that in order to integrate with that world, SystemC will require some concepts from that world (like pins and clocks enable interaction with RTL). I am not sure that is true, necessarily, I think you can just as well create point adaptors to the same effect.</p>
<h2>Getting to Southampton</h2>
<p>The <a href="http://www.soton.ac.uk/">University of Southampton </a>hosted the event, and it took place in the university lecture halls.  That means that we got free very fast WiFi (unlike any commercial conference venue I have ever seen).  The university campus was full of services (unlike the desolate place that last year&#8217;s FDL/S4D choose).  Housing in the <a href="http://www.soton.ac.uk/accommodation/halls/gleneyre/index.html">Glen Eyre residential halls </a>was a bit spartan but functional. Felt like being back in my days as a student living in student housing.</p>
<p>The instructions from the conference about how to get to the conference was a bit confusing and incomplete. In practice, it is very easy to get to Southampton from both Gatwick (direct train) and Heathrow (NationalExpess bus 203).  At Heathrow, I had a bit of luck with the bus to Southampton. The instructions from the NationalExpress website had me believe that I had to get from Terminal 5 where we landed to the central bus station and then catch the bus at 15.00. As we landed 40 minutes late (14.40), this looked very hopeless&#8230; until I found the NationalExpress counter in the arrivals hall at Terminal 5 and they told me the bus would leave at 15.30. Nice, no stress. The bus to Southampton even had free Wifi on board!</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062.jpg"></a><a href="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg"><img class="aligncenter size-full wp-image-1275" title="P1140062-1" src="http://jakob.engbloms.se/wp-content/uploads/2010/09/P1140062-1.jpg" alt="" width="400" height="246" /></a></p>
<p>Once in Southampton, you then had to take the bus U1A out to the university campus, and finding a bus stop for that was the most difficult part of the journey, actually. Some of the buses from Heathrow stop at Southampton university.</p>
<p>See also &#8220;<a href="http://jakob.engbloms.se/archives/1280">S4D Part 2</a>&#8221; for a few more tidbits from S4D.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1251"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1251" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1251" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1251/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Microsoft + ARM = ARM64?</title>
		<link>http://jakob.engbloms.se/archives/1204?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1204#comments</comments>
		<pubDate>Tue, 27 Jul 2010 19:57:05 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[apple]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1204</guid>
		<description><![CDATA[The recent news that Microsoft has taken out an ARM architectural license has caused a lot of speculation about just what this might mean. There are several quite well reasoned ideas around the web, and I have one idea of my own: sixty-four bits. Here is a list of some of the ideas I have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png"><img class="alignleft size-full wp-image-1205" title="windows phone logo" src="http://jakob.engbloms.se/wp-content/uploads/2010/07/windows-phone-logo.png" alt="" width="66" height="58" /></a>The recent news that Microsoft has taken out an ARM architectural license has caused a lot of speculation about just what this might mean. There are several quite well reasoned ideas around the web, and I have one idea of my own: sixty-four bits.</p>
<p><span id="more-1204"></span></p>
<p>Here is a list of some of the ideas I have seen floating around:</p>
<ul>
<li>They are going to put <a href="http://armdevices.net/2010/07/23/the-secrets-behind-microsofts-new-arm-license/">Windows 7 on ARM </a>to <a href="http://gigaom.com/2010/07/23/4-actions-microsoft-can-take-with-an-arm-license/">defend against Linux in the sub-netbook segment</a>. Certainly a useful move &#8211; but do you need an architecture license for that? It would not make much sense to put Windows on ARM in a way that requires you to have a Microsoft-designed processor core to run it.</li>
<li>It is a case of Apple &#8220;A4&#8243; envy &#8211; if Apple can build their own custom ARM chip for a consumer devices in the tablet space, <a href="http://gigaom.com/2010/07/23/4-actions-microsoft-can-take-with-an-arm-license/">so can Microsoft</a>. I think this makes a lot of sense. But once again, what use is a license to design your own ARM core and ARM instructions? <a href="http://unplugged.rcrwireless.com/index.php/20100723/news/2226/why-microsoft-acquired-an-arm-license/">Apple does not seem to have created a custom processor core</a>, instead using a standard ARM Cortex-A8 core in the A4 chip as far as I understand.</li>
<li><a href="http://arstechnica.com/microsoft/news/2010/07/microsoft-should-cut-out-the-middle-men-and-build-its-own-phones.ars/2">Handheld gaming, building their own phone</a>. It is essentially the A4 envy argument, and I cannot see how an architecture license will help.</li>
<li><a href="http://www.computerweekly.com/Articles/2010/07/23/242079/Microsoft-licenses-ARM-tech-in-bid-to-own-39internet-of.htm">Deeply embedded devices</a>. Microsoft wants to ride the ARM wave into things like power meters and other truly embedded systems. Once again, that might certainly be a good idea &#8211; but why an architecture license?</li>
<li>As a defensive move in the Microsoft-Intel relationship. If Intel is toying with Linux, Microsoft can be toying with ARM. In that context, a few million dollars to acquire a license to do nothing much with it might make sense as negotiation leverage.</li>
</ul>
<p>The best idea I think is the <a href="http://gigaom.com/2010/07/23/4-actions-microsoft-can-take-with-an-arm-license/">server angle</a>. ARM is making inroads into servers where single-thread processor speed is not that important. The power consumption advantage over x86 is significant, as long as you do not need utter speed or very many cores sharing memory.</p>
<p>Power-efficient servers is an area where Microsoft can certainly see the potential for truly revolutionary changes in the IT field. There is no reason why x86 would be the best architecture there, and the fact that x86 is controlled by Intel and AMD makes it very hard for Microsoft to really take part in hardware-software innovation. Essentially, Intel and AMD design the processors, and the software has to adapt. This model is not optimal if you want to really see what you can do with the hardware-software boundary.</p>
<p>Thus, if I think Microsoft might do with the ARM license is to start tinker with the OS interface of the hardware. The beneficiary would be both Microsoft and ARM, since Microsoft innovations for ARM might well become standard in ARM land. For example, an OS might benefit tremendously from small changes in the processor in areas like:</p>
<ul>
<li>Memory management &#8211; a hardware scheme tailored to what Microsoft sees being done in their operating systems and their third-party server software could certainly be very beneficial to efficiency.</li>
<li>Multicore and multithreading &#8211; ARM has some special support in their MPCore designs for communicating between cores in a multicore system. Such support could be extended to help the OS manage both threads and processors across multicore designs. Also, ARM might want some help from Microsoft to design a scalable OS interface that works for tens or hundreds of tightly-coupled cores (rather than the current limit of 4 in ARM Cortex-A9 MP).</li>
<li>More bits &#8211; the biggest architectural problem I see with ARM in servers is that ARM is currently a 32-bit architecture. 32 bits are not enough for even medium servers today. Having Microsoft help ARM design a 64-bit version of the ARM architecture sounds far-fetched, but it is certainly something where a license to change the ARM instruction set would help.</li>
<li>ARM instruction set additions to help build an emulator for x86 instructions on ARM, to allow current  x86-Windows applications to run on an ARM-based device. This is an idea which has been tried in the past and which has never been very successful in the market. ARM currently has a set of instructions which help accelerate virtual machines such as .net and JVM, and I think that is as far as this idea has been proven useful.</li>
</ul>
<p>In summary, the only reasonable use I can see that Microsoft would make of a license to build custom ARM processor cores and tweak instruction sets would be to build better servers, by improving multicore communications mechanisms, upping ARM to 64 bits, tweaking the MMU design, and possibly creating some kind of x86 emulation support.</p>
<p>That said, I think the most likely result is a custom ARM-based chip to power a phone or tablet or other consumer electronics device, running a Microsoft software stack (derived from Win CE/Windows Phone) on Microsoft hardware, all sold as a Microsoft product. Essentially, doing the <a href="http://arstechnica.com/microsoft/news/2010/07/microsoft-should-cut-out-the-middle-men-and-build-its-own-phones.ars/2">Apple all-integrated solution.</a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1204"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1204" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1204" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1204/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Concurrency in Lego Mindstorms NXT</title>
		<link>http://jakob.engbloms.se/archives/1058?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1058#comments</comments>
		<pubDate>Fri, 08 Jan 2010 21:19:54 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[AVR]]></category>
		<category><![CDATA[Domain-specific languages]]></category>
		<category><![CDATA[LabView]]></category>
		<category><![CDATA[lego]]></category>
		<category><![CDATA[Mindstorms]]></category>
		<category><![CDATA[NXT]]></category>
		<category><![CDATA[parallelized software]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1058</guid>
		<description><![CDATA[For my parental leave, I have just bought myself a Lego Mindstorm NXT 2.0 kit. It is not much fun for our youngest, who mostly gets a bit scared by a piece of Lego driving around making noises, but I hope to be able to use it to teach my older child (almost five) to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1057" style="margin: 5px 10px;" title="lego mindstorms nxt2" src="http://jakob.engbloms.se/wp-content/uploads/2010/01/lego-mindstorms-nxt2.png" alt="lego mindstorms nxt2" width="146" height="126" /></p>
<p>For my parental leave, I have just bought myself a <a href="http://www.mindstorms.com">Lego Mindstorm NXT 2.0 kit</a>. It is not much fun for our youngest, who mostly gets a bit scared by a piece of Lego driving around making noises, but I hope to be able to use it to teach my older child (almost five) to program. Let&#8217;s see how that turns out. It looks hard to make the NXT environment provide the kind of <a href="http://www.boardgamegeek.com/boardgame/18/roborally">Roborally</a>-style programming blocks that I had hoped to create, as I cannot for some reason get a sufficiently custom icon onto custom blocks.</p>
<p>It also presented me with an opportunity to try some domain-specific high-level graphical programming. The programming environment provided for the NXT series of Mindstorms kits is based on LabView from National Instruments, and it really does seem to work. It even features parallel tasks, which I tried to use&#8230;</p>
<p><span id="more-1058"></span>It turned out that it is not that easy to get it right. I was trying to have a few different tasks look at different sensors and steer the robot in different directions based on the sensor readings. However, this quickly turned into a literal deadlock: the robot just sat there, doing nothing, after the first time that my ultrasound sensor task tried to steer it. Failure.</p>
<p>The manual is quite silent on the tasking semantics, and there are no (or I have not yet found) shared variables, locks, or message-passing mechanisms to synchronize the tasks.</p>
<p>Looking for an answer on the web, I came across a nice tutorial on tasking:</p>
<ul>
<li><a href="http://www.ortop.org/NXT_Tutorial/tasks.html">http://www.ortop.org/NXT_Tutorial/tasks.html</a></li>
</ul>
<p>It is a video, and it ends with the note that the only reliable way to multitask in the NXT environment is to NOT try to control the same thing from multiple tasks. If I want to listen to multiple sensors, the only way is to create a loop with switching on inputs. I.e., manual polling. So much for using the natural language of tasks to handle naturally concurrent activities. At some point, I might figure out if I can construct it from tasks and data value transfers, but for now, I will just go with the flow and write (or rather draw) some polling loops.</p>
<p>Here is an example from the manual. Note that the top &#8220;beam&#8221; controls motor A, and the bottom controls motors B and C:</p>
<p><img class="aligncenter size-full wp-image-1059" title="mindstorms manual excerpt" src="http://jakob.engbloms.se/wp-content/uploads/2010/01/mindstorms-manual-excerpt.png" alt="mindstorms manual excerpt" width="691" height="448" /></p>
<p>That last part about data wires is intriguing&#8230; but I will need to find some more reference materials.</p>
<p>Anyway, the way to express parallel tasks here is really quite neat. At least as long as they are dealing with completely separate aspects of the control of the robot.</p>
<p>If this had been a more hard-core environment, it would have been fun to put different tasks on the ARM7 and AVR processors that are inside the NXT 2.0 brick. Yes, it is a true multiprocessor, if very limited in capabilities. At <a href="http://mindstorms.lego.com/en-us/whatisnxt/default.aspx">http://mindstorms.lego.com/en-us/whatisnxt/default.aspx </a>you have the specs. ARM7, 256 kB of FLASH, 64 kB of RAM, and the AVR has 512 bytes of RAM and 4 kB of FLASH. A nice real embedded machine!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1058"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1058" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1058" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1058/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Is Cycle Accuracy a bad Idea?</title>
		<link>http://jakob.engbloms.se/archives/153?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/153#comments</comments>
		<pubDate>Fri, 11 Jul 2008 20:45:02 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[Axys]]></category>
		<category><![CDATA[Carbon Technology]]></category>
		<category><![CDATA[clock-cycle models]]></category>
		<category><![CDATA[CoWare]]></category>
		<category><![CDATA[cycle accuracy]]></category>
		<category><![CDATA[DEC]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Infineon]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[rtl]]></category>
		<category><![CDATA[scdsource]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=153</guid>
		<description><![CDATA[In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea [...]]]></description>
			<content:encoded><![CDATA[<p>In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all&#8230; is ARM right or am I right, or are we both right since we are talking about different things?</p>
<p>Let&#8217;s look at this in more detail.</p>
<p><span id="more-153"></span></p>
<h2>Definitions</h2>
<p>A clock-cycle (CC) model in this discussion is something that attempts to provide a cycle-by-cycle depiction of the behavior of a computer system. Usually, such models are driven by a cycle-by-cycle clock, as that is the easiest way to write and structure them.<br />
A cycle-accurate (CA) model is a CC model where the depiction is &#8220;the same&#8221; as what would happen in the real system provided they both started from the same state. </p>
<h2>What is ARM Doing?</h2>
<p>ARM seems to be passing on the tools and technologies they acquired when they bought Axys back in 2004. These tools are CC-oriented, and are aimed at hardware architects (and some really-low-level software work). They make it possible to evolve a target design cycle by cycle in the simulator to get a very accurate picture of the target behavior. I think this fits very well for Carbon, as they generate cycle-driven very accurate models by essentially compiling the actual RTL implementation of a piece of logic, processor, or device into something a bit faster than plain HDL simulation. Carbon models are a natural fit for the Axys tools.</p>
<p>Basically, it sounds as if ARM decided that manually creating CC level CA models for their latest processors for use in the Axys tools (SoC Designer) was too much work and too hard to validate. Thus, they pass the whole thing on to Carbon and seem to expect Carbon to generate CA models for use with SoC designer straight from the actual ARM implementation RTL. Carbon will have the old CC/CA models written by Axys (and later ARM), and then generate new models for new generations of ARM chips like the Cortex A9. I quote:</p>
<blockquote><p>&#8220;The model generation flow will be optimized and validated using the RTL code, ensuring speed and accuracy. The processor models will also leverage the Carbon model application programming interface (API) to offer a direct connection to the ARM RealView(R) Debugger. Carbon-generated models of ARM IP will offer our customers the fastest, most-accurate path for firmware development and architectural exploration.&#8221; (<a href="http://www.carbondesignsystems.com/Press/20080707%20Carbon%20Press%20Release.pdf">press release</a>)</p></blockquote>
<p>And:</p>
<blockquote><p>ARM made this decision, Cornish said, because it&#8217;s become increasingly difficult and time-consuming to develop cycle-accurate models. &#8220;We recognized it would make more sense to work with a specialist like Carbon that has technology for generating models directly from RTL,&#8221; he said. (<a href="http://www.scdsource.com/article.php?id=264">SCDSource News Piece on the deal</a>)</p></blockquote>
<h2>Feasibility of Construction</h2>
<p>The core argument here is really how easy or feasible it is to build CA models of a processor core (or any other really complex piece of logic). There are several interesting views to consider.</p>
<ol>
<li>The ARM statement is basically saying that building CA models of a processor core is very hard. It is hard to get right, hard to validate, and hard to maintain. So why even try? Better to generate it from the RTL and let experts at doing that do the work.</li>
<li>In my PhD thesis from 2002, I concluded that building an accurate model of a processor from public information and reverse engineering is very very difficult, and cited a number of computer architecture and real-time systems attempts to build models that all turned out to have accuracy issues. I did not know much about EDA then &#8212; and ESL did not really exist. But I think that still holds water: constructing a model of a processor is hard.</li>
<li>In the SCDSource article, I make the statement that &#8220;Building cycle-accurate (CA) models is very difficult, as you need to understand and describe the implementation details of complicated hardware units. &#8230; It is quite easy to end up with something that is essentially an alternative implementation to the actual chip RTL. It is especially difficult for third parties, as it requires access to the device and processor core designers to explain the design.&#8221; Which is essentially saying that you need to get inside the processor design group to get the information.</li>
<li>The common knowledge that all great processor design teams, from the DEC Alpha to Intel x86s to AMD Opterons to IBM Power to Freescale Power to Infineon TriCore to Sun Niagara use internal cycle-detailed simulators as their main design tools to prototype and decide how to design pipelines, memory systems, and system platforms. In this case, the simulator comes before the processor, not the other way around.</li>
<li>Tensilica has, as Grant Martin points out in comments at <a href="http://www.scdsource.com/article.php?id=266">SCDSource</a>, tools that generate both the processor and an accurate model at the same time from the same information base.</li>
<li>CoWare&#8217;s LisaTek tools for describing and generating application-specific processors also claim to generate accurate models from the LISA source files in a way similar to Tensilica but based on a user describing a completely custom design in a third-party tool. In the case of Tensilica, the tool and the design come from the same company.</li>
</ol>
<p>So where does this leave us? It makes it clear that in order to build a good cycle-accurate model you need access to internal information and the processor design/processor design team. The CA model can be built either:</p>
<ol>
<li>By synthesizing from the RTL, Carbon-style.</li>
<li>By synthesizing from some more abstract design description, Tensilica or LisaTek-style.</li>
<li>By the design team as part of the design process.</li>
<li>By some poor guy working after the fact from specs and test cases.</li>
</ol>
<p>I think the ARM-Carbon deal (and all practical experience as well) invalidates the fourth variant. Essentially, that is what Axys had to do: build models after the fact, separate from the CPU design flow. This is a property of how ARM design processors and the fact that Axys began life outside of ARM (my guess, nota bene). It is what computer architecture researchers often want to do but fall down on over and over again. In fact, a common question from computer architecture newbies is if Virtutech Simics has correct models of processors like the Intel Pentium4 or Core 2 available to use as starting points in research. It would be nice, but sorry, we do not.</p>
<p>But the other three variants do make sense, and will all result in some kind of decent model. Which one you end up doing depends on the style of your design and quite likely the complexity of the processor and system design. In the end, any truly revolutionary design (think Sun Rock, for example) will need to write a custom simulator as tools will not have the concepts in them to model all ideas. It seems that simple &#8220;standard&#8221; designs that fit in the categories of &#8220;custom RISC&#8221; or &#8220;custom DSP&#8221; and that do not break new ground in computer architecture can probably be designed using tools that allow processor and simulator generation. I think that most heavy-duty general-purpose processor cores will have to do either the design-model or RTL-generation path, while more accelerator-style cores can use the tools approach.</p>
<p>As a final note, there could really be two different problems being addressed here regarding &#8220;cycle accuracy&#8221;, and that this might contribute to different levels of feasibility:</p>
<ul>
<li>Using the simulator to validate and optimize software performance can tolerate some errors in details as long as errors do not accumulate (see for example the &#8220;<a href="http://moss.csc.ncsu.edu/~mueller/wcet06/accepted/5.html">timing anomalies</a>&#8221; or &#8220;<a href="http://www-emsoft02.imag.fr/Programme/Engblom.pdf">unbounded long timing effects</a>&#8221; found in WCET research). It is about understanding the software behavior versus the processor design (or complex accelerator design versus input data), in small focused spots of execution.</li>
<li>Using the simulator to validate a chip design including buses and other devices that can be bus masters. This ought to require a higher level of accuracy, as the penalty for errors would potentially seem greater. And this is also where ARM&#8217;s SoC designer fit in, rather than as a tool to understand the software behavior. The scope here is larger and there is usually no idea of zooming in on detail at particular points in time.</li>
</ul>
<p>So where does this land us?</p>
<p>I guess that CC/CA models can be built if you have a nice inside track to the design team, and that the only sensible way to use them is as a zoom device for the places in your code where you absolutely need the details. Most of the time (say 90-95-99%) software does not need CC models, but rather something that is functionally accurate and that runs really really fast so that all software can at least be executed. That is something a CC model will never be able to do, at least not for systems using non-trivial operating systems requiring a few billion instructions just to boot&#8230;</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/153"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/153" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/153" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/153/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

