<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; operating systems</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/operating-systems/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Eclipse Linux Kernel Indexing Works</title>
		<link>http://jakob.engbloms.se/archives/338?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/338#comments</comments>
		<pubDate>Sun, 01 Feb 2009 17:10:18 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Linux kernel]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[Simon Kågström]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=338</guid>
		<description><![CDATA[Edited on 2009-Feb-01, to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09. Thanks to Simon Kågströms post (and the even better second-generation with screenshots) about using Eclipse for the Linux kernel, I have a much [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-339 alignleft" style="margin: 5px 10px;" title="eclipse_wide_logo" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/eclipse_wide_logo.jpg" alt="" width="131" height="68" /> <img class="size-medium wp-image-329 alignright" style="margin-left: 10px; margin-right: 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" /> <em>Edited on 2009-Feb-01,  to include the link to the illustrated guide that really helps you get there faster. Thanks Simon! Also, promoted to front page, original post was put up on 2008-Nov-09.</em></p>
<p>Thanks to <a href="http://simonkagstrom.livejournal.com/31079.html?view=19559#t19559">Simon Kågströms post </a>(and the even better <a href="http://simonkagstrom.livejournal.com/33093.html">second-generation with screenshots</a>) about using <a href="http://www.eclipse.org">Eclipse </a>for the Linux kernel, I have a much nicer work environment now for my ongoing work in learning Linux device drivers on PowerPC, which has helped me work my way through several hard-to-figure-out system calls.<span id="more-338"></span> Here is a screenshot that I found pretty cool&#8230; the tool has found the definition and comments for the IRQ registration function:</p>
<p style="text-align: center;"><a href="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08.png"><img class="size-medium wp-image-340 aligncenter" title="2008-11-09-21-51-08" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/2008-11-09-21-51-08-300x187.png" alt="" width="300" height="187" /></a></p>
<p style="text-align: left;">2009-Feb-01:</p>
<p style="text-align: left;">I had to rebuild my indexing from scratch in the past weekend, and as a result, I have a word of warning: you have to create a &#8220;C project&#8221; in Eclipse, if you accidentally create a &#8220;Project&#8221;, the Eclipse workspace file will have the wrong name (.project instead of .cproject), and the autoconf-to-eclipse script will not work.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/338"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/338" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/338" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/338/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hardware-Software Race Condition in Interrupt Controller</title>
		<link>http://jakob.engbloms.se/archives/588?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/588#comments</comments>
		<pubDate>Sat, 17 Jan 2009 21:16:14 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[interrupt controller]]></category>
		<category><![CDATA[learning by doing]]></category>
		<category><![CDATA[OpenPIC]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[race condition]]></category>
		<category><![CDATA[teaching setup]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=588</guid>
		<description><![CDATA[The best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-589" style="margin: 5px 10px;" title="racecondition" src="http://jakob.engbloms.se/wp-content/uploads/2008/01/racecondition.png" alt="racecondition" width="99" height="78" />The best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a global shared memory, as well as a means for the processors to interrupt each other in order to notify about the presence of a message or synchronize in general. Getting this really right turned out to be not so easy.</p>
<p><span id="more-588"></span></p>
<p>I started out with a simple model where each processor had an interrupt location mapped in global memory, and writing to this location would interrupt the processor. As a bonus, the written value was communicated to the receiving processor. Then, the processor being interrupted would acknowledge the interrupt to its local interrupt controller by writing into a local address.  Worked like a charm in simple tests.</p>
<p>It broke completely when I started sending messages from multiple nodes to the same node&#8230; if an interrupt from node B reached node A when A was busy processing an interrupt from C, the interrupt from B would simply be ignored. There was no queuing, no fairness, no arbitration. The software could not solve this, since in order to create a lock around the global interrupt location for a processor, it needs some kind of global signaling mechanism. Which was what this interrupt system was supposed to provide.</p>
<p>I must have had some suspicion that something was not quite right, as I had equipped the interrupt controller with a counter for interruptions raised vs interrupts cleared. This monotonically increased, indicating accumulated non-noticed interrupt attempts.</p>
<p>One obvious solution that did not work either was to provide a way to check that an interrupt was successfully sent. Since the interrupt send register for a processor was put in a shared global memory space, a processor that wrote the interrupt send register and then read the status register would have no way to guarantee that the status it read actually dealt with the interrupt it had tried to send. It would be very likely to read the status resulting from some other processor&#8217;s interrupt attempt. Basically, it would be doing non-protected access to a shared mutable area&#8230; known not to be a good idea.</p>
<p>Another solution would be to use an atomic load-and-store operation that would store a value in a register and then return a value to the processor as well. However, I have never seen this supported for device space, even if atomic operations of this type is available on most machines for regular memory.</p>
<p>So it was back to the drawing board. It is clear that in order to do interrupts in a multiprocessor, it must be possible for any processor to interrupt any other processor without the message getting lost due to simultaneous actions in other processors. How to solve this?</p>
<p>And why did I just not copy an existing design or read a book to tell me how to do this? The problem is that I have not managed to find any good readable text on this kind of subject: how does a multiprocessor (shared-memory or local memory, does not matter really) really handle interrupts and coordinate the code that is actually running locally on each individual processor with that running on other processors &#8212; at the lowest level. A description of the hardware-software interaction design needed to make this work must exist somewhere, but I have not managed to find it, and I suspect that in many cases this is just passed down as lore from one generation of system designers to the next. If someone knows a good text on this subject, please do point it out to me!</p>
<p>My first design was to use N x N registers for an N-processor machine. Essentially, each processor would have a bank of registers with one register for each other processor, indicating the sending processor. Thus, if processors A and B decide to interrupt C simultaneously, they would write into two different locations, and C could scan its register array to tell that both A and B were calling. However, this eats memory space pretty quickly, since it requires 2 times N squared registers:</p>
<ul>
<li>N registers local to a processor, to read out the message sent in.</li>
<li>N registers for each processor,  to write messages to. This can be either a local set for each processor, or a put in global memory.</li>
</ul>
<p>In essence, this is the design of the OpenPIC controller common in PowerPC land. It codes the processors using bits rather than full registers, but it works with a local set of data for each processor where it can set bits to interrupt any other processor.</p>
<p>A colleague of mine pointed out that the SPARC systems do things a bit differently. There, you have a single register into which you send the number of the receiving processor, and a status flag to tell you if you were successful in sending. The sending software is thus responsible for retrying if the remote side is busy. This scales nicely to quite large systems, since there is no need to represent or manage interrupt registers many hundreds of bits wide &#8212; the vast vast majority of which would not be used anyway at any particular point in time. What you lose is the ability of a single processor to do arbitary multicast interrupting, which I don&#8217;t think is that commonly neede (though it might well be, this is a bit of a dark art).</p>
<p>Since both these controller registers are present in memory that is local to a processor, there is no need to worry about races between different processors interrupting the same target processor simultanenously. The hardware interrupt bus will work out so that only one wins, and the software on only one processor will see  a successful flag status and continue. The others will spin, or do more sophisticated waits if needed.</p>
<p>In the end, the code for sending an interrupt that I used was this:</p>
<pre>void interrupt_cpu(int cpu_num, int message) {
  *my_intr_dest = cpu_num;
  *my_intr_send_data = message;
  while(*my_intr_send_status == 0) {
    *my_intr_send_data = message;
  }
}</pre>
<p>Note that I still send a 32-bit message, mostly since that is handy in educational and demo setups that are not completely limited by what current hardware does. In this design, writing to the message register is what triggers the interrupt (or an attempt to send an interrupt, rather) on the other processor. The hardware (or in my case, the virtual hardware model) does the rest, in a way that is guaranteed to deliver all interrupts safely to its end point, eventually. But without any complex buffering in the hardware itself, that is best handled in the software which has an easier time managing state. This also lets the software use other strategies, such as possibly using a busy interrupt as a signal to try some other processor that is less busy.</p>
<p>Anyway, it was an interesting experience to try this, and seeing how hardware devices and software interact in a concurrent machine to create races. Not just software, but also hardware, must be designed right to avoid races from occuring. And races caused by hardware are quite impossible to work around in software at times.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/588"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/588" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/588" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/588/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shaking a Linux Device Driver on a Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/337?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/337#comments</comments>
		<pubDate>Sun, 09 Nov 2008 22:23:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=337</guid>
		<description><![CDATA[To continue from last week&#8217;s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds. First some background. A key idea in the setup is to use the approach of assuming some processing time for [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-329" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" />To continue from <a href="http://jakob.engbloms.se/archives/330">last week&#8217;s post </a>about my Linux device driver and hardware teaching setup in <a href="http://www.virtutech.com/academia">Simics</a>, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.</p>
<p><span id="more-337"></span></p>
<p>First some background.</p>
<p>A key idea in the setup is to use the approach of <em>assuming some processing time </em>for the hardware accelerator, rather than creating detailed code and determining the actual processing time for a particular implementation. Given some assumed time, we can then see how it impacts program performance. This is a way of designing hardware where we look to how fast something needs to be to have a positive impact, rather than trying to make it as fast as possible. It also lets us analyze how performance in hardware is seen when using a complete OS stack and a real device driver rather than simple bare-metal software (which tends to show the performance in the best possible light). Essentially, it is loosely timed design-space exploration.</p>
<p>Initial tests of the driver used very short completion times, on the order of 1 microsecond. The read() call at this point simply waited for the hardware completion flag to become true, and then returned the results. That is not the kind of behavior that a driver should have, since if the hardware gets some kind of hiccup, we will be stuck looping  inside a kernel context. Instead, I implemented a blocking read variant that would put the calling process to sleep until a result arrives.</p>
<p class="MsoNormal">In order to test that my driver did the sleep function correctly, I changed the processing delay into the level of seconds&#8230; and promptly found a set of issues that forced several rewrites of the code. The most important was the need to switch to a software flag for completion rather than relying on the hardware flag, and the implementation of an interrupt handler to get a notification from the hardware.</p>
<p>Then, on Friday, I demonstrated the setup along with some new performance analysis tools to go with it to some students testing the setup. And the test program suddenly stopped working, obviously hanging at the first call to read() without ever getting unblocked.</p>
<p>The reason was a classic race condition: the code in the <tt>write()</tt> device driver call that sent input data into the hardware device waited until after the writing was complete (and then some more) before clearing the operation complete flag. Here is the relevant piece of code:</p>
<pre>for(i=0;i&lt;words;i++) {
  write_register(SIMPLE_INPUT, kbuf[i]);
}
*f_pos = 0;
kfree(kbuf);
clear_completion_state();</pre>
<p class="MsoNormal">With a sufficiently short delay to completion, the completion interrupt fired, was handled, and set the completion flag before the <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">write()</span></span> function even got to <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">clear_completion_state()</span></span>. After this, the test program called <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">read()</span></span> to read the result, and was blocked as the completion flag was not set. The interrupt to signal completion from the hardware had already triggered and its result deposited in the software flag, which had then been promptly overwritten inside write(). Thus, inside read(), the flag never became set, and the process waited forever.</p>
<p class="MsoNormal">The fix is obvious: just move the clearing of the flag to <em>before </em>the writing to the hardware begins.</p>
<p class="MsoNormal">To generalize from this brilliant example of concurrency carelessness, this is a really good accidental demonstration of the power of varying timing in a virtual platform to shake code and find timing-related bugs in a manner much more efficient than possible on physical hardware.</p>
<p class="MsoNormal">Had I described the exact (or even approximate) timing of a particular hardware implementation, this kind of bug would not have been found and the driver code would not have been as robust. An implementation relying on a very short completion time could check the hardware operation complete flag directly, but that broke down when the delay was long. The buggy implementation above worked fine with a long completion time, but broke down with a short. The fixed implementation works across a span of times from 10 ns to 10 s or more, which is all you can ask for I think.</p>
<p class="MsoNormal">A short fun Simics note on this: changing that timing parameter is a run-time change. It is possible to change it during a run, from the Simics command-line, using a simple one-line command:</p>
<pre class="MsoNormal" style="padding-left: 30px;"><span style="color: #0000ff;">simics&gt; </span>sd0-&gt;time_to_result = 10.0e-9</pre>
<p class="MsoNormal">It is really nice working with a system like that!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/337"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/337" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/337" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/337/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Linux Device Drivers on a Virtual PowerPC</title>
		<link>http://jakob.engbloms.se/archives/330?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/330#comments</comments>
		<pubDate>Sun, 02 Nov 2008 10:02:41 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[DML]]></category>
		<category><![CDATA[endianness]]></category>
		<category><![CDATA[freescale]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=330</guid>
		<description><![CDATA[There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on embedded PowerPC machines, and teaching myself how to do [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-329 alignleft" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" /></p>
<p>There are times when working with virtual hardware and not real hardware feels very liberating and efficient (not to mention safe). Bringing up, modifying, and extending operating systems is one obvious such case. Recently, I have been preparing an open-source-based demonstration and education systems based on <a href="http://www.virtutech.com/solutions/virtual_platform/powerpc/freescale/mpc8641d.html">embedded PowerPC machines</a>, and teaching myself how to do Linux device drivers in the process. This really brought out the best in virtual platform use.</p>
<p><span id="more-330"></span></p>
<p>The final result of my efforts will be more public early next year, when the students I have put to work on my Linux-based setup come back and show me what they accomplished (or not). Until then, here are some small tidbits on how easy it is to work with kernel-level code in a virtual machine. Actually, if I had been working on real hardware, I am not that certain that I would have had anything but a bricked machine in front of me &#8212; to put it simply, flash reprogramming seems to hate me, and I have managed to fail or destroy a few embedded boards that have been unlucky enough to cross my path.</p>
<p>The virtual platform was really very helpful to diagnose all the mistakes I made while creating my driver and making it talk to my custom hardware.</p>
<p>First of all, it was dead easy to test a new version of the driver: start the simulation from a checkpoint of a booted and configured machine, load the driver into the target file-system using the Simicsfs backdoor (similar to the VmWare hostfs solution), and then insmod it. This was automated in a script that typed the needed commands on the target-command line with no manual intervention. Each iteration takes a few seconds, which is just as fast an convenient as testing a simple program directly on the host.</p>
<p>Diagnosing what went wrong was greatly facilitated by the simulator: did the driver access the device I had prepared for it? Were values read as expected? Obviously, there were a lot of such cases, I am not the most expert device driver programmer (yet).</p>
<p>Here is one particularly interesting example: I empirically learnt that the Linux kernel &#8220;readl&#8221; function is always reading data little-endian, even on a big-endian machine. You have to use &#8220;readl_be&#8221; to get the big-endian data from a big-endian device attached to a big-endian machine. I guess the behavior makes sense for reuse of drivers across architectures, but it sure confused me when my driver was reading the right register but complaining about bad contents.</p>
<p>The simulator showed the problem very plainly:</p>
<ul>
<li>&#8220;value read is 0xabcd0101 (BE)&#8221;. Ok that looks right.</li>
<li>&#8220;register r3 contains 0x0101cdab&#8221;. Strange, looks like the wrong byte order. WHY I screamed to myself.</li>
<li>Using reverse execution to step back one instruction showed that the load instruction used was a byte-swapping 32-bit access. Aha!.</li>
<li>Go into Linux kernel headers (include/asm/io.h) to find that there were a bunch of other varieties available, and guess that readl_be() was the right solution.</li>
<li>Change device driver code, recompile, and retest. Now it worked.</li>
</ul>
<p>I would have assumed that the book I was using as my guide, the highly-recommended<a href="http://lwn.net/Kernel/LDD3/"> Linux Device Drivers, 3rd edition</a>&#8221; would have told me this. But it did not, as it is annoyingly tied to the horrible standard PC. It could really do with some extra chapters on drivers for PowerPC, ARM, and MIPS (to name some of the most important non-x86 architectures out there).</p>
<p>On the other side of the fence, I am using <a href="http://www.virtutech.com/products/simics-modelbuilder.html">Virtutech DML </a>to do the actual device, and that is working out very well. In my setup right now, I can change the device driver and the hardware it drives, recompile both, and then run an automated test script that starts from a checkpoint, inserts the hardware model in target memory, loads the device driver, and tests it in about five seconds. Very handy, and all completely automatic. The ability to load and insert hardware models on the fly during simulation is really very convenient here &#8212; I would have to have to reboot the target Linux from scratch each time I wanted to add or remove things from the virtual platform hardware setup.</p>
<p>To sum things up, so far, I have learnt quite a lot about doing Linux device drivers and how to setup hardware in a Linux system, and I think it would have been much harder to learn and experiment like I have done had I been stuck with physical hardware (not to mention the plain impossiblity of just inserting a  new piece of hardware in a simple way into a physical system).</p>
<p>It really shows that quite often, virtual hardware is &#8220;even better than the real thing&#8221;.</p>
<p>For fun, here is a screenshot of a complete test run of loading the device driver:</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2008/11/hsi-course-complete-test-run-rebuilt-device-and-driver.png"><img class="aligncenter size-medium wp-image-335" title="hsi-course-complete-test-run-rebuilt-device-and-driver" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/hsi-course-complete-test-run-rebuilt-device-and-driver-300x187.png" alt="" width="300" height="187" /></a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/330"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/330" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/330" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/330/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What is Efficiency when Cores are Free?</title>
		<link>http://jakob.engbloms.se/archives/269?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/269#comments</comments>
		<pubDate>Sat, 13 Sep 2008 16:48:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[manycore]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[SiCS Multicore days]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=269</guid>
		<description><![CDATA[More from the SiCS multicore days 2008. There were some interesting comments on how to define efficiency in a world of plentiful cores. The theme from my previous blog post called &#8220;Real-Time Control when Cores Become Free&#8221; came up several times during the talks, panels, and discussions. It seems that this year, everybody agreed that [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-270" style="margin-left: 5px; margin-right: 5px;" title="onoff" src="http://jakob.engbloms.se/wp-content/uploads/2008/09/onoff.png" alt="" width="72" height="70" />More from the SiCS multicore days 2008.</p>
<p>There were some interesting comments on how to define efficiency in a world of plentiful cores. The theme from my previous blog post called &#8220;<a href="http://jakob.engbloms.se/archives/123">Real-Time Control when Cores Become Free</a>&#8221; came up several times during the talks, panels, and discussions. It seems that this year, everybody agreed that we are heading to 100s or 1000s of &#8220;self-respecting&#8221; cores on a single chip, and that with that kind of core count, it is not too important to keep them all busy at all times at any cost. As I stated earlier, cores and instructions are now free, while other aspects are limiting, turning the classic optimization imperatives of computing on its head. Operating systems will become more about space-sharing than time-sharing, and it might make sense to dedicate processing cores to the sole job of impersonating peripheral units or doing polling work. Operating systems can also be simplified when the job of time-sharing is taken away, even if communications and resource management might well bring in some new interesting issues.</p>
<p>So, what is efficiency in this kind of environment?</p>
<p><span id="more-269"></span></p>
<p>It was clear from both the panel discussion and discussions over lunch that programmer productivity and predictability are things that can be traded for absolute100% load on all cores. Just like making 100% use of main memory is not usually a design goal today, so making 100% use of all processor cores is not a reasonable the goal tomorrow. Some resources are so plentiful that it makes sense not to try to push usage to the limit.</p>
<p>With 100s of cores, it is quite likely that even for the most performance-demanding loads like doing LTE decoding, it is not worth the herculean effort to get all cores running at full speed all the time. Getting 80% to 90% of the cores working on a workload is probably a good tradeoff.</p>
<p>Another tradeoff you can make is to increase determinism and debuggability by assigning tasks and schedules in a more static and predictable way. Instead of trying to balance loads across the cores, tasks could be assigned in some static or semi-static manner, so that the execution of a system can be repeated with some chance of success. That should not be too hard if all cores run a static cyclic scheduler, for example, or even a single task on each core. Dynamic scheduling might well be a global suboptimization in a world with plenty of cores, as it just makes things more complex for a fairly small increase in actual efficiency. You could also imagine putting debug agents and code on certain cores just to help you get better insight into what the system is doing. A bit like <a href="http://jakob.engbloms.se/archives/17">I blogged about after last year&#8217;s Multicore Day</a>, asking designers to put more silicon into debug functionality. Maybe in a 100s of core device, we allocate cores to debug as well (I do not think we can do without dedicated debug circuitry, as that is needed to effect things like stopping cores quickly and similar).,</p>
<p>When I heard this, my gut reaction was that &#8220;hey, that is not particularly environmental&#8221; &#8212; any kind of waste of resources is really an anathema to the ecologically friendly society we need to build over the next 10-20 years. But then someone pointed out that a key part of the efficiency equation is that you turn off the unused cores and accelerators so they do not use any power. And since the cores are a resource that keeps increasing in count from basically the same use of resources  (manufacturing a chip will cost about the same amount of energy and materials for each chip, but with finer geometries you pack double the number of cores in it), it should be fine. It should also be noted that multicore computing by itself allows for more efficient processing units, for a variety of reasons.</p>
<p>Robustness also tends to increase if you have some slack in your system. For example, most hard real-time systems insist on not being more than 80% loaded or so (on a single CPU) even at the worst of tested times. To have some margin for the inevitable unexpected situations. For a 100s of cores device, you might also want to spare some cores for the case that hardware faults crop up in certain parts of the chip. Then you can shift loads to other cores (which obviously requires a pretty resilient interconnect to make any sense).</p>
<p>This final point bring me to my final thought on this was of building computing systems: in some way, we get closer to physical engineering habits when cores are free. We do not build bridges with the minimum amount of concrete and steel to handle the load we expect. Instead, there is a margin of error of a factor of three or five or so, to make sure that even in the most unexpected of unknown circumstances, that bridge will still stand. In a similar way, we might be able to use lots of free cores to engineer software systems that have far more resilience in them than todays systems that keep trying to make maximum use of the resource of clock cycles and instruction processing count. I do not quite know how that kind of system would look, but the analogy is very interesting.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/269"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/269" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/269" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/269/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lego Racers Boardgame &#8212; and why Old is Better in Software (mostly)</title>
		<link>http://jakob.engbloms.se/archives/256?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/256#comments</comments>
		<pubDate>Mon, 08 Sep 2008 07:30:05 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[multicore software]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[boardgames]]></category>
		<category><![CDATA[lego]]></category>
		<category><![CDATA[maturity]]></category>
		<category><![CDATA[operating systems]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=256</guid>
		<description><![CDATA[This might appear as a stretched analogy, but it struck as me as obvious when I tried playing the Lego Racers boardgame with my 3-year old this weekend. The game is ranked pretty low on Boardgamegeek, and deservedly so. The promise and premise is great: use Lego cars to race around a track and pick [...]]]></description>
			<content:encoded><![CDATA[<p>This might appear as a stretched analogy, but it struck as me as obvious when I tried playing the <a href="http://www.boardgamegeek.com/game/10043">Lego Racers boardgame </a>with my 3-year old this weekend. The game is ranked pretty low on Boardgamegeek, and deservedly so. The promise and premise is great: use Lego cars to race around a track and pick up new pieces to modify the powers of your car&#8230; sounds like great fun. Right? But it is not, and that&#8217;s where my analogy with the age of software comes in.</p>
<p><span id="more-256"></span></p>
<p>Lego Racers is a very buggy game. It takes almost no playing to get to a situation that is not covered by the rules, which would seem to indicate that play testing was not part of the design process. It seems that the designers made the same mistake as many programmers do: explore the obvious and primary path of execution, without thinking about what could go differently or go wrong. For something as simple as this games, that is simply sloppy. For something as complex as say, an operating system or telephone switch, it is more understandable.</p>
<p>This is where the age of software comes in: the more a particular piece of software has been used, the more different cases will have been explored, and the more errors, mistakes, and simple design holes will have been fixed. Steve Gibson at <a href="http://www.grc.com">grc.com </a>and Leo Laporte&#8217;s <a href="http://www.twit.tv/sn">Security Now </a>podcast often says that a completely new piece of software cannot be said to be secure, since there is no evidence to support that. It might well have been developed, in principle, with lots of security in mind. But until proven in the real world with real adversaries, there is no support for calling it secure.</p>
<p>It is also the case that no amount of internal testing can provide full coverage for all the cases that will appear in actual use at real customer and user sites. It is a matter of volume, but also a matter of sheer inspiration and creativity. Someone with a real problem to solve will use the tools they have in any way they can imagine&#8230; and your own developers cannot be expected have the same imagination as a user population many times their own size. That&#8217;s why beta testing and customer early access and iterative development are so important: only then will all the possible ways of using something be explored. It often turns out that users feel that your software can do something that you never quite thought it would &#8212; and that you sometimes have to insert specific limitations into the documention saying that &#8220;sorry, you cannot do that (for some not initially obvious but deep technical reason)&#8221;.</p>
<p>It also puts an interesting perspective on new creative software. Any new software entering a market will not have support for all possible users and all possible use cases. If there is an established older piece of software in the same domain, the new software will tend to solve fewer problems with fewer odd boundary conditions. The new software will typically be designed to solve some part of the problem better (or cheaper) than existing software (otherwise, its existence is hard to motivate), but initially it will not have the breadth and depth of coverage that a decade-old package will have. Simply from having been subject to users and their creativity and requirements for a long time.</p>
<p>That sounds like an awfully academic argument. One concrete example: the Linux operating system is now catching up to the old heavy-weights like Solaris and Aix in terms of scalability and robustness and features. Solaris still seems to scale better to really large number of cores and processes, but compared to where Linux used to be in the pre-2.6 kernel versions the situation is vastly improved. But doing multiprocessing like that well simply seems to take calendar time. More users does not help. You need the grind of having to transition through a few different generations of hardware of different types, and with different trends being judged important. Similary, the real-time operating systems now becoming SMP-aware will not scale as well as Linux does, at least not when judged from shared-memory flat designs. They do have their areas of merit, but they simply will not accumulate the same kind of shared-memory experience until a few years have passed. On the other hand, for the domains where predictability, control, and performance count, they are far superior to the general-purpose operating systems like Linux and Solaris, since they have been accumulating far more experience there. The same fact is evident in military history: time upon time, history shows how &#8220;seasoned troops&#8221; have a quality that no amount of quantity or training quality of fresh troops can match. Troops and equipment have to prove themselves in real battles with real enemies before their true quality can be assessed and their full potential realized.</p>
<p>To sum up, it seems to me that while cool new software is exciting to write and exciting to use, in heavy-duty real-world use, you want seasoned well-aged software that has been proven and tested in a wide variety of real-world caes over a significant period of time. Nothing beats experience, at least as long as the software system is maintained well so that it evolves in a way that keeps it open to future evolution. It does happen that old software gets worse with age&#8230; but there are many examples that are like fine wines and just tend to get better with time.</p>
<p>As a final aside,  we also own some other Lego-branded children&#8217;s games, and they, while not the most complex games in existence, at least are consistent and work without a problem. So the Lego brand itself needs not be avoided.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/256"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/256" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/256" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/256/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The 1970 rule strikes again: Virtual Platform Principles in 1967</title>
		<link>http://jakob.engbloms.se/archives/130?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/130#comments</comments>
		<pubDate>Fri, 30 May 2008 20:37:31 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[1969]]></category>
		<category><![CDATA[HITAC-8400]]></category>
		<category><![CDATA[Hitachi]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[race condition]]></category>
		<category><![CDATA[Temporal decoupling]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=130</guid>
		<description><![CDATA[Being a bit of a computer history buff, I am often struck by how most key concepts and ideas in computer science and computer architecture were all invented in some form or the other before 1970. And commonly by IBM. This goes for caches, virtual memory, pipelining, out-of-order execution, virtual machines, operating systems, multitasking, byte-code [...]]]></description>
			<content:encoded><![CDATA[<p>Being a bit of a computer history buff, I am often struck by how most key concepts and ideas in computer science and computer architecture were all invented in some form or the other before 1970. And commonly by IBM. This goes for caches, virtual memory, pipelining, out-of-order execution, virtual machines, operating systems, multitasking, byte-code machines, etc. Even so, I have found a quite extraordinary example of this that actually surprised me in its range of modern techniques employed. This is a follow-up to a previous post, after having actually digested <a href="http://jakob.engbloms.se/archives/121">the paper I talked about earlier</a>.</p>
<p><span id="more-130"></span></p>
<p>The paper in question was published in 1969, and is titled &#8220;<a href="http://portal.acm.org/citation.cfm?id=961053.961092&amp;coll=ACM&amp;dl=ACM&amp;CFID=67556471&amp;CFTOKEN=25257537">A program simulator by partial interpretation<strong>&#8220;</strong></a>. In the previous post, I took note of its use of direct execution of software plus trapping of privileged instructions, but that was not really the most interesting bits in there.</p>
<p>They lay out  in quite simple terms most of the key ideas behind today&#8217;s fast virtual platforms. Here are the best parts:</p>
<ul>
<li>They note that simulation of a computer is often used to overcome debugging difficulties, in particular repeating failed runs and tracing all that is going on in the target machine.</li>
<li>They are hunting down race conditions using the simulator.</li>
<li>They use recorded input and output to drive a deterministic simulation even of workloads involving communication with the external world.</li>
<li>They simulate multiple processors on top of a single physical processor by means of giving each processor a certain time slice to do its work before switching to the next processor. This is known as temporal decoupling or quantized simulation today, and is a key to the high speed of solutions such as Simics. They note the same tradeoffs as we see today, 40 years later, for doing this: shorter slices more accurately depict the parallelism, but also cost performance.</li>
<li>The temporally decoupled simulation also includes timers and similar non-CPU-hardware. Just like we do it today for virtual platforms.</li>
<li>In a temporally decoupled simulation, they optimize the simulation of the IDL, Idle, instruction. When it is encountered, they skip immediately to the end of the time slice. This is what we today call idle-loop optimization or hypersimulation, and which is absolutely key to achieving scalable simulation of large multiprocessor and multi-machine setups (since most parts of a system are not usually maximally loaded).</li>
<li>They are debugging operating systems on the simulator, not just user-level code.</li>
</ul>
<p>The computer in question is a Japanese System/360-compatible machine called the <a href="http://www.ipsj.or.jp/katsudou/museum/computer/0610_e.html">HITAC-8400</a>. The work was reported in 1969, but actually carried out in 1967.</p>
<p>There are some differences in scale and kind compared to today&#8217;s virtual platforms, but none that detract from the underlying principles. The 1967 system is host-on-host, so it is not the kind of cross-environment that is most common in today&#8217;s virtual platforms (Power Arch on x86, ARM on x86, etc.). The IO system is much easier to simulate since it is part of the instruction set of the processor rather than being a set of complex memory-mapped peripherals.</p>
<p>So the 1970 rule strikes again. Not the IBM rule, this time, this was all done by Hitachi. There are traces of similar work at IBM in other papers, but I have not been able to locate actual copies of any publication.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/130"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/130" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/130" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/130/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Off-Topic: Vista Refuses Aero for Java</title>
		<link>http://jakob.engbloms.se/archives/43?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/43#comments</comments>
		<pubDate>Tue, 23 Oct 2007 20:16:20 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[desktop software]]></category>
		<category><![CDATA[Aero]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[Vista]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/43</guid>
		<description><![CDATA[Vista just gave me an interesting error message: running a serious Java program required it to turn off the Aero interface. Interesting. What can Java have done to deserve that? Tweet]]></description>
			<content:encoded><![CDATA[<p><a title="Vista Aero does not work with Java" href="http://jakob.engbloms.se/wp-content/uploads/2007/10/vista-aero-cannot-work-with-java.png"><img title="Vista Aero does not work with Java" src="http://jakob.engbloms.se/wp-content/uploads/2007/10/vista-aero-cannot-work-with-java.thumbnail.png" alt="Vista Aero does not work with Java" hspace="20" align="left" /></a>Vista just gave me an interesting error message: running a serious Java program required it to turn off the Aero interface. Interesting. What can Java have done to deserve that?</p>
<p><span id="more-43"></span> <a title="Vista Aero does not work with Java" href="http://jakob.engbloms.se/wp-content/uploads/2007/10/vista-aero-cannot-work-with-java.png"><img src="http://jakob.engbloms.se/wp-content/uploads/2007/10/vista-aero-cannot-work-with-java.png" alt="Vista Aero does not work with Java" /></a></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/43"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/43" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/43" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/43/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AMP vs Virtualization</title>
		<link>http://jakob.engbloms.se/archives/22?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/22#comments</comments>
		<pubDate>Thu, 13 Sep 2007 20:26:29 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[uncategorized]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[AMP]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[SMP]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/22</guid>
		<description><![CDATA[It just dawned on me recently (and it sure must have been obvious to those working with configuring AMP &#8212; Assymtric Multiprocessing Systems) that in an AMP setup, the operating systems involved actually know about each other and have to account for the fact that they are sharing a single processor chip with other operating [...]]]></description>
			<content:encoded><![CDATA[<p>It just dawned on me recently (and it sure must have been obvious to those working with configuring AMP &#8212; Assymtric Multiprocessing Systems) that in an AMP setup, the operating systems involved actually know about each other and have to account for the fact that they are sharing a single processor chip with other operating systems. So you cannot just take two single-core operating system images from an existing multiple-processor (local memory) solution and put them on a single chip and things just work. You do need to prepare the boot process and find a way to nicely share the common I/O devices, timers, accelerator engines and other resources on the chip. This is materially different from a virtualized setup.</p>
<p><span id="more-22"></span><br />
In a virtualization-based setup, you use a single hypervisor program that then controls several single-processor operating systems running on the machine. That hypervisor also takes care of allocating shared resources to the operating systems, sometimes by sharing a single physical resources, sometimes by only letting one operating system access a certain resource. So in this case, you can actually reuse existing OS images on a new multicore chip and transparently transform an existing system.</p>
<p>Too bad there is still no embedded processor with strong support for heavy-duty virtualization like this.</p>
<p>On the other hand, it might be a passing need. The transition of old applications to new hardware will always involve some rewrite and retouch, and if that means doing a bit of change in the OS setup to handle an AMP case nicely, it is probably not too expensive (compared to redoing applications on top of the OS to be truly SMP).  And for virtualization, this means that you can use a Xen-style paravirtual approach where the OS is modified to run on top of a simple hypervisor.</p>
<p>Running and booting an unmodified binary install of an OS is likely more of a server/desktop problem than one for embedded applications. We are going to see virtualization support in hardware to help light-weight approaches be even more efficient, and also to tackle the security issues of rogue code getting into some OS image. Hardware support is needed to contain an OS that has been taken over by bad guys, no amount of cooperation between OSes in an AMP setting is going to prevent that.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/22"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/22" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/22" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/22/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fastscale minimal virtual machines &#8212; beautiful simple idea</title>
		<link>http://jakob.engbloms.se/archives/16?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/16#comments</comments>
		<pubDate>Tue, 28 Aug 2007 19:01:26 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[business issues]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[Fastscale]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/archives/16</guid>
		<description><![CDATA[A company called Fastscale Technologies has a product that is simple in concept and yet very powerful. Instead of using complete installs of heavy operating systems like Linux or Windows to run applications on virtual machines, they offer tools that provide minimal operating system configurations that are tailored to the needs of a particular application. [...]]]></description>
			<content:encoded><![CDATA[<p>A company called <a href="http://www.fastscale.com/">Fastscale Technologies</a> has a product that is simple in concept and yet very powerful. Instead of using complete installs of heavy operating systems like Linux or Windows to run applications on virtual machines, they offer tools that provide minimal operating system configurations that are tailored to the needs of a particular application. Since only that application is going to be run on the virtual machine, this is sufficient. <a href="http://www.theregister.co.uk/2007/08/27/fastscale_vmware_virtual_manager/">According to press reports, </a>this means that you can run several times more virtual machines on a given host, compared to default OS installs. And boot an order of a magnitude faster.</p>
<p><span id="more-16"></span>The basic premise is definitely one that makes sense. We have seen this at <a href="http://www.virtutech.com">Virtutech</a>, working with stripped-down Linux images on various embedded boards. Turning on and off individual services often has a significant impact on both execution speed and memory consumption of a particular target. The gut reaction when setting up a new target is to think about what can be stripped out rather than on putting in everything that could be useful. Since that will result in a bloated image that will consume resources with little additional value.</p>
<p>The difference is even more obvious for Windows machines, where the speed of the same virtual machine running NT, 2000, XP, or Vista is incredibly different. With Vista and XP in particular, there is much going on in the background eating up memory and processor time. Some simple tuning tricks like turning off animations in the GUI or background indexing can have a very large impact on speed. Same thing goes for graphical desktop Linux distributions, where turning off eye candy results in significant speed increases.</p>
<p>So I believe that Fastscale can offer what they promise, it is a nice simple idea. But as always with a simple idea that makes for a powerful product, there is something more than just the idea. There is a somewhat tricky piece of execution.</p>
<p>The trick that Fastscale brings is to automate the process of creating a minimal but sufficient substrate for a particular application, given the app, target OS, and server hardware. Sounds like dependency checking once you have the data on what needs what, but finding out the particular dependence chains in any particular OS is hard work. Doing it manually is pretty painful, and the Linux kernel configurator is not always up to the task.</p>
<p>It is also a nice example of a case where open-source opens up for new innovation. Doing this kind of configuration to Windows is much harder in practice, if nothing else since you do not get to recompile the kernel yourself.</p>
<p>Also, Linux in particular has rather good support for removing non-essential components, I believe to some extent thanks to its extensive use in embedded software. Embedded people have a tradition of application-specific configurations, and here is a nice case of when that results in a better desktop/server product in the end. Who said embedded requirements were just for a niche market?</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/16"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/16" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/16" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/16/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

