<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; interrupt</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/interrupt/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Interrupts and Temporal Decoupling</title>
		<link>http://jakob.engbloms.se/archives/1384?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1384#comments</comments>
		<pubDate>Sun, 27 Feb 2011 21:09:17 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Grant Martin]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[Temporal decoupling]]></category>
		<category><![CDATA[Tensilica]]></category>
		<category><![CDATA[virtual]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1384</guid>
		<description><![CDATA[I am just finishing off reading the chapters of the Processor and System-on-Chip Simulation book (where I was part of contributing a chapter), and just read through the chapter about the Tensilica instruction-set simulator (ISS) solutions written by Grant Martin, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png"><img class="alignleft size-full wp-image-737" style="margin: 5px;" title="gears1" src="http://jakob.engbloms.se/wp-content/uploads/2009/04/gears1.png" alt="" width="56" height="57" /></a>I am just finishing off reading the chapters of the <a href="http://www.springer.com/engineering/circuits+%26+systems/book/978-1-4419-6174-7" target="_self"><em>Processor and System-on-Chip Simulation </em></a>book (where <a href="http://blogs.windriver.com/engblom/2011/01/processor-and-soc-simulation-book.html">I was part of contributing a chapter</a>), and just read through the chapter about the <a href="http://www.tensilica.com">Tensilica </a>instruction-set simulator (ISS) solutions written by <a href="http://www.chipdesignmag.com/martins/">Grant Martin</a>, Nenad Nedeljkovic and David Heine. They have a slightly different architecture from most other ISS solutions, since that they have an inherently variable target in the configurable and extensible Tensilica cores. However, the more interesting part of the chapter was the discussion on system modeling beyond the core. In particular, how they deal with interrupts to the core in the context of a <a href="http://jakob.engbloms.se/?s=temporal+decoupling">temporally decoupled </a>simulation.</p>
<p><span id="more-1384"></span>This is a small detail, but one where I have always had a feeling that some fundamental assumption was missing in my discussions with various people from the hardware design community. It always seemed that hardware designers assumed a different basic design &#8211; and Grant Martin explained it very well just what that was. They only check for interrupts at the beginning of a time slice. Which makes interrupts less precise  versus the code, but also makes the core interpreter fairly simple since all it has to do is to churn through instructions.</p>
<p>There is another solution, which is employed in Simics, where the processor can take an interrupt at any point in a time quantum. To do this, the processor needs to be aware of what is going to happen. The essentials of the solution is to have devices call the processor and tell it that they intend to interrupt it at some point T in time. The processor simulator then makes sure to stop and give the device model a chance to act at that exact point in time. It is obvious that this solution is easily generalized to cover all time callbacks needed to drive device work. A significant part of the responsibility for running the event-driven simulation is moved into the processor core.</p>
<p>Making the event queue visible to the processor also gives the processor a chance to hypersimulate, or skip idle  time. Since it knows the next point in time that something will happen  (either the end of a time quantum or an event posted by a device), it  can very easily, safely, and <a href="http://blogs.windriver.com/engblom/2010/09/deterministic-but-unpredictable.html">repeatably </a>jump forward in time without any  impact on simulation semantics.</p>
<p>When dealing with multiple processors, this means that each processor will have precise interrupts from the devices that are close to it. Timers and IO interrupts tend to work closely with a certain processor for a prolonged period of time. Interrupts between processors suffer a time-quantum delay sometimes, but that is no worse than the solution of checking all interrupts at time-quantum boundaries.</p>
<p>Qemu uses a solution which is a mix of the two. <a href="http://www.usenix.org/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf">According to the 2005 Usenix paper</a>, devices do call into the processor to announce an interrupt, but this is handled by &#8220;soon&#8221; returning to the processor main loop. Processors are not responsible for keeping track of interrupts, making it very imprecise and not very repeatable when interrupts will happen.</p>
<p>Thus, we can see that there are a few different ways to implement interrupts in virtual platforms. Each approach comes from a different tradition and features different trade-offs.</p>
<p>I was a bit surprised by the comment in the Tensilica chapter that only  correctly synchronized programs will work on a temporally decoupled  simulation. In my experience, temporal decoupling is transparent to software functionality &#8211; all software runs. The perceived timing of operations can be different, and some tightly-coupled code might behave in suboptimal ways, but it certainly runs and works. And lets you <a href="http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html">observe  parallel code errors</a>.</p>
<p>Temporal decoupling is necessary in any fast platform, and its effect on semantics are really minor. With the simple tweak of having a processor know when interrupts might happen, it will also not affect the device-processor interface very much, maintaining very tight synchronization between processors and their controlled hardware.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1384"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1384" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1384" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1384/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Shaking a Linux Device Driver on a Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/337?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/337#comments</comments>
		<pubDate>Sun, 09 Nov 2008 22:23:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=337</guid>
		<description><![CDATA[To continue from last week&#8217;s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds. First some background. A key idea in the setup is to use the approach of assuming some processing time for [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-329" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" />To continue from <a href="http://jakob.engbloms.se/archives/330">last week&#8217;s post </a>about my Linux device driver and hardware teaching setup in <a href="http://www.virtutech.com/academia">Simics</a>, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.</p>
<p><span id="more-337"></span></p>
<p>First some background.</p>
<p>A key idea in the setup is to use the approach of <em>assuming some processing time </em>for the hardware accelerator, rather than creating detailed code and determining the actual processing time for a particular implementation. Given some assumed time, we can then see how it impacts program performance. This is a way of designing hardware where we look to how fast something needs to be to have a positive impact, rather than trying to make it as fast as possible. It also lets us analyze how performance in hardware is seen when using a complete OS stack and a real device driver rather than simple bare-metal software (which tends to show the performance in the best possible light). Essentially, it is loosely timed design-space exploration.</p>
<p>Initial tests of the driver used very short completion times, on the order of 1 microsecond. The read() call at this point simply waited for the hardware completion flag to become true, and then returned the results. That is not the kind of behavior that a driver should have, since if the hardware gets some kind of hiccup, we will be stuck looping  inside a kernel context. Instead, I implemented a blocking read variant that would put the calling process to sleep until a result arrives.</p>
<p class="MsoNormal">In order to test that my driver did the sleep function correctly, I changed the processing delay into the level of seconds&#8230; and promptly found a set of issues that forced several rewrites of the code. The most important was the need to switch to a software flag for completion rather than relying on the hardware flag, and the implementation of an interrupt handler to get a notification from the hardware.</p>
<p>Then, on Friday, I demonstrated the setup along with some new performance analysis tools to go with it to some students testing the setup. And the test program suddenly stopped working, obviously hanging at the first call to read() without ever getting unblocked.</p>
<p>The reason was a classic race condition: the code in the <tt>write()</tt> device driver call that sent input data into the hardware device waited until after the writing was complete (and then some more) before clearing the operation complete flag. Here is the relevant piece of code:</p>
<pre>for(i=0;i&lt;words;i++) {
  write_register(SIMPLE_INPUT, kbuf[i]);
}
*f_pos = 0;
kfree(kbuf);
clear_completion_state();</pre>
<p class="MsoNormal">With a sufficiently short delay to completion, the completion interrupt fired, was handled, and set the completion flag before the <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">write()</span></span> function even got to <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">clear_completion_state()</span></span>. After this, the test program called <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">read()</span></span> to read the result, and was blocked as the completion flag was not set. The interrupt to signal completion from the hardware had already triggered and its result deposited in the software flag, which had then been promptly overwritten inside write(). Thus, inside read(), the flag never became set, and the process waited forever.</p>
<p class="MsoNormal">The fix is obvious: just move the clearing of the flag to <em>before </em>the writing to the hardware begins.</p>
<p class="MsoNormal">To generalize from this brilliant example of concurrency carelessness, this is a really good accidental demonstration of the power of varying timing in a virtual platform to shake code and find timing-related bugs in a manner much more efficient than possible on physical hardware.</p>
<p class="MsoNormal">Had I described the exact (or even approximate) timing of a particular hardware implementation, this kind of bug would not have been found and the driver code would not have been as robust. An implementation relying on a very short completion time could check the hardware operation complete flag directly, but that broke down when the delay was long. The buggy implementation above worked fine with a long completion time, but broke down with a short. The fixed implementation works across a span of times from 10 ns to 10 s or more, which is all you can ask for I think.</p>
<p class="MsoNormal">A short fun Simics note on this: changing that timing parameter is a run-time change. It is possible to change it during a run, from the Simics command-line, using a simple one-line command:</p>
<pre class="MsoNormal" style="padding-left: 30px;"><span style="color: #0000ff;">simics&gt; </span>sd0-&gt;time_to_result = 10.0e-9</pre>
<p class="MsoNormal">It is really nice working with a system like that!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/337"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/337" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/337" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/337/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

