<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; device driver</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/device-driver/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>Neat Register Design to Avoid Races</title>
		<link>http://jakob.engbloms.se/archives/1070?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1070#comments</comments>
		<pubDate>Thu, 28 Jan 2010 18:59:53 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[64-bit computing]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[programming register]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1070</guid>
		<description><![CDATA[In his most recent Embedded Bridge Newsletter, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat! I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-589" style="margin: 5px 10px;" title="racecondition" src="http://jakob.engbloms.se/wp-content/uploads/2008/01/racecondition.png" alt="racecondition" width="99" height="78" />In his most recent <a href="http://garystringham.com/newsletter.shtml?nid=039">Embedded Bridge Newsletter</a>, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat!</p>
<p>I have seen the &#8220;write 1 clears&#8221; solution before in real hardware, but I was not aware of the other two variants. The idea of having a &#8220;write mask&#8221; in one half of a 32-bit word is really clever.</p>
<p>However, this got me thinking about what the fundamental issue here really is.</p>
<p><span id="more-1070"></span></p>
<p>As I see it, it is the fact that the processor cannot address small enough units atomically. The <a href="http://garystringham.com/newsletter.shtml?nid=037">read-modify-write that was used to start the discussion in the Embedded Bridge #37</a> was needed in order to get the current state of a configuration register, change some setting that only occupied a few bits in it, and write back the result to the register. The way most configuration registers that I have seen in practice works.</p>
<p>But if each setting could be given its own register, the problem would go away. Each operation would target a unique address, achieving the same effect as the bit-wise masks or write-1 solutions proposed. The core problem is that hardware tends to share settings into registers, as it has been considered too expensive to put information that might cover a range as small as [0,1] into a 32-bit register. Probably, since there is a lack of addresses for registers, you cannot have 1000 settings cause each simple device to use up 1000 words of physical addresses.</p>
<p>But is that really an issue, if we look forward?</p>
<p>It seems to me that, as 64-bit instruction sets and addressing systems penetrate down into more and more embedded systems, a simple solution would be to throw address space at the problem. I don&#8217;t think it is uneconomical to allocate huge chunks of memory space to each device, giving each setting its own register, when you have 64 bit virtual addresses to work with. There is no way you can fill up a physical memory system (guess that will some day come back to haunt me)&#8230; even the highest-end machines today only use something like 40 bits for actually addressing physical memories.</p>
<p>The software would be simpler and more robust, with virtually no cost.</p>
<p>Another solution that I have also seen starting to appear is to dispense with register settings altogether, and rather define a command API that the processor &#8220;calls&#8221; by putting in command packets into some memory area. This does require quite a bit of silicon for a decoder, but it provides for a much higher level of interaction with devices. As hardware devices get defined in successively higher-level languages (C, C++, UML, MatLab, &#8230;), and <a href="http://jakob.engbloms.se/archives/871">their programming interfaces and associated drivers get autogenerated</a>, this solution makes eminent sense.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1070"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1070" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1070" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1070/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Conquering Software with Software High-Level Synthesis</title>
		<link>http://jakob.engbloms.se/archives/871?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/871#comments</comments>
		<pubDate>Fri, 31 Jul 2009 22:12:50 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[high-level synthesis]]></category>
		<category><![CDATA[Kees Vissers]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=871</guid>
		<description><![CDATA[This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />This post is a follow-up to the DAC panel discussion we had yesterday on how to conquer hardware-dependent software development. Most of the panel turned into a very useful dialogue on virtual platforms and how they are created, not really discussing how to actually use them for easing low-level software development. We did get to software eventually though, and had another good dialogue with the audience. Thanks to the tough DAC participants who held out to the end of the last panel of the last day!</p>
<p>As is often the case, after the panel has ended, I realized several good and important points that I never got around to making&#8230; and of those one struck me as worthy of a blog post in its own right.It is the issue of how high-level synthesis can help software design.</p>
<p><span id="more-871"></span>At the end of the panel, the last comment from Kees Vissers of Xilinx pointed out that high-level synthesis is a very powerful way to build hardware. I think his point was that hardware and software are not that different&#8230; but the remark also got me thinking. If it is the case that high-level synthesis currently makes hardware creation easier, can&#8217;t it also be applied to software creation? In particular, if you have a HLS description of a piece of hardware, can&#8217;t you also generate its driver software?</p>
<p>I think that makes eminent sense, since one of the hard parts of doing device drivers is just getting the use of the programming registers of a device right. The programming register interface is a really strange thing if you think about it. It is not native to either software or hardware, really.</p>
<p>In hardware, you communicate between devices using fifos or signals or packet-based mechanisms which do not in general look like programming register writes. You move data by sending a stream of data directly, not word-by-word addressing registers. Similarly, on the software side, software units call each other using functions (or higher-level OS abstractions like signals or network packets). They do not put data into addressed registers&#8230;</p>
<p>Today, high-level synthesis as practiced in industry involves describing the function of a device in pretty abstract terms, so that the compiler can make smart decisions on the implementation details. It also makes it easier to try different alternatives in the implementation, trading size, speed, and power consumption against each other. Different types of concurrency and pipelining can be explored.</p>
<p>However, once we get to the hardware-software interface, we get rudely dropped into a world of manual detailing of an interface with no tool support to explore it or validate it. Why should that really be the case? I think that the hardware-software interface requires just as much care as the internals of the device. After all, it is the external face of the device, and if that is too hard to use, users will not get the full benefit of the device. Here are some previous posts on the nature of interfaces and why they matter: <a href="http://www.garystringham.com/newsletter.shtml">1</a>, <a href="http://jakob.engbloms.se/archives/799">2</a>, <a href="http://jakob.engbloms.se/archives/770">3</a>, <a href="http://jakob.engbloms.se/archives/709">4</a>.</p>
<p>So I would propose a different take on this, where you apply synthesis at a higher-level, and generate the hardware internals, the programming register interface, and the software driver from the same source. The interface you design for a device would be a set of function calls expressed in software terms, and thus relatively easy to use from software. Let&#8217;s call this SHLS, Software High-Level Synthesis. Or maybe Software-Level Synthesis, SLS.</p>
<p>I am well aware that most operating systems do not provide an interface to device drivers consisting of function calls&#8230; but rather rely on pretty non-semantic methods like read/write/ioctl. However, that is easy to overcome by generating a user-level interface library in addition to the raw device driver. Obviously, a tool like this would need some adaptation to apply to each operating system targeted. But that does not feel like something that cannot fairly easy be handled by a template system. Compared to the complexity of synthesizing hardware this feels pretty basic.</p>
<p>If you hide the software-hardware interface inside a black box like this, it can also be implemented in quite different and interesting ways. For example, you could imagine using a small memory-mapped buffer where you enter commands and data and then ask the hardware to &#8220;parse&#8221; it, rather than using discrete registers with immediate effects. Or you could optimize the coding of the relevant hardware operations into a bit-compacted representation no human would like, but which are no problem for the machine.</p>
<p>Another option is obviously to just stop at the hardware-software interface, but let the tool help the hardware designer build the programming register interface and explore various options for it. Not having to invent a register layout from scratch should make that job much easier.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/871"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/871" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/871" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/871/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>I Want One&#8230; Trillion Instructions&#8230;</title>
		<link>http://jakob.engbloms.se/archives/709?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/709#comments</comments>
		<pubDate>Sat, 28 Mar 2009 21:10:31 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[abstraction levels]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[Dr. Evil]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mpc8641d]]></category>
		<category><![CDATA[Simics]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=709</guid>
		<description><![CDATA[There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite [...]]]></description>
			<content:encoded><![CDATA[<p>There is an eternal debate going on in virtual platform land over what the right kind of abstraction is for each job. Depending on background, people favor different levels. For those with a hardware background, more details tend to be the comfort zone, while for those with a software background like myself, we are quite comfortable with less details. I<a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html"> recently did some experiments about the use of quite low levels of hardware modeling details for early architecture exploration and system specification</a>.</p>
<p><span id="more-709"></span></p>
<p>It all comes down to a simple classic tradeoff that I usually illustrate like this (using more neutral ground than computer systems; and with credit to Peter Magnusson who had this slide already in place when I joined Virtutech back in 2002):</p>
<p><img class="aligncenter size-full wp-image-711" title="simulation-rule" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/simulation-rule.png" alt="simulation-rule" width="457" height="341" /></p>
<p>What this is telling you is simple:</p>
<ul>
<li>You simulate something very large using large units, i.e., low level of detail; or</li>
<li>You simulate something quite small using small units, i.e., high level of detail.</li>
</ul>
<p>I wanted to test the idea that by using less detail, you can run larger test cases and therefore obtain better coverage of overall landscape than diving in and counting cycles in some small part of it. In the end, this made me cross the trillion instruction line &#8212; since each experiment took a few hundred billion target instructions to complete, repeating and tweaking during the development work definitely add up to more than a trillion instructions.</p>
<p>And this is where I have put my little finger close to my mouth and say:</p>
<p style="text-align: center;"><img class="size-full wp-image-710 aligncenter" style="margin-top: 10px; margin-bottom: 10px;" title="drevil_million_dollars" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/drevil_million_dollars.jpg" alt="drevil_million_dollars" width="300" height="318" /></p>
<p>&#8216;I want one trillion instructions&#8217;</p>
<p>So what did I get from these trillion instructions?</p>
<p>An interesting study in how operating system overhead can have a big impact on the profitability of hardware accelerators. By running hundreds of test cases with different assigned computation latencies of a hardware accelerators, as well as different driver models for my hardware (all running under Linux on my favorite MPC8641D), a key diagram emerged:</p>
<p style="text-align: left;"><img class="aligncenter size-full wp-image-712" style="margin-top: 10px; margin-bottom: 10px;" title="hwsw" src="http://jakob.engbloms.se/wp-content/uploads/2009/03/hwsw.png" alt="hwsw" width="872" height="507" /><a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html">Read the paper </a>for all the details, but the key thing to note is that with a poor driver architecture, making the hardware 100 times faster resulted in zero gain in system performance. Had this experiment been performed on a bare-bones platform without a full operating system in place, I am fairly certain that the faster hardware would have been considered much more worthwhile.</p>
<p style="text-align: left;">In the end, I resorted to a driver variant where I had user-level code directly access the device programming interface via an mmap()-mapped memory region. Not pretty, essentially this was bare-metal programming wrapped inside a big cosy Linux package, but it sure was efficient compared to doing a kernel/user mode switch for each hardware operation. But even here, it turned out that making the hardware very very fast as opposed to just very fast had no benefit. It proves to me that the software has to be taken into account in full in order to properly evaluate an idea for a hardware design.</p>
<p style="text-align: left;">You could say that the poor results for acceleration here were due to my inept Linux driver programming skills, but that just underscores the key result: you have to take the software into account. If the conclusion is that a better Linux device driver programmer is needed, you have still decided that the key system bottleneck is not just the speed of the hardware, but how it is used. And that is exactly what system design needs to be about.</p>
<p style="text-align: left;">As an aside, playing around with a complete system like this, and automatically run large volumes of test with varying parameters was a really interesting experience. I must admit that getting to these trillions of instructions required  a few hours of simulation time, but nothing that could not be solved by leaving a computer running over lunch or a long meeting. The machine was modeled using standard Simics &#8220;software timing&#8221;, i.e., without any particular cache or pipeline or bus details, and it seems that that is usually all you need. Had I increased the level of detail and slowed things down by a factor of ten or a hundred, I would never have covered such a large set of test cases and been able to evaluate as many different variants of drivers and hardware speeds.</p>
<h2 style="text-align: left;">IBM did it before me</h2>
<p style="text-align: left;">Finally, I found it interesting that an analogous experience about the effect of creating a complete software stack and testing what looks like a very good hardware idea was reported in an IBM paper from a few years ago, in &#8220;<a href="http://researchweb.watson.ibm.com/journal/rd/502/peterson.html">Application of full-system simulation in exploratory system design and development</a>&#8220;, by Peterson et al, in the IBM Journal of Research and Development. Look at the section about the &#8220;MIP Morphing&#8221; feature, which is essentially cache locking. They do use a fairly detailed simulator for the end evaluation of their performance &#8211; but the key message is that by running a full software stack, they realized that just managing the feature was too hard in a realistic software environment to make it worthwhile:</p>
<blockquote>
<p style="text-align: left;">Initially, the MIP morphing feature was well received by internal development and HPCS customers alike. The team was aware of the need to both manage this hardware feature at the OS level and provide portable abstractions to the programmer to exploit this feature in a productive way. &#8230;</p>
</blockquote>
<p style="text-align: left;">And then:</p>
<blockquote>
<p style="text-align: left;">The implementation effort was facilitated by Mambo, allowing the OS team to prototype the MIP morph idea in a controlled development environment. Taking the prototyping effort to this level of realism uncovered many complexities in supporting the MIP morph in a virtualized manner. ..</p>
</blockquote>
<p style="text-align: left;">And finally:</p>
<blockquote>
<p style="text-align: left;">By prototyping the software support that was <em>needed at the OS level and exposing the usage issues at the application programmer&#8217;s level</em>, the magnitude of the problem was exposed at its fullest. Further, the improvement in performance did not show a sufficient payback for the immense effort that would be required at the software level to support the idea, and as a result it was dropped from further consideration.</p>
</blockquote>
<p style="text-align: left;">It seems that whatever you do, IBM did it first&#8230; and it validates the idea of full-system simulation and that software is king today.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/709"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/709" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/709" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/709/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Shaking a Linux Device Driver on a Virtual Platform</title>
		<link>http://jakob.engbloms.se/archives/337?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/337#comments</comments>
		<pubDate>Sun, 09 Nov 2008 22:23:13 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[embedded software]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[device driver]]></category>
		<category><![CDATA[interrupt]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[operating systems]]></category>
		<category><![CDATA[power architecture]]></category>
		<category><![CDATA[race condition]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=337</guid>
		<description><![CDATA[To continue from last week&#8217;s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds. First some background. A key idea in the setup is to use the approach of assuming some processing time for [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-329" style="margin: 5px 10px;" title="penguin-variant" src="http://jakob.engbloms.se/wp-content/uploads/2008/11/penguin-variant.png" alt="" width="100" height="118" />To continue from <a href="http://jakob.engbloms.se/archives/330">last week&#8217;s post </a>about my Linux device driver and hardware teaching setup in <a href="http://www.virtutech.com/academia">Simics</a>, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.</p>
<p><span id="more-337"></span></p>
<p>First some background.</p>
<p>A key idea in the setup is to use the approach of <em>assuming some processing time </em>for the hardware accelerator, rather than creating detailed code and determining the actual processing time for a particular implementation. Given some assumed time, we can then see how it impacts program performance. This is a way of designing hardware where we look to how fast something needs to be to have a positive impact, rather than trying to make it as fast as possible. It also lets us analyze how performance in hardware is seen when using a complete OS stack and a real device driver rather than simple bare-metal software (which tends to show the performance in the best possible light). Essentially, it is loosely timed design-space exploration.</p>
<p>Initial tests of the driver used very short completion times, on the order of 1 microsecond. The read() call at this point simply waited for the hardware completion flag to become true, and then returned the results. That is not the kind of behavior that a driver should have, since if the hardware gets some kind of hiccup, we will be stuck looping  inside a kernel context. Instead, I implemented a blocking read variant that would put the calling process to sleep until a result arrives.</p>
<p class="MsoNormal">In order to test that my driver did the sleep function correctly, I changed the processing delay into the level of seconds&#8230; and promptly found a set of issues that forced several rewrites of the code. The most important was the need to switch to a software flag for completion rather than relying on the hardware flag, and the implementation of an interrupt handler to get a notification from the hardware.</p>
<p>Then, on Friday, I demonstrated the setup along with some new performance analysis tools to go with it to some students testing the setup. And the test program suddenly stopped working, obviously hanging at the first call to read() without ever getting unblocked.</p>
<p>The reason was a classic race condition: the code in the <tt>write()</tt> device driver call that sent input data into the hardware device waited until after the writing was complete (and then some more) before clearing the operation complete flag. Here is the relevant piece of code:</p>
<pre>for(i=0;i&lt;words;i++) {
  write_register(SIMPLE_INPUT, kbuf[i]);
}
*f_pos = 0;
kfree(kbuf);
clear_completion_state();</pre>
<p class="MsoNormal">With a sufficiently short delay to completion, the completion interrupt fired, was handled, and set the completion flag before the <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">write()</span></span> function even got to <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">clear_completion_state()</span></span>. After this, the test program called <span class="codeinline"><span style="font-size: 8pt; line-height: 115%;">read()</span></span> to read the result, and was blocked as the completion flag was not set. The interrupt to signal completion from the hardware had already triggered and its result deposited in the software flag, which had then been promptly overwritten inside write(). Thus, inside read(), the flag never became set, and the process waited forever.</p>
<p class="MsoNormal">The fix is obvious: just move the clearing of the flag to <em>before </em>the writing to the hardware begins.</p>
<p class="MsoNormal">To generalize from this brilliant example of concurrency carelessness, this is a really good accidental demonstration of the power of varying timing in a virtual platform to shake code and find timing-related bugs in a manner much more efficient than possible on physical hardware.</p>
<p class="MsoNormal">Had I described the exact (or even approximate) timing of a particular hardware implementation, this kind of bug would not have been found and the driver code would not have been as robust. An implementation relying on a very short completion time could check the hardware operation complete flag directly, but that broke down when the delay was long. The buggy implementation above worked fine with a long completion time, but broke down with a short. The fixed implementation works across a span of times from 10 ns to 10 s or more, which is all you can ask for I think.</p>
<p class="MsoNormal">A short fun Simics note on this: changing that timing parameter is a run-time change. It is possible to change it during a run, from the Simics command-line, using a simple one-line command:</p>
<pre class="MsoNormal" style="padding-left: 30px;"><span style="color: #0000ff;">simics&gt; </span>sd0-&gt;time_to_result = 10.0e-9</pre>
<p class="MsoNormal">It is really nice working with a system like that!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/337"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/337" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/337" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/337/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

