<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; hardware-software interface</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/hardware-software-interface/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>DAC 2009 Panel and Paper</title>
		<link>http://jakob.engbloms.se/archives/823?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/823#comments</comments>
		<pubDate>Wed, 01 Jul 2009 12:38:58 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[appearances]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Cadence]]></category>
		<category><![CDATA[DAC]]></category>
		<category><![CDATA[hardware-software interface]]></category>
		<category><![CDATA[Jason Andrews]]></category>
		<category><![CDATA[Ross Dickson]]></category>
		<category><![CDATA[Wild West panel]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=823</guid>
		<description><![CDATA[The 46th Design Automation Conference (DAC) is coming up in San Francisco in the US, last week of July. For me, this will be the first time I ever go to DAC. I have been to a couple of Design Automation and Test Europe  (DATE) conferences before, but DAC is supposedly even bigger as an [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-824" style="margin: 5px;" title="46daclogo" src="http://jakob.engbloms.se/wp-content/uploads/2009/07/46daclogo.gif" alt="46daclogo" width="81" height="73" />The <a href="http://www.dac.com/46th/index.aspx">46th Design Automation Conference (DAC) </a>is coming up in San Francisco in the US, last week of July. For me, this will be the first time I ever go to DAC. I have been to a couple of <a href="http://www.date-conference.com/">Design Automation and Test Europe  (DATE) </a>conferences before, but DAC is supposedly even bigger as an event for the EDA and related communities. I have the honor to be on a panel this year, as well as co-authoring a paper on software validation.</p>
<p><span id="more-823"></span>The panel is called &#8220;<a href="http://www.dac.com/events/eventdetails.aspx?id=95-49">The Wild West: Conquest of Complex Hardware-Dependent Software Design</a>&#8220;, and takes place on Thursday, July 30, at 16.30, in room 131. We will be discussing hardware/software integration, multicore software, and other topics that I like. We will have a good mix of tool providers and tool users.</p>
<p>The paper is called &#8220;Design Flow for Embedded System Device Driver Development and Verification&#8221;, and is co-authored by me, Jason Andrews of Cadence, and my colleague Ross Dickson. It is presented in the user track session called &#8220;<span id="ctl00_Center_Content_Placeholder__lblEventTitle" class="sestitle"><a href="http://www.dac.com/events/eventdetails.aspx?id=95-3-U">Verification: A Front-End Perspective</a>&#8220;, on Tuesday, at 16.30. It deals with how you can use directed random testing to verify software drivers for custom hardware, using a virtual platform.<br />
</span></p>
<p>I will at the DAC all week, sounds like a great fun event!<span class="sestitle"><br />
</span></p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/823"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/823" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/823" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/823/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Hardware-Software Interface is where the Action Is</title>
		<link>http://jakob.engbloms.se/archives/799?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/799#comments</comments>
		<pubDate>Sun, 07 Jun 2009 19:52:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[embedded systeme]]></category>
		<category><![CDATA[Brian Cantrill]]></category>
		<category><![CDATA[Gary Stringham]]></category>
		<category><![CDATA[hardware design]]></category>
		<category><![CDATA[hardware-software interface]]></category>
		<category><![CDATA[Keith Adams]]></category>
		<category><![CDATA[Steve Gibson]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=799</guid>
		<description><![CDATA[When I started out doing computer science &#8220;for real&#8221; way back, the emphasis and a lot of the fun was in the basics of algorithms, optimizing code, getting complex trees and sorts and hashes right an efficient. It was very much about computing defined as processor and memory (with maybe a bit of disk or [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-800" title="pn4_quad-gigaswift-utp-adapter" src="http://jakob.engbloms.se/wp-content/uploads/2009/06/pn4_quad-gigaswift-utp-adapter.gif" alt="pn4_quad-gigaswift-utp-adapter" width="100" height="73" />When I started out doing computer science &#8220;for real&#8221; way back, the emphasis and a lot of the fun was in the basics of algorithms, optimizing code, getting complex trees and sorts and hashes right an efficient. It was very much about computing defined as processor and memory (with maybe a bit of disk or printing or user interface accessed at a very high level, and providing the data for the interesting stuff). However, as time has gone on, I have come to feel that this is almost too clean, too easy to abstract&#8230; and gone back to where I started in my first home computer, programming close to the metal.</p>
<p><span id="more-799"></span>As I dig deeper into operating systems and the hardware-software interface layer (mostly with the help of virtual platforms), I have come to appreciate just how hard and interesting that part of the computing stack is. I guess it is partially because that is the level where most of the nice thick layers of middleware and API software we use these days (and which to be frank I find fairly boring) just break down and have to start dealing with the real world. For some reasons, web servers and their programming feels barren and boring compared to dealing with interrupts, memory maps, and bit twiddling.</p>
<p>Several things I have read and heard about recently touch on this subject in various ways. All of them point to the fact that hardware-software interface design is important, and that there is a lot of right and wrong ways of doing it&#8230; which are rarely taught in universities and rarely approached in computing literature.</p>
<p>First, <a href="http://blogs.sun.com/bmc/entry/concurrency_s_shysters">Brian Cantrill of Sun wrote a blog post blasting transactional memory </a>in November of 2008, which I recently reread and got a bit of a epiphany from in this paragraph:</p>
<blockquote><p>&#8230; Even if one assumes that writing a transaction is conceptually easier than acquiring a lock, and even if one further assumes that transaction-based pathologies like livelock are easier on the brain than lock-based pathologies like deadlock, there remains a fatal flaw with transactional memory: much system software can never be in a transaction <strong>because it does not merely operate on memory</strong>. That is, system software frequently takes action outside of its own memory, requesting services from software or hardware operating on a disjoint memory (the operating system kernel, an I/O device, a hypervisor, firmware, another process &#8212; or any of these on a remote machine). In much system software, the in-memory state that corresponds to these services is protected by a lock &#8212; and the manipulation of such state will never be representable in a transaction. So for me at least, transactional memory is an unacceptable solution to a non-problem.</p></blockquote>
<p>In the same style, <a href="http://x86vmm.blogspot.com/2008/11/cantrill-and-bonwick-get-all-concurrent.html">Keith Adams at VMWare </a>picked up on the above and applied it to the microkernel idea:</p>
<blockquote><p>It&#8217;s interesting to me that, as with microkernels, one of the principle reasons TM will fail is the messy, messy reality of peripheral devices. One of the claims made by microkernel proponents is that, since microkernel drivers are &#8220;just user-level processes&#8221;, they&#8217;ll survive driver failures. And this is almost true, for some definition of &#8220;survive.&#8221; Suppose you&#8217;re a microkernel, and you restart a failed user-level driver; the new driver instance has no way of knowing what state the borked-out driver left the actual, physical hardware in. Sometimes, a blind reset procedure can safely be carried out, but sometimes it can&#8217;t. Also, the devices being driven are DMA masters, so they might very well have done something horrible to the kernel even though the buggy driver was &#8220;just a user-level app.&#8221; And if there were I/Os in flight at failure time, have they happened, or not? Remember, they might not be idempotent&#8230; I&#8217;m not saying that some best-effort way of dealing with many of these problems is impossible, just that it&#8217;s unclear that moving the driver into userspace has helped the situation at all.</p></blockquote>
<p>So what this shows is that the hardware-software interface is where the really hard and interesting problems start to pop up. I am big fan of abstraction and layers of indirection as programming methodologies, I am not a <a href="http://www.grc.com">Steve Gibson</a> who feels that programs are best written in assembly&#8230; but the abstractions do have to allow for the truth that is underneath the system. Bad abstractions or too simple abstractions make things more complex, rather than less.</p>
<p>Moving on from the software side of things to the hardware design side,<a href="http://www.garystringham.com/newsletter.shtml">Gary Stringham is running a nice series of tips for hardware design</a>. Here, there are lots of interesting issues to confront as well to make hardware easy or worthwhile to use. He recently ran a link to a <a href="http://www.microsoft.com/whdc/resources/MVP/xtremeMVP_hw.mspx">2004 Microsoft article on how hardware should be designed</a>, based on the experience of the Windows driver team at Microsoft.</p>
<blockquote><p>If every hardware engineer just understood that write-only registers make debugging almost impossible, our job would be a lot easier. Many products are designed with registers that can be written, but not read. This makes the hardware design easier, but it means there is no way to snapshot the current state of the hardware, or do a debug dump of the registers, or do read-modify-write operations. Now that virtually all hardware design is done in Verilog or VHDL, it takes only a tiny bit of additional effort to make the registers readable.</p>
<p>Another typical hardware trick is registers that automatically clear themselves when written. Although this is sometimes useful, it also makes debugging difficult when overused.</p></blockquote>
<p>I guess it is kind of sad that even five years later, this same issues do seem to crop up in new products and merit volumes of venom from driver developers&#8230; On the other hand, some companies do seem to be getting it. To me, the Freescale designs of recent years do seem to be fairly easy to configure and debug, and not feature write-only bits in any large number.</p>
<p>The article about <a href="http://jakob.engbloms.se/archives/770">hardware acceleration for TCP/IP by Mike Odell </a>that I discussed in a previous blog post is also relevant: when do the complexity of hardware interfacing negate any performance benefit from an accelerator?</p>
<p>(<em>for some reason, the initial posting of this post had an incomplete last paragraph, something weird in WordPress updates happened</em>)</p>
<p>To sum up, I think the interaction of hardware and software in the context of full opreating systems and device driver stacks is a really interesting topic that seems to have not gotten very much academic coverage. I hope to be able to help remedy some of this, once I get the Simics setup used in my <a href="http://jakob.engbloms.se/archives/709">experiments with hardware accelerators </a>packaged and available for academia. Full-system virtual platforms make for a very good experimental system, especially those where you use some third-party or standard operating system rather than just your own controlled code.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/799"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/799" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/799" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/799/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>When does Hardware Acceleration make Sense in Networking?</title>
		<link>http://jakob.engbloms.se/archives/770?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/770#comments</comments>
		<pubDate>Sat, 16 May 2009 06:45:47 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[history of computing]]></category>
		<category><![CDATA[multicore computer architecture]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[accelerators]]></category>
		<category><![CDATA[ethernet]]></category>
		<category><![CDATA[hardware-software interface]]></category>
		<category><![CDATA[Mike Odell]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[tcp]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=770</guid>
		<description><![CDATA[Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of &#8220;TCP offload&#8221;. This question was answered by a very nicely reasoned &#8220;no&#8221; in an article by Mike Odell in ACM Queue called &#8220;Network Front-End Processors, Yet Again&#8220;. The article is highly recommended for its long historical look at network [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-771" style="margin: 0px 15px;" title="q_stamp" src="http://jakob.engbloms.se/wp-content/uploads/2009/05/q_stamp.gif" alt="q_stamp" width="38" height="65" />Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of &#8220;TCP offload&#8221;. This question was answered by a very nicely reasoned &#8220;no&#8221; in an article by Mike Odell in <a href="http://queue.acm.org/">ACM Queue </a>called &#8220;<a href="http://queue.acm.org/detail.cfm?id=1530828">Network Front-End Processors, Yet Again</a>&#8220;.</p>
<p><span id="more-770"></span></p>
<p>The article is highly recommended for its long historical look at network processing and network processing offload. As the balance  between speeds of networks, processors, memory, and interconnects between network cards and the rest of the system has changed over the years, it is an idea that occasionally (four or five times since the 1970s) has made sense. However, in the end, Mike thinks that it usually does not, and for a machine with multiple cores and a modern fast interconnect, it is hard to see how a hardware accelerator can actually help speed things up much when the coordination between the hardware and the software is accounted for. Even if there would appear to be a big bottleneck somewhere today, we can be sure that it wil be removed in the next generation of hardware, rendering the market window for an accelerator quite short.</p>
<p>I read this article as another great motivation for the need to carefully consider the functional design of the hardware-software interface for acceleration devices. For simple data-pumping or media-processing units, this looks easy. For something as complex as TCP/IP processing, it is not. I think the key is that for TCP, we have something that is much more like control-plane processing than data-plane processing, and that is harder to efficiently integrate between hardware and software. Also, there is not really that much work left to offload once data copies have been architected in the right way (and I read Mike&#8217;s article to say that we now know how to do this in a sufficently few-copies way that software is close to optimal in architecture).</p>
<p>From a market perspective, it would also indicate that the acceleration circuits that are in common use today are by definition those that make sense. Having hardware-accelerated graphics and video decoders does seem to help build more efficient and attractive computer systems, as do cryptography accelerators. With this view, it will be interesting to see which of all the accelerators found in modern networking SoCs like those from Freescale and Cavium will survive the test of time. I am willing to put a small bet that pattern-matching engines for traffic inspection is one of them. Apart from that, hard to say.</p>
<p>So go read that article before you start designing your next brilliant accelerator for a common expensive operation.</p>
<p>It also reminds me of a <a href="http://www.virtutech.com/whitepapers/wp-system_arch_spec.html">whitepaper I wrote early this year </a>on how to evaluate performance of a hardware accelerator in the context of a full system with a full software stack, considering the details of the hardware-software interface.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/770"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/770" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/770" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/770/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

