<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Observations from Uppsala &#187; hardware modeling</title>
	<atom:link href="http://jakob.engbloms.se/archives/tag/hardware-modeling/feed" rel="self" type="application/rss+xml" />
	<link>http://jakob.engbloms.se</link>
	<description>Computer Technology: Simulation, Virtualization, Virtual Platforms, Embedded, Multicore and Multiprocessing (by Jakob Engblom)</description>
	<lastBuildDate>Sun, 29 Jan 2012 19:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
    <title>Observations from Uppsala</title>
    <url>http://jakob.engbloms.se/favicon.png</url>
    <link>http://jakob.engbloms.se</link>
    <width>32</width>
    <height>32</height>
    <description>Observations from Uppsala - http://jakob.engbloms.se</description>
    </image>		<item>
		<title>EETimes: James Aldis on Performance Modeling</title>
		<link>http://jakob.engbloms.se/archives/1387?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1387#comments</comments>
		<pubDate>Thu, 03 Mar 2011 20:13:03 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[hardware design]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[James Aldis]]></category>
		<category><![CDATA[OMAP]]></category>
		<category><![CDATA[performance optimization]]></category>
		<category><![CDATA[TI]]></category>
		<category><![CDATA[Virtio]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1387</guid>
		<description><![CDATA[James Aldis of TI has published an article in the EEtimes about how Texas Instruments uses SystemC in the modeling of their OMAP2 platform. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png"><img class="alignleft size-full wp-image-1388" style="margin: 5px 10px;" title="TI logo" src="http://jakob.engbloms.se/wp-content/uploads/2011/03/TI-logo.png" alt="" width="80" height="76" /></a>James Aldis of TI has published an article in the <a href="http://www.eetimes.com">EEtimes</a> about how <a href="http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4212778">Texas Instruments uses SystemC in the modeling of their OMAP2 platform</a>. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual platform use of hardware designers, which is significantly different from the virtual platform use of software designers.<br />
<span id="more-1387"></span>For a software person like myself, this article offers a well-written  insight into the world of hardware design and bus optimization for SoCs.</p>
<p>TI deploys two totally different platforms for hardware and software development, which makes perfect sense.  The goals are so different between a high-speed software development platform and performance-accurate hardware design platform that trying to force them together would likely just create a bad compromise that is bad for everybody.</p>
<p>Additionally, FPGAs are used to create timing-dependent low-level code, where you need both timing accuracy and decent speed.  It is worth noting that the performance model is mostly &#8220;dataless&#8221; &#8211; it models the timing of actions and their dependencies, but not their values and computations.</p>
<blockquote><p>The different models serve different purposes, require different levels of effort to use, and become available at different times during the project. The SystemC performance model is always available first and is always the simplest to create and use. The virtual platform is the next to become available. It is used for software development and has very little timing accuracy.  TI uses Virtio technology to create this model rather than SystemC.</p></blockquote>
<p>Given the number of ultimately failed attempts I have seen at making timing and function available in the same model but as orthogonal concerns, this observation in the article is very insightful:</p>
<blockquote><p>It would appear the choice of two different technologies for the virtual platform and the performance model is inefficient, wasting potential code reuse. However, the two have completely different (almost fully orthogonal) requirements, and at module level almost no code reuse is possible.</p></blockquote>
<p>Maybe this is an impossible dream in the general case.</p>
<p>One somewhat surprising statement in the article is that there is no real software available to use in the SoC design phase. Often, virtual platforms are sold as being able to use &#8220;the real software&#8221; when designing hardware. But in the case of TI, the software is mostly written by their customers, with little available for TI to use. Thus, they are forced to design their own test cases to drive the hardware design process.</p>
<blockquote><p>The requirements on the simulation technology are first and foremost ease in creating test cases and models and credibility of results. The emphasis on test-case creation is a consequence of the complexity of the devices and of the way in which an SoC platform such as OMAP-2 is used: because the whole motivation is to be able to move from marketing requirements to RTL freeze and tape-out in a very short time; and because in many cases large parts of the software will be written by the end customer and not by the SoC provider (Texas Instruments, in this article), the performance-area-power tradeoff of a proposed new SoC must be achieved without the aid of &#8220;the software.&#8221;</p></blockquote>
<p>The platform they built is all based on clock-cycle-level interfaces (CC), which is very natural when the primary use case is hardware design.</p>
<p>The primary component optimized in the TI design process is the on-chip interconnect structure, called the &#8220;NoC&#8221; in the article. Each SoC variant is built from a set of (usually already existing) devices and processor cores. The main work of the integration is designing an appropriate NoC for the SoC. The NoC design is crucial to the actual performance level the final SoC product will have.</p>
<p>By playing with the topology, the level of concurrency, and the level of pipelining in the NOC, it&#8217;s possible to create SoCs from the same basic modules with quite different capabilities.</p>
<p>The only real instruction-set simulators used are CC-level models of DSPs, used for software optimization taking but contention into account. No models of the ARM control cores are used. Mostly, processors are represented by stochastic or trace-driven traffic generators that put transactions on buses but do not actually run any real code.</p>
<p>The stochastic processor models are very powerful and provide traffic that is very similar to a real processor.  A very elegant property of such models is that it is very easy to change the parameters of the model to model quite different software/processor scenarios. Compared to writing real test programs for a full ISS, this is much faster and allows for the exploration of more alternatives.</p>
<p>The stochastic models are used along side function-graph breakdowns of software, essentially models that say that an application does A, then B, then C, and that maybe D can happen in parallel. This model of an application is connected to the hardware simulation and can control when things happen and what goes on in parallel. It amounts to a simple model of what an RTOS would do, to some extent.</p>
<p>Configurability is a key theme throughout the OMAP architecture exploration platform. SystemC being what it is, it is limited to configuration at start-up time, but that is perfectly sensible for an architecture exploration use case where you want to setup and platform and test its performance. Dynamic reconfiguration during a run is not that important.  TI has spent a great deal of effort in making the system easy to configure using parameter files.</p>
<p>The article goes into many more fascinating details on the models used.  I can only say one thing: read it, if you have any interest in these kinds of issues.</p>
<p>Good work, James!</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1387"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1387" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1387" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1387/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modeling Endianness</title>
		<link>http://jakob.engbloms.se/archives/1336?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/1336#comments</comments>
		<pubDate>Sun, 26 Dec 2010 15:58:19 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[big-endian]]></category>
		<category><![CDATA[endianess]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[little-endian]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=1336</guid>
		<description><![CDATA[Endianness is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way. This blog post describes how I am used to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png"><img class="alignleft size-full wp-image-1337" style="margin: 5px 10px;" title="egg" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/egg.png" alt="" width="74" height="66" /></a><a href="http://en.wikipedia.org/wiki/Endianness">Endianness </a>is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way.</p>
<p>This blog post describes how I am used to working with endianness in virtual platforms, and why this approach makes sense to me. There are other ways of dealing with endianness, with different trade-offs and overriding goals.</p>
<h2><span id="more-1336"></span>Fundamentals</h2>
<p>What is endianness? In my way of looking at it, it is the arbitrary solution to the problem you get when a large unit of information (say, a 32-bit word) needs to be stored as a set of smaller units (say, 8-bit bytes). When this happens, you need to split the large unit into smaller units, and decide on how to order the smaller units. There is no objectively better or worse way to do this &#8211; as long as the result is unambiguous and based on positional numerics (i.e., no roman numerals, please), it is hard to claim that one order is better than another.</p>
<p>We use &#8220;endianness&#8221; all time without really thinking about it, when we write regular decimal numbers. In our <a href="http://en.wikipedia.org/wiki/Hindu_numerals">standard </a>base-10 decimal writing system, any value &gt;9 has to be written down using multiple digits. The order we use is a big endian representation: the most significant numbers come first in our reading order (hundreds before tens before single digits, etc.).</p>
<p>In computer architecture, we have three main schools of endianness:</p>
<ul>
<li>No endian, where we never break things down to bytes but always operate on equal-size words (not very common in practice, but certain machines like the Microchip PIC have instruction ROMs as wide as the instructions, and no way to address components of the intructions)</li>
<li>Big endian, BE, where the most significant bytes are put first in order of ascending addresses. I.e., the &#8220;big end&#8221; comes first.</li>
<li>Little endian, LE, where the least significant bytes are put first</li>
<li>&#8220;Middle endian&#8221;, where the ordering differs for different sizes of data (<a href="http://en.wikipedia.org/wiki/Endianness">Wikipedia </a>mentions this, but I have never seen an example). I have heard stories about chips that also used different endianness to store data by different instructions (by misdesign, I am not referring to the Power Architecture load/store byte-reversed instructions).</li>
</ul>
<p>BE is the traditional choice of IBM and the major early RISC chips, with Power Architecture, MIPS, SPARC, and the zSeries as the most important representatives. LE is the choice of x86, and more recently ARM. MIPS also seems to be gravitating towards LE, probably as a way to make x86 software slightly easier to port. Note that even though some processor cores are described as endianness-neutral, that really means that they can run as either LE or BE. In practice, particular chip designs incorporating such cores tend to lean heavily towards one endianness, since devices are designed for a particular endianness.</p>
<h2>The Software View</h2>
<p>For me, the most important view of endianness is how the software sees it. When a program is running on any current architecture, it logically sees memory as an array of bytes. Inside the memory chips, we have a very different physical layout, usually with words much wider than a byte, as well as an addressing scheme that is not one-dimensional. The interconnect (&#8220;bus&#8221;) moving data from a processor to memory and back is a complex system containing caches, buses of different widths (usually 64 bits or more), memory controllers, cache controllers, bus bridges, and other devices. All of this is usually completely invisible to software, as illustrated below:</p>
<p style="text-align: center;"><img class="aligncenter" style="margin-top: 5px; margin-bottom: 5px;" title="endianness 1" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-1.png" alt="" width="504" height="389" /></p>
<p style="text-align: left;">Basically, the bus system is invisible. The important endianness property as far as software is concerned is the order in which bytes are put into memory, and memory is considered as an array of bytes (since a byte is the smallest unit of addressing). If you look at the memory of a computer system using a debugger, this is the view you will get &#8211; both for on-target and off-target debuggers like ICE units and JTAG debuggers. Each memory access (store or load) will logically pass a small array of bytes into some position in the very large array that is memory.</p>
<h2 style="text-align: left;">The Modeling View</h2>
<p style="text-align: left;">Modeling endianness is not optional when building a virtual platform. The software will at some point assume a certain relationship between word layouts and byte addresses in memory (such as overlaying a byte array on an integer in a C union), or when interpreting network packets (which are defined to use BE byte order, and therefore network code has to convert values to native endian to process them).</p>
<p style="text-align: left;">If you start from the software view of endianness and memory, the obvious simulation model for memory operations is to maintain the array of bytes view of memory matching the physical target.</p>
<ul>
<li>Each memory access from a simulated processor gets turned into a transaction in the simulator.</li>
<li>The transaction has variable size, matching the size of the memory access operation issued by the processor.</li>
<li>The transaction contains a sequence of bytes, in the same order as they would end up in target memory on a physical machine. I.e., the order reflects the endianness of the processor.</li>
<li>The transaction has a starting address (byte-based) matching the memory access the processor issues.</li>
<li>The contents of the memory model in the simulation is an array of bytes, and its content matches what you would find on the physical target &#8211; the logical software view of the target.</li>
<li>The bus system connecting the processor to the memory is basically considered as a black box that just moves the transaction to memory.</li>
</ul>
<p>The above is very easy to implement, and actually a very convenient implementation for someone used to the software view of hardware. The only thing that remains to be considered is how a processor simulator is implemented in practice.</p>
<p>In a typical processor simulator, you represent the target system registers using words of the same size as the target processor uses. I.e., for a 32-bit processor, you use 32-bit words on the host to represent the contents of a register. As the processor model is running, the contents of the register might have to stored in data structures internal to the processor (such as an array of words representing the register file). Naturally, such data structures are kept in host endianness since they are just plain compiled C code. As the processor model runs, arithmetic is carried out using host endianness.</p>
<p>Actually, usually no endianness is involved as the values are considered as words. Remember that a word does not have endianness until it is broken down into bytes and someone actually looks at the bytes. In particular, an operation like</p>
<pre>uint8  a;
uint32 b;
a = (b &amp; 0xff)</pre>
<p>will pick up the 8 lowest bits of a word on any processor. The code is logically working inside of registers and is perfectly portable. However, the result of</p>
<pre>uint32 *c;
*c = b;
a = *((uint8 *)c);</pre>
<p>will pick up the first (at the lowest address) byte stored in memory when b was written &#8211; which is the same as the above on an LE processor, but different on a BE processor. The crucial observation here is that the latter variant contains an explicit store of a word, and an explicit load of a byte. Thus, endianness enters as we store the word (the byte load has no endianness, as it is loading the smallest unit of addressability).</p>
<p>What this means is that a processor simulator will have to do an explicit ordering of bytes as it is writing out values to memory. The simulator will need to take a word it has represented in &#8220;host order&#8221; (as it is within the simulator itself) and convert it to the byte order of the target processor. If the two match, such as simulating a little-endian ARM target on a (always little-endian) x86 host, nothing needs to be done. If they do not match, such as simulating a big-endian PPC target on an x86 host, the bytes have to be swapped before being sent to simulated memory.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png"><img class="aligncenter size-full wp-image-1340" title="endianness 2" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-2.png" alt="" width="422" height="368" /></a>When the processor does a load, it similarly has to swap the bytes being read from memory (if using different target and host endianness).</p>
<p>As soon as we leave the processor simulator, the order of bytes in transactions and simulated memory has to defined and managed in a host-independent way. This is crucial to enable  snapshots of memory to be <a href="http://blogs.windriver.com/engblom/2010/08/transporting-bugs-with-checkpoints.html#more">shared across hosts, time, and space</a>, and simply to allow the simulation to work correctly. The semantics of the simulation must be defined by the simulator, not by the nature of the host.</p>
<p>Note that as an optimization, quite often we do not create an explicit transaction, but rather use the optimization of letting the processor simulator write directly to the representation of the target memory in the memory simulator. In this design, the target memory representation is just an array of bytes mirroring the contents that the processor would see on a physical target.</p>
<p>Let&#8217;s go through this with a simple example. We assume we are on an x86 host. Our processor simulator contains a 32-bit register with the value 0&#215;01020304. This value is endianless until we have to send it to simulated memory, it is just a value of 32 bits. We write it to target memory at address 0&#215;100</p>
<p>On a simulated LE target, the memory write will result in a transaction containing the byte sequence (0&#215;04, 0&#215;03, 0&#215;02, 0&#215;01) &#8211; lowest byte comes first. The memory model will store this with 0&#215;04 at address 0&#215;100, 0&#215;03 at 0&#215;101, etc. The processor model can achieve this effect by simply doing a host-native word store to the memory array.</p>
<p>On a simulate BE target, the memory write will result in a transaction containing (0&#215;01, 0&#215;02, 0&#215;03, 0&#215;04). In memory, 0&#215;01 will be stored at address 0&#215;100, 0&#215;02 at 0&#215;101, etc. To store this word correctly, the processor model will have to do a byte swap operation on the word before writing it out to memory. Such a byte swap operation might seem expensive, but the evidence does not indicate that it matters. All the fastest instruction-set simulators use this method internally as far as I know (Wind River Simics, Imperas OVP, Qemu, IBM Mambo), which to me indicates that the design works well on a simulation system level.</p>
<h2>Device Models</h2>
<p>Device models are the main part of a functional simulator for a computer system. They also have endianness, as they expose memory-mapped interfaces to software. To deal with devices in a consistent manner, they will interpret inbound memory transactions using their local register endianness. This makes it simple and reliable to simulate systems where the processor and the devices have different endianness.</p>
<p>Systems with mixed device endianness is very common, mostly thanks to PCI. PCI is defined to use little-endian byte ordering in all memory accesses, as it originated in the x86 world. PCI is still being used in almost all computer systems, and thus LE PCI devices are being connected to BE processors.</p>
<p>Internally, a device model will also use words to represent data. When data is written to a device, it will interpret the bytes in the write transaction using its local order. When data is read from a device, it will fill in the data in the read transaction using its local order.This makes device drivers that byte-swap incoming data from an LE PCI device on a BE processor work just like they do on physical hardware.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png"><img class="aligncenter size-full wp-image-1341" title="endianness 3" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-3.png" alt="" width="473" height="414" /></a>This makes endianness a local property of the device. The same device model can be used without change in both an LE and a BE target system. This mirrors reality: PCI devices are used in all kinds of systems, and the devices do not change, and neither do the models have to.</p>
<p>In some systems, the designers try to hide the RISC-processor-to-PCI endianness mismatch by making the hardware swap bytes around as they move from the memory bus into the PCI subsystem. If this is the case in a target system, the simplest simulation method is to insert an byte-swapping intermediary on the path from the processor to the devices. This will do an extra byte swap on all transactions passing by, and things will work correctly (note that this byte swap has to be defined to work on a certain word length, and if transactions are bigger than this length, you will also have to order the words).</p>
<p>Note that as long as all units involved on the path from a device to a processor use the same word length, you can replace all the byte swapping operations with a simple flag. This flag will indicate if a transaction has been swapped or not. For example, when we have a BE processor talking to a BE device, on an LE host. The BE processor will flag the transaction as &#8220;wrong-endian&#8221; as it sends it out but actually store the bytes in LE order in the transaction. The BE device will check the flag and realize that it is wrong-endian too. And since two wrongs make a right, it does not have  to swap the bytes either but can copy the transaction contents directly into its internal registers.</p>
<h2>Dealing with Data</h2>
<p>There are other things you want to do with a memory image in a virtual platform apart from reading and writing it from a processor. One particular task is to move data into and out of memory model in order to load code and data, as well as to save the state of the system. The representation of a memory as an array of bytes works very well for this approach, since it corresponds naturally to how software files are created on the host. Since most software files are intended to be loaded by the target into target memory, they are prepared in target byte order. Another advantage of using a byte-based memory representation is that file formats like ELF can be loaded straight into virtual memory without having to convert addresses.</p>
<p><a href="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png"><img class="aligncenter size-full wp-image-1344" title="endianness 5" src="http://jakob.engbloms.se/wp-content/uploads/2010/12/endianness-5.png" alt="" width="495" height="395" /></a>The representation is also host-independent, which facilitates moving memory images from one host to another, a key part of <a href="http://jakob.engbloms.se/archives/1235">using virtual platforms as a communications mechanism</a>. Another benefit of viewing memory as an array of bytes as accessed from a processor is that debuggers can look at memory in the same way as they would when running on the same host.</p>
<h2>Summary</h2>
<p>This long post (WordPress tells me it is more than 2500 words) really only starts to scratch the surface of this fascinating topic. It has described one approach to endianness modeling, and some of the subtleties involved. There are many more subtleties that we could go into.</p>
<h2>Footnote: SystemC TLM-2.0</h2>
<p>There are other ways to model endianness. In particular, the approach described here is not used in the SystemC TLM-2.0 standard. In TLM-2.0, all data is stored in a transaction in <em>host</em> order, not target order. To model the target endianness, you instead change a descriptor array that tells the simulator about how to interpret the bytes when viewed from the target.</p>
<p>As I see it, this means that TLM-2.0 is better suited for modeling the ins and outs of a bus system, including discovering how data ends up at a target from the actions of the various components of the bus system. It models byte lanes and the width of buses, and uses host byte order for all transfers of data. In contrast, the approach described in this blog post works by modeling the documented (or intended) effect of the hardware at the software level.</p>
<p>Overall, I would say that TLM-2.0 is slightly more geared towards the &#8220;<a href="http://jakob.engbloms.se/archives/1083">design&#8221; use of modeling, rather than &#8220;describe</a>&#8220;. By modeling bus widths, actual byte lanes, and other concepts, the simulator will discover the shape and endianness of data as it arrives at a target memory or device.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/1336"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/1336" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/1336" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/1336/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software, Hardware, and Development Methods</title>
		<link>http://jakob.engbloms.se/archives/242?&#038;owa_medium=feed&#038;owa_sid=</link>
		<comments>http://jakob.engbloms.se/archives/242#comments</comments>
		<pubDate>Mon, 25 Aug 2008 20:51:12 +0000</pubDate>
		<dc:creator>Jakob</dc:creator>
				<category><![CDATA[computer simulation technology]]></category>
		<category><![CDATA[ESL]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[virtual platforms]]></category>
		<category><![CDATA[Frank Schirrmeister]]></category>
		<category><![CDATA[Glenn Perry]]></category>
		<category><![CDATA[hardware modeling]]></category>
		<category><![CDATA[Hypercard]]></category>
		<category><![CDATA[object-oriented programming]]></category>
		<category><![CDATA[OOP]]></category>
		<category><![CDATA[Pascal]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://jakob.engbloms.se/?p=242</guid>
		<description><![CDATA[I just read an opinion-provoking piece &#8220;Software developer attitudes: just get on with it&#8221; by Frank Schirrmeister, as well as the article &#8220;Life imitating art: Hardware development imitating software development&#8221; by Glenn Perry that he linked to. Both these articles touch on the long-standing question of who does development the &#8220;best&#8221; in computing. I have [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-654" style="margin: 10px;" title="opinion" src="http://jakob.engbloms.se/wp-content/uploads/2009/02/opinion.png" alt="opinion" width="91" height="69" />I just read an opinion-provoking piece <a href="http://www.synopsysoc.org/viewfromtop/?p=46">&#8220;Software developer attitudes: just get on with it&#8221; by Frank Schirrmeister</a>, as well as the article <a href="http://www.soccentral.com/results.asp?EntryID=26246#PrintPage">&#8220;Life imitating art: Hardware development imitating software development&#8221; by Glenn Perry </a>that he linked to. Both these articles touch on the long-standing question of who does development the &#8220;best&#8221; in computing. I have heard these arguments many times, where software developers think that there is something mythical about hardware development that makes things work so much better with much fewer bugs, and hardware people looking at the speed of development and fanciful fireworks of coding that software engineers can do. It could be a case of the grass always looking greener on the other side&#8230; but there are some concrete things that are relevant here.</p>
<p><span id="more-242"></span></p>
<h2>Perry: OO and onwards!</h2>
<p>As Glenn Perry notes, hardware languages have only just now discovered object-oriented programming, and he sees a gap of five to ten years from software development practice to hardware development practice. I think saying that OO is mainstream in 1998 is a bit late&#8230; I remember doing my first OO things in object Pascal and Hypercard around 1990, and it was standard fare at that point in time.  So with this perspective, maybe there is some hope that hardware development one day takes up current practices like <a href="http://jakob.engbloms.se/wp-admin/post.php?action=edit&amp;post=165">dynamic typing, explicitly threaded languages</a>, and <a href="http://jakob.engbloms.se/wp-admin/post.php?action=edit&amp;post=186">scripting-style</a> (as I have argued before on this blog).</p>
<p>It also looks to me as if the OO is really mostly applied to verification systems and test benches rather than the actual design. Which is not too surprising: in the end, hardware design is about creating a fixed hardware layout, and polymorphic objects and pointer chains map really poorly to transistors. Considering the incredible difficulties of static optimization of C++ code with fun things like escape analysis and very conservative assumptions, I have a hard time seeing synthesis from arbitrary OO code any time soon.</p>
<p>But the main use seems to be in test benches, and in that world there is no reason to stick to antiquated concepts like static typing and old-fashion OO&#8230; it would seem quite feasible to move to an asynchronous message-passing parallel model with dynamic types and special support for validating hardware behavior. I think a limiting factor here is the user base that has little computer science schooling and little exposure to non-procedural languages.</p>
<h2>Schirrmeister: The Sword of Damocles</h2>
<p>Over to Schirrmeister. His assertion is really that the grass looks greener on the other side, but that that grass is really quite poisonous for those living on your own side:</p>
<blockquote><p>While I agree with Glenn that the technology in software engineering may be more advanced, for example around languages, I am certain that the required methodologies are fundamentally incompatible. The reason? In hardware engineering the project team always has the “<a href="http://en.wikipedia.org/wiki/Sword_of_Damocles_%28disambiguation%29">Sword of Damocles</a>” hanging over their head. Mess up the tape out and you will cost the company several millions in NRE (Non recurring engineering, or, “N”ever “R”eturn “E”ver). In addition you have to consider the lost product revenue because of the several months the project is now delaying production.</p>
<p>In contrast, in software engineering there is always service pack 2. The requirement to get everything right is not as deadly as it is in hardware engineering. And as a result of all that Skip Hovsmith is perfectly right &#8211; “Just get on with it” is unfortunately often the approach taken in software engineering. It looks to like things have to get a lot worse before this approach changes.</p></blockquote>
<p>I think this makes some sense&#8230; but it is really a bit simplistic.</p>
<h2>My Synthesis: It is a sliding scale&#8230;</h2>
<p>I think that the argument that Frank puts forward has some merit. But it is not true that there is always a next patch in the software world. In my experience, we really have a sliding scale of software criticality and ease of patching.</p>
<p>At one extreme, we have hosted web applications where the code lives on the provider&#8217;s servers, and can be and is updated all the time. Here, a user does not even see versions usually, they just see problems being fixed. Wonderful quick turnaround, and also very easy to just send something out for an eternal beta a la google&#8230; Development can be very productive as it is very easy to deploy new versions and customers are quite tolerant of issues as long as you can show that you can fix things rapidly. Often, the code is really throw-away and not intended to live for more than a few months anyway, so you can live with glitches and bugs, they are not economically meaningful to patch.</p>
<p>Somewhere in the middle there is PC software that you need to update over the Internet (or diskettes in the good old days). Here, patching is relatively cheap and easy, and most users happily update their software once a week or once a month, basically as often as you can release it. Some Enterprise Customers tend to be slow to adopt the latest patches as they want to check that things do not break before deployment.</p>
<p>Then you have embedded software which is usually much harder to patch in the field. I have only updated my mobile phone a few times in two years. And if you go down to printers, routers, and similar devices they only very rarely get updated. Here, you do need to be quite careful about testing since the cost of fixing goes up.</p>
<p>The most extreme software is safety-critical life-determining items like radiation machines, car brakes, aircraft engine controls, missile guidance, and similar. The cost of developing such software is on par with hardware, since there is no second chance if a bug manifests itself. Patching is about as hard as replacing faulty chips. Methodologies have to be fairly heavyweight, just like in hardware design.</p>
<p>So software can be just as precisely engineered and costly and slow to develop as hardware.</p>
<p>The real question is how to reduce the drag induced by the tough requirements of hardware development, so that at least some parts of the development can be fast and furious and fun. And on par with modern software development.</p>
<h2>Virtual platforms is where things meet</h2>
<p>In my mind, one of the places for fast and fun development is virtual platforms. A virtual platform is initially a vehicle for exploration and experimentation and iteration on how a design should be. That is an ideal place for software-like attitudes and modern software development. Code quickly, test with software, iterate often is really exactly what you want to do up-front in a hardware or system design project. That the code is incomplete and not sufficient for synthesis does not matter: you want to get to the essential questions as quickly as possible. Or provide a virtual platform for software developers are soon as possible.</p>
<p>The VP itself is not a hardware design, it is really a software program and can be and should be developed as such. <strong>Modeling is programming</strong>, not hardware design. I do not see the initial VP or the software-development VP as being something that is (necessarily) to be converted into hardware. They are really design specifications and executable data sheets, which at some point are used to create more detailed design that can be actually turned into hardware.</p>
<p>A model of a piece of hardware is usually something that a <a href="http://www.scdsource.com/article.php?id=166">single programmer with good tools </a>can put together in days, as long as it is <a href="http://www.virtutech.com/whitepapers/modeling.html">kept at a software-timed transaction level</a>. There is no need for heavy processes for these kinds of models, and they offer a chance for hardware designers to have some fun and go off into software land and quickly program things in a more relaxed and richer programming environment. Doing the initial work on a virtual platform is really like web development, where a new version can be generated very often to see what the users think of it. There is no hardware cost or fab cost to dampen enthusiasm&#8230;</p>
<p>Virtual platforms are uniquely interesting in that respect, they are really where software and hardware meet, and can be considered to belong on either side. In my mind, they should be considered to be software, as that is what makes it possible to develop them with the required speed.</p>
<div class="simple_likebuttons_container_small">
      <div class="simple_likebuttons_googleplus">
        <g:plusone size="medium" count="false" href="http://jakob.engbloms.se/archives/242"></g:plusone>
      </div>
    
      <div class="simple_likebuttons_twitter simple_likebuttons_twitter_s">
        <a href="https://twitter.com/share" class="twitter-share-button" data-count="none" data-url="http://jakob.engbloms.se/archives/242" data-lang="en">Tweet</a>
      </div>
    
      <div class="simple_likebuttons_facebook">
        <div id="fb-root"></div>
        <script>(function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) {return;}
          js = d.createElement(s); js.id = id;
          js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";
          fjs.parentNode.insertBefore(js, fjs);
        }(document, "script", "facebook-jssdk"));</script>
        <div class="fb-like" data-href="http://jakob.engbloms.se/archives/242" data-send="false" data-layout="button_count" data-show-faces="false" data-width="90"></div>
      </div>
    </div>]]></content:encoded>
			<wfw:commentRss>http://jakob.engbloms.se/archives/242/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

