I presented a tutorial about the “verification of virtual platforms models” at DVCon Europe last week. The tutorial was prepared by me and Ola Dahl at Ericsson, but Ola unfortunately could not attend and present his part – so I had to learn his slides and style and do my best to be an Ola stand-in (tall order, we really missed you there Ola!). The title maybe did not entirely describe the contents – it was more a discussion around how to think about correctness and in particular specifications vs implementations. The best part was the animated discussion that we got going in the room, including some new insights from the audience that really added to the presented content.
Updated: Included an important point on software correctness that I forgot in the first publication.
Back in 2016, the European Space Agency (ESA) lost the Schiaparelli Mars lander during its descent to the surface on Mars. From a software engineering and testing perspective, the story of why the landing failed (see for example the ESA final analysis, Space News, or the BBC) is instructive. It comes down to how software is written and tested to deal with unexpected inputs in unexpected circumstances. I published a blog post about this right after the event and before the final analysis was available. Thankfully, that has since been retired from its original location-it was a bit too full of speculation that turned out to be incorrect… So here is a mostly rewritten version of the post, quoting the final analysis and with new insights.
I have recently got back to developing training labs for the Simics simulator (and related technologies). During the process of developing a new accelerator model using as many of the latest frameworks and APIs as possible, it was basically guaranteed that I would hit some bugs and unexpected behaviors. That is a natural part of and benefit from creating training materials in the first place. It also provides a good illustration of two fundamentally different ways to look at software development. One is to play it safe and get things done in known ways, and the other is charge ahead, try the unknown, and see what happens. Damn the torpedoes, bugs are a benefit. No bug reports, no glory. In this post, I will share some recent examples of just coding ahead and breaking thing.
Recently I stumbled on a nice piece called “Always Measure One Level Deeper” by John Ousterhout, from Communications of the ACM, July 2018. https://cacm.acm.org/magazines/2018/7/229031-always-measure-one-level-deeper/fulltext. The article is about performance analysis, and how important it is to not just look at the top-level numbers and easy-to-see aspects of a system, but to also go (at least) one level deeper to measure the components and subsystems that affect the overall system performance.
I just found a story about Undo software that was rather interesting from a strategic perspective. “Patient capital from CIC gives ‘time travelling’ company Undo space to pivot“, from the BusinessWeekly in the UK. The article describes a change from selling to individual developers, towards selling to enterprises. This is an important business change, but it also marks I think a technology thinking shift: from single-session debug to record-replay.
I work with virtual platforms and software simulation technology, and for us most simulation is done on standard servers, PCs, or latptops. Sometimes we connect up an FPGA prototype or emulator box to run some RTL, or maybe a real-world PCIe device, but most of the time a simulator is just another general-purpose computer with no special distinguishing properties. When connecting to the real world, it is simple standard things like Ethernet, serial ports, or USB.
There are other types of simulators in the world however – still based on computers running software, but running it somehow closer to the real world, and with actual physical connections to real hardware beyond basic Ethernet and USB. I saw a couple of nice examples of this at the Embedded World back in February, where full-height racks were basically “simulators”.
I have just published a piece about the Intel Excite project on my Software Evangelist blog at the Intel Developer Zone. The Excite project is using a combination of of symbolic execution, fuzzing, and concrete testing to find vulnerabilities in UEFI code, in particular in SMM. By combining symbolic and concrete techniques plus fuzzing, Excite achieves better performance and effect than using either technique alone.
This blog post was originally published at Intel. After it was retired from the Intel blog system, I reposted the full contents here in order to preserve the information for my own sake. And in case anyone has a pointer here.
How Simulation Started a Billion-Dollar Company
In the early 1990s, “PC graphics” was almost an oxymoron. If you wanted to do real graphics, you bought a “real machine”, most likely a Silicon Graphics workstation. At the PC price-point, fast hardware-accelerated 3D graphics wasn’t doable… until it suddenly was, thanks to Moore’s law. 3dfx was the first company to create fast 3D graphics for PC gamers. To get off the ground and get funded, 3dfx had to prove that their ideas were workable – and that proof came in the shape of a simulator. They used the simulator to demo their ideas, try out different design points, develop software pre-silicon, and validate the silicon once it arrived.
Simics and other simulation solutions are a great way to add more variation to your software testing. I have just documented a nice case of this on my blog at the Intel Developer Zone (IDZ), where the Simics team found a bug in how Xen deals with MPX instructions when using VT-x. Thanks to running on Simics, where scenarios not available in current hardware are easy to set up.
My first blog post as a software evangelist at Intel was published last week. In it, I tell the story of how our development teams used Simics to test the software behavior (UEFI, in particular) when a server is configured with several terabytes of RAM. Without having said server in physical form – just as a simulation. And running that simulation on a small host with just 256 GB of RAM. I.e., the host RAM is just a small fraction of the target. That’s the kind of things that you can do with Simics – the framework has a lot of smarts in it.
It was rather interesting to realize that just the OS page tables for this kind of system occupies gigabytes of RAM… but that just underscores just how gigantic six terabytes of memory really is.
How important is the documentation (manual, user guide, instruction booklet) for the actual quality and perceived quality of a product? Does it materially affect the user? I was recently confronted by this question is a very direct way. It turned out that the manual for our new car was not quite what you would expect…
This is just the first page, and as you can see if you know Swedish or German or both, it is a strange interleaving of sentences in the two languages.