In the early 1990s, “PC graphics” was almost an oxymoron. If you wanted to do real graphics, you bought a “real machine”, most likely a Silicon Graphics workstation. At the PC price-point, fast hardware-accelerated 3D graphics wasn’t doable… until it suddenly was, thanks to Moore’s law. 3dfx was the first company to create fast 3D graphics for PC gamers. To get off the ground and get funded, 3dfx had to prove that their ideas were workable – and that proof came in the shape of a simulator. They used the simulator to demo their ideas, try out different design points, develop software pre-silicon, and validate the silicon once it arrived. Read the full story on my Intel blog, “How Simulation Started a Billion-Dollar Company”, found at the Intel Developer Zone blogs.
A new entry just showed up in the world of reverse debugging – Simulics, from German company Simulics. It does seem like the company and the tool are called the same. Simulics is a rather rare breed, the full-system-simulation-based reverse debugger. We have actually only seen a few these in history, with Simics being the primary example. Most reverse debuggers apply to user-level code and use various forms of OS call intercepts to create a reproducible run. Since the Simulics company clearly comes from the deeply embedded systems field, it makes sense to take the full-system approach since that makes it possible to debug code such as interrupt handlers.
I have also updated my history of commercial reverse debuggers to include Simulics.
Intel CoFluent Technology is a simulation and modeling tool that can be used for a wide variety of different systems and different levels of scale – from the micro-architecture of a hardware accelerator, all the way up to clustered networked big data systems. On the Intel Evangelist blog on the Intel Developer Zone, I have a write-up on how CoFluent is being used to do model just that: Big Data systems. I found the topic rather fascinating, how you can actually make good predictions for systems at that scale – without delving into details. At some point, I guess systems become big enough that you can start to make accurate predictions thanks to how things kind of smooth out when they become large enough.
Simics and other simulation solutions are a great way to add more variation to your software testing. I have just documented a nice case of this on my blog at the Intel Developer Zone (IDZ), where the Simics team found a bug in how Xen deals with MPX instructions when using VT-x. Thanks to running on Simics, where scenarios not available in current hardware are easy to set up.
UndoDB is an old player in the reverse debugging market, and have kept at it for ten years. Last year, they released the Live Recorder record-replay function. Most recently, they have showed an integration between the recorder function and Jenkins, where the idea is that you record failing runs in your CI system and replay them on the developer’s machine. Demo video is found on Youtube, see https://www.youtube.com/watch?v=ap8552P5vss.
Last year (2015), a paper called “Don’t Panic: Reverse Debugging of Kernel Drivers” was presented at the ESEC/FSE (European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering) conference. The paper was written by Pavel Dovgalyuk, Denis Dmitriev, and Vladimir Makarov from the Russian Academy of Sciences. It describes a rather interesting approach to Linux kernel device driver debug, using a deterministic variant of Qemu along with record/replay of hardware interactions. I think this is the first published instance of using reverse debugging in a simulator together with real hardware.
My first blog post as a software evangelist at Intel was published last week. In it, I tell the story of how our development teams used Simics to test the software behavior (UEFI, in particular) when a server is configured with several terabytes of RAM. Without having said server in physical form – just as a simulation. And running that simulation on a small host with just 256 GB of RAM. I.e., the host RAM is just a small fraction of the target. That’s the kind of things that you can do with Simics – the framework has a lot of smarts in it.
It was rather interesting to realize that just the OS page tables for this kind of system occupies gigabytes of RAM… but that just underscores just how gigantic six terabytes of memory really is.
How important is the documentation (manual, user guide, instruction booklet) for the actual quality and perceived quality of a product? Does it materially affect the user? I was recently confronted by this question is a very direct way. It turned out that the manual for our new car was not quite what you would expect…
This is just the first page, and as you can see if you know Swedish or German or both, it is a strange interleaving of sentences in the two languages.
A comment on my old blog post about the history of reverse execution gave me a pointer to a fairly early example of replay debugging. The comment pointed at a 2002 blog post which in turn pointed at a 1999 LWN.net text which almost in passing describes a seemingly working record-replay debugger from 1995. The author was a Michael Elizabeth Chastain, of whom I have not managed to find any later traces.
I love bug and debug stories in general. Bugs are a fun and interesting part of software engineering, programming, and systems development. Stories that involve running Simics on Simics to find bugs are a particular category that is fascinating, as it shows how to apply serious software technology to solve problems related to said serious software technology. On the Intel Software and Services blog, I just posted a story about just that: debugging a Linux kernel bug provoked by Simics, by running Simics on a small network of machines inside of Simics. See https://blogs.intel.com/evangelists/2016/05/30/finding-kernel-1-2-3-bug-running-wind-river-simics-simics/ for the full story.
A new record, replay, and reverse debugger has appeared, and I just had to take a look at what they do and how they do it. “rr” has been developed by the Firefox developers at Mozilla Corporation, initially for the purpose of debugging Firefox itself. Starting at a debugger from the angle of attacking a particular program does let you get things going quickly, but the resulting tool is clearly generally useful, at least for Linux user-land programs on x86. Since I have tried to keep up with the developments in this field, a write-up seems to be called for.
A long time ago, when I was a PhD student at Uppsala University, I supervised a few Master’s students at the company CC-Systems, in some topics related to the simulation of real-time distributed computer systems for the purpose of software testing. One of the students, Magnus Nilsson, worked on a concept called “Time-Accurate Simulation”, where we annotated the source code of a program with the time it would take to execute (roughly) on the its eventual hardware platform. It was a workable idea at the time that we used for the simulation of distributed CAN systems. So, I was surprised and intrigued when I saw the same idea pop up in a paper written last year – only taken to the next level (or two) and used for detailed hardware design!
Continue reading “Time-Accurate Simulation Revisited – 15 years later”
Intel is a big Simics user, but most of the time Intel internal use of Simics is kept internal. However, we recently had the chance to interview Karthik Kumar and Thomas Willhalm of Intel about how they used Simics to interact with external companies and improve Intel hardware designs. The interview is found on the Wind River blog network.
It is also my last blog post written at Wind River; since January 18, I am working at Intel. I am working on ways to keep publishing texts about Simics and simulation, but the details are not yet clear.
I just posted a short blog post on the Wind River blog, introducing a video demo of the Web API to Wind River Helix Lab Cloud. In the post and video, I show how the Lab Cloud Web API works. For someone familiar with REST-style APIs, this is probably baby-level, but for me and probably most of our user base, it is something new and a rather interesting style for an API. Thus, doing a video that shows the first few steps of authentication and getting things going seems like a good idea.
I have read some recent IBM articles about the POWER8 processor and its hardware debug and trace facilities. They are very impressive, and quite interesting to compare to what is usually found in the embedded world. Instead of being designed to help with software debug, it seems the hardware mechanisms in the Power8 are rather focused on silicon bringup and performance analysis and verification in IBM’s own labs. As well as supporting virtual machines and JIT-based systems!
I have a long-standing interested in debugging in general and reverse debugging in particular and the related idea of record-replay debug (see a series of blog posts I did a few years ago on the topic: history 1, history 2, history 3, S4D report, updates, Simics reverse execution, and then Lab Cloud record/replay). Recently, I found out that Undo Software, one of the pioneers in the field, had released a product called “Live Recorder“. So I went to check it out by reading their materials and comparing it to what we have seen before.
The recent news that a hacked version of Apple Xcode has been used to insert bad code into quite a few programs for Apple iOS was both a bad surprise and an example of something that has been hypothesized for a very long time. For the news, I recommend the coverage on ArsTechnica of the XCodeGhost issue. It is very interesting to see this actually being pulled off for real – I recall seeing this discussed as a scenario back in the 1990s, going back to Ken Thompson’s 1983 ACM Turing Award lecture.
I just added a new blog post on the Wind River blog, about how you do fault injection with Simics. This blog post covers the new fault injection framework we added in Simics 5, and the interesting things you can do when you add record and replay capabilities to spontaneous interactive work with Simics. There is also a Youtube demo video of the system in action.
I have read a few news items and blog posts recently about how various types of software running on top of virtual machines and emulators have managed to either break the emulators or at least detect their presence and self-destruct. This is a fascinating topic, as it touches on the deep principles of computing: just because a piece of software can be Turing-equivalent to a piece of hardware does not mean that software that goes looking for the differences won’t find any or won’t be able to behave differently on a simulator and on the real thing.
The Security Now Podcast number 497 dealt with the topic of Vehicle Hacking. It was fairly interesting, if a bit too light on the really interesting thing which is what actually went on in the vechicle hack that was apparently demonstrated on US national television at some point earlier this year (I guess this CBS News transcript fits the description). It was still good to hear the guys from the Galois consulting firm (Lee Pike and Pat Hickey) talking about what they did. Sobering to realize just how little even a smart guy like Steve Gibson really knows about embedded systems and the reality of their programming. Embedded software really is pretty invisible in both a good way and a bad way.
I have been thinking about the role and prestige of testing for the past several years. Many things I have read and things companies have done indicate that “testing” is something that is considered a bit passe and old-school. Testers are dead weight that get into the way of releases, and they are unproductive barnacles that slow development down. Testers can all be replaced by automatic testing put in place by brilliant developers. The creative developer types are the guys with the status anyway. I might be exaggerating, but there is an issue here. I think we need to be acknowledge that testers are a critical part of the software quality puzzle, and that testing is not just something developers can do with one hand tied behind their back.
In a dusty bookshelf at work I found an ancient tome of wisdom, long abandoned by its previous owner. I was pointed to it by a fellow explorer of the dark arts of computer system design as something that you really should read. The book was “Fortress Rochester”, written by Frank Soltis, and published in 2001.
Last year, I concluded a programming project at work that clearly demonstrated that real programming tasks tend to involve multiple languages. I once made a remark to a journalist that there is a zoo of languages inside all real products, and my little project provided a very clear example of this. The project, as discussed previously, was to build an automated integration between a simple Simics target system and the Simulink processor-in-the-loop code testing system. In the course of this project, I used six or seven languages (depending on how you count), three C compilers, and three tools. Eight different compilers were involved in total.
I am going to be speaking at the 2015 Embedded World Conference in Nürnberg, Germany. My talk is about Continuous Integration for embedded systems, and in particular how to enable it using simulation technology such as Simics.
My talk is at 16.00 to 16.30, in session 03/II, Software Quality I – Design & Verification Methods.
There is a new post at my Wind River blog, about how you can use Simics to enable the automatic testing of pretty much any computer system (as long as we can put it inside a simulator). This is a natural follow-up to the earlier post about continuous integration with Simics and Simics-Simulink integrations — automated test runs is a mandatory and necessary part of all modern software development.
I just found and read an old text in the computer systems field, “Why Do Computers Fail and What Can Be Done About It?” , written by Jim Gray at Tandem Computers in 1985. It is a really nice overview of the issues that Tandem had encountered in their customer based, back in the early 1980s. The report is really a classic in the computer systems field, but I did not read it until now. Tandem was an early manufacturer of explicitly fault tolerant and highly reliable and available computers. In this technical report Jim Gray describes the basic principles of fault tolerance, and what kinds of faults happen in the field and that need to be tolerated.
At the ISCA 2014 conference (the biggest event in computer architecture research), a group of researchers from Microsoft Research presented a paper on their Catapult system. The full title of the paper is “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services“, and it is about using FPGAs to accelerate search engine queries at datacenter scale. It has 23 authors, which is probably the most I have ever seen on an interesting paper. There are many things to be learnt from and discussed about this paper, and here are my thoughts on it.
The Mill is a new general-purpose high-performance processor design from out-of-the-box computing (http://ootbcomp.com/). They claim to beat typical high-end out-of-order (OOO) designs like the Intel Haswell generation by crazy factors, such as being 2.3x faster while using 2.3x less power compared to a Haswell. All the while costing less. Ignoring the cost aspect, the power and performance numbers are truly impressive – especially for general code. How can they do something so much better than what we have today? For general code? That requires some serious innovation. With that perspective, I ask myself where the Mill is really significantly different from what we have seen before.
I recently made my first acquaintance with Windows 8, having bought a new Sony ultrabook for the family. Including a touch screen. The combination of the touch-based interface and the phone-like look of Windows 8 even on a PC has led me to think about the (unconscious) expectations that I have come to have on how systems behave and how services are accessed, from how smart phones and tablets have come to work in the past few years. In particular, where are web-based services going?
Apple just released their new iPhone 5s, where the biggest news is really the 64-bit processor core inside the new A7 SoC. Sixty four bits in a phone is a first, and it immediately raises the old question of just what 64 bits gives you. We saw this when AMD launched the Opteron and 64-bit x86 PC computing back in the early 2000’s, and in a less public market the same question was asked as 64-bit MIPS took huge chunks out of the networking processor market in the mid-2000s. It was never questioned in servers, however.