I have written before about the debug advice to “Quit thinking and look.” It means that you should not form conclusions prematurely. Stop and look at what is going on instead of guessing and cooking up theoretical scenarios. Sound advice that I completely failed to follow in the case that I just chronicled on my Intel Blog: https://software.intel.com/en-us/blogs/2020/03/18/quit-thinking-and-look-chasing-simics-performanceContinue reading “Intel Blog Post: “Quit Thinking and Look” – Mea Culpa Chasing a Performance Bug”
Recently I stumbled on a nice piece called “Always Measure One Level Deeper” by John Ousterhout, from Communications of the ACM, July 2018. https://cacm.acm.org/magazines/2018/7/229031-always-measure-one-level-deeper/fulltext. The article is about performance analysis, and how important it is to not just look at the top-level numbers and easy-to-see aspects of a system, but to also go (at least) one level deeper to measure the components and subsystems that affect the overall system performance.
I just found a story about Undo software that was rather interesting from a strategic perspective. “Patient capital from CIC gives ‘time travelling’ company Undo space to pivot“, from the BusinessWeekly in the UK. The article describes a change from selling to individual developers, towards selling to enterprises. This is an important business change, but it also marks I think a technology thinking shift: from single-session debug to record-replay.
I work with virtual platforms and software simulation technology, and for us most simulation is done on standard servers, PCs, or latptops. Sometimes we connect up an FPGA prototype or emulator box to run some RTL, or maybe a real-world PCIe device, but most of the time a simulator is just another general-purpose computer with no special distinguishing properties. When connecting to the real world, it is simple standard things like Ethernet, serial ports, or USB.
There are other types of simulators in the world however – still based on computers running software, but running it somehow closer to the real world, and with actual physical connections to real hardware beyond basic Ethernet and USB. I saw a couple of nice examples of this at the Embedded World back in February, where full-height racks were basically “simulators”.
I have just published a piece about the Intel Excite project on my Software Evangelist blog at the Intel Developer Zone. The Excite project is using a combination of of symbolic execution, fuzzing, and concrete testing to find vulnerabilities in UEFI code, in particular in SMM. By combining symbolic and concrete techniques plus fuzzing, Excite achieves better performance and effect than using either technique alone.
In the early 1990s, “PC graphics” was almost an oxymoron. If you wanted to do real graphics, you bought a “real machine”, most likely a Silicon Graphics workstation. At the PC price-point, fast hardware-accelerated 3D graphics wasn’t doable… until it suddenly was, thanks to Moore’s law. 3dfx was the first company to create fast 3D graphics for PC gamers. To get off the ground and get funded, 3dfx had to prove that their ideas were workable – and that proof came in the shape of a simulator. They used the simulator to demo their ideas, try out different design points, develop software pre-silicon, and validate the silicon once it arrived. Read the full story on my Intel blog, “How Simulation Started a Billion-Dollar Company”, found at the Intel Developer Zone blogs.
Doing continuous integration and continuous delivery for embedded systems is not necessarily all that easy. You need to get tools in place to support automatic testing, and free yourself from unneeded hardware dependencies. Based on an inspiring talk by Mike Long from Norway, I have a piece on how simulation helps with embedded CI and CD on my Software Evangelist blog on the Intel Developer Zone.
Simics and other simulation solutions are a great way to add more variation to your software testing. I have just documented a nice case of this on my blog at the Intel Developer Zone (IDZ), where the Simics team found a bug in how Xen deals with MPX instructions when using VT-x. Thanks to running on Simics, where scenarios not available in current hardware are easy to set up.
UndoDB is an old player in the reverse debugging market, and have kept at it for ten years. Last year, they released the Live Recorder record-replay function. Most recently, they have showed an integration between the recorder function and Jenkins, where the idea is that you record failing runs in your CI system and replay them on the developer’s machine. Demo video is found on Youtube, see https://www.youtube.com/watch?v=ap8552P5vss.
My first blog post as a software evangelist at Intel was published last week. In it, I tell the story of how our development teams used Simics to test the software behavior (UEFI, in particular) when a server is configured with several terabytes of RAM. Without having said server in physical form – just as a simulation. And running that simulation on a small host with just 256 GB of RAM. I.e., the host RAM is just a small fraction of the target. That’s the kind of things that you can do with Simics – the framework has a lot of smarts in it.
It was rather interesting to realize that just the OS page tables for this kind of system occupies gigabytes of RAM… but that just underscores just how gigantic six terabytes of memory really is.
How important is the documentation (manual, user guide, instruction booklet) for the actual quality and perceived quality of a product? Does it materially affect the user? I was recently confronted by this question is a very direct way. It turned out that the manual for our new car was not quite what you would expect…
This is just the first page, and as you can see if you know Swedish or German or both, it is a strange interleaving of sentences in the two languages.
A long time ago, when I was a PhD student at Uppsala University, I supervised a few Master’s students at the company CC-Systems, in some topics related to the simulation of real-time distributed computer systems for the purpose of software testing. One of the students, Magnus Nilsson, worked on a concept called “Time-Accurate Simulation”, where we annotated the source code of a program with the time it would take to execute (roughly) on the its eventual hardware platform. It was a workable idea at the time that we used for the simulation of distributed CAN systems. So, I was surprised and intrigued when I saw the same idea pop up in a paper written last year – only taken to the next level (or two) and used for detailed hardware design!
Continue reading “Time-Accurate Simulation Revisited – 15 years later”
Thanks to the good folks at Vector Software, I was pointed to a conference recording on Youtube, from the Google Test Automation Conference (GTAC) 2015 (Youtube video). The recording covers quite a few talks, but at around 4 hours 38 minutes, Brian Gogan describes the testing used for the Chromecast product. This offers a very cool insight into how networked consumer systems are being tested at Google. Brian labels the Chromecast as an “Internet of Things” device*, and pitches his talk as being about IoT testing. While I might disagree about his definition of IoT, he is definitely right that the techniques presented are applicable to IoT systems, or at least individual devices.
I have read a few news items and blog posts recently about how various types of software running on top of virtual machines and emulators have managed to either break the emulators or at least detect their presence and self-destruct. This is a fascinating topic, as it touches on the deep principles of computing: just because a piece of software can be Turing-equivalent to a piece of hardware does not mean that software that goes looking for the differences won’t find any or won’t be able to behave differently on a simulator and on the real thing.
There is a new post at my Wind River blog, about how Simics is used to simulate large wireless networks for IoT (Internet-of-Things) applications.
It is funny for me to be back at the IoT game. A decade ago (time flies, doesn’t it?), at Virtutech, I and Johan Runeson took part in an EU research project on exactly this topic. Unfortunately, we had to back out of that project due to economic circumstances and failing management commitment, but we still learnt a few things that were relevant now that we are back in the IoT game. In particular, how to simulate wireless networks in a reasonable way in a transaction-level simulator. Thus, payback for the investment took 10 years to arrive, but it did arrive. To me, that underscores the need to be a bit speculative, take some risk, and try to explore the future.
There is a new post at my Wind River blog, about how you can use Simics to enable the automatic testing of pretty much any computer system (as long as we can put it inside a simulator). This is a natural follow-up to the earlier post about continuous integration with Simics and Simics-Simulink integrations — automated test runs is a mandatory and necessary part of all modern software development.
The September 2013 issue of the Intel Technology Journal (which actually arrived in December) is all about Simics. Daniel Aarno of Intel and I served as the content architects for the issue, which meant that we managed to contributed articles from various sources, and wrote an introductory article about Simics and its usage in general. It has taken a while to get this journal issue out, and now that it is done it feels just great! I am very happy about the quality of all the ten contributed articles, and reading the final versions of them actually taught me some new things you could do with Simics! I already wrote about the issue in a Wind River blog post, so in this my personal blog I want to be a little bit more, well, personal.
There is a new post at my Wind River blog, about how a team of researchers at the University of Nebraska at Lincoln is using Simics to force rare bugs to manifest themselves as errors. They used Simics to control a target system to force it into rare situations much more likely to trigger latent bugs, requiring far fewer test runs compared to just randomly rerunning tests again and again and hoping to see a bug.
There is a new post at my Wind River blog, about how you actually do fault injection in Simics. This particular post is pretty detailed, showing the actual architecture of a fault injector in Simics, not just “yes you can do it”. It includes actual diagrams of system components and how you can insert fault injection into an existing system, so it is a bit more technical than most my Wind River blog posts that tend to be more conceptual.
Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.
There is a new post at my Wind River blog, an interview with Dan Poirot at RTI who is using Simics to model and test heterogeneous, distributed, networked systems.
There is a new post at my Wind River blog, about how you can use a virtual platform to complete work faster. Not by making the virtual platform execution of target code faster, but by optimizing the way you work and taking advantage of the features of a virtual platform.
There seems to be no shortage of bugs that “should have been obvious” and subject to the “how can you not check that your own products work together” phenomenon. Just the other day, I stumbled on another one. This time, it was the Microsoft set of applications and operating systems that do not quite work together the way you would expect them to.
We recently got ourselves an iPod Touch, to entertain our oldest child on long trips. It is a brilliant device in many ways, I can understand why people love their iPhones (even though I am very happy with the very different style of the Blackberry phone that I was given by my employer). However, I have found one weird behavior in the music player that leaves me wondering how it got through into the shipping product.
Continuing on the thread from my previous post about the testing of products that fail to find problems that become obvious to (some) users after a very short time, I just read an article (in Swedish) about how the famed Tesla roadster cars behaved when they were confronted with Scandinavian winters.
I recently started using a new mobile phone, a Blackberry Bold 9700. I am a bit ambivalent on some of its design features, but it is certainly a very different device from the much more friendly SonyEricsson I had before. Like anybody would do, I have been playing around with it to see what it can do and what not (notable things not working: the “AppWorld” application store is not available in Sweden, YouTube videos do not play in any way that I can figure out).
And almost inevitably, as you play around with a complex modern piece of software (which is what most of the phone is, after all), you find some obvious things which are just plain broken. You wonder, “why didn’t they think of this”, and “how could this ever escape testing?” My current best example is that the built-in web browser does not render the pages from Blackberry’s own support knowledgebase.