Intel Blog Post: Fault Injection in the Early Days of Simics

Injecting faults into systems and subjecting them to extreme situations at or beyond their nominal operating conditions is an important part of making sure they keep working even when things go bad.  It was realized very early in the history of Simics (and the same observation had been made by other virtual platform and simulator providers) that using a virtual platform makes it much easier to provide cheap, reliable, and repeatable fault injection for software testing. In an Intel Developer Zone (IDZ) blog post, I describe some early cases of fault injection with Simics.

Continue reading “Intel Blog Post: Fault Injection in the Early Days of Simics”

Intel Blog: The Right Mindset and Toolset for Testing

I have a two-part series (one, two) on testing posted on my Software Evangelist blog on the Intel Developer Zone.  This is a long piece where I get back to the interesting question of how you test things and the fact that testing is not just the same as development.  I call the posts Mindset and Toolset

Continue reading “Intel Blog: The Right Mindset and Toolset for Testing”

Wind River Blog: Fault Injection using Simics – with Video

I just added a new blog post on the Wind River blog, about how you do fault injection with Simics. This blog post covers the new fault injection framework we added in Simics 5, and the interesting things you can do when you add record and replay capabilities to spontaneous interactive work with Simics. There is also a Youtube demo video of the system in action.

When is Redundancy Cheaper?

fire from MS Office clip artI find the subject of fault tolerance and resiliency in computers quite interesting. It also very interesting to look into what kinds of faults actually do happen in the real world, and what impact they have. I recently found a couple of good sources on this. First of all, a paper from Super Computing 2012 by Fiala et al, called “Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing” (ACM Digital Library). One of its references was to a 2011 talk by Al Geist, “What is the Monster in the Closet”, which provided some more data on how common faults are.

Continue reading “When is Redundancy Cheaper?”

Wind River Blog: An Interview with Andreas Buchwieser about Safety Standards and Simics

There is a new post at my Wind River blog, an interview with Andreas Buchwieser from the Wind River office in München. It discusses how Simics can be applied to the field of safety-critical systems, including helping test the software to get it certified. Really interesting, and in particular it is worth noting that qualifying tools in the IEC 61508 and ISO 26262 context is much easier than in DO-178B/C. The industrial family of safety standards have been created to allow for tools to help validate an application without forcing incredibly high demands on the development of those tools.

 

Wind River Blog: Fault Injection with Simics

There is a new post at my Wind River blog, about how you actually do fault injection in Simics. This particular post is pretty detailed, showing the actual architecture of a fault injector in Simics, not just “yes you can do it”. It includes actual diagrams of system components and how you can insert fault injection into an existing system, so it is a bit more technical than most my Wind River blog posts that tend to be more conceptual.

Fujitsu Server Fault Injection Robot

Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.

Continue reading “Fujitsu Server Fault Injection Robot”

Pulling the Virtual Ethernet Plug

I just read the panel interview at the start of the latest issue (Number 4, 2008) of ACM Queue. Here, you have Bryan Cantrill of Sun (the man behind dTrace) bemoan the difficulty of testing faults. In particular:

Part of the reason I’m interested in virtualization is as a development methodology. It has not delivered on this, but one of the things that I ask is can I use virtualization to automate someone pulling the Ethernet cable out of the jack? I can get a lot closer to simulating it if you let me create a toy virtual machine than I can running on the live machine.

Well, this already exists. It is a common feature to any virtual platform that is not a datacenter-oriented runtime engine like VmWare, Xen, LPAR, and its ilk. Doing fault injection is a primary use case for virtual platforms, especially for larger servers and systems featuring redundancy and fault tolerance.