• About Jakob Engblom and this blog
Observations from Uppsala Computer Simulation, Virtual Platforms, Embedded Programming, Multicore and More (by Jakob Engblom)

Fujitsu Server Fault Injection Robot

2011 December 11 22:53 / Leave a Comment / Jakob

Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.

The Fujitsu Scientific and Technical Journal is the Fujitsu equivalent of IBM’s Journal of Research and Development. Thankfully, the FSTJ is still free while IBM erected a paywall around their articles. Number 2 of 2011 had the theme of servers, and there is an article about Quality Assurance for  Stable Server Operation by Masafumi Matsuo, Yuji Uchiyama, and Yuichi Kurita.

The article describes the process of ensuring that the final servers that are shipped to customers (from what seems to be the Sparc-based line of Fujitsu computers, even though it might actually apply equally to their mainframes and x86-based servers) are as stable as possible. Apart from designing things right, this also requires testing the fault handling and recovery operations.

I found it noteworthy that they do a lot of configuration testing, where various hardware and software configurations are played off against each other. In this way, corner cases are explored and coverage of the actual configurations that customers will be using becomes more likely (it is always dangerous to only test on one or a few configurations). They push memory system and processor loads to very high levels to ensure continued operation even in extreme cases, and also try to push the actual chips to make sure they will operate reliably in a wide range of environmental conditions. Indeed, a large focus is placed on pure physical reliability, as that is the basis for system reliability.

The best part, however, is on page four of the article, where they show the physical fault injection robot that is applied to the chips mounted on boards. This robot  shorts out individual pins on chips, clamping them to zero volts. It goes over all pins, and the test system checks what happens to the system in each case. Some kind of exhaustive testing going on here.

Neat. I have heard other stories of physical fault injection, including complex mechanisms like passing computer boards through irradiation chambers all the way to brutally simple tasks like putting an axe into a board to break it or pulling boards out of racks to simulate a sudden catastrophic failure. I would like to see more of just how these things are done in the real world. I suspect there are quite a few interesting robotics setups out there that do fault injection.

In any case, the article offered an interesting glimpse of many of the techniques used to make computer systems robust and reliable. Recommended.

It ends by noting that deeply consolidated SoC designs and aggressive dynamic power management are challenging from a testing and observation perspective, as well as creating more single points of failure. If a single system on chip fails, that system is all gone…

 

 

 

 

 

Tweet
Posted in: computer architecture, testing / Tagged: fault injection, fujitsu, Masafumi Matsuo, server, Yuichi Kurita, Yuji Uchiyama

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation

← Previous Post
Next Post →

Recent Posts

  • Military Science Fiction – The Books Blur Together
  • Wind River Blog: Starting & Configuring Simics
  • Wind River Blog:
  • Nudge Theory and Graphical User Interfaces
  • Wind River Blog: Collaborating with Recording Checkpoints
  • Wind River Blog: Simics 4.8 is Here
  • A Few Electrons too Many
  • Wind River Blog: Visuality NQ CIFS Server on Simics
  • Everything in the Cloud?
  • Wind River Blog: TCF and Simics
  • Off-Topic: Moving Bad Piggies Save Games
  • Two Cores, Four Cores, Eight Cores – Mobile Variety
  • Bliss: Failing to Pivot for Ideology
  • Wind River Blog and Movie: Demo of Simics Debugging
  • Simulation vs Reality in Schlock Mercenary

Categories

  • appearances (30)
  • articles (21)
  • blogging (10)
  • books (7)
  • business issues (31)
  • computer architecture (35)
  • conferences (34)
  • EDA (50)
    • ESL (35)
  • embedded (78)
    • embedded software (57)
    • embedded systeme (50)
  • general research (6)
  • history (32)
    • general history (7)
    • history of computing (26)
  • off-topic (94)
    • biking (5)
    • board games (1)
    • computer games (3)
    • desktop software (35)
    • food and drink (1)
    • funny (12)
    • gadgets (24)
    • Politics (3)
    • popular culture (5)
    • trains (5)
    • transportation (10)
    • travel (10)
    • websites (3)
  • parallel computing (92)
    • multicore computer architecture (51)
    • multicore debug (22)
    • multicore software (65)
  • programming (109)
  • review (8)
  • security (19)
  • teaching (7)
  • testing (9)
  • uncategorized (12)
  • virtual things (131)
    • computer simulation technology (68)
    • virtual machines (18)
    • virtual platforms (99)
    • virtualization (14)
  • Wind River Blog (43)

Tags

ARM blog commentary Cadence Checkpointing clock-cycle models Communications of the ACM computer architecture conference cycle accuracy debugging Domain-specific languages eclipse embedded freescale G900 heterogeneous homogeneous IBM Intel iPod lego linux mobile phones multicore off-topic office 2007 operating systems p4080 podcast commentary power architecture rant research reverse debugging reverse execution S4D SiCS Multicore days Simics simulation software tools Sun SystemC video virtualization Vista Windows

1

  • F-Secure Blog

Blogs and news

  • Andras Vajda's blog (on multicore)
  • Embedded in Academia (John Regehr)
  • Grant Martin
  • Jack Ganssle
  • My Wind River Blog
  • Security Now podcast
  • Secworks (Joachim Strömbergson)
  • Simon Kågström
  • Synopsys View from the Top
  • Worse Than Failure

Archives

  • June 2013 (3)
  • May 2013 (4)
  • April 2013 (1)
  • March 2013 (4)
  • February 2013 (1)
  • January 2013 (3)
  • December 2012 (2)
  • November 2012 (2)
  • October 2012 (1)
  • September 2012 (6)
  • August 2012 (4)
  • July 2012 (4)
  • June 2012 (3)
  • May 2012 (4)
  • April 2012 (2)
  • March 2012 (3)
  • February 2012 (1)
  • January 2012 (6)
  • December 2011 (2)
  • November 2011 (3)
  • October 2011 (4)
  • September 2011 (5)
  • August 2011 (4)
  • July 2011 (3)
  • June 2011 (4)
  • May 2011 (7)
  • April 2011 (1)
  • March 2011 (3)
  • February 2011 (5)
  • January 2011 (1)
  • December 2010 (4)
  • November 2010 (3)
  • October 2010 (5)
  • September 2010 (5)
  • August 2010 (5)
  • July 2010 (6)
  • June 2010 (5)
  • May 2010 (3)
  • April 2010 (4)
  • March 2010 (3)
  • February 2010 (4)
  • January 2010 (7)
  • December 2009 (6)
  • November 2009 (6)
  • October 2009 (7)
  • September 2009 (6)
  • August 2009 (7)
  • July 2009 (11)
  • June 2009 (5)
  • May 2009 (10)
  • April 2009 (7)
  • March 2009 (8)
  • February 2009 (9)
  • January 2009 (12)
  • December 2008 (8)
  • November 2008 (9)
  • October 2008 (9)
  • September 2008 (10)
  • August 2008 (13)
  • July 2008 (12)
  • June 2008 (8)
  • May 2008 (9)
  • April 2008 (10)
  • March 2008 (7)
  • February 2008 (8)
  • January 2008 (5)
  • December 2007 (5)
  • November 2007 (7)
  • October 2007 (7)
  • September 2007 (12)
  • August 2007 (9)
  • July 2007 (2)
© Copyright 2013 - Observations from Uppsala
Infinity Theme by DesignCoral / WordPress