Back in 2016, the European Space Agency (ESA) lost the Schiaparelli Mars lander during its descent to the surface on Mars. From a software engineering and testing perspective, the story of why the landing failed (see for example the ESA final analysis, Space News, or the BBC) is instructive. It comes down to how software is written and tested to deal with unexpected inputs in unexpected circumstances. I published a blog post about this right after the event and before the final analysis was available. Thankfully, that has since been retired from its original location-it was a bit too full of speculation that turned out to be incorrect… So here is a mostly rewritten version of the post, quoting the final analysis and with new insights.
What went wrong?
According to the ESA analysis, the cause of the loss of the lander was that the control system believed it had already landed on the planet while still in descent. Assuming this, it jettisoned the parachutes and made a too-short retro-rocket burn. The erroneous decision was caused by a series of “saturated” readings from a rotation sensor that the software misinterpreted. The software has not been designed or tested with those values. Initially, it was not clear if there was a fault in the sensor hardware, or if the world was basically going out of expected bounds.
Quoting from ESA:
The inquiry into the crash-landing of the ExoMars Schiaparelli module has concluded that conflicting information in the onboard computer caused the descent sequence to end prematurely.
Around three minutes after atmospheric entry the parachute deployed, but the module experienced unexpected high rotation rates. This resulted in a brief ‘saturation’ – where the expected measurement range is exceeded – of the Inertial Measurement Unit, which measures the lander’s rotation rate.https://www.esa.int/Science_Exploration/Human_and_Robotic_Exploration/Exploration/ExoMars/Schiaparelli_landing_investigation_completed
The saturation resulted in a large attitude estimation error by the guidance, navigation and control system software. The incorrect attitude estimate, when combined with the later radar measurements, resulted in the computer calculating that it was below ground level.
This mistaken altitude estimate made the computer perform a series of incorrect actions, basically behaving as if it was already on the ground or at least very close to it:
This resulted in the early release of the parachute and back-shell, a brief firing of the thrusters for only 3 sec instead of 30 sec, and the activation of the on-ground system as if Schiaparelli had landed. The surface science package returned one housekeeping data packet before the signal was lost.
In reality, the module was in free-fall from an altitude of about 3.7 km, resulting in an estimated impact speed of 540 km/h.https://www.esa.int/Science_Exploration/Human_and_Robotic_Exploration/Exploration/ExoMars/Schiaparelli_landing_investigation_completed
In short, “some inputs went out of the range that we had tried in testing – and the software failed”. Any software engineer would recognize the phenomenon. If you have not tested a scenario, you almost expect software to fail when encountering it. The only reason we are talking about this particular case is that it is much more visible when a space probe crashes on a distant planet than when some random piece of software fails.
Dealing with truly unexpected inputs in a sane way is not easy for non-interactive systems. The space probe cannot pop up an error dialog and ask the user for help. Here is another totally unrelated example, which I spotted on my way to work. It is an information monitor at the railway station asking the user to help it find its hard drive and press some Function keys to get into BIOS settings. Obviously totally useless information, and error handling could be improved:
Parallels to security testing
The testing that is needed for embedded systems and in particular for control systems seems similar to what you do in software security (or the other way around, considering that embedded got here first). Finding unexpected and improperly handled “nasty” inputs is a key part of the game. In a network-connected computer system, nasty inputs allow for intrusion and exploits. In a space probe, they cause literal crashes.
The question is how you expand the set of available inputs to test both within the boundaries you imagined when creating the code, and how to explore outside the set of “possible values”. It is surprising how often “that cannot happen” turns into “how did that ever work” in programming. Unlike mechanical systems, you cannot rely on phenomena like inertia or gradual degradation to keep moment-to-moment changes within some range of possibilities. Software is fundamentally brittle, and a very small change in a value can cause a program to spin way out of control.
Simulation as the Test Tool
The way to test this is to inject arbitrary values and subject the system to stresses that would not happen in easy spontaneous testing. This applies to both physical and computer systems, and the space industry has a long history of sophisticated testing techniques. They know that the situation on earth inside an atmosphere under gravity is totally different from what you encounter out in the deep dark. Fundamentally, you have to apply simulation. Not just in software, but also in hardware.
Here is a photo from ESA showing how they test the actual physical lander in a simulated environment:
Image Copyright ESA. Image source: http://www.esa.int/spaceinimages/Images/2013/11/ExoMars_EDM
These kinds of physical test rigs have their virtual counterparts that subject the mission software to simulated scenarios and simulated stresses.
ESA EDL E2E Simulator
ESA used a simulator/digital twin setup for the lander, known as the Entry Descent and Landing simulator (EDL E2E simulator). The full investigation report concludes that this simulator did not capture the full range of physical behaviors that occurred in reality on Mars, despite building a large sophisticated model of the vehicle dynamics. Post-facto, given the telemetry from the lander, they could reproduce the behavior observed on Mars in the simulator. That is a good simulator design!
When testing the mission software, the simulator was run using Monte Carlo techniques in order to provide a range of scenarios. Using randomness like this is key to really good software testing I think. In security research, fuzzing techniques are used to find the cracks in software. In hardware validation, directed random testing is the dominant methodology since trying to get good coverage with manually created test cases just does not work.
If you are testing anything non-trivial, going for random-driven generative testing is the way to go. Engineers should use their imagination for some set of tests, models of likely scenarios for another bunch, and then create a random scenario generator/fuzzer to find the things that nobody could actually imagine.
The below diagram attempts is repeated from an Intel blog I published in 2017, and attempts to capture the types of test that we need to consider:
The point is that there are always going to be unexpected invalid and unexpected unusual states, and our code has to be robust against such things occurring. We also need testing to go beyond what we can imagine – where randomness or other test generation techniques is likely crucial.
Note that a non-technical way to expand the set of test coverage is to bring in different perspectives from outsiders, which will tend to push the yellow “imaginable bad things” area out, and reducing the red “unimaginable bad things” area.
Missing the crucial input
The ESA investigation concludes that the source of the error was actual physical oscillations that happened on Mars. All sensors worked correctly, and the measurement values that caused the software to fail were correct. However, the physical scenario causing the issue was not captured by the physics models employed in the EDL simulator:
From SIB analysis it can be concluded that oscillation of parachute forces due to parachute area oscillations can explain alone the unexpected high rates just after parachute inflation. The phenomena of parachute area oscillations was not considered in the multi body model used in the E2E simulator.
Based on the analyses performed, the SIB consider that rates saturating the IMU were to be expected for the EDM system and parachute deployment conditions and configuration.http://exploration.esa.int/science-e/www/object/doc.cfm?fobjectid=59175
That is very honest and upstanding and offers a lesson to other engineers: if something goes wrong, go to the bottom of it to learn for the future. Write it up and remember. Don’t try to hide your failures. In this case, the simulator simply did not capture an aspect of the physical reality that turned out to be crucial. It is one small mistake in a gigantic project with a very ambitious simulator design, implementation, and validation.
Building trust in the model
The ESA report notes that the correctness of the EDL simulator is critical for valid testing. You cannot just throw something together and expect it to work as a representative for the real world. Instead, each simulator module must be painstakingly designed, analyzed, and verified on its own in order to have confidence in the results.
The EDL E2E simulator, is the unique tool used to verify the capability of the EDM to fulfil the key Mission and System requirements in terms of Entry, Descent and Landing performance, before it actually happens on Mars. It consists of a high number of models, which each must be validated with test and analysis, in order to make the E2E simulation valid. The EDL E2E simulator is used to perform Monte Carlo simulations taking into account a certain spread of uncertainties and variations in the driving parameters of each sub-model. The results of these simulations are used to define the design margins. This approach is State of the Art, also used by JPL/NASA. One mistake in any of the sub-models can ruin everything so each sub-model must be thoroughly validatedhttp://exploration.esa.int/science-e/www/object/doc.cfm?fobjectid=59175
The last sentence is so very true, and so very painful when you know how hard it is to build software without mistakes. Building a simulator is doubly difficult, as you have to build a correct implementation of a correct specification or understanding of the world. An error in either the implementation OR the specification/model design can lead to incorrect results.
Doing things differently?
There are two things from this case that appear like they could have been done differently.
One is obviously to have considered more possibilities in the model, which is basically what the ESA report says. The physics models did not capture all physical effects. I guess this goes to the “bring in more perspectives” design method, as I cannot see how you could randomize your way to including more aspects of reality into a model.
The other would be to test the mission software without a physics model at all, subjecting it to inputs generated using some different method. This sounds easy to do, but could be hard to do right. It is one thing to unit test classic computational or functional code like most software developers do. Send in inputs, see if the code manages to handle it, done, at least as long as there is not too much of a build-up of state.
A control system is a feedback system where inputs cause outputs, and the system expects the outputs to have on future inputs. Sending in totally bizarre input values might just be meaningless unless the test system also tries to do something with the outputs of the system. It might work to test the robustness of the code, but it might be meaningless when considering the control laws they implement, providing little actual test value.
That said, testing code with the physics model but overriding certain inputs to simulate hardware or software failures in the input path does make sense. Whether that would have been possible to do in the case of the Schiaparelli lander I have no insight