Embedded Systems November 2000 Vol13_12

Issue link:

Contents of this Issue


Page 182 of 189

Jack G. Cianssle Crash And Burn We're not terribly good at learn- ing from our succe ses; mug with the satisfaction of a j ob well done, most of us pt·oceed immediately to the next task a t hand. It's a shame that we can' t look at a succesful proj ect and then dig deeply in to what happened, bow, and when, to suck the educational content of the pmj ect dry. Ah, but failures are indeed a diffe r- e nt story. High-profil e di saste rs inevitably produce an investigation, calls for Congress to "do some thing," and, in the best of circumstances, a change in the way things are built so the accident is not repeated. Isn't it astonishing that aiqJlane trav- el is o reliable? That we can zip around the sky at 600 knots, seven miles up, in an ineffably complex device created by flawed people? Perhaps aviation's impressive safety record is a by-product of the way the industry manages fail- ures. Every crash is investigated; each yields new training requirements, new design mods, or other system changes to eliminate or reduce the probabili ty of such a disaster sttiking again. Though crashes are rare, they do occur, so airlin ers carry expensive flight data recorders whose sole pur- po e is to produce post-accident clues to the safety board. What a hame that we firmware folks don't have a similar atti tude. Mo tly we' re astonished when our ystems break or a bug surfaces. I hope that in the future we learn to wri te code proactively, expecting bug and problems but finding or trapping tl1em early, and leaving a trail of clues as to what went wr ong. I believe we houJd examine disasters, our own and others, because so many embedded systems crash in similar ways. I coiJect embedded disaster stoties not from morbid fascination but because I think they offer universal lessons. Here are a few tl1at are instructive. NEAR On December 20, 1998, the Near Earth Asteroid Rendezvous spacecraft, afte r tl1ree years enroute to 433 Eros, exe- wi th 433 Eros wo uld happen 13 months later than planned. Like so many system failures, a seri es of events, each not tet-ribly criti- cal, led to the fuel dump. Immediately after the engine fired up for the planned 15 minute burn, accelemnieters detected a lateral accel- eration that exceeded a limit pro- grammed into the firmware. This momentary under-one-second tran- Analyzing past failures is one of the best ways to prevent them from happening again. Here, Jack distills some lessons from high-profile disasters. cuted a main engine burn intended to place tl1e vehicle in orbit about the asteroid. The planned 15-minute burn aborted almost immediately; firmware put tl1e spacecraft into a safe mode, as planned in case of such a contingency. But then NEAR unexpectedly went silent. Twenty-seven hours later com- munications resumed, but ground con- trollers found that two-thirds of the mission's fuel had been dumped. Controlle rs spent a few days ana- lyzing data to understand what hap- pened, then initiated a seri es of burns that will ultimately lead to EAR's successful rendezvous with the aster- oid. But two thirds of the spacecraft 's fuel had been dumped, using all of the mi sia n's reserves. Enough fu el was left-ba rely-to comple te the original goals of the miss ion. But reduced fuel means things happen more slowly, so NEAR's rendezvous sient was not out of bounds for the mechanical configuration of the space- craft. But the propulsion unit is can- tilevered from the base of the space- craft, creating a bending response that, according to tl1e report, "was not appreciated."• Quoting fu rther, "In ret- rospect, the correct thing fo r the G&C software to have done would have been to ignore (blank out) the accelerome- ter readings dut·ing the brief transient period." In other words, though the transient wasn't anticipated, tl1e oft- ware was too aggressive in qualifying accelerometer inputs. With the software figuring lateral moveme nt exceeded a pre-pro- grammed limi t, it shut tl1e motor down and put the spacecraft into a safe mode. The fi rmware used tl1rusters to rotate EAR to an earth-safe atti tude. Code t11en ran a script designed to change over from th n.tsters to • ·eaction wheels Embedded Systems Programming NOVEMBER 2ooo 179

Articles in this issue

Archives of this issue

view archives of EETimes - Embedded Systems November 2000 Vol13_12