Embedded Systems November 2000 Vol13_12

Issue link:

Contents of this Issue


Page 187 of 189

BREAK POINTS 8 Inadequate testing appears repeatedly as a theme in disasters. The NEAR team had simulators and prototypes, but these test platforms worked poorly. Their fidelity was suspect, leaving the engineers to wonder, when problems surfaced, if the element at fault was the simulator or the code. Several common threads run specification. "3 Clearl y, a decision that doomed the project to failure. Here 's a case where the firmware was, in fact, perfect-if pe rfection is mea- sured by how well the code meets the spec. Again, "The supplier of the SRl was only following the specification given it." As with NEAR, Ariane's crash r·esulted from a series of coupled events rather than any single problem. The exception was largely a result of poor specification. But designers did realize that some variables might go out of range; in fact they specifically wrote code to monitor fo ur of the even critical variables. Why were three left exposed? An assumption was made that physical limits made it impossible for these three to overflow (an assumption that proved expensive- ly faulty). Further, a target of 80% processor loading meant that check- ing all calculations would be prohibi- tively expensive. But the exception itself didn 't cause A.riane 's crash. When both SRls failed, they did so gracefully, and even returned diagnostic data to the vehi- cle's main computer that indicated the flight data was invalid. But the main computer ignored the diagnostic bit, assumed the data was valid, and used this incorrect information to guide the vehicl e. As a result of trying to use bad data, the computer commanded the engine nozzles to hard-over deflection, result- ing in the tumbling and de truction of the rocket. To complicate the picture further, the floating-point operation that over- flowed was not even a calculation required for normal flight operations. It was left-over code, a relic of the firmware's Ariane 4 heritage, code that had meaning only before lift-off. The review board also noted that, though testing of the SRl is hard, it's quite possible and (gasp!) maybe even a good idea: "Had such a test been per- formed by the supplier or as part of the acceptance test the fai lure mechanism would have been exposed." To summarize: poorly tested code that should not have been running caused a floating point conversion error because the spec didn 't call for an understanding of real flight dynam- ics. In an effort to keep processor load- ing low, the variables involved weren't monitored, though others were. Two redundant SRls running the same code performed identically and shut down. The main computer ignored the SRl "bad data" bit and tried to fly using corrupt information. Another interesting tidbit from the report: " ... the view had been taken that software should be considered correct until it is shown to be at fault." This is the rationale behind using identical code on redundant SRls. It does beg the question of why sufficient te ting to isolate those potential soft- ware faults was not performed. Conclusion My embedded disaster collection grows daily. I expect, as embedded sys- tems become ever more pervasive, that no end is in ight to the firmware crisis we' ll all experience. through many of these stories. The first is that of error handling. Look at Ariane: when the software fai led, it prope rly set a diagnostic bit that meant "ignore this data." Yet the main CPU blithely carried on, ignoring the error bit instead. Inadequate testing, too, appears repeatedly as a theme in disaste rs. The NEAR team had simulators and proto- types, but these test platforms worked poorly. Their fidelity was suspect, leav- ing the engineers to wonder, when problems surfaced, if the element at fault was the simulator or the code. Ariane, too, had poor simulators and thus only partially tested software. On Clementine it appears that some code was not tested at all. Interprocessor communications is a con tant source of trouble. Though I'm a great believer in using multiple CPUs to reduce software complexity and workload, problems result when too much comm is required. NEAR's computers ran into r ·ace conditions. Ariane's error bit was disregarded . Those who don' t learn from the esp past are sentenced to repeat it. j ack G. Ganssle is a lecturer and consul- tant on embedded develofJment issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. He founded two companies spe- cializing in embedded systems. Contact him at jack@ganssle. corn. References 1. "The NEAR Rendezvous Burn Anomaly of December 1998." Available at near.jhuapl. edu/anoml index.html 2. "Lessons Learned from the Clementine Mission," NASA/CR report 97-207442. 3. " Ariane 5, Flight 501 Failure, " Report by the Inquiry Board. Available at rk.! richcontent!Reports / Failure_reportsl Ariane501. htm. EMBEDDED SYSTEMS PROGRAMMING (ISSN 1040~3272) Is published monthly, with an additional rssuc published In September, by CMP Media, 525 Market St., Stc. 500, San Francisco, CA 94105, (415} 905·2200 Please d1rect advertising and editorial mquiries to th1s address. SUBSCRIPTION RATE tor the United States is $55 f<>r 13 tssues. Canadian/Mexican orders musl be accompanied by payment In U.S. funds w1th additional postagt> of $6 per year. All other foreign subscriptions must be prepaid in US. funds w1th additional postage of S15 per year for surf.lCe mail and $40 per year for runnail. POSTMASTER. All subscription orders, lnqu1nes, and address changes should be sent to EMBEDDED SYSTEMS PROGRAMMING, P.O. BoK 3404, Northbrook, ll60065-9468. For custorner serv~ce, telephone toll-free (877)676-9745. Please allow four to SiK weeks for change of address to t.1ke effect. Periodicals postage IS paid at San Francisco, CA and additional m,liling offices. Ride-along enclosed In verstons 4, 5, 7, and 8. EMBEDDED SYSTEMS PROGRAMMING Is a regtrtered trademark owned by the parent company, CMP Medra. AU matenal publl\hed 10 EMBEDDED SYSTEMS PROGRAMMING ts copynght C 1999 by CMP Medta. AU right

Articles in this issue

Archives of this issue

view archives of EETimes - Embedded Systems November 2000 Vol13_12