Embedded Systems November 2000 Vol13_12

Issue link:

Contents of this Issue


Page 35 of 189

• It is possible to create an opera- tional profile to test against and assess reliability. Again, there is a great deal that can be predicted, but one can never completely and accurately predict the actual opera- tional profile before the fact Even if we are wary of these dan- gerous assumptions, we still have to recognize the limitations inherent in testing as a means of bringing quali ty to a system. First of all , testing cannot prove correctn ess. In other words, test- ing can show the existence of a defect, but not the absence of faul ts. The only way to p1·ove correctness via testing would be to hit all possible tates, which as we've tated previously, is fun- damentally intractable. Second, one can ' t always make con- fident predictions about the reliability of software based upon testing. To do so would require accurate statistical models based upon actual operating conditions. The challenge is that such conditions are seldom known wi th confidence until afte r a system is installed! Even if a previous system is in place, enough things may change between the two systems to rende r old data less than valuable. Third, even if you can test a prod- uct against its specification, it does not necessarily speak to the trustworthi- ness of the software. Trustworthiness has everything to do with the level of tru t we place in a system (of comse). Te ting can give us some idea con- cerning its relative reliabili ty, but may still leave us wary with respect to the safety-critical areas of the sy tern. For as many disclaimers as we've just presented, there still is a role for testing. At the ve ry least every boat should be put in wate r to see if it fl oats. No matter how much else you do correctly (such as inspections, reviews, formal methods, and so on) there is still a need to run the soft- ware through its paces. Ideally this te ting exercises the software in a manne r tha t closely resembles its actual ope rating environment and conditions. It should also focus strongly on the specific areas identi- fied as pote ntial hazards. Effecti ve testing in safety-c ritical software should involve independent validation. Where safety is concerned, there should be no risk of the conflict of in terest inherent in a development engineer testing his own code. Even when such an engineer is sincere, blind spots can and do exist. The same blind spot responsible for the creation of a fault will likely lead the engineer to not find that fault through the design of tests. Equally important, when independent validation engi- neers create a test suite, it should not be made available to development engineers. The rationale is the same that guides the GRE people to not give you the acn1al test you' re going to take as a study guide for taking it! Situations may a1 ·ise in which devel- opers must be involved in assessing the quali ty of the software they designed or implemented. In such cases, the test design should be written indepen- dently, development should not have acce s to the design until after imple- men tation, and the resul ts should be independently reviewed. Reviews and inspections. One of the most effective methods for eliminating problems early in the product life cycle is to employ reviews and inspec- tions of nearly every artifact created during development of the product. That applies to requirements docu- ments, high-level and low-level design documents, product documentation, and so on. In the case of safety-critical software, tl1ese reviews and inspections should be particularly focused around areas of potential hazard. Case study: Titanic The Titanic provides an interesting case study because it's particularly well known and the root causes are reason- ably well understood. It is particularly interesting to look at the three princi- ples of risk analysis and apply them to me famous ocean liner. 34 NOVEMBER 2000 Embedded Systems Programming Clearl y, sailing across the Atlantic Ocean carr ies ce rtain ri sks, most notably icebe rgs. Ships had previous- ly sunk afte r striking one. The design- ers of the Titanic had analyzed the hazards, and had discovered that no ship had ever expe rienced a rupture of more than four chambers. So they built their luxury liner with the capac- ity to su1·vive up to four compartment ruptures. Ma rketing dubbed the ship unsinkabl e, they under-equipped it with life boats, the crew got careless, and they struck an icebe rg, rupturing five compartment . The rest, as they say, is history. A number of reasonable solutions tragedy could have been to thi applied a t any of the levels we described earlier. Principle 1: avoid or remove the haz- ard. They could have chosen not to sail across the Atlantic-not a particu- larly viable option since people need- ed to get across the ocean. People were aware of the risk of crossing the ocean and found it acceptable. The crew could have chosen to take a more southerly route, which presumably was longer. This would have added time and cost, perhaps prohibitive given the business model under which tiley were operating at tile time. Any of these actions would have mitigated the risk of sailing in the first place, but for tl1e sake of argument we' ll assume that crossing the Atlantic needed to hap- pen and tilat tl1e route was economi- cally necessary. They still could have mitigated risk by following the next principle. Principle 2: if the hazard is unavoid- able, minimize the risk of accident. They could have set course just slight- ly far ther soutl1 , avoided tile higher risk areas of icebergs and not added considerably to tile travel time for the voyage. They could have put in place more accurate and timely reporting mecha nisms fo r spotting icebergs. They could have been more diligent in the radio shack and in the captain 's

Articles in this issue

Archives of this issue

view archives of EETimes - Embedded Systems November 2000 Vol13_12