EETimes

Embedded Systems November 2000 Vol13_12

Issue link: http://dc.ee.ubm-us.com/i/71849

Contents of this Issue

Navigation

Page 115 of 189

Say th e h ardwa re provides a Main Loop of Code lOOms timeout, but your policy says tha t you only want to check the sys- tem for sanity eve ry 300ms. You will have to kick the dog at an inte rval shorte r than lOOms, but o nly do the sani ty check eve ry third time th e kick functi o n is call ed . This approach may not be suitable for a single loop design if the main loop could take longe r than lOOms to execute . One possibili ty is to move t11e sani- If sanity checks OK Kick the dog else Record failure ty check out to an interrupt. The inter- rupt would be called every lOOms, and would then kick the dog. On every third interrupt the interrupt function would check a flag that indicates that the main loop is still spinning. This flag is set at the end of the main loop, and cleared by the interrupt as soon as it has read it. If you take t11e approach of kicking will be no false watchdog resets. On a medical ventilator, 10 seconds would have been far too long to leave the patient unassisted, but if the device can recover within a second then the failure will have minimal impact, so a choice of a 500ms timeout might be appropriate. When making such cal- culations be sure to include the time taken for the device to start up as well as the timeout time of the watchdog itself. One real-life example i the Space Shuttle 's main engine controller. 1 The watchdog timeout is set at l 8ms, which is shorter than one maj or control cycle. The response to the watchdog biting is to switch over to t11e backup compute r. This mechanism allows control to pass from a failed computer to the backup before the engine has time to pe r form any irreve rsible actions. While on the subject of timeouts, it is worth pointing out that some watch- dog circui ts allow the very first timeout to be considerably longer than the timeout u ed for the rest of the peri- odic checks. This allows the processor time to initialize, without having to worry about t11e watchdog biting. While the watchdog can often respond fas t enough to halt mechani- cal systems, it offers little protection for damage t11at can be done by software alone. Consider an area of non-volatile RAM which may be overwritten with rubbish data if some loop goes out of control. It is likely that such an over- write would occur far faster than a watchdog could detect the fault. For those situations you need some other protection such as a checksum. The watchdog is really just one layer of pro- tection, and should form part of a com- prehensive safety net. Multiplying the interval If you are not building the watchdog hardware yourself, th en you may have little say in determining t11e longest interval available. On some microcontrollers the bui lt-in watch- dog has a maximum timeout on the order of a few hundred mill iseconds. It you decide t11at you want more time, you need to multiply that in software. 114 NOVEMBER 2000 Embedded Systems Programming the watchdog from an interrupt, it is vital to have a check on the main loop, such as the one described in the previ- ous paragraph. Otherwise it is possible to get into a situation where t11e main loop has hung, but the inte rrupt con- tinues to kick t11e dog, and the watch- dog never gets a chance to reset the system. Self-test Assume that t11e watchdog hardware fails in such a way that it neve1 · bites. How would you ever know? When the system works, such a fault is not appar- ent. The fault would only be discov- ered when some fai lure t11at normally leads to a reset, instead leads to a hung system. If such a failure was accept- able, you would never have bot11ered with the watchdog in the first place. If you th ink watchdog fai lure is a rare thing, think again . Many systems contain a means to disable the watch- dog, like a jumper that connects the watchdog output to the reset line. This is necessary fo1 · some test modes, and for debugging with any tool that can halt the program. If the jumper falls out, or a service engineer who removed t11e jumper for a test forgets

Articles in this issue

Archives of this issue

view archives of EETimes - Embedded Systems November 2000 Vol13_12