EETimes

Embedded Systems November 2000 Vol13_12

Issue link: http://dc.ee.ubm-us.com/i/71849

Contents of this Issue

Navigation

Page 139 of 189

All non-faulty sites will correctly relay the value they received from the coordinator. Faulty sites, on the other hand, are free to do as they choose, including sending the wrong value or not sending a value at all. • The communication medium is reliable (that is, no possibility of failures such as loss, duplication, or reordering of messages) • The absence of a me sage can be detected • The receiver of a message knows unequivocally the identity of the sender (that is, the sender's identi- ty cannot be forged) • It is possible to positively determine that a message was not ent by a particular site (as opposed to being lost in the medium) Clearly, considering our previous discussions on the difficulty of achiev- ing these conditions in distributed sys- tem , these are rather significant con- straints on the applicability of the algorithm. They are introduced to simplify the problem. Variants of the algorithm have been developed that relax some of these conditions. In the Lamport-Shostak-Pease algo- rithm, one of the sites initiates the agreement process. We refer to this site as the coordinator while the other sites are called followers. The algo- rithm generally proceeds as follows: • The coordinator sends its value to each of the followers • Each follower relays the value that it receives from the coordinator to all other followers (but not the coordinator). If no value wa received from the coordinator, a shared default value used instead. In effect, the follower acts as a coordinator for the reduced set of ites that excludes the coordina- tor of the previous round. Note that this "relaying" step is per- formed in parallel by each of the followers. This means that there can be multiple rounds of relaying, until each site finds itself in a situa- tion where it has no followers left • Each follower chooses the majority value, or, if no values are received at all, it chooses the shared default value. (In case of ties, each site invokes the ame method for breaking the tie, thereby guaran- teeing agreement) All non-faulty sites will correctly relay the value they received from the coordinator. Faulty site , on the otl1er hand, are free to do as they choose, including sending the wrong value or not sending a value at all. Note tl1at, if tl1ere is no majority, the algorithm cannot work. In fact, it can be shown that, in the case of n faulty sites, the algorithm only works if there are at least (3n+l) sites. This means that the algorithm can only work if there are at least four sites. (A variant of the algorithm in which faulty followers are constrained to cor- rectly relay the coordinator's value only requires [2n+ 1] sites.) This algorithm works even if the coordinator is faulty, becau e it guar- antees that all non-faulty sites will agree on the same value. Further details, including a proof of correct- ness of the algorithm, can be found in Lamport, Shostak, and Pease. Distributed mutual exclusion The basic principle here is that a site wanting to enter a critical section must inform all the other sites and then wait until it has been given approval by all other sites. A request for entering a crit- ical section is broadcast reliably to all tl1e other sites. The request contains the identity of the requesting site as well as a timestamp of tl1e request.6 The time- stamp is required to ensure faimess. vVhen a site S receives a request from site R to enter a critical ection, it will either give its permission immedi- ately or defer the reply until later. The algorithm that it follows is: 138 NOVEMBER 2000 Embedded Systems Programming • If Sis already in a critical section, it delays sending its permission • If S i not waiting to enter a critical section, granting penni sion • If S is itself waiting to enter a criti- cal section, it compares the time- stamp of its own request with the timestamp of R's request and, if tl1e latter is earlier, it replies immedi- ately with its permi ion. However, if S's timestamp is earli er, then it postpones its pem1ission • Postponed permissions are granted as soon as S exits its critical section • A site does not enter its critical sec- tion until all other sites have given it permission Distributed election A common problem that occurs in cer- tain kinds of distributed systems is the distributed election jJroblem. In this case, it is necessary to elect a leader among a set of peers. The peers at-e symmetric in the sense that they are all equally capable of being leaders, but only one is required to perform the functions of the leader. The advantage of this scheme is that if a leader fails , the remaining sites can simply undertake another election and select a new leader. This avoids a single point of failure and increases the potential sys- tem availability. A widely used election algorithm is called the Bully algorithm [3]: a site entering the election procedure first sends its bid to all other sites. The bid is typically based on some quality mat is unique to each site, such as a site identifier, so that there are no dupli- cate bids. If a leader already exists, it responds to me new site and me new site simply enters the monitoring state. In mis State, the site monitors me operational status of the elected leader (for example, mrough a peri- odic poll). If no leader exists, the candidate site waits to receive me bids of all other candidates mat it receives in response to its own bid. Once these are all in, each site evaluates the bids. it replies immediately

Articles in this issue

Archives of this issue

view archives of EETimes - Embedded Systems November 2000 Vol13_12