Tải bản đầy đủ
6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm

# 6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm

Tải bản đầy đủ

322

Distributed mutual exclusion algorithms

interesting observation: when a site is waiting to execute the CS, it need not
receive REPLY messages from every other site. To enter the CS, a site only
needs to receive a REPLY message from the site whose request just precedes
its request in priority. For example, if sites Si1 ,Si2 , ..Sin have a pending request
for CS and the request of Si1 has the highest priority and that of Sin has the
lowest priority and the priority of requests decreases from Si1 to Sin , then a site
Sik only needs a REPLY message from site Sik−1 , 1 < k ≤ n to enter the CS.

9.6.1 System model
Each request is assigned a priority ReqID and requests for CS access are
granted in the order of decreasing priority. We will defer the details of
what ReqID is composed of to later sections. The underlying communication
network is assumed to be error free.
Definition 9.1 Ri and Rj are concurrent iff Pi ’s REQUEST message is
received by Pj after Pj has made its request and Pj ’s REQUEST message is
received by Pi after Pi has made its request.
Definition 9.2 Given Ri , we define the concurrency set of Ri as follows:
CSeti = {Rj Ri is concurrent with Rj } {Ri }.

9.6.2 Description of the algorithm
Algorithm 9.4 uses three types of messages (REQUEST, REPLY, and
FLUSH) and obtains savings on the number of messages exchanged per
CS access by assigning multiple purposes to each. For the purpose of
blocking a mutual exclusion request, every site Si has a data structure
called local_request_queue (denoted as LRQi ), which contains all concurrent
requests made with respect to Si ’s request, and these requests are ordered
with respect to their priority.
All requests are totally ordered by their priorities and the priority is determined by the timestamp of the request. Hence, when a process receives a
REQUEST message from some other process, it can immediately determine
if it is allowed to access the CS before the requesting process or after it.
In this algorithm, messages play multiple roles and this will be discussed
first.
Multiple uses of a REPLY message
1. A REPLY message acts as a reply from a process that is not
requesting.
2. A REPLY message acts as a collective reply from processes that have
higher priority requests.
A REPLY(Rj ) from a process Pj indicates that Rj is the request made by Pj for
which it has executed the CS. It also indicates that all the requests with priority
≥ priority of Rj have finished executing CS and are no longer in contention.

323

9.6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm

Thus, in such situations, a REPLY message is a logical reply and denotes
a collective reply from all processes that had made higher priority requests.

Uses of a FLUSH message
Similar to a REPLY message, a FLUSH message is a logical reply and denotes
a collective reply from all processes that had made higher priority requests.
After a process has exited the CS, it sends a FLUSH message to a process
requesting with the next highest priority, which is determined by looking up
the process’s local request queue. When a process Pi finishes executing the
CS, it may find a process Pj in one of the following states:
1. Rj is in the local queue of Pi and located in some position after Ri , which
implies that Rj is concurrent with Ri .
2. Pj had replied to Ri and Pj is now requesting with a lower priority. (Note
that in this case Ri and Rj are not concurrent.)
3. Pj ’s requst had higher priority than Pi ’s (implying that it had finished the
execution of the CS) and is now requesting with a lower priority. (Note
that in this case Ri and Rj are not concurrent.)
A process Pi , after executing the CS, sends a FLUSH message to a process
identified in state 1 above, which has the next highest priority, whereas it
sends REPLY messages to the processes identified in states 2 and 3 as their
requests are not concurrent with Ri (the resuests of processes in states 2 and 3
were deferred by Pi till it exits the CS). Now it is up to the process receiving
the FLUSH message and the processes recieving REPLY messages in states
2 and 3 to determine who is allowed to enter the CS next.
Consider a scenario where we have a set of requests R3 , R0 , R2 , R4 , R1
ordered in decreasing priority, where R0 , R2 , R4 are concurrent with one
another, then P0 maintains a local queue of [ R0 , R2 , R4 ] and, when it exits
the CS, it sends a FLUSH (only) to P2 .
Multiple uses of a REQUEST message
Considering two processes Pi and Pj , there can be two cases:
Case 1 Pi and Pj are not concurrently requesting. In this case, the process
which requests first will get a REPLY message from the other process.
Case 2 Pi and Pj are concurrently requesting. In this case, there can be
two subcases:
1. Pi is requesting with a higher priority than Pj . In this case, Pj ’s
REQUEST message serves as an implicit REPLY message to Pi ’s
request. Also, Pj should wait for REPLY/FLUSH message from some
process to enter the CS.
2. Pi is requesting with a lower priority than Pj . In this case, Pi ’s
REQUEST message serves as an implicit REPLY message to Pj ’s
request. Also, Pi should wait for REPLY/FLUSH message from some
process to enter the CS.

324

Distributed mutual exclusion algorithms

(1) Initial local state for process Pi :
• int My_Sequence_Number i = 0
• array of boolean RV i j = 0, ∀j ∈ {1...N }
• queue of ReqID LRQi is NULL
• int Highest_Sequence_Number_Seeni = 0
(2) InvMutEx: Process Pi executes the following to invoke mutual exclusion:
(2a) My_Sequence_Numberi = Highest_Sequence_Number_Seeni + 1.
(2b) LRQi = NULL.
(2c) Make REQUEST(Ri ) message, where Ri = (My_Sequence_
Numberi , i).
(2d) Insert this REQUEST in LRQi in sorted order.
(2e) Send this REQUEST message to all other processes.
(2f) RVi k = 0∀k ∈ 1 … N − i . RVi [i]=1.
(3) RcvReq: Process Pi receives REQUEST(Rj ), where Rj = SN j , from
process Pj :
(3a) Highest_Sequence_Number_Seeni = max(Highest_Sequence_
Number_Seeni , SN).
(3b) If Pi is requesting:
(3bi) If RV i [j] = 0, then insert this request in LRQi (in sorted order) and
mark RV i [j] = 1. If (CheckExecuteCS), then execute CS.
(3bii) If RV i [j] = 1, then defer the processing of this request, which will
be processed after Pi executes CS.
(3c) If Pi is not requesting, then send a REPLY(Ri ) message to Pj . Ri
denotes the ReqID of the last request made by Pi that was satisfied.
(4) RcvReply: Process Pi receives REPLY(Rj ) message from process Pj . Rj
denotes the ReqID of the last request made by Pj that was satisfied:
(4a) RVi [j] =1.
(4b) Remove all requests from LRQi that have a priority ≥ the priority
of Rj .
(4c) If (CheckExecuteCS), then execute CS.
(5)

FinCS: Process Pi finishes executing CS:
(5a) Send FLUSH(Ri ) message to the next candidate in LRQi . Ri denotes
the ReqID that was satisfied.
(5b) Send REPLY(Ri ) to the deferred requests. Ri is the ReqID
corresponding to which Pi just executed the CS.
(6) RcvFlush: Process Pi receives a FLUSH(Rj ) message from a process Pj :
(6a) RVi [j] = 1
(6b) Remove all requests in LRQi that have the priority ≥ the priority of Rj .
(6c) If (CheckExecuteCS) then execute CS.
(7) CheckExecuteCS: If (RVi [k] = 1, ∀k ∈ {1
N }) and Pi ’s request is at
the head of LRQi , then return true, else return false.
Algorithm 9.4 Lodha and Kshemkalyani’s fair mutual exclusion algorithm [13].

325

9.6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm

Examples
• Figure 9.11 Processes P1 and P2 are concurrent and they send out
REQUESTs to all other processes. The REQUEST sent by P1 to P3 is
delayed and hence is not shown until in Figure 9.13.
• Figure 9.12 When P3 receives the REQUEST from P2 , it sends REPLY
to P2 .
• Figure 9.13 The delayed REQUEST of P1 arrives at P3 and at the same
time, P3 sends out its REQUEST for CS, which makes it concurrent with
the request of P1 .
• Figure 9.14 P1 exits the CS and sends out a FLUSH message to P2 .
• Figure 9.15 Since the requests of P2 and P3 are not concurrent, P2 sends
a FLUSH message to P3 . P3 removes (1,1) from its local queue and enters
the CS.
The data structures LRQ and RV are updated in each step as discussed
previously.

9.6.3 Safety, fairness and liveness
Proofs for safety, fairness and liveness are quite involved and interested
readers are referred to the original paper for detailed proofs.

9.6.4 Message complexity
To execute the CS, a process Pi sends (N −1) REQUEST messages. It receives
(N − CSeti ) REPLY messages. There are two cases to consider:
1.

CSeti ≥ 2. There are two subcases here:
(a) There is at least one request in CSeti whose priority is smaller than
that of Ri . So Pi will send one FLUSH message. In this case the total
number of messages for CS access is 2N − CSeti . When all the
requests are concurrent, this reduces to N messages.
(b) There is no request in CSeti , whose priority is less than the priority of
Ri . Pi will not send a FLUSH message. In this case, the total number
of messages for CS access is 2N − 1− CSeti . When all the requests
are concurrent, this reduces to N − 1 messages.

2. CSeti = 1. This is the worst case, implying that all requests are satisfied
serially. Pi will not send a FLUSH message. In this case, the total number
of messages for CS access is 2(N − 1) messages.

326

Distributed mutual exclusion algorithms

Figure 9.11 Processes P1 and
P2 send out REQUESTs.

P1

P2

P3
Figure 9.12 P3 sends a REPLY
message to P2 only.

P1

P2

P3
P1 enters the CS

Figure 9.13 P3 sends out a
REQUEST message.

P1

P2

P3
Figure 9.14 P1 exits the CS
and sends a FLUSH message
to P2 .

P1 enters the CS
P1
P1 sends a FLUSH
message to P2
P2

P3

327

9.7 Quorum-based mutual exclusion algorithms

P1 enters the CS

Figure 9.15 P3 enters the CS.

P1
P1 sends a FLUSH
message to P2
P2
P2 sends a FLUSH
message to P3
P3

P3 enters the CS

REQUEST from P1
REQUEST from P2
REQUEST from P3
FLUSH message

9.7 Quorum-based mutual exclusion algorithms
Quorum-based mutual exclusion algorithms respresented a departure from the
trend in the following two ways:
1. A site does not request permission from all other sites, but only from a
subset of the sites. This is a radically different approach as compared to
the Lamport and Ricart–Agrawala algorithms, where all sites participate
in conflict resolution of all other sites. In quorum-based mutual exclusion
algorithm, the request set of sites are chosen such that ∀i ∀j : 1 ≤ i j ≤
N :: Ri ∩ Rj = . Consequently, every pair of sites has a site which
mediates conflicts between that pair.
2. In quorum-based mutual exclusion algorithm, a site can send out only one
REPLY message at any time. A site can send a REPLY message only after
it has received a RELEASE message for the previous REPLY message.
Therefore, a site Si locks all the sites in Ri in exclusive mode before
executing its CS.
Quorum-based mutual exclusion algorithms significantly reduce the message
complexity of invoking mutual exclusion by having sites ask permission from
only a subset of sites.
Since these algorithms are based on the notion of “Coteries” and “Quorums,” we first describe the idea of coteries and quorums. A coterie C is

328

Distributed mutual exclusion algorithms

defined as a set of sets, where each set g ∈ C is called a quorum. The following
properties hold for quorums in a coterie:
• Intersection property For every quorum g, h ∈ C, g ∩ h = ∅.
For example, sets 1,2,3 , 2,5,7 , and 5,7,9 cannot be quorums in
a coterie because the first and third sets do not have a common
element.
• Minimality property There should be no quorums g, h in coterie C
such that g ⊇ h. For example, sets {1,2,3} and {1,3} cannot be quorums in
a coterie because the first set is a superset of the second.
Coteries and quorums can be used to develop algorithms to ensure mutual
exclusion in a distributed environment. A simple protocol works as follows:
let “a” be a site in quorum “A.” If “a” wants to invoke mutual exclusion,
it requests permission from all sites in its quorum “A.” Every site does the
same to invoke mutual exclusion. Due to the Intersection property, quorum
“A” contains at least one site that is common to the quorum of every other
site. These common sites send permission to only one site at any time. Thus,
mutual exclusion is guaranteed.
Note that the Minimality property ensures efficiency rather than correctness.
In the simplest form, quorums are formed as sets that contain a majority of
sites. There exists a variety of quorums and a variety of ways to construct
quorums. For example, Maekawa
[14] used the theory of projective planes

to develop quorums of size N .

9.8 Maekawa’s algorithm
Maekawa’s algorithm [14] was the first quorum-based mutual exclusion algorithm. The request sets for sites (i.e., quorums) in Maekawa’s algorithm are
constructed to satisfy the following conditions:
M1 (∀i ∀j : i = j, 1 ≤ i j ≤ N :: Ri ∩ Rj = ).
M2 (∀i : 1 ≤ i ≤ N :: Si ∈ Ri ).
M3 (∀i : 1 ≤ i ≤ N :: Ri = K).
M4 Any site Sj is contained in K number of Ri s, 1 ≤ i j ≤ N .
Maekawa used the theory of projective√ planes and showed that
N = K K − 1 + 1. This relation gives Ri = N .
Since there is at least one common site between the request sets of any two
sites (condition M1), every pair of sites has a common site which mediates
conflicts between the pair. A site can have only one outstanding REPLY
message at any time; that is, it grants permission to an incoming request if it
has not granted permission to some other site. Therefore, mutual exclusion is

329

9.8 Maekawa’s algorithm

guaranteed. This algorithm requires delivery of messages to be in the order
they are sent between every pair of sites.
Conditions M1 and M2 are necessary for correctness; whereas conditions
M3 and M4 provide other desirable features to the algorithm. Condition M3
states that the size of the requests sets of all sites must be equal, which
implies that all sites should have to do an equal amount of work to invoke
mutual exclusion. Condition M4 enforces that exactly the same number of
sites should request permission from any site, which implies that all sites
have “equal responsibility” in granting permission to other sites.
In Maekawa’s algorithm, a site Si executes the steps shown in Algorithm 9.5
to execute the CS.

Requesting the critical section:
(a) A site Si requests access to the CS by sending REQUEST(i) messages
to all sites in its request set Ri .
(b) When a site Sj receives the REQUEST(i) message, it sends a REPLY(j)
message to Si provided it hasn’t sent a REPLY message to a site since
its receipt of the last RELEASE message. Otherwise, it queues up the
REQUEST(i) for later consideration.
Executing the critical section:
(c) Site Si executes the CS only after it has received a REPLY message from
every site in Ri .
Releasing the critical section:
(d)

After the execution of the CS is over, site Si sends a RELEASE(i)
message to every site in Ri .
(e) When a site Sj receives a RELEASE(i) message from site Si , it sends
a REPLY message to the next site waiting in the queue and deletes that
entry from the queue. If the queue is empty, then the site updates its state
to reflect that it has not sent out any REPLY message since the receipt
of the last RELEASE message.
Algorithm 9.5 Maekawa’s algorithm.

Correctness
Theorem 9.3 Maekawa’s algorithm achieves mutual exclusion.
Proof Proof is by contradiction. Suppose two sites Si and Sj are concurrently
executing the CS. This means site Si received a REPLY message from all
sites in Ri and concurrently site Sj was able to receive a REPLY message
from all sites in Rj . If Ri ∩ Rj = Sk }, then site Sk must have sent REPLY
messages to both Si and Sj concurrently, which is a contradiction.

330

Distributed mutual exclusion algorithms

Performance

Note that the√size of a request√ set is N . Therefore,
an execution of the

CS requires √N REQUEST, N REPLY, and N RELEASE messages,
resulting in 3 N messages per CS execution. Synchronization delay in this
algorithm is 2T . This is because after a site Si exits the CS, it first releases all
the sites in Ri and then one of those sites sends a REPLY message to the next
site that executes the CS. Thus, two sequential message transfers are required
between two successive CS executions. As discussed next, Maekawa’s algorithm is deadlock-prone. Measures to handle deadlocks require additional
messages.

9.8.1 Problem of deadlocks
Maekawa’s algorithm can deadlock because a site is exclusively locked by
other sites and requests are not prioritized by their timestamps [14,22]. Thus,
a site may send a REPLY message to a site and later force a higher priority
request from another site to wait.
Without loss of generality, assume three sites Si , Sj , and Sk simultaneously
invoke mutual exclusion. Suppose Ri ∩ Rj = Sij }, Rj ∩ Rk = Sjk }, and
Rk ∩ Ri = Ski . Since sites do not send REQUEST messages to the sites
in their request sets in any particular order and message delays are arbitrary,
the following scenario is possible: Sij has been locked by Si (forcing Sj to
wait at Sij ), Sjk has been locked by Sj (forcing Sk to wait at Sjk ), and Ski has
been locked by Sk (forcing Si to wait at Ski ). This state represents a deadlock
involving sites Si , Sj , and Sk .

Maekawa’s algorithm handles deadlocks by requiring a site to yield a lock if
the timestamp of its request is larger than the timestamp of some other request
waiting for the same lock (unless the former has succeeded in acquiring locks
on all the needed sites) [14, 22]. A site suspects a deadlock (and initiates
message exchanges to resolve it) whenever a higher priority request arrives
and waits at a site because the site has sent a REPLY message to a lower
priority request.
Deadlock handling requires the following three types of messages:
FAILED A FAILED message from site Si to site Sj indicates that Si
cannot grant Sj ’s request because it has currently granted permission to
a site with a higher priority request.
INQUIRE An INQUIRE message from Si to Sj indicates that Si would
like to find out from Sj if it has succeeded in locking all the sites in its
request set.
YIELD A YIELD message from site Si to Sj indicates that Si is returning
the permission to Sj (to yield to a higher priority request at Sj ).

331

9.9 Agarwal–El Abbadi quorum-based algorithm

Details of how Maekawa’s algorithm handles deadlocks are as follows:
• When a REQUEST(ts, i) from site Si blocks at site Sj because Sj has
currently granted permission to site Sk , then Sj sends a FAILED(j) message
to Si if Si ’s request has lower priority. Otherwise, Sj sends an INQUIRE(j)
message to site Sk .
• In response to an INQUIRE(j) message from site Sj , site Sk sends a
YIELD(k) message to Sj provided Sk has received a FAILED message
from a site in its request set and if it sent a YIELD to any of these sites,
but has not received a new REPLY from it.
• In response to a YIELD(k) message from site Sk , site Sj assumes as if it
has been released by Sk , places the request of Sk at appropriate location in
the request queue, and sends a REPLY (j) to the top request’s site in the
queue.
Thus, Maekawa-type algorithms require extra messages to handle deadlocks and may exchange these messages even though there is no deadlock.
The√maximum number of messages required per CS execution in this case
is 5 N .

9.9 Agarwal–El Abbadi quorum-based algorithm
Agarwal and El Abbadi [1] developed a simple and efficient mutual exclusion
algorithm by introducing tree quorums. They gave a novel algorithm for
constructing tree-structured quorums in the sense that it uses hierarchical
structure of a network. The mutual exclusion algorithm is independent of
the underlying topology of the network and there is no need for a multicast
facility in the network. However, such facility will improve the performance
of the algorithm. The mutual exclusion algorithm assumes that sites in the
distributed system can be organized into a structure such as tree, grid, binary
tree, etc. and there exists a routing mechanism to exchange messages between
different sites in the system.
The Agarwal–El Abbadi quorum-based algorithm, however, constructs quorums from trees. Such quorums are called “tree-structured quorums.” The
following sections describe an algorithm for constructing tree-structured quorums and present an analysis of the algorithm and a protocol for mutual
exclusion in distributed systems using tree-structured quorums.

9.9.1 Constructing a tree-structured quorum
All the sites in the system are logically organized into a complete binary tree.
To build such a tree, any site could be chosen as the root, any other two sites
may be chosen as its children, and so on. For a complete binary tree with

332

Distributed mutual exclusion algorithms

level “k,” we have 2k+1 − 1 sites with its root at level k and leaves at level 0.
The number of sites in a path from the root to a leaf is equal to the level of
the tree k + 1, which is equal to O(log n). There will be 2k leaves in the tree.
A path in a binary tree is the sequence a1 , a2 , … , ai , ai+1 , … , ak such that
ai is the parent of ai+1 .
The algorithm for constructing structured quorums from the tree is given
in Algorithm 9.6. For the purpose of presentation, we assume that the tree is
complete, however, the algorithm works for any arbitrary binary tree.

(1) FUNCTION GetQuorum(Tree: NetworkHierarchy): QuorumSet;
(2)
VAR left, right: QuorumSet;
(3)
BEGIN
(4)
IF Empty (Tree) THEN
(5)
RETURN ({});
(6)
ELSE IF GrantsPermission(Tree↑.Node) THEN
(7)
RETURN((Tree↑.Node) ∪ GetQuorum (Tree↑.LeftChild));
(8)
OR
(9)
RETURN((Tree↑.Node) ∪ GetQuorum (Tree↑.RightChild));
(10)
ELSE
(11)
left ← GetQuorum(Tree↑.left);
(12)
right ← GetQuorum(Tree↑.right);
(13)
IF (left = ∅ ∨ right = ∅) THEN
(14)
(* Unsuccessful in establishing a quorum *)
(15)
EXIT(-1);
(16)
ELSE
(17)
RETURN(left ∪ right);
(18)
END; (* IF *)
(19)
END; (* IF *)
(20)
END; (* IF *)
(21) END GetQuorum
Algorithm 9.6 Algorithm for constructing a tree-structured quorum [1].

The algorithm for constructing tree-structured quorums uses two functions
called GetQuorum(Tree) and GrantsPermission(site) and assumes that there
is a well-defined root for the tree. GetQuorum is a recursive function that
takes a tree node “x” as the parameter and calls GetQuorum for its child node
provided that the GrantsPermission(x) is true. The GrantsPermission(x) is
true only when the node “x” agrees to be in the quorum. If the node “x” is
down due to a failure, then it may not agree to be in the quorum and the
value of GrantsPermission(x) will be false. The algorithm tries to construct
quorums in a way that each quorum represents any path from the root to
a leaf, i.e., in this case the (no failures) quorum is any set a1 , a2 , … , ai ,
ai+1 , … , ak , where a1 is the root and ak is a leaf, and for all i < k, ai is the
parent of ai+1 . If it fails to find such a path (say, because node “x” has failed),
the control goes to the ELSE block which specifies that the failed node “x”