Tải bản đầy đủ
9 Kshemkalyani–Singhal algorithm for the P-out-of- model

9 Kshemkalyani–Singhal algorithm for the P-out-of- model

Tải bản đầy đủ

366

Deadlock detection in distributed systems

Data structures: a node i has the following local variables:
waiti : boolean (:= false);
/*records the current status.*/
ti : integer (:= 0);
/*denotes the current time.*/
t_blocki : real;
/*denotes the local time when i blocked last.*/
in i : set of nodes whose requests are outstanding at node i.
out i : set of nodes on which node i is waiting.
pi : integer (:= 0);
/*the number of replies required for unblocking.*/
wi : real (:= 1.0);
/*keeps weight to detect the termination of
the algorithm.*/
Computation events:
REQUEST_SEND(i):
/*Executed by node i when it blocks on a pi -out-of-qi request.*/

For every node j on which i is blocked do
out i ← out i
{j};
send REQUEST(i) to j;
set pi to the number of replies needed;
t_blocki ← ti ;
waiti ← true;
REQUEST_RECEIVE(j):
/*Executed by node i when it receives a request made by j. */

in i ← in i

j .

REPLY_SEND(j):
/*Executed by node i when it replies to a request by j.*/

in i ← in i − j ;
send REPLY(i) to j.
REPLY_RECEIVE(j):
/*Executed by node i when it receives a reply from j to its request.*/

if valid reply for the current request
then
begin
out i ← out i − j ;
pi ← pi − 1;
pi = 0 →
{waiti ← false;
∀k ∈ out i , send CANCEL(i) to k;
out i ← ∅.}
end
CANCEL_RECEIVE(j):
/*Executed by node i when it receives a cancel from j.*/

if j ∈ in i then in i ← in i − j .
Algorithm 10.3 Kshemkalyani–Singhal algorithm for the P-out-of-Q model.

367

10.9 Kshemkalyani–Singhal algorithm for the P-out-of-Q model

of a WFG is a traversal of the WFG in which all messages are sent in the
direction of the WFG edges (outward sweep) or all messages are sent against
the direction of the WFG edges (inward sweep). In the outward sweep, the
algorithm records a snapshot of a distributed WFG. In the inward sweep, the
recorded distributed WFG is reduced to determine if the initiator is deadlocked. Both the outward and the inward sweeps are executed concurrently
in the algorithm. Complications are introduced because the two sweeps can
overlap in time at a process, i.e., the reduction of the WFG at a process can
begin before the WFG at that process has been completely recorded. The
algorithm deals with these complications.

System model
The system has n nodes, and every pair of nodes is connected by a logical
channel. An event in a computation can be an internal event, a message send
event, or a message receive event. Events are assigned timestamps using
Lamport’s clocks [29].
The computation messages can be either REQUEST, REPLY, or CANCEL messages. To execute a pi -out-of-qi request, an active node i sends qi
REQUESTs to qi other nodes and remains blocked until it receives sufficient
number of REPLY messages. When node i blocks on node j, node j becomes
a successor of node i and node i becomes a predecessor of node j in the
WFG. A REPLY message denotes the granting of a request. A node i unblocks
when pi out of its qi requests have been granted. When a node unblocks,
it sends CANCEL messages to withdraw the remaining qi –pi requests it
had sent.
Sending and receiving of REQUEST, REPLY, and CANCEL messages are
computation events. The sending and receiving of deadlock detection algorithm messages are algorithmic or control events.

10.9.1 Informal description of the algorithm
When a node init blocks on a P-out-of-Q request, it initiates the deadlock detection algorithm. The algorithm records the part of the WFG that
is reachable from init (henceforth, called the init’s WFG) in a distributed
snapshot [4]; the distributed snapshot includes only those dependency edges
and nodes that form init’s WFG.
The distributed WFG is recorded using FLOOD messages in the outward
sweep and the recorded WFG is examined for deadlocks using ECHO messages in the inward sweep. To detect a deadlock, the initiator init records its
local state and sends FLOOD messages along all of its outward dependencies.
When node i receives the first FLOOD message along an existing inward
dependency, it records its local state. If node i is blocked at this time, it sends
out FLOOD messages along all of its outward dependencies to continue the

368

Deadlock detection in distributed systems

recording of the WFG in the outward sweep. If node i is active at this time
(i.e., it does not have any outward dependencies and is a leaf node in the
WFG), then it initiates reduction of the WFG by returning an ECHO message along the incoming dependency even before the states of all incoming
dependencies have been recorded in the WFG snapshot at the leaf node.
ECHO messages perform reduction of the recorded WFG by simulating
the granting of requests in the inward sweep. A node i in the WFG is reduced
if it receives ECHOs along pi out of its qi outgoing edges indicating that pi
of its requests can be granted. An edge is reduced if an ECHO is received on
the edge indicating that the request it represents can be granted. After a local
snapshot has been recorded at node i, any transition made by i from idle to
active state is captured in the process of reduction. The nodes that can be
reduced do not form a deadlock whereas the nodes that cannot be reduced are
deadlocked. The order in which reduction of the nodes and edges of the WFG
is performed does not alter the final result. Node init detects the deadlock if
it is not reduced when the deadlock detection algorithm terminates.
In general, WFG reduction can begin at a non-leaf node before recording of
the WFG has been completed at that node; this happens when an ECHO message arrives and begins reduction at a non-leaf node before all the FLOODs
have arrived at it and recorded the complete local WFG at that node. Thus, the
activities of recording and reducing the WFG snapshot are done concurrently
in a single phase. Unlike the algorithm in [45], no serialization is imposed
between the two activities. Since a reduction is done on an incompletely
recorded WFG at nodes, the local snapshot at each node has to be carefully
manipulated so as to give the effect that WFG reduction is initiated after
WFG recording has been completed.
When multiple nodes block concurrently, they may each initiate the deadlock detection algorithm concurrently. Each invocation of the deadlock detection algorithm is treated independently and is identified by the initiator’s
identity and initiator’s timestamp when it blocked. Every node maintains a
local snapshot for the latest deadlock detection algorithm initiated by every
other node. We will describe only a single instance of the deadlock detection
algorithm.

The problem of termination detection
The algorithm requires a termination detection technique so that the initiator
can determine that it will not receive any more ECHO messages. The
algorithm uses a termination detection technique based on weights [20] in
conjunction with SHORT messages to detect the termination of the algorithm.
A weight of 1.0 at the initiator node, when the algorithm is initiated, is
distributed among all FLOOD messages sent out by the initiator. When
the first FLOOD is received at a non-leaf node, the weight of the received
FLOOD is distributed among the FLOODs sent out along outward edges at
that node to expand the WFG further. Since any subsequent FLOOD arriving

369

10.9 Kshemkalyani–Singhal algorithm for the P-out-of-Q model

at a non-leaf node does not expand the WFG further, its weight is returned to
the initiator in a SHORT message. When a FLOOD is received at a leaf node,
its weight is piggybacked to the ECHO sent by the leaf node to reduce the
WFG. When an ECHO arriving at a node unblocks the node, the weight of the
ECHO is distributed among the ECHOs that are sent by that node along the
incoming edges in its WFG snapshot. When an ECHO arriving at a node does
not unblock the node, its weight is sent directly to the initiator in a SHORT
message.
Note that the following invariant holds in an execution of the algorithm: the sum of the weights in FLOOD, ECHO, and SHORT messages
plus the weight at the initiator (received in SHORT and ECHO messages)
is always 1.0. The algorithm terminates when the weight at the initiator
becomes 1.0, signifying that all WFG recording and reduction activity has
completed.
FLOOD, ECHO, and SHORT messages carry weights for termination detection. Variable w, a real number in the range 0 1 , denotes the weight in a
message.

10.9.2 The algorithm
A node i stores the local snapshot for snapshots initiated by other nodes in a
data structure LSi (local snapshot), which is an array of records:
LSi : array [1,
, n] of record;
A record has several fields to record snapshot related information and
is defined in Algorithm 10.4 for an initiator init. The deadlock detection
algorithm is defined by the following procedures: SNAPSHOT-INITIATE,
FLOOD-RECEIVE, ECHO-RECEIVE, and SHORT-RECEIVE. They are executed atomically.
Example We now illustrate the operation of the algorithm with the help
of an example [26]. Figure 10.3 shows initiation of deadlock detection by
node A and Figure 10.4 shows the state after node D is reduced. The notation
x/y beside a node in the figures indicates that the node is blocked and needs
replies to x out of the y outstanding requests to unblock.
In Figure 10.3, node A sends out FLOOD messages to nodes B and C.
When node C receives FLOOD from node A, it sends FLOODs to nodes D, E,
and F. If the node happens to be active when it receives a FLOOD message,
it initiates reduction of the incoming wait-for edge by returning an ECHO
message on it. For example, in Figure 10.3, node H returns an ECHO to node
D in response to a FLOOD from it. Note that node can initiate reduction
(by sending back an ECHO in response to a FLOOD along an incoming
wait-for edge) even before the states of all other incoming wait-for edges
have been recorded in the WFG snapshot at that node. For example, node F

370

Deadlock detection in distributed systems

LSi init out : set of integers (:= ∅);
LSi
LSi
LSi
LSi

init
init
init
init

/*nodes on which i is waiting in the
snapshot.*/
in : set of integers (:= ∅); /*nodes waiting on i in the snapshot.*/
t : integer (:= 0);
/*time when init initiated snapshot.*/
s : boolean (:= false); /*local blocked state as seen by snapshot.*/
p : integer;
/*value of pi as seen in snapshot.*/

SNAPSHOT_INITIATE
/*Executed by node i to detect whether it is deadlocked. */

init ← i;
wi ← 0;
LSi init t ← ti ;
LSi init out ← out i ;
LSi init s ← true;
LSi init in ← ∅;
LSi init p ← pi ;
send FLOOD i i ti 1/ out i

to each j in out i .

/* 1/ out i is the
fraction of weight sent
in a FLOOD message. */

FLOOD_RECEIVE(j, init, t_init, w)
/*Executed by node i on receiving a FLOOD message from j. */
LSi init t < t_init
j ∈ in i →
/*Valid FLOOD for a new snapshot. */

LSi init out ← out i ;
LSi init in ← j ;
LSi init t ← t_init;
LSi init s ← waiti ;
waiti = true →
/* Node is blocked. */
LSi init p ← pi ;
send FLOOD i init t_init w/ out i to each k ∈ out i ;
waiti = false →
/* Node is active. */
LSi init p ← 0;
send ECHO i init t_init w to j;
LSi init in ← LSi init in − j .

LSi init t < t_init
j ∈ in i → /* Invalid FLOOD for a new snapshot. */
send ECHO i init t_init w to j.
LSi init t = t_init
j ∈ in i → /* Invalid FLOOD for current snapshot. */
send ECHO i init t_init w to j.
LSi init t = t_init
j ∈ in i →
/*Valid FLOOD for current snapshot. */
LSi init s = false →
send ECHO i init t_init w to j;

371

10.9 Kshemkalyani–Singhal algorithm for the P-out-of-Q model

LSi init s = true →
LSi init in ← LSi init in
j ;
send SHORT init t_init w to init.
LSi init t > t_init → discard the FLOOD message. /*Out-dated FLOOD. */
ECHO_RECEIVE(j, init, t_init, w)
/*Executed by node i on receiving an ECHO from j. */

[
/*Echo for out-dated snapshot. */

LSi init t > t_init → discard the ECHO message.
LSi init t < t_init → cannot happen.

/*ECHO for unseen snapshot. */

LSi init t = t_init →
/*ECHO for current snapshot. */
LSi init out ← LSi init out − j ;
LSi init s = false → send SHORT init t_init w to init.
LSi init s = true →
LSi init p ← LSi init p − 1;
LSi init p = 0 →
/* getting reduced */
LSi init s ← false;
init = i → declare not deadlocked; exit.
send ECHO i init t_init w/ LSi init in to all k
∈ LSi init in;
LSi init p = 0 →
send SHORT init t_init w to init.
]
SHORT_RECEIVE(init, t_init, w)
/*Executed by node i (which is always init) on receiving a SHORT. */

[
/*SHORT for out-dated snapshot. */

t_init < t_blocki → discard the message.
/*SHORT for uninitiated snapshot. */

t_init > t_blocki → not possible.
/*SHORT for currently initiated snapshot. */

t_init = t_blocki LSi init s = false → discard.
t_init = t_blocki LSi init s = true →
wi ← wi +w
wi = 1 → declare a deadlock.
]
Algorithm 10.4 Deadlock detection algorithm [26].

/* init is active. */

372

Deadlock detection in distributed systems

Figure 10.3 An example-run
of the algorithm – initiation of
deadlock detection by node
A [26].

REQUEST
FLOOD
REPLY
ECHO

A (initiator)

1/2

C
B
1/2

2/3

D
2/4

E

I

1/2

H

Figure 10.4 An example-run
of the algorithm – the state
after node D is reached [26].

REQUEST
FLOOD
REPLY
ECHO

G

F

A (initiator)

1/2

C
2/3

B
1/2

D
E
1/2

F

373

10.9 Kshemkalyani–Singhal algorithm for the P-out-of-Q model

in Figure 10.3 starts reduction after receiving a FLOOD from C even before
it has received FLOODs from D and E.
Note that when a node receives a FLOOD, it need not have an incoming
wait-for edge from the node that sent the FLOOD because it may have already
sent back a REPLY to the node. In this case, the node returns an ECHO in
response to the FLOOD. For example, in Figure 10.3, when node I receives
a FLOOD from node D, it returns an ECHO to node D.
ECHO messages perform reduction of the nodes and edges in the WFG
by simulating the granting of requests in the inward sweep. A node that is
waiting a p-out-of-q request gets reduced after it has received ECHOs. When
a node is reduced, it sends ECHOs along all the incoming wait-for edges
incident on it in the WFG snapshot to continue the progress of the inward
sweep.
In general, WFG reduction can begin at a non-leaf node before recording of the WFG has been completed at that node. This happens when
ECHOs arrive and begin reduction at a non-leaf node before FLOODs have
arrived along all incoming wait-for edges and recorded the complete local
WFG at that node. For example, node D in Figure 10.3 starts reduction
(by sending an ECHO to node C) after it receives ECHOs from H and
G, even before FLOOD from B has arrived at D. When a FLOOD on
an incoming wait-for edge arrives at a node which is already reduced, the
node simply returns an ECHO along that wait-for edge. For example, in
Figure 10.4, when a FLOOD from node B arrives at node D, node D returns an
ECHO to B.
In Figure 10.3, node C receives a FLOOD from node A followed by a
FLOOD from node B. When node C receives a FLOOD from B, it sends a
SHORT to the initiator node A. When a FLOOD is received at a leaf node,
its weight is returned in the ECHO message sent by the leaf node to the
sender of the FLOOD. Note that an ECHO is like a reply in the simulated
unblocking of processes. When an ECHO arriving at a node does not reduce
the node, its weight is sent directly to the initiator through a SHORT message.
For example, in Figure 10.3, when node D receives an ECHO from node H, it
sends a SHORT to the initiator node A. When an ECHO that arrives at a node
reduces that node, the weight of the ECHO is distributed among the ECHOs
that are sent by that node along the incoming edges in its WFG snapshot.
For example, in Figure 10.4, at the time node C gets reduced (after receiving
ECHOs from nodes D and F), it sends ECHOs to nodes A and B. (When node
A receives an ECHO from node C, it is reduced and it declares no deadlock.)
When an ECHO arrives at a reduced node, its weight is sent directly to the
initiator through a SHORT message. For example, in Figure 10.4, when an
ECHO from node E arrives at node C after node C has been reduced (by
receiving ECHOs from nodes D and F), node C sends a SHORT to initiator
node A.

374

Deadlock detection in distributed systems

Correctness
Proving the correctness of the algorithm involves showing that it satisfies the
following conditions:
1. The execution of the algorithm terminates.
2. The entire WFG reachable from the initiator is recorded in a consistent
distributed snapshot in the outward sweep.
3. In the inward sweep, ECHO messages correctly reduce the recorded snapshot of the WFG.
The algorithm is initiated within a timeout period after a node blocks on a
P-out-of-Q request. On the termination of the algorithm, only all the nodes
that are not reduced are deadlocked. For a correctness proof of the algorithm,
the readers are referred to the original source [26].

Complexity analysis
The message complexity of the algorithm has been analyzed in [26]. The
algorithm has a message complexity of 4e − 2n + 2l and a time complexity1
of 2d hops, where e is the number of edges, n the number of nodes, l the
number of leaf nodes, and d the diameter of the WFG. This is better than twophase algorithms for detecting generalized deadlocks and gives the best time
complexity that can be achieved by an algorithm that reduces a distributed
WFG to detect generalized deadlocks in distributed systems.

10.10 Chapter summary
Out of the three approaches to handle deadlocks, deadlock detection is the
most promising in distributed systems. Detection of deadlocks requires performing two tasks: first, maintaining or constructing whenever needed a WFG;
second, searching the WFG for a deadlock condition (cycles or knots).
In distributed deadlock-detection algorithms, every site maintains a portion
of the global state graph and every site participates in the detection of a
global cycle or knot. Due to lack of globally shared memory, design of
distributed deadlock-detection algorithms is difficult because sites may report
the existence of a global cycle after seeing its segments at different instants
(though all the segments never existed simultaneously).
Distributed deadlock detection algorithms can be divided into four classes:
path-pushing, edge-chasing, diffusion computation, and global state detection.
In path-pushing algorithms, wait-for dependency information of the global
WFG is disseminated in the form of paths (i.e., a sequence of wait-for dependency edges). In edge-chasing algorithms, special messages called probes are

1

Time complexity denotes the delay in detecting a deadlock after its detection has been initiated.

375

10.12 Notes on references

circulated along the edges of the WFG to detect a cycle. When a blocked
process receives a probe, it propagates the probe along its outgoing edges in
the WFG. A process declares a deadlock when it receives a probe initiated
by it. Diffusion computation type algorithms make use of echo algorithms to
detect deadlocks. Deadlock detection messages are successively propagated
(i.e, “diffused” through) through the edges of the WFG. Global state detectionbased algorithms detect deadlocks by taking a snapshot of the system and by
examining it for the condition of a deadlock.

10.11 Exercises
Exercise 10.1 Consider the following simple approach to handle deadlocks in distributed systems by using “time-outs”: a process that has waited for a specified period
for a resource declares that it is deadlocked and aborts to resolve the deadlock. What
are the shortcomings of using this method?
Exercise 10.2 Suppose all the processes in the system are assigned priorities which
can be used to totally order the processes. Modify Chandy et al.’s algorithm for the
AND model so that when a process detects a deadlock, it also knows the lowest
priority deadlocked process.
Exercise 10.3 Show that, in the AND model, false deadlocks can occur due to deadlock resolution in distributed systems [43]. Can something be done about it or they
are bound to happen?
Exercise 10.4 Show that in the Kshemkalyani–Singhal algorithm for the P-out-of-Q
model, if the weight at the initiator process becomes 1.0, then the intiator is involved
in a deadlock.

10.12 Notes on references
Two survey articles on distributed deadlock detection can be found in papers by
Knapp [22] and Singhal [43]. The literature is full of distributed deadlock detection
algorithms. Path-pushing distributed deadlock detection algorithms can be found in
papers by Gligor and Shattuck [11], Menasce and Muntz [33], Ho and Ramamoorthy [18], and Obermarck [38]. Other edge-chasing distributed deadlock detection
algorithms can be found in papers by Choudary et al. [7], and Kshemkalyani and
Singhal [27]. Herman and Chandy [16] discuss detection of deadlocks in the AND/OR
model. In [24], Kshemkalyani and Singhal give an optimal algorithm to detect distributed deadlocks under the generalized request model. Other algorithms to detect
generalized deadlocks include Bracha and Toueg [2] and Wang et al. [45].
In [25], Kshemkalyani and Singhal give a characterization of distributed deadlocks.
A rigorous correctness proof of a distributed deadlock detection algorithm is given in
Kshemkalyani and Singhal [27]. Brezezinski et al. [3] discuss the deadlock models
under a very generalized blocking conditions. Two knot detection algorithms in dis-

376

Deadlock detection in distributed systems

tributed systems are given in Chandy and Misra [34] and Manivannan and Singhal [31].
Gray et al. [12] present a simple analysis of the probability of deadlocks in database
systems. Lee and Kim [30] present a performance analysis of distributed deadlock
detection algorithms. Other algorithms for deadlock detection in distributed systems
can be found in [1, 8–10, 13–15, 17, 21, 23, 28, 32, 36, 37, 39, 40, 41, 44]. Wu et al. [46]
present an algorithm to avoid distributed deadlock in the AND model.

References
[1] B. Awerbuch and S. Micali, Dynamic deadlock resolution protocols, in Proceedings of the Foundations of Computer Science, Toronto, Canada, 1986, 196–207.
[2] G. Bracha and S. Toueg, Distributed deadlock detection, Distributed Computing,
2(3), 1987, 127–138.
[3] J. Brezezinski, J. M. Helary, M. Raynal, and M. Singhal, Deadlock models and
generalized algorithm for distributed deadlock detection, Journal of Parallel and
Distributed Computing, 31(2), 1995, 112–125.
[4] K. M. Chandy and L. Lamport, Distributed snapshots: determining global states
of distributed systems, ACM Transactions on Programming Language Systems,
3(1), 1985, 63–75.
[5] K. M. Chandy and J. Misra, A distributed algorithm for detecting resource deadlocks in distributed systems, Proceedings of the ACM Symposium on Principles
of Distributed Computing, Ottawa, Canada, August 1982, 157–164.
[6] K. M. Chandy, J. Misra, and L. M. Haas, Distributed deadlock detection, ACM
Transactions on Computer Systems, 1(2), 1983, 144–156.
[7] A. Choudhary, W. Kohler, J. Stankovic, and D. Towsley, A modified priority
based probe algorithm for distributed deadlock detection and resolution, IEEE
Transactions on Software Engineering, 15(1) 1989, 10–17.
[8] J. R. G. de Mendivil, F. Farina, J. Garitagoitia, C. F. Alastruey, and J. M.
Bernabeu-Auban, A distributed deadlock resolution algorithm for the AND
model, IEEE Transactions on Parallel and Distributed Systems, 10(5), 1999,
433–447.
[9] A. K. Elmagarmid, N. Soundararajan, and M. T. Liu, A distributed deadlock
detection and resolution algorithm and its correctness proof, IEEE Transactions
on Software Engineering, 14(10), 1988, 1443–1452.
[10] M. Flatebo and A. K. Datta, Self-stabilizing deadlock detection algorithms, Proceedings of the 1992 ACM Annual Conference on Communications, Kansas City,
Missouri, March 1992, 117–122.
[11] V. Gligor and S. Shattuck, On deadlock detection in distributed databases, IEEE
Transactions on Software Engineering, SE-6(5), 1980, 435–440.
[12] J. N. Gray, P. Homan, H. F. Korth, and R. L. Obermarck, A straw man analysis
of the probability of waiting and deadlock in a database system, Technical Report
RJ 3066, IBM Research Laboratory, San Jose, CA, 1981.
[13] L. M. Haas, Two approaches to deadlock detection in distributed systems. Ph.D.
dissertation, Department of Computer Sciences, University of Texas, Austin, TX,
1981.
[14] L. M. Haas and C. Mohan, A distributed deadlock detection algorithm for a
resource-based system, Research Report RJ 3765, IBM Research Laboratory,
San Jose, CA, 1983.