Tải bản đầy đủ - 0 (trang)
2 ? Is the Weakest Oracle for SSLE over Rings

# 2 ? Is the Weakest Oracle for SSLE over Rings

Tải bản đầy đủ - 0trang

On the Power of Ω? for SSLE in Population Protocols

33

Algorithm 2. Protocol RingDetector - initiator x, responder y

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

(if the master is the responder, it creates18 (move token from x to y)

19 if tokenx = ⊥ then

a white probe)

if mastery = 1 then probey ← 0

20

(the token becomes black when

meeting a flag)

(raise flags if needed)

21

if f lagy = 1 then tokeny ← 1

22

(otherwise, keeps the same color

f lagx ← f lagy ← 1

or merges)

23

else if tokeny ∈ {⊥, 0} then

(move probe from y to x)

tokeny ← tokenx

if probey = ⊥ then

24

(the flag is cleared)

(the probe becomes black when

25

f lagy ← 0

26

tokenx ← ⊥

meeting a token)

if tokenx = ⊥ then probex ← 1

27 end

otherwise, keeps the same color or 28 (if the master receives a token, it

merges)

changes its output and whitens the

else if probex ∈ {⊥, 0} then

token)

probex ← probey

29 if mastery = 1 and tokeny = ⊥ then

probey ← ⊥

30

outy ← tokeny

end

31

tokeny ← 0

32 (a non-master responder copies the

(if the master receives a white probe, it

output of the initiator)

33 if mastery = 0 then outy ← outx

if masterx = 1 and probex = 0 then

tokenx ← f lagx

the variable leader (resp. master) in the input assignment α at x. The following

lemma states that, eventually, a unique token circulates in the ring.

Lemma 3. In any configuration C ∈ IRCE , there is exactly one token (white

or black) in C, i.e., there exists a unique agent x such that C(x).token = ⊥.

Proof (Sketch). If there are no tokens, some probe sent by the master will return to

the master with the color white (recall that the probes and tokens move in opposite

directions). This causes the master to ﬁre a token. Two colliding tokens merge into

one. This implies that there will always be at least one token. In particular, all the

probes sent by the master will return to the master with the color black; thus no

more tokens are created. Moreover, thanks to the global fairness, if there are several

tokens, they eventually all merge into a unique token.

This unique circulating token (from the lemma above) allows to divide the

execution into rounds. We deﬁne a round to be a segment of SE that begins with

the token loaded at the master, and ends up right before the token returns to

the master. The following lemma describes the output of the master at the end

of each round.

Lemma 4. Consider a round R in SE . We denote by (C0 , α0 ) . . . (Cr , αr ) the

corresponding sequence of configurations and input assignments. Case (a) If there

34

J. Beauquier et al.

are no leaders during R, i.e., for every 0 ≤ i ≤ r, and every agent x, we have

αi (x).leader = 0, then after the last action of the round, all the agents have

their flags cleared (set to zero). Case (b) If there are no leaders during R, and

if all the agents have their flags cleared at the beginning of the round, then after

the last action of the round, the master outputs 0 and all the agents have their

flags cleared. Case (c) If there is at least one leader in each assignment during

R, i.e., for every 0 ≤ i ≤ r, there is some agent xi such that αi (xi ).leader = 1,

then after the last action of the round, the master outputs 1.

Proof (Sketch). We only prove here the case (c). Full proof details are presented

in [6]. Assume that there is a leader in each input assignment. Let μ be an agent

that holds a leader in assignment α0 , i.e., α0 (μ).leader = 1. During the round,

there must be some i, such that μ = vi is the responder and the initiator ui

holds the token. If μ holds a leader in assignment αi , then after the transition,

the token must have turned black. If μ does not hold a leader in assignment

αi , since μ did hold a leader in assignment α0 , there must be some j < i such

that αj (μ).leader = 1 and αj+1 (μ).leader = 0. Now, since the input trace is

compatible with the schedule, μ must be the initiator uj or the responder vj

in the transition (Cj , αj ) → Cj+1 . Hence, μ must raise its ﬂag, i.e., we have

Cj+1 (μ).f lag = 1 (j + 1 ≤ i). Recall that there is a unique token, so the ﬂag

cannot be cleared during the remaining actions until i. Hence, at i, the token

turns black when the token moves from the initiator ui to the responder vi = μ.

In all cases, the master receives a black token at the end of the round, and thus

outputs 1.

Theorem 3. The protocol RingDetector is a self-stabilizing implementation of

Ω? using ELE (i.e., ELE Ω?) over oriented rings. Moreover, Ω? rings ELE

(by [18]), and thus Ω? is the weakest oracle for solving ELE over rings.

Proof (Sketch). Full proof details are presented in [6]. We divide the execution in

rounds as deﬁned above. If there are no leader forever, then Lemma 4 ensures that

after a ﬁnite number of rounds, the master permanently outputs 0. If there is a

leader in each input assignment, then Lemma 4 ensures that after a ﬁnite number

of rounds, the master permanently outputs 1. In both cases, the propagation of

the master’s output ensures that the output trace of the protocol satisﬁes the

oracle Ω? conditions (see Sect. 3.2 for its deﬁnition).

References

1. Angluin, D., Aspnes, J., Diamadi, Z., Fischer, M.J., Peralta, R.: Computation in

networks of passively mobile finite-state sensors. Distrib. Comput. 18(4), 235–253

(2006)

2. Angluin, D., Aspnes, J., Eisenstat, D.: Fast computation by population protocols

with a leader. Distrib. Comput. 21(3), 183–199 (2008)

3. Angluin, D., Aspnes, J., Eisenstat, D., Ruppert, E.: The computational power of

population protocols. Distrib. Comput. 20(4), 279–304 (2007)

On the Power of Ω? for SSLE in Population Protocols

35

4. Angluin, D., Aspnes, J., Fischer, M.J., Jiang, H.: Self-stabilizing population protocols. ACM Trans. Auton. Adapt. Syst. 3(4), 13 (2008). Kindly check and confirm

whether the inserted page range for Ref. [4] is correct. Amend if necessary.

5. Beauquier, J., Blanchard, P., Burman, J.: Self-stabilizing leader election in population protocols over arbitrary communication graphs. In: Baldoni, R., Nisse, N.,

Steen, M. (eds.) OPODIS 2013. LNCS, vol. 8304, pp. 38–52. Springer, Heidelberg

(2013). doi:10.1007/978-3-319-03850-6 4

6. Beauquier, J., Blanchard, P., Burman, J., Denysyuk, O.: On the power of oracle

omega? for self-stabilizing leader election in population protocols. Technical report,

INRIA (2016). http://hal.archives-ouvertes.fr/hal-00839759

7. Beauquier, J., Blanchard, P., Burman, J., Kutten, S.: The weakest Oracle for symomer,

metric consensus in population protocols. In: Bose, P., Gasieniec, L.A., Ră

K., Wattenhofer, R. (eds.) ALGOSENSORS 2015. LNCS, vol. 9536, pp. 41–56.

Springer, Heidelberg (2015). doi:10.1007/978-3-319-28472-9 4

8. Beauquier, J., Burman, J.: Self-stabilizing synchronization in mobile sensor networks with covering. In: Rajaraman, R., Moscibroda, T., Dunkels, A., Scaglione,

A. (eds.) DCOSS 2010. LNCS, vol. 6131, pp. 362–378. Springer, Heidelberg (2010).

doi:10.1007/978-3-642-13651-1 26

9. Beauquier, J., Burman, J., Clement, J., Kutten, S.: On utilizing speed in networks

of mobile agents. In: PODC, pp. 305–314. ACM (2010)

10. Bonnet, F., Raynal, M.: Anonymous asynchronous systems: the case of failure

detectors. In: DISC, pp. 206–220 (2010)

11. Cai, S., Izumi, T., Wada, K.: How to prove impossibility under global fairness: on

space complexity of self-stabilizing leader election on a population protocol model.

Theory Comput. Syst. 50(3), 433–445 (2012)

12. Canepa, D., Potop-Butucaru, M.G.: Self-stabilizing tiny interaction protocols. In:

WRAS, pp. 10:1–10:6 (2010)

13. Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving

consensus. J. ACM 43(4), 685–722 (1996)

14. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

15. Charron-Bost, B., Hutle, M., Widder, J.: In search of lost time. Inf. Process. Lett.

110(21), 928–933 (2010)

16. Cornejo, A., Lynch, N.A., Sastry, S.: Asynchronous failure detectors. In: PODC,

pp. 243–252 (2012)

17. Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun.

ACM 17(11), 643–644 (1974)

18. Fischer, M., Jiang, H.: Self-stabilizing leader election in networks of finite-state

anonymous agents. In: Shvartsman, M.M.A.A. (ed.) OPODIS 2006. LNCS, vol.

4305, pp. 395–409. Springer, Heidelberg (2006). doi:10.1007/11945529 28

19. Fischer, M.H., Lynch, N.A., Paterson, M.S.: Impossibility of consensus with one

faulty process. J. ACM 32(2), 374–382 (1985)

20. Michail, O., Chatzigiannakis, I., Spirakis, P.G.: Mediated population protocols.

Theor. Comput. Sci. 412(22), 2434–2450 (2011)

21. Mizoguchi, R., Ono, H., Kijima, S., Yamashita, M.: On space complexity of selfstabilizing leader election in mediated population protocol. Distrib. Comput. 25(6),

451–460 (2012)

Self-stabilizing Byzantine-Tolerant Distributed

Replicated State Machine

Alexander Binun1(B) , Thierry Coupaye2 , Shlomi Dolev1(B) ,

Mohammed Kassi-Lahlou2 , Marc Lacoste2 , Alex Palesandro2 ,

Reuven Yagel1,3 , and Leonid Yankulin4

3

1

Department of Computer Science,

Ben-Gurion University of the Negev, Beersheba, Israel

{binun,dolev}@cs.bgu.ac.il

2

Orange Labs, Lannion, France

Azrieli - Jerusalem College of Engineering, Jerusalem, Israel

4

Open University of Israel, Ra’anana, Israel

Abstract. Replicated state machine is a fundamental concept used

for obtaining fault tolerant distributed computation. Legacy distributed

computational architectures (such as Hadoop or Zookeeper) are designed

to tolerate crashes of individual machines. Later, Byzantine fault-tolerant

Paxos as well as self-stabilizing Paxos were introduced. Here we present

for the ﬁrst time the self-stabilizing Byzantine fault-tolerant version of a

distributed replicated machine. It can cope with any adversarial takeover

on less than one third of the participating replicas. It also ensures automatic recovery following any transient violation of the system state, in

particular after periods in which more than one third of the participants

are Byzantine. A prototype of self-stabilizing Byzantine-tolerant replicated Hadoop master node has been implemented. Experiments show

that fully distributed recovery of cloud infrastructures against Byzantine faults can be made practical when relying on self-stabilization in

local nodes. Thus automated cloud protection against a wide variety of

faults and attacks is possible.

1

Introduction

Computing and communication systems are in transition to become a commodity

just like electricity, where clients are served by companies that supply state-ofthe-art commuting services. Availability and security are the most important

aspects in such modern computing systems, i.e., systems that should always

be up and operating while protecting the clients’ (privacy and) security. Selfstabilization [13] fosters availability and security, capturing the ability of systems

to automatically recover following a temporary violation of the assumptions

made for the system to work properly. Self-stabilization is a property that every

ongoing system should have, as self-stabilizing systems automatically recover

from unanticipated states. For example, states that have been reached due to

insuﬃcient error detection in messages, changes of bit values in memory [17]

c Springer International Publishing AG 2016

B. Bonakdarpour and F. Petit (Eds.): SSS 2016, LNCS 10083, pp. 36–53, 2016.

DOI: 10.1007/978-3-319-49259-9 4

Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine

37

or any temporary violation in the assumptions made for the system to operate

correctly. The approach is comprehensive, rather than the one addressing speciﬁc

fault scenarios (thus risking missing a scenario that will appear later). In the selfstabilization paradigm the designer considers every arbitrary system state (not

necessarily desired or even consistent). Thereafter the designer must prove that

from every arbitrary conﬁguration the system execution converges to exhibit the

desired behavior.

Thus, self-stabilizing systems do not rely on the consistency of an initial

conﬁguration and on the application of correct steps thereafter. In contrast, selfstabilizing systems assume that the consistency can be broken initially or along

the execution and the system will need to recover automatically thereafter. The

designers assume an arbitrary conﬁguration and prove convergence not because

they would like the system to be started in an arbitrary conﬁguration. The reason

is that consistency preservation relying on the safety of initial conﬁguration and

on the inductive arguments that are based on applying only predeﬁned allowed

steps is usually broken. The system must eventually and automatically regain

consistency, even if it is lost in the middle of the recovery process. This exhibits

safer behavior than that of non-stabilizing systems, namely, initially safe and

eventually safe [12].

Security and privacy are the concerns that should be integrated into selfstabilizing systems to mitigate cyber attacks, e.g., [35].

The focus of the paper is self-stabilizing Byzantine fault-tolerant tolerant

replicated service which responds to requests from clients. Such services are

state-based and therefore will be referred to as replicated state machines.

State of the art. Self-stabilizing systems should be constructed over selfstabilizing hardware, otherwise the stabilization of the system may be blocked

by, e.g., the fact that the microprocessor is in a halt state [17]. Fortunately,

self-stabilizing components can be build by composition. Once the underlying

hardware stabilizes a self-stabilizing operating system [28] and a self-stabilizing

hypervisor (that copes with attacks of malicious agents) [2,29] yield the selfstabilizing infrastructure.

Despite recent signiﬁcant advances in applying self-stabilization techniques

and concepts to system design (e.g., [7]), industrial prototypes and products

incorporating self-stabilization still do not exist. Recently proposed techniques

include self-stabilizing hardware and boot programs [18], stabilization preserving compilers [19,34] and high-level approaches facilitating development like the

application recovery tool [8], the recovery-oriented computing paradigm [5], and

the automatic creation of (stabilizing) programs [6]. A self-stabilizing distributed

ﬁle system has been implemented [20].

An important aspect that needs to be integrated into self-stabilizing systems

is security. One approach in addressing security attacks is to assume that some

computing devices (usually the minority) are controlled by an adversary that has

taken over their actions. The term Byzantine, or malicious (device, processor,

process, etc.) is used for such components. The design of self-stabilizing systems

38

A. Binun et al.

that copes with Byzantine processes has been investigated in, e.g., [3,21,23–

27,29]. In particular, Ostrovsky and Yung [36] introduced the notion of a mobile

still, at any given time the number of Byzantine components is restricted by a

certain threshold.

Repeated consensus for implementing replicated state machines is one of the

core research ﬁelds in distributed computing and distributed systems, as it has

proven a great abstraction for practical implementations of distributed systems.

The capability to tolerate crashes in distributed systems allows us to utilize the

redundancy in the system; otherwise, a single crash can block the entire system

functionality. Thus, asynchronous consensus, which is always safe and in practice

terminates (by the use of failure detector heuristics), has been investigated for

decades and serves the industry in implementing robust, highly available replicated state machines based, for example, on Paxos [4,38]. Unfortunately systems

are still not robust enough, and companies suﬀer service breaks due to unavoidable faults, sometimes called transient faults. For example, single event upsets

cause bits to be changed in memory and may drive the system to an unpredicted

state, possibly changing the consensus version counter to its maximal value at

once. Thus, there is a great need for automatic recovery from any arbitrary initial state. Such automatic recovery can be achieved by designing the system to

be self-stabilizing. Once transient faults cease and the system state is left in an

arbitrary state the system converges to the desired behavior by regular execution

of the distributed system components of their (hardwired) programs. Another

important facet of fault tolerance, beyond crashes and cooperative automatic

recovery (where all components execute their code to ensure convergence) is

the never stopping (not only passive) malicious behavior of Byzantine components. Byzantine components can be attributed to unpredictable never stopping

faults or a compromised component controlled by an adversary. Obviously, the

design of a distributed system that copes with Byzantine components controlled

by the most powerful malicious adversary will withstand any behavior of these

components including non-sophisticated crashes. Thus, when a system tolerates

Byzantine faults there is no risk in neglecting a fault that can happen in practice.

Clearly only a portion of the system can be constantly Byzantine, otherwise any

system behavior is possible. Thus, we are interested in a self-stabilizing replicated

state machine that withstands a portion of the components being Byzantine

(including those that crashed). In a system that is designed to cope only with

Byzantine faults, there is a risk of losing consistency when the non-Byzantine

portions experience short-time transient faults (say due to an electricity spike).

The consistency of self-stabilizing systems is not built on the consistency of the

initial state and induction over the execution of the (long sequence of) allowed

steps. In the (possibly years of) execution, a self-stabilizing system rebuilds the

consistency while executing seamless consistency maintenance steps. They are

considered as a (fortune) side eﬀect of steps executed during regular execution.

This occurs while exhibiting the desired behavior when no transient fault occurs

and after convergence from an illegal state.

Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine

39

We present the ﬁrst self-stabilizing Byzantine replicated state machine for

semi-synchronous settings [11,27].

For any system, especially a distributed one, anti-attack protection measures

have been viewed as endless series of steps where each new counter-measure is

introduced to mitigate an upcoming attack or failure, until the next unexpected

event occurs. This is particularly the case for dependability of cloud systems

which are becoming the infrastructure used by many mission critical software

systems. Security and safety concerns impact core cloud features. For example, the expectations from resource sharing, elasticity, or virtualization can grow

deeper and broader to an initially unsuspecting researcher or engineer. What

if threats and failures were evolving faster than defense mechanisms? What if

stacking so many counter-measures mechanisms was simply not fast enough?

We take the approach to admit that faults or attacks will occur no matter what.

Instead, emphasis should be placed on a combination of attack mitigation and

graceful automatic recover. These measures should follow overwhelming (possibly zero-day) attacks that derive the system to inconsistent state.

The combination of self-stabilization with Byzantine fault tolerance should

become the standard property for cloud infrastructures, especially for the cloud

management replicated state machine (e.g., Chubby, Zookeeper, Hadoop). Such

a robust design is enabled by the algorithms and architecture presented here.

How to build such clouds? We leverage on the previous research addressing the

single host layer, in the hypervisor [2]. The stabilization of a single machine

in the distributed system is based on the hardware watchdog (see [18] and the

references therein) that bootstrap larger and larger tiers of monitors and consistency establishments. Thus, when a single participant loses its consistency

the participant will automatically regain the consistency. Still one would like to

avoid using the self-stabilization property as the sole mean for mitigating attacks.

This is because self-stabilization requires (sometimes expensive) recovery period

in which the system is not operating as it should. Moreover, allowing the attackers to drive the machine into an arbitrary inconsistent state may imply the

loss of the memorized state and a transition to a consistent state (even default

or initial consistent state) that may not be correlated to the operations done

in the past. Thus, we advocate the use of any known mitigation tool to obtain

super-secure-stabilization design (as in [2]), where only zero-day or overwhelming

attacks as well as transient faults, imply the need for the self-stabilization convergence fall-back. This is similar to the case when super-stabilizing Byzantine

fault-tolerant algorithms do not ignore communication graph dynamic changes

but try to address them to avoid stabilization convergence.

Once participants are designed to be super-secure-stabilizing we turn to the

core coordination entity of data-centers and clouds that is based on implementing replicated state machine. Replicated state machine exploit the distributed

architecture to overcome the possibility of a single point of failure in the critical

mission entity of the system. Variants of Paxos [32,33] are used in practice to

implement the replicated state machine. We present for the ﬁrst time a selfstabilizing Byzantine tolerant replicated state machine. The design allows less

40

A. Binun et al.

than one third of the participants to be malicious (Byzantine). Thus, even if at

any given time less than one third participants are in an inconsistent state or

are recovering the system continues to operate as required.

Elaboration on the self-stabilizing techniques. The local state of each

node or host is periodically monitored and updated by the stabilization manager

instance running at the host, partly at the bios level. It enforces self-stabilization

on the local hypervisor. In [2] we presented the architecture for a self-stabilizing

hypervisor which is able to recover itself in the presence of Byzantine faults

regardless of the state it is currently in.

Active monitoring. We assume that consistency may be lost due to overwhelming zero-day attacks or transient faults as in [40]. Thus, the state of each

machine in the system should always be refreshed to reﬂect the reality. For example, assume the replicated state machine managing the data-center maintains the

list of the active (slave) machines and the jobs they were assigned to. It is possible, that after an overwhelming attack or transient fault the replicated state

machine will converge to an internally consistent state of the replicated state

machine, with a totally diﬀerent data on the slaves and their assigned jobs with

relation to the actual situation of the slaves. Thus, replicated state machines are

always suspicious concerning their records and actively check with the slaves the

actual situation and refresh their memory accordingly. In other words, periodically verifying the state held by each participant by repeatedly querying and

examining the source for the data, refreshing the data to gain global consistency

by distributed independent updates. In the worst case scenarios in which update

synchronization races are possible, the need for active monitoring may imply the

need to use global distributed snapshots and assignment of a consistent distributed state.

Super-Secure-Stabilization. The obvious advantage of the self-stabilization

approach is that the details of an attack (e.g. who attacks, the attack surface)

are completely ignored. Self-stabilization ensures a system always converges to a

consistent state from an inconsistent one, no matter how the latter was reached or

looks like. Thus, self-stabilizing systems do not risk ignoring a scenario or coping

with a yet unknown possible scenario, while addressing all scenarios in an holistic

fashion. The convergence procedure might be very ineﬃcient and sometimes

disastrous: for example a system moves towards the entire reboot, discarding

very sensitive data that was accumulated so far (and could be still saved). If we

knew some attack details, we could oﬀer a more eﬃcient rescue plan and save

our data. To put it another way, as a ﬁrst step a system tries to mitigate attacks

and perform local state modiﬁcations, ﬁxing the damage caused by a known

kind of attack. The “local repair” approach is the cornerstone of the superstabilization approach as in [22]. This approach requires that the repair process

meets some weak minimal security requirement (expressed by the so-called “safe

passage predicate”) should still be met during the repair process. In our case

the passage predicate incorporates security requirements so we developed the

super-secure-stabilization approach. It employs known attack mitigation tools

Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine

41

(e.g., anti-virus) to avoid the need for the self-stabilization fallback. If a passage

predicate is not satisﬁed (in a certain number of computation steps) we resort

to the classic self-stabilization approach.

Agreement in the presence of Byzantine behavior. Individual nodes might

be conquered by malware and then deliberately obstruct the distributed system

functionality. Thus the latter exposes Byzantine behavior - at least for a while,

before recovering. Loopholes in the cloud software may ease the process of conquering nodes. For example, an OpenStack controller node may be taken over by

malware and expose Byzantine behavior, obstructing the entire Hadoop master

node behavior; the latter becomes a single point of failure. To address the problem we replicate every service that runs at the controller node. Then we deploy

a Byzantine fault-tolerant self-stabilizing replicated state machine algorithm to

ensure that the service state is consistent. The system state is periodically agreed

upon and updated according to the (agreed upon) arriving inputs following the

previous clock pulse. The clock pulse is generated by a self-stabilizing and Byzantine fault-tolerant clock synchronization algorithm. In our implementation we

used the ﬁrst such algorithm suggested in [27]1 .

The rest of the paper is organized as follows, Sect. 2 discusses brieﬂy related

works. Section 3 contains system settings, Sect. 4 details the BFT Consensus

algorithm and a sketch of proof, Sect. 5 discusses various aspects of the prototype,

and Sect. 6 concludes.

2

Related Work

The basic Byzantine consensus protocol was presented by [38], where it was

shown that in a synchronous system with maximum f Byzantine processes, 3f +1

processes are needed to solve the Byzantine consensus problem. The celebrated

classic example of practical Byzantine agreement appears in [11]. Recently, BFTSMaRT [10] uses authenticated Byzantine agreement (through signatures based

on private/public-key) to avoid a Byzantine participant claiming this is the vote

of another participant as a participant cannot sign the vote on behalf of the

another participant. We note that any replica might become Byzantine at any

time. In practice we assume that the set of Byzantine replicas is ﬁxed between

two consecutive pulses generated by the self-stabilizing and BFT-tolerant clock

synchronization algorithm [27]. The set of Byzantine replicas can be seamlessly

changed if the new non-Byzantine replicas participate in a consensus where they

1

The number of participants in the replicated state machine is typically small n =

3f + 1 = 4, allowing quarter of the system to expose Byzantine behavior as the

beneﬁt of a larger system is bounded by tolerating one third of the participants

being Byzantine (when approaching inﬁnite number of participants). Moreover, the

algorithm already proved itself in real practical systems. See [30] where it is stated

that: “I used the entire paper a few years ago to design some middleware”, “The

article is 19 pages long, very readable, and, as mentioned above, was used to create

real software”. Subsequently, more complicated solutions to implement can be found

in [16] and the references therein.

42

A. Binun et al.

are updated with the previous agreed upon state before non-Byzantine replicas

become Byzantine. In [10] the replica that becomes Byzantine holds a private

key that may be used for future malicious purposes or even be passed to an

outside adversary, nullifying the beneﬁt of the encryption.

In our solution every clock pulse repeatedly trigers a totally new instance of

Byzantine agreement. We use Exponential Information Gathering (eig) [31] to

obtain Byzantine agreement.

Paxos [32,33] is commonly used in replicated state machine infrastructures.

One legacy implementation is Zookeeper [37] which tolerates crashes of participating machines. Recently, Byzantine tolerant Paxos as well as self-stabilizing

Paxos and replicated state machine implementations were presented [4,15].

While implementing our algorithms we used OpenStack [45] which is an open

source platform for cloud computing. We also used Hadoop [44] which is a platform for carrying out distributed computations. Some of our implementations

used an open source container platform Docker [42] that is capable to simulate

the cloud settings through its lightweight virtual machines.

3

System Settings

We use the standard settings of semi-synchronous distributed system [27]

(Sects. 2 and 4). The state of a processor (node or machine) is the content of

the CPU memory including the program counter. Processors communicate by

exchanging messages. A configuration c is a vector of the states of the n processors in the system and a FIFO-queue of messages between any two (connected)

processors. The queue consists an ordered set of messages that were sent from

one processor to its neighbor and not yet received, the order reﬂects the time at

which the messages were sent. An execution is an alternating sequence of conﬁgurations and events c0 , e1 , c1 , e2 · · · . In a semi-synchronous execution, events

happen in real time, taking one conﬁguration to the next by the execution of

input output operations (including inputs from the independent local physical

clock) and local state transition (local computation). There are two types of

events, one is a tick from some processor’s physical clock, which happen every

1 to 1 + time units as assumed in [27]. In the state transition, the processor

may or may not receive a message. There are globally known lower and upper

bounds on the time delivery of a message. A message sent at a given time will

be received within the time range between the lower bound of message delivery

and the upper bound of message delivery. There is a set of f , n > 3f , Byzantine

processors. We allow Byzantine processors to send spontaneously any message

at any given time. A state transition of a non-Byzantine processor obeys its

transition function.

We assume that a typical cloud system is composed of a set of processors connected by an eventually reliable network. Each processor runs a stack consisting

of a BIOS and of an operating system (that possibly includes a virtualization

layer and user mode programs, as in the local case [2]). Each node contains the

stabilization manager that ensures its own correctness.

Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine

43

Malicious and transient faults can lead to corruption of all node memory

including code and data, except from the read only memory chips. An example is

a faulty or compromised machine that tries to attack the rest of the system or to

aﬀect a proper resource allocation, e.g., by sending malicious request allocation

messages.

4

Self-Stabilizing Byzantine Tolerant Replicated State

Machine

In our settings a client process sends requests to a master process which runs the

replicated state machine algorithm. The replicated state machine implemented

in a distributed fashion acts as a (coordinated) single master entity that assigns

the task of processing requests to slaves in the data-center. The agreed state is

sent to the slave processes conducting certain computation accordingly. Slaves

report on the result to the replicated state machine, which in turn sends the

outcome to the requesting client (after updating the replicated state machine

that contains a map of the tasks assigned to slaves). See more details in Sect. 5

and the ﬁgures within, detailing the Hadoop prototype. The BFT Consensus

algorithm is operated by replicated master processes on n = 3f + 1 replicas

r1 , r2 . . . rn where f is the maximum number of Byzantine replicas. Each replica

maintains its own copy of the state machine and the history of messages that were

received from client and slave processes. We say that replicas are in consensus

when:

– All non-Byzantine replicas agree on the same state S of the replicated state

machine, and

– Histories of all non-Byzantine replicas contain the messages recently sent by

each active client/slave where “active client” is the one that runs an inﬁnite

message sending loop. The history may also include arbitrary messages of

inactive clients/slaves.

Replicas use a self-stabilizing BFT clock [27] to synchronously start executing

the consensus algorithm. Upon receiving a pulse from a BFT clock each replica

begins exchanging the (recent) message history of the replica with all other replicas. Then it collects histories sent by other replicas until all histories received

or timeout happens. This exchange may yield a subset of the histories because

some of the sent histories may be lost, may not arrive in a predeﬁned time or

may not be sent at all due to Byzantine behavior. Eventually each replica has

a set of messages that includes the messages from its own history and the messages it received from other replicas. Next we use the Exponential Information

Gathering (EIG) that implements a Leader-Less Byzantine Fault Tolerant algorithm (LLBFT) (see a similar use of EIG in the implementation in [9]) over the

state S of the replicated state machine. Then another n instances of the LLBFT

algorithm are invoked, one instance for a history of every of the n participants.

These n LLBFT invocations are used to obtain an identical vector of histories,

one for each replica, held by all non-Byzantine participants. To put it another

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2 ? Is the Weakest Oracle for SSLE over Rings

Tải bản đầy đủ ngay(0 tr)

×