Tải bản đầy đủ - 0 (trang)
Chapter 9. Junos High Availability on MX Routers

Chapter 9. Junos High Availability on MX Routers

Tải bản đầy đủ - 0trang

This chapter starts with a brief overview of the HA features available on all MX routers;

subsequent sections will detail the operation, configuration, and usage of these features.

Graceful Routing Engine Switchover

Graceful Routing Engine Switchover (GRES) is the foundation upon which most

other Junos HA features are stacked. The feature is only supported on platforms

that have support for dual REs. Here, the term graceful refers to the ability to

support a change in RE mastership without forcing a reboot of the PFE components. Without GRES, the new master RE reboots the various PFE components to

ensure it has consistent PFE state. A PFE reboot forces disruptions to the data

plane, a hit that can last several minutes while the component reboots and is then

repopulated with current routing state.

Graceful Restart

Graceful Restart (GR) is a term used to describe protocol enhancements designed

to allow continued dataplane forwarding in the face of a routing fault. GR requires

a stable network topology (for reasons described in the following), requires protocol modifications, and expects neighboring routers to be aware of the restart and

to assist the restarting router back into full operation. As a result, GR is not transparent and is losing favor to Nonstop Routing. Despite these shortcomings, it’s

common to see GR on platforms with a single RE as GR is the only HA feature that

does not rely on GRES.

Nonstop Routing

Nonstop Routing (NSR) is the preferred method for providing hitless RE switchover. NSR is a completely internal solution, which means it requires no protocol

extensions or interactions from neighboring nodes. When all goes to the NSR plan,

RE mastership switches with no dataplane hit or externally visible protocol reset;

to the rest of the network, everything just keeps working as before the failover.

Because GR requires external assistance and protocol modifications, whereas NSR

does not, the two solutions are somewhat diametrically opposed. This means you

must choose either GR or NSR as you cannot configure full implementations of

both simultaneously. It’s no surprise when one considers that GR announces the

control plane reset and asks its peers for help in ramping back up, while NSR seeks

to hide such events from the rest of the world, that it simply makes no sense to try

and do both at the same time!

Nonstop Bridging

Nonstop Bridging (NSB) adds hitless failover to Layer 2 functions such as MAC

learning and to Layer 2 control protocols like spanning tree and LLDP. Currently,

NSB is available on MX and supported EX platforms.

In-Service Software Upgrades

In-Service Software Upgrades (ISSU) is the capstone of Junos HA. The feature is

based on NSR and GRES, and is designed to allow the user to perform software

upgrades that are virtually hitless to the dataplane while being completely transparent in the control plane. Unlike a NSR, a small dataplane hit (less than five

722 | Chapter 9: Junos High Availability on MX Routers


seconds) is expected during an ISSU as new software is loaded into the PFE components during the process.

Graceful Routing Engine Switchover

As noted previously, GRES is a feature that permits Juniper routers with dual REs to

perform a switch in RE mastership without forcing a PFE reset. This permits uninterrupted dataplane forwarding, but unless combined with GR or NSR, does not in itself

preserve control plane or forwarding state.

The foundation of GRES is kernel synchronization between the master and backup

routing engines using Inter-Process Calls (IPC). Any updates to kernel state that occur

on the master RE, for example to reflect a changed interface state or the installation of

a new next-hop, are replicated to the backup RE as soon as they occur and before

pushing the updates down into other parts of the system, for example, to the FPCs. If

the kernel on the master RE stops operating, experiences a hardware failure, a configured process is determined to be thrashing, or the administrator initiates a manual

switchover, mastership switches to the backup RE.

Performing a switchover before the system has synchronized leads to an all-bets-off

situation. Those PFE components that are synchronized are not reset, while the rest of

the components are. Junos enforces a GRES holddown timer that prevents rapid backto-back switchovers, which seems to be all the rage in laboratory testing. The 240second (4-minute) timer between manually triggered GRES events is usually long

enough to allow for complete synchronization, and therefore helps to ensure a successful GRES event. The holddown timer is not enforced for automatic GRES triggers

such as a hardware failure on the current master RE. If you see the following, it means

you need to cool your GRES jets for a bit to allow things to stabilize after an initial

reboot or after a recent mastership change:


jnpr@R1-RE0>request chassis routing-engine master acquire no-confirm

Command aborted. Not ready for mastership switch, try after 234 secs.

The GRES Process

The GRES feature has three main components: synchronization, switchover, and recovery.


GRES begins with synchronization between the master and backup RE. By default after

a reboot, the RE in slot 0 becomes the master. You can alter this behavior, or disable a

given RE if you feel it’s suffering from a hardware malfunction or software corruption,

at the [edit chassis redundancy] hierarchy:

Graceful Routing Engine Switchover | 723



jnpr@R1-RE1# set chassis redundancy routing-engine 1 ?

Possible completions:


Backup Routing Enguine


Routing Engine disabled


Master Routing Engine


jnpr@R1-RE1# set chassis redundancy routing-engine 1

When the BU RE boots, it starts the kernel synchronization daemon called ksyncd. The

ksyncd process registers as a peer with the master kernel and uses IPC messages to carry

routing table socket (rtsock) messages that represent current master kernel state. Synchronization is considered complete when the BU RE has matching kernel state. Once

synchronized, ongoing state changes in the master kernel are first propagated to the

backup kernel before being sent to other system components. This process helps ensure

tight coupling between the master and BU kernels as consistent kernel state is critical

to the success of a GRES. Figure 9-1 shows the kernel synchronization process between

a master and BU RE.

Figure 9-1. GRES and Kernel Synchronization.

The steps in Figure 9-1 show the major GRES processing steps after a reboot.

1. The master RE starts. As noted previously, by default this is RE0 but can be altered

via configuration.

2. The various routing platform processes, such as the chassis process (chassisd),


3. The Packet Forwarding Engine starts and connects to the master RE.

4. All state information is updated in the system.

724 | Chapter 9: Junos High Availability on MX Routers


5. The backup RE starts.

6. The system determines whether graceful RE switchover has been enabled.

7. The kernel synchronization process (ksyncd) synchronizes the backup RE with the

master RE.

8. After ksyncd completes the synchronization, all state information and the forwarding table are updated.

Routing Engine Switchover

RE switchover can occur for a variety of reasons. These include the following:

• By having the chassid process monitor for loss of keepalive messages from master

RE for 2 seconds (4 seconds on the now long-in-the-tooth M20 routers). The keepalive process functions to ensure that a kernel crash or RE hardware failure on the

current master is rapidly detected without need for manual intervention.

• Rebooting the current master.

• By having the chassid process on the BU RE monitor chassis FPGA mastership

state and becoming master whenever the chassis FPGA indicates there is no current


• By detecting a failed hard disk or a thrashing software process, when so configured,

as neither is a default switchover trigger.

• When instructed to perform a mastership change by the operating issuing a request

chassis routing-engine switchover command. This command is the preferred

way to force a mastership change during planned maintenance windows, and for

GRES feature testing in general.

See the section on NSR for details on other methods that can be used to

induce a GRES event when testing HA features.

Upon seizing mastership, the new master’s chassisd does not restart FPCs. During the

switchover, protocol peers may detect a lack of protocol hello/keepalive messages, but

this window is normally too short to force a protocol reset; for example, BGP needs to

miss three keepalives before its hold-time expires. In addition, the trend in Junos is to

move protocol-based peer messages, such as OSPF’s hello packets, into the PFE via the

ppmd daemon, where they are generated independently of the RE. PFE-based message

generation not only improves scaling and accommodates lower hello times, but also

ensures that protocol hello messages continue to be sent through a successful GRES

event. However, despite the lack of PFE reset, protocol sessions may still be reset depending on whether GR or NSR is also configured in addition to basic GRES. Stating

this again, GRES alone cannot prevent session reset, but it does provide the infrastruc-

Graceful Routing Engine Switchover | 725


ture needed to allow GR and NSR control plane protection, as described in later sections.

After the switchover, the new master uses the BSD init process to start/restart daemons

that wish to run only when the RE is a master and the PFEs reestablish their connections

with chassisd. The chassisd process then relearns and validates PFE state as needed

by querying its peers in the PFE.

Figure 9-2 shows the result of the switchover process.

Figure 9-2. The Routing Engine Switchover Process.

The numbers in Figure 9-2 call out the primary sequence of events that occur as part

of a GRES-based RE switchover:

1. Loss of keepalives (or other stimulus) causes chassid to gracefully switch control

to the current backup RE.

2. The Packet Forwarding Engine components reconnect to the new master RE.

3. Routing platform processes that are not part of graceful RE switchover, and which

only run on a master RE, such as the routing protocol process (rpd) when NSR is

not in effect, restart.

4. Any in-flight kernel state information from the point of the switchover is replayed,

and the system state is once again made consistent. Packet forwarding and FIB

state is not altered, and traffic continues to flow as it was on the old master before

the switchover occurred.

726 | Chapter 9: Junos High Availability on MX Routers


5. When enabled, Graceful Restart (GR) protocol extensions collect and restore routing information from neighboring peer helper routers. The role of helper routers

in the GR process is covered in a later section.

What Can I Expect after a GRES?

Table 9-1 details the expected outcome for a RE switchover as a function of what mix

of HA features are, or are not, configured at the time.

Table 9-1. Expected GRES Results.





RE, no HA


PFE reboot and the control plane reconverges

on new master

All physical interfaces are taken offline, Packet Forwarding Engines restart, the standby routing engine restarts the routing

protocol process (rpd), and all hardware and interfaces are

discovered by the new master RE. The switchover takes several

minutes and all of the router’s adjacencies are aware of the

physical (interface alarms) and routing (topology) change.

GRES only

During the switchover, interface and kernel

information is preserved. The switchover is

faster because the Packet Forwarding Engines are not restarted.

The new master RE restarts the routing protocol process (rpd).

All hardware and interfaces are acquired by a process that is

similar to a warm restart. All adjacencies are aware of the router’s

change in state due to control plane reset.

GRES plus



Traffic is not interrupted during the switchover. Interface and kernel information is preserved. Graceful restart protocol extensions

quickly collect and restore routing information from the neighboring routers.

Neighbors are required to support graceful restart, and a wait

interval is required. The routing protocol process (rpd) restarts.

For certain protocols, a significant change in the network can

cause graceful restart to stop.

GRES plus


Traffic is not interrupted during the switchover. Interface, kernel, and routing protocol

information is preserved for NSR/NSB supported protocols and options.

Unsupported protocols must be refreshed using the normal recovery mechanisms inherent in each protocol.

The table shows that NSR, with its zero packet loss and lack of any external control

plane flap (for supported protocols), represents the best case. In the worst case, when

no HA features are enabled, you can expect a full MPC (PFE) reset, and several minutes

of outage (typically ranging from 4 to 15 minutes as a function of system scale), as the

control plane converges and forwarding state is again pushed down into the PFE after

a RE mastership change. The projected outcomes assume that the system, and the

related protocols, have all converged and completed any synchronization, as needed

for NSR, before a switchover occurs.

The outcome of a switchover that occurs while synchronization is still underway is

unpredictable, but will generally result in dataplane and possible control plane resets,

making it critical that the operator know when it’s safe to perform a switchover. Knowing when its “safe to switch” is a topic that’s explored in detail later in this chapter.

Graceful Routing Engine Switchover | 727


Though not strictly required, ruining the same Junos version on both

REs is a good way to improve the odds of a successful GRES.

Configure GRES

GRES is very straightforward to configure and requires only a set chassis redundancy

graceful-switchover statement to place it into effect. Though not required, it’s recommended that you use commit synchronize whenever GRES is in effect to ensure consistency between the master and backup REs to avoid inconsistent operation after a RE


When you enable GRES, the system automatically sets the chassis redundancy keepa

live-time to 2 seconds, which is the lowest supported interval; attempting to modify

the keepalive value when GRES is in effect results in a commit fail, as shown.

jnpr@R1-RE1# show chassis

redundancy {


## Warning: Graceful switchover configured, cannot change the default keepalive



keepalive-time 25;



When GRES is disabled, you can set the keepalive timer to the range of 2 to 10,000

seconds, with 300 seconds being the non-GRES default. When GRES is disabled, you

can also specify whether a failover should occur when the keepalive interval times out

with the set chassis redundancy failover on-loss-of-keepalives statement.

However, simply enabling GRES results in two-second-fast keepalive along with automatic failover. With the minimal GRES configuration shown, you can expect automatic

failover when a hardware or kernel fault occurs on the master RE resulting in a lack of

keepalives for two seconds:


jnpr@R1-RE1# show chassis

redundancy {



Note how the banner changes to reflect master or backup status once GRES is committed:


jnpr@R1-RE1# commit

commit complete



728 | Chapter 9: Junos High Availability on MX Routers


And, once in effect, expect complaints when you don’t use commit synchronize. Again,

it’s not mandatory with GRES, but it’s recommended as a best practice:


jnpr@R1-RE1# commit

warning: graceful-switchover is enabled, commit synchronize should be used

commit complete


jnpr@R1-RE1# commit synchronize


configuration check succeeds


commit complete


commit complete

You can avoid this nag by setting commit synchronize as a default, which is a feature

that is mandatory for NSR:

jnpr@R1-RE1# set system commit synchronize


jnpr@R1-RE1# commit


configuration check succeeds


commit complete


commit complete

GRES itself does not mandate synchronized configurations. There can

be specific reasons as to why you want to have different configurations

between the two REs. It should be obvious that pronounced differences

can impact on the relative success of a GRES event, so if you have no

specific need for a different configuration it’s best practice to use commit

synchronize to ensure the current active configuration is mirrored to the

BU RE, thus avoiding surprises at some future switchover, perhaps long

after the configuration was modified but not synchronized.

GRES Options

The GRES feature has a few configuration options that add additional failover triggers.

This section examines the various mechanisms that can trigger a GRES.

Disk Fail. You can configure whether a switchover should occur upon detection of a disk

failure using the on-disk-failure statement at the [edit chassis redundancy fail

over] hierarchy:

jnpr@R1-RE1# set chassis redundancy failover ?

Possible completions:

+ apply-groups

Groups from which to inherit configuration data

+ apply-groups-except Don't inherit configuration data from these groups

Graceful Routing Engine Switchover | 729



Failover on disk failure

on-loss-of-keepalives Failover on loss of keepalive

The RE has its own configuration for actions to be taken upon a hard

disk failure. These include reboot or halt. You should not try and configure both actions for the same hard disk fail. When GRES is in effect,

you should use set chassis redundancy failover on-disk-failure.

Otherwise, use the set chassis routing-engine on-disk-failure diskfailure-action [reset | halt] statement when GRES is off. Note that

having the RE with the disk problem perform a shutdown will trigger a

GRES (if configured), given that keepalives will stop, but this method

adds delay over the more direct approach of using the set chassis

redundancy failover on-disk-failure statement.

Storage Media Failures

The failure of storage media is handled differently based on whether the primary of

alternate media fails, and where the failure occurs on the master or backup RE:

• If the primary media on the master RE fails, the master reboots, and the backup

assumes mastership. The level of service interruption is dependent on which HA

features (GRES, NSR, GR, etc.) are enabled. The old master will attempt to restart

from the alternate media, and if successful, will come back online as the backup

RE. It will not become the master unless the new master fails or a manual switch

is requested by the operator.

• If the alternate media on the master RE fails, the master will remain online and

continue to operate as master unless set chassis redundancy failover on-diskfailure option is applied to the configuration. If this option is configured, the

backup will assume mastership, and the old master will reboot. As before, the level

of service interruption is dependent on which HA features are enabled. If the old

master reboots successfully, it will come back online as the backup RE and will

not become the master unless the new master fails or a manual switch is requested

by the operator.

• If any media on the backup RE fails, the backup RE will reboot. If it boots successfully, it will remain the backup RE and will not become the master unless the

master fails or a manual switch is requested by the operator.

Process Failure Induced Switchovers. You can also configure whether a switchover should occur upon detection of thrashing software processes at the [edit system processes]

hierarchy. This configuration triggers a GRES if the related process, rpd in this case, is

found to be thrashing, which is to say the daemon has started and stopped several times

over a short interval (two or more times in approximately five seconds):

jnpr@R1-RE1# show system processes

routing failover other-routing-engine;

The effect of this setting is demonstrated by restarting the routing daemon a few times:

730 | Chapter 9: Junos High Availability on MX Routers



jnpr@R1-RE0# run restart routing immediately

error: Routing protocols process is not running

Routing protocols process started, pid 2236


jnpr@R1-RE0# run restart routing immediately


jnpr@R1-RE0# run restart routing immediately

error: Routing protocols process is not running

On the last restart attempt, an error is returned, indicating that the RPD process is no

longer running, indicating it was not restarted due to thrashing. Also, note that after

the previous process restart the local master has switched to the BU role:


jnpr@R1-RE0# run restart routing immediately

error: Routing protocols process is not running

Verify GRES Operation

One you configure GRES, you want to make sure that after a commit synchronize both

REs reflect either a master or BU status. In this section, the following GRES baseline is



jnpr@R1-RE0# show chassis

redundancy {



Things start with confirmation of a master and BU prompt on the two routing engines.

There should never be two masters or two slaves. The prompt is confirmed to change

for RE1 at R1, which is now in a backup role.



Next, you confirm that the BU RE is running the kernel synchronization daemon



jnpr@R1-RE1>show system processes | match ksyncd

5022 ?? S

0:00.15 /usr/sbin/ksyncd -N

5034 ?? S

0:00.19 /usr/sbin/clksyncd -N

The output also shows the clksyncd daemon, responsible for precision time synchronization over Ethernet to support Synchronous Ethernet and other mobile backhaul

technologies. With all looking good, the final indication that GRES is operation comes

from a show system switchover command. This command is only valid on the BU RE,

as it is the one doing all the synchronizing from the master:


jnpr@R1-RE1>show system switchover

Graceful Routing Engine Switchover | 731


Graceful switchover: On

Configuration database: Ready

Kernel database: Ready

Peer state: Steady State

The output confirms that graceful switchover is on, that the configuration and kernel

databases are currently synchronized, and that IPC connection to the master RE kernel

is stable. This output indicates the system is ready to perform a GRES. You can get the

master’s RE view of the synchronization process with the show database-replication



jnpr@R1-RE0# run show database-replication ?

Possible completions:


Show database replication statistics


Show database replication summary


jnpr@R1-RE0# run show database-replication


jnpr@R1-RE0# run show database-replication summary


Graceful Restart




Message Queue







jnpr@R1-RE0# run show database-replication statistics


Dropped connections

Max buffer count

Message received:

Size (bytes)


Message sent:

Size (bytes)


Message queue:

Queue full

Max queue size









Use the CLI’s restart kernel-replication command to restart the ksyncd daemon on

the current BU RE if it displays an error or is failing to complete synchronization in a

reasonable period of time, which can vary according to scale but should not exceed 10

minutes. If the condition persists, you should confirm matched software versions on

both REs, which is always a good idea when using GRES anyway.

732 | Chapter 9: Junos High Availability on MX Routers


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 9. Junos High Availability on MX Routers

Tải bản đầy đủ ngay(0 tr)