Tải bản đầy đủ
14 BGP4 ( border gateway protocol version 4)

14 BGP4 ( border gateway protocol version 4)

Tải bản đầy đủ


Routing tables and protocols

Figure 6.24

BGP common message header and open message format.

Figure 6.25 BGP update message format and path attributes.

There are three types of messages defined by BGP: open, update and notification/keepalive
messages. The various formats of these message types are illustrated in Figures 6.24, 6.25
and 6.26. Their functions are described next.

BGP common message header
All BGP message types are preceded by the BGP common message header as illustrated in
Figure 6.24a. The fields are coded as follows:

BGP4 (border gateway protocol version 4)


Figure 6.26 BGP notification message format.
Table 6.10

BGP message types

BGP message type value

BGP Message type


Open message
Update message
Notification message
Keepalive message

• the marker field is either set with all bits of value ‘1’ or alternatively contains authentication information;
• the length field indicates the length of the BGP-packet in octets (including header). Allowed
values are 19–4096; and
• the type field value indicates the type of BGP message and therefore the format of the
remainder of the message following the BGP common header. The permitted type values
are listed in Table 6.10.

BGP open message
A BGP open message is used to negotiate the mutual configuration of BGP peer routers after
establishment of the TCP connection. On acceptance of the BGP open message (opening the
connection and establishing the BGP peer relationship), the router sends a keepalive message.
The meaning and coding of the various fields of the BGP open message (Figure 6.24b) are
as follows:
• the version field identifies the version number of BGP in use (the current version is
version 4);
• my autonomous system is the AS number of the network of which the transmitting BGP
speaker is a part. (AS numbers are allocated by IANA (Internet assigned numbers authority)
and may be looked up on www.iana.org);
• the hold time is the time period for which a BGP peer connection will be held (i.e.
considered active) even if no BGP message is received within this period. Should the hold
time expire, the remote BGP speaker will be considered to have failed or otherwise been
removed from service. This is the method used for ageing BGP learnt routing information
(we discussed ageing earlier in the chapter);
• the BGP identifier is a unique identifier for the BGP-speaking router. The same value is
used in all BGP peer relationships of a given router; and
• the parameter fields provide the potential for optional services to be added to the BGP
open message. Parameter type 1 corresponds to the use of authentication information.


Routing tables and protocols

BGP update message
Once a BGP peer relationship has been opened, the BGP peers exchange their complete routing
tables by means of BGP update messages. Subsequent BGP update messages are then sent
to advise of changes in the topology or an increase in the number of reachable destinations.
Like other routing protocols, regular repeats of each update are also sent to prevent routing
information from ageing and being deleted.
The format of a BGP update message is shown in Figure 6.25a. Each individual update
describes the path to reach a given remote autonomous system (AS), the nature and attributes
of the path (i.e. which services it is capable of carrying, limitations, etc.) and a list of IP address
ranges which can be reached in the target AS. The meaning and codings of the various fields
in a BGP update message are as follows:
• the withdrawn routes field comprises a list of IP address ranges which may no longer be
reached in the AS;
• the unfeasible routes length field merely indicates the number of octets making up the
withdrawn routes field;
• the network layer reachability information (NLRI) is a list of the IP address ranges which
can be reached via the BGP router transmitting the update by means of the path described in
the path attributes field. Taken together with the path attributes field, the NLRI represents
one or more entries in the routers routing table; and
• the path attributes field describes the route to a particular destination (or group of destinations as defined in the NLRI field) as it appears in the routers routing table. Normally
the entire route (a list of transit autonomous systems, ending with the destination AS) will
be listed, but it could be that only the next hop is listed (depending upon the coding of
the path attributes fields — Figure 6.25b).
The BGP path attributes field is formatted as illustrated in Figure 6.25b and coded according
to Tables 6.11 and 6.12. Attribute flags (O, T, P and EL) are set according to Table 6.11.
Well-known attributes (flag bit O = ‘0‘) must be recognised by every BGP-speaker. Certain
well-known attributes are mandatory and must always be included. These include the ORIGIN,
NEXT HOP and AS PATH fields as set out int Table 6.12. Optional attributes (bit O = 1) need
not be recognised by the BGP speaker, though the related information should be forwarded to
other BGP peers if appropriate. Optional attributes are further divided into transitive and nontransitive types (flag bit T). Information received about transitive type attributes is forwarded
to other peers, but non-transitive type attributes should not be forwarded.
Table 6.11
Flag abbreviation

BGP attribute flags in path attributes field

Flag name


Optional bit


Transitive bit


Partial bit
Extended length bit

Bit value set to ‘0’

Bit value set to ‘1’!

1 octet attribute
length field
(Figure 6.25b)

Optional attribute
Transitive attribute
2 octet attribute
length field
(Figure 6.25b)

BGP4 (border gateway protocol version 4)
Table 6.12


BGP path attributes

type code


Possible Attribute values






























0 = IGP (interior destination is in local AS)
1 = EGP (exterior path learned by means of EGP
2 = path or destination learned otherwise (e.g. by
route redistribution)
Path segment type value as follows:
1 = AS− SET: unordered list of transit ASs
2 = AS− SEQUENCE: ordered list of transit ASs
3 = AS− CONFED− SET: unordered list of transit
ASs within the local BGP confederation
4 = AS− CONFED− SEQUENCE: ordered list of
transit ASs within the local BGP confederation
IP address of the border router which forms the
next hop of the path
This value serves to distinguish between multiple
possible boundary router connections to a
neighbouring AS. The route with the lowest
MULTI− EXIT− DISC will be preferred
This field is used within an network (i.e. in
IBGP) to inform another external BGP router
in the same AS of the administratively
preferred route to a given destination.
This field indicates to other BGP speakers that
several overlapping routes have been
combined into a single summary route.
The AS number and IP address of a BGP speaker
which conducted aggregation of this route.
The BGP COMMUNITIES attribute allows the
speakers of multiple routes to be combined
into a single community. This simplifies the
creation and administration of BGP routing
policy. Destination communities may then be
classified into types like NO− EXPORT (route
may not be made known to external ASs) or
NO− ADVERTISE (route may not be
advertised to other BGP peers). The BGP
communities attributes is defined by RFC
The ORIGINATOR− ID attribute is related to
BGP route reflection (defined by RFC 1966)
The CLUSTER− LIST attribute is related to BGP
route reflection (defined by RFC 1966)

Path attribute

The partial (P) bit indicates whether the list of path attributes is complete or not, or whether
further attributes follow. The extended length (EL) bit indicates whether the attribute length
field (Figure 6.25b) is 1 octet or 2 octets long.
Table 6.12 lists the path attributes used by BGP to describe routes. The ORIGIN,
NEXT HOP and AS PATH fields are mandatory fields. This information must always be
provided. Together with the NLRI (network layer reachability information) field in the BGP


Routing tables and protocols

update message, it enables complete routing table information to be advertised to BGP peers
for their calculation of the best route. The best route calculation is then undertaken according
to the local BGP policy and the selection of the shortest available route (least number of AS
transit hops) to the destination. BGP does not support load balancing across shortest routes of
equal ‘length’.

BGP notification message
BGP notification messages are used to notify protocol errors which may occur during a BGP
connection. Should such an error occur, then the connection is cleared immediately after
sending the notification message. The format of BGP notification messages is as illustrated in
Figure 6.26. Message meanings and field coding are according to Table 6.13.

BGP keepalive message
BGP keepalive messages have a similar function to the hello packets of OSPF. The keepalive
message informs the peer BGP router, that despite not having sent a routing update message,
the router is still ‘alive’ and fully operational — and that all previously advertised reachable
destinations are still available. This prevents routing information from being aged and therefore
deleted and removed from routing table calculations. The keepalive message consists merely
of the BGP common header (Figure 6.24a, with the type field set to value ‘4’).

BGP route decision-making and BGP policy
Route calculation and decision-making under BGP are according to the shortest route. The
shortest route is that with the smallest number of autonomous system (AS) transit hops. No load
Table 6.13

BGP notification messages: meaning and coding of error code and error subcode

Error message category

Error code

Error message

Error subcode

Message header error


Open message error


Update message error


Connection not synchronised
Bad message length
Bad message type
Unsupported version number
Bad peer AS
Bad BGP identifier
Unsupported optional parameter
Authentication failure
Unacceptable hold time
Malformed attribute list
Unrecognised well-known attribute
Missing well-known attribute
Attribute flags error
Attribute length error
Invalid ORIGIN attribute
AS routing loop
Invalid NEXT− HOP attribute
Optional attribute error
Invalid network field
Malformed AS− PATH


Hold timer expired
Finite state machine error


BGP4 (border gateway protocol version 4)


sharing is possible across alternative paths of a similar shortest length. In the case of multiple
paths being found to have the same ‘length’, the path with the lowest MULTI EXIT DISC
value (see Table 6.12) will be chosen.
The network administrator can influence the choice of routes by BGP by setting up a BGP
policy. The BGP policy affects which routing information received from BGP peer partners
is considered in the routing table calculation and also affects which routes are advertised.
By ignoring routing information from certain sources (this is done by incoming filtering),
the BGP router can be forced to use alternative routes. And by not advertising certain routes,
the use of these routes by other parties can be restricted. Two forms of non-advertisement may
be defined:
• NO EXPORT will prevent routing updates from being advertised to external BGP peers
(outside the BGP confederation);
• NO ADVERTISE will prevent routing updates from being advertised to any BGP peers
(this information will only be shared with interior gateway protocols (IGPs)).

BGP communities
A BGP community allows a single BGP policy to be applied to a whole group of different
destinations. This makes for easier creation and maintenance of the policy. Each BGP community is classified according to whether routing updates may be advertised or not (i.e. as
subject to NO EXPORT, NO ADVERTISE or Local AS only policies as explained above).
An individual destination may belong to multiple BGP communities.

Management of BGP: route maps, route redistribution and route aggregation
A route map sets the conditions for route redistribution between different routing protocols.
The route map consists of set and match commands which create a defined condition, according
to which, routing information learned by one routing protocol (e.g. BGP or an IGP) will be
transferred to the other. Unmatched routing information is not transferred.
Obviously some level of route redistribution is required in order that externally reachable
destinations can be advertised to relevant internal routers and internal reachable destinations
similarly notified to exterior routers. For this, the link cost or distance metrics of different
interior and exterior gateway protocols need to be converted. To avoid the problems created
by this conversion, BGP route redistribution is discouraged as far as possible. In particular it
is recommended that BGP information relevant to a transit AS (autonomous system) not be
redistributed to an IGP, but instead be transferred from one BGP router to another by means
of IBGP (interior border gateway protocol).
Route aggregation is the term used to describe grouping a number of IP address ranges
together to share a single outgoing route. This is typically achieved by the use of a static route.
The fact that such manual intervention might still be considered by network administrators
reflects the desire to apply administrative ‘preferences’ to external routes rather than rely
entirely upon destination reachability information automatically received using BGP. On its
own, automatic route calculation by BGP is unable to take into account factors in route choice
such as the financial cost or reliability of a given third-party AS (autonomous system) network.

BGP confederation
A BGP confederation presents a group of autonomous systems (AS) externally as if they were
a single AS. This has the benefit of reducing the mesh of BGP peer relationships which might
otherwise have to exist and enables a tighter control of routing policies.


Routing tables and protocols

Route reflection
When BGP route reflection is used, an autonomous system (AS) is divided into multiple areas
called clusters, each cluster being assigned a CLUSTER ID (Table 6.12). The route reflector
clients (i.e. routers within the cluster) are only permitted to establish IBGP sessions with
the route reflector (RR) of the cluster. In this way, the requirement for full-meshing between
IBGP routers within an AS is reduced to only requiring a full mesh between the route reflectors
(RR). Like BGP confederation, this simplifies the number of BGP connections which must be
established, so leading to more easily manageable networks.

Route flap dampening
Route flapping is the term applied to unstable routes, in particular when a given destination or
path to that destination is seen to oscillate between an ‘available’ and a ‘down’ state. Route
flapping is undesirable because it can lead to instability of the network as a whole, and so
BGP attempts to eliminate it using a technique called route flap dampening. In simple terms,
BGP ‘penalises’ flapping routes by making them appear artificially ‘longer’ during the routing
table calculation.

6.15 Problems associated with routing in source and destination
local networks
So far in this chapter, we have learned how routers employ routing protocols between themselves to exchange routing information and so build routing tables for the forwarding of IP
packets. Such mechanisms ensure that an IP packet can be delivered from a router in the
source network to the router in the appropriate destination network. The destination router
knows which networks are connected to it, using which type of physical interfaces (e.g. ethernet, serial line etc.). In addition, the router is aware which range of IP addresses are associated
with each directly connected interface (because this information is configured manually into
the router when the interface is set up). But our routing table is not quite complete! What the
router may not yet know, is the layer 2 address (sometimes called hardware address) which is
associated with a given IP address. Thus it could receive an IP packet destined for one of the
IP-ranges which it recognises as being ‘directly connected’ but be unable to deliver the packet
to the correct end-device because it does not know the LAN address (MAC address, IEEE
address or hardware address) which corresponds with the destination IP address. To overcome
the problem, the address resolution protocol (ARP) was devised.

Resolving a destination IP address for the associated hardware address
When the router in a given destination LAN receives an IP packet for an IP address which it can
identify as being ‘directly connected’ by means of a given shared medium interface (typically
a LAN) but does not already know the hardware address (i.e. layer 2 address, MAC address
or IEEE-address) of the destination device, then it broadcasts an ARP request to all stations
in the LAN (using the LAN or other layer 2 datalink protocol). The ARP request indicates the
IP address for which a hardware address is being sought (the target IP address). The station
which recognises its IP address (which in this case will have been manually configured in
its ‘network settings’) replies to the ARP request with an ARP response. The ARP response
packet replies with the target IP address and the related hardware address (Figure 6.27). This

Problems associated with routing in source and destination local networks

Figure 6.27


Identifying the hardware address associated with a target destination IP address using ARP.

enables the router to build yet another entry in its routing table — the hardware address of a
directly reachable destination IP address.

How a device wishing to communicate gets a source IP address and discovers
the nearest router
Using routing protocols and ARP (address resolution protocol), routers are able automatically
to determine the best route to any reachable IP address. Nonetheless, communication using
the IP protocol cannot commence until the sending device also knows its IP address and the
default gateway address of the first router. The source IP address must be included in all sent
packets, which in the first instance need to be forwarded by the router to the first router in
the connection. This presents two problems: determining the IP address of a source host, and
knowing which is the first router (i.e. gateway) to which this host should forward IP packets.
There are two basic ways in which to assign IP addresses to hosts or other end-user devices:
either by manual configuration or by automatic assignment. Manual configuration is normally
used in the case of permanently assigned addresses, while temporarily assigned addresses
are normally automatically configured. There are various different protocols available for the
automatic assignment of IP addresses by a router to a source host. The best-known of these are:
• DHCP (dynamic host configuration protocol); and
• BOOTP (bootstrap protocol); and
• RARP (reverse address resolution protocol); and
• ES-IS (end system-intermediate system).
The job of ‘finding’ the first router can similarly either be manually performed (by configuring a so-called default gateway IP address) or automatically (when the process is called
router discovery).


Routing tables and protocols

Nowadays, DHCP (as defined in RFC 2131) has become the normal and most widely
used method of dynamically and automatically assigning IP addresses to source stations and
hosts for temporary periods and of router discovery. DHCP itself is really just a further
development and refinement of the bootstrap protocol (BOOTP). Before DHCP, BOOTP was
originally intended to assign an IP address to a host being newly booted (i.e. switched on) and
to deliver a boot file from the BOOTP server (typically a UNIX server in the past). The boot
file includes the default gateway address and other network settings. One of the disadvantages
of BOOTP is the continuing need for manual updating of the boot file. DHCP extends the
capabilities of BOOTP by arranging for automatic generation of configuration settings and the
control of a much wider range of IP-suite protocol parameters.
Once the IP address of the gateway (i.e. first router) is known, ARP (address resolution
protocol) may need to be used to derive the hardware address (i.e. the MAC address) of the
router. Following this, IP packets generated by the host can be forwarded to the first (default
gateway) router, from where they can be successfully forwarded to their destination.
Each host (e.g. PC in a LAN) is typically configured nowadays as a DHCP client. The
‘network settings’ of the PC are configured to receive an ‘automatic IP address assignment’
and to discover the ‘gateway IP address’. Meanwhile, the access router in the LAN acts as
the DHCP server. Dynamic allocation of IP addresses in this way makes for much easier
administration of IP addresses. It is much easier to move or sub-divide subnetwork ranges
of IP addresses to different parts of a network — without having manually to reconfigure the
IP addresses in each individual host. In addition, the temporary assignment of IP addresses
alleviates to some extent a worldwide shortage of IPv4 addresses.
Reverse ARP (RARP) is an alternative to BOOTP or DHCP. As the name suggests, it is
an adaptation of ARP (address resolution protocol) to allow the process to work ‘in reverse’.
Rather than wishing to resolve a target IP address for the corresponding hardware address (as
ARP does), hosts using RARP already know their own hardware address but need to request
the assignment of an IP address.

Router discovery
The IRDP protocol (ICMP router discovery protocol ) in IPv4 or neighbour discovery in IPv6
(RFC 2461) provides a means for hosts automatically to discover routers within their local
network. By making use of such automatic discovery, hosts need no longer be manually
configured with a static default gateway routing table entry. But while IRDP allows for the
automatic discovery of neighbouring routers, it does not allow the host to determine which
of possible multiple routers represents the best route to a given destination. For this purpose
there are two alternatives available to the network administrator:
• implement a static default gateway route after all; or
• configure the host silently to listen to routing information broadcasts of the neighbouring
routers using interior gateway protocols (IGPs) such as RIP (routing information protocol)
(We discussed earlier in the chapter how RIP-2 specifically includes a multicast address
for this purpose).

ES-IS (end system-intermediate system) protocol
ES-IS is another protocol designed for router detection and address resolution. It was developed
principally as the protocol to be used in conjunction with the OSI (open systems interconnection) connectionless network service (CLNS).

Problems associated with routing in source and destination local networks


ARP (address resolution protocol), RARP (reverse ARP) and inARP (inverse
ARP) message format
Much of the functioning of the address resolution protocol (ARP) can be understood merely
by inspecting the message format (Figure 6.28) and understanding the field codings:
• the hardware type is the type of network interface for which a hardware address is sought
(Table 6.14);
• the protocol type has the value 08-06 for the Internet protocol (IP) in the case of an
ethernet LAN;
• the hardware length field indicates the length of the hardware address;
• the protocol length field indicates the length of the protocol type field;
• the operation field indicates the ARP message type (Table 6.15); and
• the remaining fields are IP address and hardware fields of the sender and target (i.e.
ARP (address resolution protocol) is used to discover the hardware address (i.e. layer 2 address,
MAC-address or IEEE address) associated with a given target IP address. It is defined by RFC
0826. Messages are broadcast in the format of Figure 6.28 within the IEEE 802.2 frame (LAN
or other layer 2 datalink frame).

Figure 6.28 ARP (address resolution protocol) message format.
Table 6.14

ARP (address resolution protocol), RARP (reverse
ARP), inverse ARP (inARP), BOOTP (bootstrap
protocol) and DHCP (dynamic host configuration
protocol): hardware type field coding

Network hardware type

Ethernet (pre-IEEE 802)
IEEE 802-networks
Frame Relay
ATM (asynchronous transfer mode)

Network hardware
type code


Routing tables and protocols
Table 6.15 ARP (address resolution protocol), RARP
(reverse ARP) and inverse ARP (inARP):
operation field coding
Message type
ARP request
ARP response
RARP request
RARP response
Inverse ARP (inARP) request
Inverse ARP (inARP) response

Operation field
code value

RARP (reverse address resolution protocol) is an adaptation of ARP used for triggering the
assignment of an IP address to a given source host which already knows its own hardware
address. The message format is the same as ARP (Figure 6.28 and Tables 6.14 and 6.15). Like
ARP request messages, RARP request messages are also broadcast to all stations on the LAN,
but using the LLC (logical link control) protocol type 80-35. RARP is defined in RFC 0903.
A further variation of ARP is the inverse address resolution protocol (inARP). Somewhat
like RARP, inARP is used to request a protocol address corresponding to a given hardware
address. Specifically, inARP is designed to be used in conjunction with frame relay stations, to
discover the IP addresses of remote stations for which only the frame relay connection address
(the DLCI — data link connection identifier) is known by the local router. This is important
for the discovery of neighbouring routers and in the election of designated routers and back-up
designated routers as we discussed earlier in the chapter. inARP is defined in RFC 2390.

BOOTP (bootstrap protocol) and DHCP (dynamic host configuration protocol)
The bootstrap protocol (BOOTP) is a protocol by which a client (i.e. a host in an ethernet or
other LAN) may obtain an IP address, the IP address of a server and the name of a bootfile. It
typically ran on a UNIX server. DHCP (dynamic host configuration protocol) is an extension
of BOOTP to allow dynamic configuration of the entire IP-suite protocol software of a host
(DHCP client) by a DHCP server. Typically DHCP servers are routers. BOOTP is defined
in RFC 0951. DHCP is defined in RFC 2131. The interoperation of BOOTP and DHCP is
defined in RFC 1534. BOOTP and DHCP share the same packet format (Figure 6.29).
The fields of the BOOTP/DHCP message are coded as follows:
• the message type is either a BOOTP request (value ‘1’), a BOOTP reply (value ‘2’);
• the hardware type is usually ‘IEEE 802.2’-based network (value ‘6’);
• the hops field records the number of hops across which the packet has been forwarded in
an attempt to find a BOOTP or DHCP server. A maximum of 16 hops are allowed before
the message must be discarded;
• the transaction ID (XID) contains any random value to link BOOTP/DHCP requests and
replies by a common identifier. This enables recognition of the reply;
• the seconds field records the elapsed time in seconds since the client sent the request;
• the first bit (ie. most significant bit) of the flags field is the broadcast flag. When set, this
indicates that the reply should be sent to the network broadcast address rather than the
client unicast address. All other bits in the flag field should be set to value ‘0’;