Tải bản đầy đủ
9 The ISO management model: FCAPS, TMN, Q3 and CMIP/ CMISE

9 The ISO management model: FCAPS, TMN, Q3 and CMIP/ CMISE

Tải bản đầy đủ

394

Managing the network

Figure 9.16 ITU-T telecommunications management network (TMN) model for network management
architecture (ITU-T/M.3010) [reproduced courtesy of ITU].

Configuration management is the maintenance of network topology information, routing
tables, numbering, addressing and other network documentation and the coordination of network configuration changes.
Accounting management is the collection and processing of network usage information as
necessary for the correct billing and invoicing of customers and the settlement with other
operators for the use of their interconnected networks to deliver packets.
Performance management is the job of managing the traffic flows in a live
network — monitoring the traffic flows on individual links against critical threshold values
and extending the capacity of links or changing the topology by adding new links. It is the
task of ensuring performance meets design objective (e.g. service level agreement).
Security management functions include:
• identification, validation and authorisation of network users and network management
operators;
• security of data;
• confirmation of requested network management actions, particularly commands creating
or likely to create major network or service upheaval;
• logging of network management actions (for recovery purposes when necessary, and also
for fault auditing); and
• maintenance of data consistency using strict change management.

TMN management model
The telecommunications management network (TMN) is the management infrastructure developed by ITU-T for full-scale network management of complicated networks — and in particular, of carrier networks. While it is not directly relevant to IP-based networks, many network

The ISO management model: FCAPS, TMN, Q3 and CMIP/CMISE

395

operators may in practice encounter its terminology and interfaces. For this reason, we present
it here. Figure 9.16 illustrates the architecture of management system components in a TMN.
The most important components of the architecture are as follows:
• the operations system (OS) — this is a combination of hardware and software — what in
common language is called a ‘network management system’ or ‘network management
server’.
• the data communications network (DCN) — this is generally conceived as a dedicated datacommunications network for collecting and distributing network management information.
(This function could also be provided by the data network being managed.)
• the network element (NE) being managed.
• the Q3 -interface — this is the interface over which the CMIP (common management information protocol) is intended to be used. CMIP is a protocol similar to SNMP, but more
powerful and more complex;
• the mediation device (MD) — this is the equivalent of an SNMP proxy agent. It converts
management messages from the standard Q3-format to non-standard ‘proprietary’ formats.
• the workstation (WS) — this is the computing device (e.g. PC or workstation) used by
human network managers to access the network management system (i.e. the OS).
Figure 9.17 illustrates the typical split of TMN functions between a network manager (typically
server hardware and software) and the agent (network management functions residing in the
network element being managed). It also shows the interfaces (F, Q3 , Qx , X) intended to be
standardised as part of TMN.
Figure 9.18 shows the five layers of functionality defined by the OSI (open systems interconnection) management model for a TMN. The layers are intended to help in the clear and
rational definition of TMN operating system boundaries, thus simplifying the definition, design
and realisation of software applications and management systems, by simplifying their data
and communication relationships to one another.
At the lowest functional layer of Figure 9.18 (network element layer, NEL) are the network
elements themselves. These are the active components making up the networks. Above them,
in the second layer of the hierarchy, is the element management layer (EML) containing

Figure 9.17 Typical division of functionality between a TMN network manager and its agents.

396

Managing the network

Figure 9.18 The functional layers of TMN.

element managers. Element managers are devices which provide for network management
and control of single network components or subnetworks (e.g. a local management terminal
or a proprietary network management system).
At the third layer, the network management layer (NML) are managers which control all
the subnetworks of a given network type (e.g. ISDN or ATM).
The service management layer (SML) contains service managers which monitor and control all aspects of a given service, on a network-independent basis. Thus, for example, it is
possible in theory to conceive a frame relay service provided over three different network
types — packet switched-type network, ISDN and ATM. The service manager ensures installation of a given customer’s connection line needs on the appropriate network(s) and monitors
the service delivered against a contracted frame relay service level agreement.
The business management layer (BML) contains functionality necessary for the management
of a network operator’s company or organisation as a whole. Thus purchasing, bookkeeping
and executive reporting and information systems are all resident in this layer.

The Q3-interface and the common management information protocol (CMIP)
Crucial for the successful communication between manager and agent over the Q3 -interface
of the TMN (telecommunications management network) architecture is:
• the definition of a standard protocol (set of rules) for communication (this is the common
management information protocol, CMIP), and;
• the definition of standardised network status information and control messages for particular standardised types of network components (the managed objects and management
information base — MIB).

Tools for network management

397

CMIP (common management information protocol) delivers the common management information service (CMIS) to the operating system of Figure 9.16. CMIS is the service allowing
a CMISE (common management information service entity — i.e. a software function) in the
manager (the ‘OS’ of Figure 9.16) to communicate status information and network control
commands with a CMISE in each of the various agents. CMIP itself is an OSI layer 7
protocol (rather like SNMP), which sets out the rules by which the information and commands (CMISE services) may be conveyed from manager to agent or vice versa. These basic
CMISE services are restricted in number. They are listed in Table 9.6. Note the similarity
with the basic protocol messages (PDUs — protocol data units) of SNMP (simple network
management protocol).

9.10 Tools for network management
Over the years, a range of different software-based tools has emerged for network-managing
large and complex data networks. The market demand for good tools is strong, and a number
of the tools have become ‘household names’ in the telecommunications industry, but there
is still no single tool which is up to the ‘whole job’. Instead de facto standard solutions for
specific tasks have appeared (i.e. specialised solutions for fault management, order-processing
and configuration, network service performance management, etc.).
Tools are beginning to mature, but nonetheless continue to be developed. Specialist network
management software manufacturers are typically trying to expand their areas of ‘specialism’
into neighbouring areas — so that they may take market share from competing products. Meanwhile, many manufacturers of network components (so-called network elements) do not invest
much effort in their ‘proprietary’ element manager (EM) network management software. Their
belief (rightly or wrongly) is often that an expensive network management solution may count
against the choice of their networking products.
Some network management tool manufacturers would like you to believe that they can
offer a ‘complete’ or ‘umbrella’ solution for everything — order processing, provisioning,
Table 9.7

The basic CMISE (common management information service entity) services
[CMIP protocol commands]

Service name
M-ACTION

M-CANCEL-GET
M-CREATE
M-DELETE
M-EVENT-REPORT

M-GET

M-SET

Function
This service requests an action to be performed on a
managed object, defining any conditional statements
which must be fulfilled before committing to action.
This service is used to request cancellation of an
outstanding previously requested M-GET command.
This service is used to create a new instance of a managed
object (e.g. a new port now being installed)
This service is used to delete an instance of a managed
object (e.g. a port being taken out of service)
This service reports an event, including managed object
type, event type, event time, event information, event
reply and any errors.
This service requests status information (attribute values) of
a given managed object. Certain filters may be set to
limit the scope of the reply.
This service requests the modification of a managed object
attribute value (i.e. is a command for parameter
reconfiguration).

398

Managing the network

configuration, performance management, billing, fault management and SLA (service level
agreement) management, but even the most comprehensive solutions are rarely little more
than a ‘patchwork’ of different ‘specialised function’ systems (e.g. for fault management,
configuration management, etc.) which have been loosely tied together.
Perhaps the best-known and most widely used system is Hewlett Packard’s OpenView.
OpenView started life as a popular SNMP manager for collecting SNMP alarms and monitoring
the current network performance status. It is still one of the most popular network management
products for this function. However, over time, Hewlett Packard have started to offer a range
of complementary software for other network management functions under the same marketing
name — ‘OpenView’. Some of these products were originally developed by partner companies
or start-up companies which got taken over by Hewlett Packard and have been adapted for
‘integration’ into the original OpenView software.
A product with similar functionality to Hewlett Packard’s OpenView is the IBM company’s
NetView. NetView is popular among large enterprise organisations with established large networks of IBM mainframe computers, since it allows for an effective management of both the
network and the computer software applications running across it — allowing human system
operators to diagnose quickly the root cause of any problems which might arise.

Configuration management
Historically, the systems available for end-to-end configuration management of complex
networks have not been sophisticated enough to cope with the complexity of the task. Configuration management requires that a single consistent database be maintained about the complete
topology and configuration of the network. For many years, it was impossible to even create
such a database, let alone maintain it, because of the lack of standardised MIB (management information base) definitions of all the possible managed objects which go to make up
a network.
While nowadays many MIBs and managed objects have been defined, there are endless
possibilities for how different objects may be related to one another. Thus, for example, an
end-to-end data connection between two inter-communicating computers might traverse a large
number of different types of connections represented by different types of managed objects
(Figure 9.19). To understand the entire end-to-end connection, each of these objects needs to
have been appropriately related to one another. But how do you easily relate objects of a
different nature to one another? Let us consider Figure 9.19 in detail.
The ‘connection’ between the Internet user’s PC and the server in Figure 9.19 traverses a
number of different networks and configuration types. There is a dial-up connection across the
public switched telephone network (PSTN), followed by an IP ‘connection’ across a router
network and an ethernet LAN ‘connection’ at the far end. But to complicate things, the two
routers forming the IP part of the network are actually interconnected by means of a frame
relay network. Next let us consider the effect of a failure in the frame relay network at the
centre of this connection. Ideally, as the network manager responsible for the connection from
the PC to the server I would like to be immediately informed of the cause of the failure. But
how can I relate the frame relay network line failure to the knock-on service problems it will
cause? The answer is — with great difficulty.
A pragmatic approach to working out the list of customer connections affected by a particular frame relay network connection failure in Figure 9.19 might appear to be to make a list of
customer names or network addresses and associate this list with the frame relay line ‘object’
in the network configuration database. In a few cases, such an approach (coupled with a large
amount of effort to maintain the database) might work. But in practice, such an approach is
impractical in most cases. Each time the PC user in Figure 9.19 dials in to the network he/she
will arrive at a different port on the router and be allocated a different IP address by DHCP

Tools for network management

Figure 9.19

399

Example of an end-to-end ‘connection’ traversing different types of networks.

(dynamic host configuration protocol). In addition, the IP path to the destination router may
change according to traffic conditions or due to changes in topology of the router network. In
short, the connection may take a different path through the frame relay network each time. So
the knock-on effect of any individual frame relay network trunk will be difficult to predict.
The network manager of a network like that illustrated in Figure 9.19 faces three main
problems when either trying to configure new ‘connections’ ordered by customers or when
trying to trace the cause of faults. These are:
• the different components of the network (corresponding to the different managed objects in
the MIB of the relevant network element) can be combined together in different combinations to create an overall end-to-end connection (there is not a set of simple ‘hard-and-fast’
rules about how to combine the different components); and
• the relationship of different components to one another may be on a 1 : 1 basis, or alternatively on a 1:n, n:1 or n:n basis (This makes it very difficult to conceive a simple network
configuration database structure which is capable of recording the correct one of many
possible different network topology permutations). Worst of all:
• the network topology is changing all the time. It is affected by the provision of new
customer lines and trunks, new nodes, current traffic loading and current network failures.
The complexity of creating a configuration management system is indeed awesome. But the
potential reward of significantly higher levels of network service and quality have spurred
many network operators and network management software developers to attempt a solution.
Much money has been spent and some interesting approaches have been developed but much
work still has to be done.
Perhaps one of the most effective tools for end-to-end management of ‘connections’ across
a network is provided by the Syndesis company tool NetProvision. This aims to provide a
solution for end-to-end network connection ‘service creation and activation’.

400

Managing the network

There is a wide range of ‘provisioning support’ and ‘service activation’ software tools
available, but many of these are oriented to the realisation of a ‘simple’ customer network
access line rather than for the provision of the end-to-end connection across an entire network.
Typically these systems are designed to:
• check that a network port is available at the relevant first network node to connect the
customer premises (order and schedule new equipment if necessary);
• check that a line is available within the local cabling network to connect the customer
premises to the port;
• allocate an available network address (as appropriate);
• schedule the installation manpower to undertake the task; and
• confirm the installation date to the customer.
Some ‘provisioning’ systems are linked to ‘customer service’ databases, and are intended to
track the quality achieved on customer lines while in operation. In this case, the quality of the
line is assumed to be adequate except during times when the ‘fault’ or ‘trouble ticket’ database
system has received a ‘live’ fault report for the connection. Such an approach assumes that
the customer will complain when his service is not working and thus that the customer is the
main means of quality monitoring.
The number of tools available for assisting the planning and configuration of networks
increases daily, but the task remains heavily dependent on skilled network engineers and
technicians; their knowledge of how to combine different types of components into reliable
network services and their experience of diagnosing and tracing faults encountered with such
networks. The ultimate ‘umbrella network manager’ (the ‘glue’ between the different network
management tools) is still a human engineer!
Below, we review some of the common tools used for configuration of Internet network components:
• Cisco IOS (Internetworking Operating System) is a largely text-based control language
used for configuring network services and networked applications in a standard manner on
Cisco routers and other Cisco devices. It is intended to provide a unified and homogeneous
manner of configuring devices and of controlling and unifying complex and distributed
network information;
• Cisco works is a tool designed for optimising network traffic and managing router access
lists;
• Cisco ConfigMaker is a software tool intended for the configuration of small router networks;
• Juniper’s JunOS (Juniper Operating System) is the Juniper equivalent of Cisco’s IOS and
is used to configure Juniper’s routers.
An approach used in some SNMP-based ‘umbrella’ network manager systems (and possible
with HP OpenView, among other systems) is to enable the configuration of all the different
network device types from a ‘single workstation’. For some network operators, such a ‘single workstation’ approach has been important, because of the impracticability and costs of
multiple video screens and keyboards for each network management operator. Nowadays it
is becoming increasingly common to find that such a ‘single workstation’ approach is based
on the standard use of SNMP to monitor and configure all the network components directly.
In the past, however, it was not uncommon for the different configuration softwares of the

Tools for network management

401

different network components simply to be ‘hidden’ behind a shared graphical user interface.
Thus the ‘click’ to configure one component of the network (shown on a common topology
diagram) would activate a different configuration software than the ‘button’ for configuring a
different kind of device.

Fault management tools
Faults in an IP-based data network are usually discovered as the result of either:
• the receipt of an SNMP alarm or event message (an SNMP trap); or
• the reported complaint of a customer or end-user to the help desk.
It is usual that both types of ‘fault report’ be recorded by the issue of a trouble ticket by a fault
management system. Probably the best-known fault management and trouble ticket system is
the Remedy system (nowadays marketed by Peregrine systems).
Faults reported by humans are entered into the trouble ticket system by hand either by
helpdesk or customer service representatives. Sometimes this software is also integrated into
call centre system software (in order that customer details can be automatically filled out by
derivation from the calling telephone number). A trouble ticket number is issued and the ticket
remains ‘open’ until a technician has diagnosed the cause of the fault, rectified the problem
and ‘handed over’ the use of the network back to the customer or end-user. At this point, the
trouble ticket is closed with an explanatory report classifying the problem and its resolution.
The trouble ticket system provides valuable statistics for analysing the quality of network
service achieved, and can thus be used to manage service level agreements (SLAs) made with
end-users.
By integrating an SNMP-based ‘umbrella’ network management system (such as Hewlett
Packard’s OpenView) into a trouble ticket system, certain ‘critical’ SNMP trap messages
(network alarms) can be made to generate a trouble ticket automatically. Such automatic
generation of trouble tickets is standard practice in large scale networks. It can be a handy
way of ‘calling out’ a technician using the ‘dispatch’ and ‘scheduling’ functions of the trouble
ticket system. In addition, the automatic generation of trouble tickets leads to a much more
precise measurement of the achieved standard of service quality and availability.
For the monitoring and diagnosis of network faults, SNMP-based ‘umbrella’ network
management systems are generally used. These typically filter and correlate the plethora
of information received by means of SNMP (SNMP traps, alarms, events as well as other
information) in an attempt to locate the ‘root cause’ of a problem. One of the greatest problems is sifting through the deluge of information which a single network failure can lead to.
Thus, for example, a frame relay connection failure in the network of Figure 9.19 will lead to
a whole range of different alarm, event and other SNMP messages being reported. The two
routers at either end of the frame relay connection will report the loss of the trunk on a given
port. The PC meanwhile will lose its end-to-end connection with the server. It might notice
immediately, or may have to conclude as the result of a timeout (the server having failed to
respond) that this connection has been severed. The PC will report that the ‘connection to the
server has been lost’. By filtering and alarm correlation, the ‘umbrella’ network management
system is programmed to conclude that the fact that the ‘connection to the server has been
lost’ because of the ‘link failure on router port X’. In consequence it prioritises the link failure
fault for the immediate attention of the human network manager.
The correlation and filtering of alarms usually rely on human experience and judgement,
though some ‘expert’ software in fault diagnosis systems is able over time to ‘learn from experience’ (by working out which was the most frequently determined root cause and resolution

402

Managing the network

determined by human technicians on previous occasions). A good dose of human input is
required in the job of tracing, diagnosing and correcting network faults!
Two of the most popular network management software tools for monitoring and managing network faults are Hewlett Packard’s OpenView and Micromuse’s Netcool products. As
a simple tool for small networks, CiscoWorks is also widely used for troubleshooting and
network optimisation.

Localisation of network faults
The localisation of faults within a network is often carried out by checking sections of the
end-to-end path in turn, using loopbacks. Working from one end (say, the PC in Figure 9.20),
the technician attempts to locate the point in the end-to-end connection where continuity has
been lost. First he or she checks that the PC is getting a response from the modem and can
‘talk’ in both directions with the modem. This is done by setting a loopback condition at the
modem and then sending a test signal. Provided the test signal is returned by the modem in
the loopback condition, then the technician concludes that the PC, the modem and the line
between them are all working ‘OK’. Next, the loopback at the modem is removed and replaced
with a loopback at the first router. If the line between the modem and the router is faulty,
then the new test signal from the PC will not be returned. If, on the other hand, the signal
is returned, then the line from the PC as far as the router is assumed to be ‘OK’. Steadily
the technician checks each progressive link in the connection until the faulty link is found.
More detailed checks can then go into determining the precise cause of the fault and the most
appropriate remedy.
There are a number of different ways in which loopbacks (or equivalent tests) can be
conducted. Since I have myself discovered technicians struggling to interpret the results of
different types of loopback tests, I think it may be worth explaining how some of the different
types work. Figure 9.20 illustrates a number of different loopback-type tests.
The PING (packet Internet groper) test is specific to IP networks. The other tests shown
in Figure 9.20 are commonly used standard loopback tests used in telecommunication line
transmission testing. The tests function in very different manners. Understanding how they
work is critical to understanding the results of the test!
Figure 9.20a shows the configuration of a connection in ‘normal operation’. The connection
commences at node A and traverses node B. Since the communication is duplex there is a
separate path used for transmit and receive directions of transmission.

Figure 9.20 Different types of network loopback tests.

Tools for network management

403

Figures 9.20b and 9.20c both show physical path loopbacks. Telecommunications line transmission equipment in particular (including line terminating devices such as CSUs, DSUs, NTEs
and NTs — see Chapter 3) typically allow such loopbacks to be applied by means of remote
network management commands to the device (device B in both Figures). Such loopbacks
return the physical layer (layer 1) protocol signal unchanged to the sender (device A). A
tone-generating device or a BERT (bit error ratio tester) are the correct types of test device
to send signals to such a loopback. An IP (Internet protocol) packet, on the other hand, is an
inappropriate test signal in the case of a physical loopback. . . . The problem is that the device
A has to send a packet with the same source as destination address (its own IP address). IP
devices such as routers usually discard such ‘unallowed’ packets. You therefore might conclude (incorrectly) that the path from node A to node B was broken, when in fact the packet
only did not return because it was (rightly) discarded. Use care when using this type of loopback! During the period of the loopback, the transmission path from node A to the remote
destination is cut.
Figures 9.20d, 9.20e and 9.20f are all special types of loopback tests developed for testing the transmission continuity of different types of data network and layered protocols.
Figure 9.20d illustrates the widely used PING (packet Internet groper) test used to check the
continuity of IP (Internet protocol) paths between routers, hosts and other IP-network components. An IP packet containing an ICMP (Internet control message protocol) message is sent
to the IP address of a particular node in the network (in the case of Figure 9.24 the addressed
node is node B). When the packet is received at node B, node B returns a confirmation
message as a reply, including the time when it received the initial PING request in its PING
response. Such PING messages can be a useful way not only of checking continuity, but also
of determining network delays along the route (calculated from the timestamp included in the
response). The response message can also be programmed to record all the intermediate nodes
traversed along the route back from node B to node A. So in addition, the PING message can
be used to determine the exact path of IP ‘connections’ through a router network. PING is a
simple, but very valuable means of localising problems in IP-based networks!
Figure 9.20e illustrates a type of loopback available in some types of frame relay network.
The loopback comprises not only a physical layer loopback to cause received frames to be
returned, but in addition, reverses the source and destination addresses in the frame. This
allows the node A to receive the packets it itself sent. Without this reversal of source and
destination addresses, frames would be discarded as invalid. (As we discussed above, node A
will usually discard as ‘invalid’ frames it receives in which it appears to be the source.)
Figure 9.20f illustrates a type of loopback available in ATM (asynchronous transfer mode)
networks. In this case, the node B is able to return (i.e. loopback) special PL-OAM (ATM
physical layer operations and maintenance) cells while simultaneously allowing the ‘normal’
operation of the end-to-end connection to continue undisturbed. In many ways the operation
of the PL-OAM cell is analogous to the PING procedure used in IP networks (Figure 9.24d).

Performance management tools
For everyday performance monitoring of large networks, the same network management systems as used for receiving SNMP traps come into question — Hewlett Packard’s OpenView,
Micromuse Netcool, Cisco Works, Cisco Netsys, etc.
Major network operators typically expect the following abilities from performance management tools: the ability to:
• view network traffic demand history and trends;
• analyse network traffic statistics in a variety of ways;

404

Managing the network

• determine quickly areas of the network in congestion (where new nodes, new or upgraded
trunk circuits need to be added); and
• report and manage the network quality achieved and compare this with the contracted
service level agreements (SLAs) regarding network performance and availability with
individual end users and customers
Specialised analysis tools are used widely to analyse network performance problems. Typically
such problems are reported by users rather vaguely as ‘slow application response times’. They
do not necessarily result from specific network failures and SNMP alarms, and tend to exhibit
symptoms of ‘malaise’ rather than of identifiable ‘illness’. They can be hard (but important)
to trace, and in consequence a number of different manufacturers offer tools, variously called
probes, sniffers and such like. Such devices typically aim to help the network operator identify
and measure:
• overall statistics of usage;
• network usage of Top Talkers (i.e. the main sources of network traffic);
• average transaction delay;
• link utilisation, including peak packet rates and packet sizes; and
• software application activity.
We shall return to the subject of network performance optimisation in detail in Chapter 14.

Accounting tools
In comparison to the range of accounting and billing tools available for charging of telephone
network usage, the range of tools available for the accounting and billing of IP data network
usage is rather sparse, and somewhat primitive. This mostly reflects the Internet ‘culture’ of
the network being a ‘good thing’ — something for which users should pool and share their
resources without charging one another: ‘you can use my network if I can use yours.’
Internet service providers (ISPs) still apply charges largely based on the telephone network
usage — a monthly subscription service (to cover the Internet network access) and ‘per-minute’
charges (to cover the telephone network costs used for dial-in) — rather than based on the
volume or the value of the data transmitted. But the situation is bound to change, as the major
Internet service providers and backbone network providers look for more ways of generating
revenue from their networks. We can expect to see the introduction of new network tariffing
models as new types of services (for example, voice-over-IP, VOIP) and different grades of
service are offered by the providers.
An IP network operator has a number of questions to consider in deciding how to tariff his
services and how to collect and process the necessary accounting records for billing (if billing
is to be network usage-based):
• Which usage should be billed? — e.g. connected minutes, number of transported bytes or
Megabytes, the number of simultaneous connection established, the bit rate at which data
is carried;
• Can I handle the volumes of accounting data records which my chosen method of usage
charging will generate? (e.g. If you tried to count individual bits, the counter would rapidly
increment, and there might be a danger of ‘over-running’ the counter. Alternatively, the