2.2. Traditional approaches

2.2.1. SNMP

One of the most widely used approach to network monitoring and management is the Simple Network Management Protocol (SNMP). This protocol was originally formulated in 1988 through RFC 1067. Since then it has undergone many changes and is currently in version three of the protocol (as defined by RFC 1157). SNMP has two distinct parts, a format for describing data about computers and a protocol for transmitting that data over a network.

The format of SNMP data takes the form of a tree structure, with each node in the tree being identified by a numerical Object IDentifier (OID). For example, .1.3.6.1.2.1.1.1.0 contains a string that describes the system being queried. The idea of OIDs significantly predates the introduction of SNMP, and was standardised by the ISO. The OIDs used by SNMP are a subset of this ISO standard, which means that all SNMP OIDs are prefixed by .1.3.6.1.2.

Like Internet Protocol addresses, these OIDs are difficult to remember. For this reason, a method of assigning more human-readable names to these OIDs has been defined. This method uses a Management Information Base (MIB) to convert the numeric OIDs into strings, much like DNS domain names. Using this system, the example OID given above would become .iso.org.dod.internet.mgmt.mib-2.system.sysDescr.0. To a large extent, the information in these MIBs is standardised — but not always in a useful way as shall be seen in Section 2.3. The MIB used by SNMP is known as MIB-II and is defined in RFC 1213.

The SNMP network protocol itself is fairly simple; it provides several operations to allow a monitoring station to read and write data to a remote host. Operations to read data include get-request (to retrieve a specific OID), get-next-request (to retrieve the next OID in sequence), and get-bulk-request (to retrieve a block of related OIDs). Data can be written using a set-request. In addition, SNMP defines a way for the remote host to notify a monitoring station of an extraordinary event that has occurred (OSI fault management). These events are sent as a trap, and are a one-way communication between the two hosts.

2.2.2. RMON

RMON is a powerful tool for the remote monitoring of Ethernet networks. At its heart is a standard SNMP MIB, defined by RFC 2819. RMON agents on remote network devices gather information about the system they are monitoring and make it available over the network using the RMON MIB. This information can then be collected and processed by remote management stations.

The idea behind RMON is to provide a standard method for monitoring the basic operation of Ethernet networks, and to provide interoperability between SNMP management stations and remote devices. RMON agents on these remote devices provide a powerful alarm and event notification mechanism for alerting the management station of changes in network behaviour. These alerts are sent using the SNMP trap method.

RMON agents have the ability to automatically collect and store historical information in order to provide trend data on such basic statistics as utilisation, collisions, et cetera. Network management tools can retrieve these histories and analyse them in order to understand network usage patterns (OSI performance management).

In total, RMON specifies ten services. Nine of these RMON groups (as they are known) apply to Ethernet and one is specific to token ring networks. Not all RMON agents will implement all the RMON groups, since some of them have extensive processing or memory overheads associated with them. These RMON groups include: general statistics about the network, long term histories, alarm events, host-specific statistics, packet and error counts for each conversation, packet filtering, packet capture, and event logs.

RMON agents have the ability to collect network traffic at the Media Access Control (MAC) level based on a defined packet filter. These packets can be retrieved from the agent and processed by a protocol analyser, providing a way to remotely monitor the traffic on a network segment in much the same way as tcpdump(1) does on the local segment.

2.2.3. Active probes

Both RMON and tools using SNMP fall under the heading of active monitoring; that is, they actively attempt to retrieve information from remote hosts. Dedicated management protocols, however, are not the only form of active network monitoring. Useful information about a network can often be gained from querying remote hosts using normal communication protocols. The reachability of a particular device can be tested using tools like ping(8), for example.

It is often informative to test a particular network service by using that service directly. To check that a web server is functioning correctly, for example, one could connect to the http port (port 80) of the host that runs the web server and use the HyperText Transfer Protocol (HTTP) to request a web page from that server. If the response that the server gives is correct, the web server may be presumed to be functioning normally.

These sorts of active probes often provide the most accurate information about the state of network services, since they directly test the service in the way it gets used.

2.2.4. Passive monitoring

Many network monitoring tools are designed to passively watch network traffic on a particular subnet or passing through a particular gateway. By examining all the packets as they go past, one can often learn a lot about the way a network is running. For example, arpwatch(8) looks for Ethernet Address Resolution Protocol (ARP) "whois" messages on the network and compares the responses to those it has stored in a database. In this way, arpwatch(8) can tell when a new device is plugged into the network or when a device changes its IP address or MAC address. This information is often useful in controlling who uses the network (OSI security management) or for troubleshooting faults such as IP address conflicts (OSI fault management).

Passive network monitoring is often the simplest form of monitoring to implement, since it does not require any cooperation from the monitored hosts. Since it looks directly at the traffic passing over the network, this form of monitoring has a huge potential to provide useful information about the state of network, both past (through the use of logging) and present.

2.2.5. Proprietary protocols

Many vendors of network infrastructure define their own management protocols. In general, these protocols are specific to that manufacturer's products and are often used as a source of competitive advantage.

A good example of this is the Cisco Discovery Protocol (CDP), which is used by Cisco routers and switches to discover their neighbours. Each CDP compatible device listens on a well-known multicast address, and periodically announces itself on that address. The table of known neighbours can be retrieved from a CDP compliant device by any network management product using SNMP [Cisco, 2001]. This capability is used by CiscoWorks (Cisco's commercial network management tool) to manage clusters of switches.

2.2.6. Higher level products

So far fairly low level network monitoring applications and protocols have been examined. Many applications build on these protocols to provide a higher level view of the network. Most often, these programs attempt to represent various aspects of the network in a graphical format.

Perhaps the most popular open-source network monitoring package is Tobias Oetiker's MRTG, the Multi-Router Traffic Grapher. MRTG uses SNMP to gather information about various network devices and displays this information graphically on a web page. Historical information is stored in a Round Robin Database (RRD) and this data is used to provide trend information in the graphs [Oetiker, 1998 ].

MRTG is an example of the most common visual method for representing network information, that is a variable versus time graph. The variable can be anything from router load to the amount of traffic on an interface. Many commercial packages also use this approach as their primary method for displaying information that they have gathered. A good example of this is the Optivity package from Nortel Networks [Nortel, 2002]. This form of monitoring is the performance monitoring mentioned in the OSI model.

Big Brother is a web-based network monitoring tool that is free for non-commercial use [BigBrother, 2002]. It actively probes hosts in an attempt to determine whether everything is running correctly. In its most basic form, Big Brother simply checks on the reachability of devices. It understands a limited subset of network protocols and can be configured to check the status for services such as a web server. Configuration is done through text-based configuration files, with no facilities for automatically detecting the presence of new hosts. It has limited abilities to provide real-time notification through the use of pagers. Big Brother keeps a historical record of status changes which can be viewed through its web interface. Big Brother fulfils the fault management, and to a lesser extent the performance management, role of the OSI model.

This is an example of a common method for representing real-time system status. It provides the network manager with an up-to-date, real time view of the status of important network services. Big Brother uses the colour of the background to indicate the state of the most critical problem on the networks, thus allowing the network manager to tell, at a glance across a room, if the system is functioning correctly. This is a important feature in today's fast paced world.

There are certainly many other high level approaches to the problem of monitoring networks, but these three examples illustrate two of the more widely used methods for representing network information. These approaches are employed later in this work.