In order to become familiar with current techniques of network management and get a feel for the status quo a number of simple problems were investigated. The limitations of current solutions were explored and new solutions evolved in an attempt to get around these limitations. This represents the first steps towards a more complete solution for monitoring networks.
As computer networks become more widely used, situations often develop where those people tasked with the day-to-day running of the network are not sufficiently skilled to correctly diagnose problems when they occur. Often these networks are remotely managed in such a way that the people on-site rarely need to intervene.
Unfortunately, when intervention is necessary, it is usually because the network is inaccessible to those tasked with managing it remotely. These remote mangers need to be able to make an accurate diagnosis of a problem they cannot see, relying solely on information that is provided to them by the site. However, it is often difficult and time-consuming to get an accurate problem report from someone who does not understand why the problem is occurring.
To make the diagnosis of remote network faults easier, a number of simple tools can be provided. These tools should attempt to determine the cause of the problem as accurately as possible (Chapter 7 discusses this in detail) and provide a list of possible causes and solutions. In cases where solutions are beyond the capabilities of those people who are on-site, the tools should provide sufficient information for diagnosis to be made, for example, telephonically.
A fault management system meeting these criteria was developed for a local high school. This system consists of a single web page which is served off their local server. The page sequentially tests the reachability of every host and router that the school depends on for their upstream Internet connectivity.
Results are presented in a large table in which each row corresponds to a test. Those systems that are functioning correctly are indicated by large blocks of green, while those that fail the reachability test are indicated by large blocks of red. The tests are ordered in such a way that the first red block on the page is the most likely cause of the error.
Beside each block is the name of the host or router that is being tested, a description of the function of that host in the provisioning of connectivity, and a detailed description of the action to be taken if the test fails. Where a test is made on a host that the school has no direct control over, for example, an upstream proxy server, information is given on how to go about reporting the fault.
The information on this web application is presented in such a way that it is easy for someone to read it over a telephone, enabling the remote administrator to virtually perform the same tests that would normally be required to diagnose the fault. Thus, this tool allows the layman to diagnose and rectify simple faults on a network, and to provide a remote network administrator with sufficient information to resolve those problems that are beyond those people who are on site.