Chapter 4. Determining Network Topology

Table of Contents
4.1. Traceroute approach
4.2. SNMP approach
4.3. Summary

One of the problems that commonly occurs when networks are allowed to develop in an ad-hoc manner is that no network topology maps are produced, or those that exist are not updated. The lack of accurate topology maps makes pinpointing network faults more difficult, since it is difficult to understand which parts of the network relate to which other parts.

Many network monitoring applications will attempt to automatically determine the layout of one's network. They often achieve this quite successfully in a homogeneous network environment, but fail to do so in a hetrogeneous environment. This is because these products often use proprietary protocols such as the Cisco Discovery Protocol that was discussed in Section 2.2.5.

What is needed is a method to discover the topology of the local area network, irrespective of the layout or vendor of the various network devices. This is in essence what the various Internet mapping projects have attempted to achieve, only on a much larger scale.

4.1. Traceroute approach

With this in mind, the approach taken by one of these projects was investigated. This project used traceroute(8)-style probes to determine the path between any two hosts on the Internet [Cheswick, 1999]. This works much like the medical idea of tomography, that is the system attempts to build up a picture of the network by looking in from the edge of the network. This idea works very well on the large scale that the Internet mapping projects are examining, but, as far as the author can tell, had never been applied to networks of the local area network scale.

This approach looked very promising and an application was written that would record the network path to each of the 65536 hosts in Rhodes' class B network. The network path consists of a number of "hops", or devices that traffic must pass through in order to reach its destination. In any path, the last hop is the destination itself. Since this application was trying to extract routing information (the hops between the source and destination of a packet) rather than the location of each host on the network, the last hop of every trace was discarded. This left us with a record of only the hops between each host and the machine on which the application was running.

The application was run mid-week during the middle of the morning in the hope of capturing data on the majority of the computers on Rhodes' network (at this stage, there were no statistics on usage patterns). The information that was obtained from a run of the application was normalised to remove duplication before being written as a data file for a graphing utility.

The graphing application chosen to visualise the results of this test was AT&T Research Labs' GraphViz. This program is a sophisticated tool for laying out and drawing both directed and undirected graphs. Its greatest feature is its ability to make very readable graphs from fairly complex data sets [Fowler, 2000]. Graphs are described in a language called "dot" and are then rendered by the application into a number of different vector and bitmap image formats.

Figure 4-1 shows an example graph created by this system. This graph shows the layout of Rhodes' network at layer three of the OSI reference model.

Figure 4-1. Topological map of Rhodes University's network

Some assumptions were made about the way networks work in order to draw a graph of the results. If a trace to a host did not contain at least one hop, the destination machine is on the same subnet as the machine running the trace application, and it is assumed to have the same default gateway as the machine running the application. While it is possible for two machines on the same subnet to use different gateways, this is not common practice.

It was also assumed that the forward and return routes of all packets are the same. This assumption allows one to derive the route between any two hosts based on the routes from a third machine to each of the machines in question by discarding all parts of the route that are common to both machines. While the assumption is generally true on a local area network, it is certainly not so on the Internet. However, without this assumption, it would not be possible to map the layout of a network in this manner. For this reason, this assumption is used by all Internet mapping projects [Cheswick, 1999].

The results of this attempt are shown in Figure 4-1. There are two interesting features about Rhodes' network that this graph highlights: Firstly, the graph is a lot smaller and flatter than was originally expected and secondly, there are two loops formed around nortel8600a.switch.

Rhodes' network is significantly more complicated than Figure 4-1 shows. This complexity, however is at layer two (datalink) of the Open Systems Interconnect (OSI) 7-layer model [Briscoe, 2000]. Tracing a route to devices in the way this application has done only detects devices that affect the TCP time to live (TTL) field — that is, devices at layer three (network) of the OSI network model. There are very few layer three devices at Rhodes, which explains the relative simplicity of the graph that was produced.

The two loops are more interesting. The loop between nortel8600a.switch, nortel8600b.switch, and vlan120.nortel occurs because of the assumption that hosts with no intermediate hops were on the same subnet. This assumption is true except in the case where the IP address being traced to is a different interface on the default gateway itself. Figure 4-2 illustrates this for the case where a machine's default gateway is vlan120.nortel (shown in line 2 for a machine called omniscient). To be the default gateway, vlan120.nortel must necessarily be on the same subnet as omniscient. This means there will only be one hop from omniscient to vlan120.nortel, as is shown in lines 3–5.

While both omniscient and vlan120.nortel are on the 146.231.120.0/21 subnet, nortel8600b.switch is not — it is on the the 146.231.128.0/21 subnet. There is, however, only one hop between omniscient and nortel8600b.switch (shown in lines 6–8). This violates the afore-mentioned assumption, and occurs because nortel8600b.switch is an interface on the same layer three switch as vlan120.nortel. In this case, the switch transparently routes packets between its interfaces, causing the apparent loop.

Figure 4-2. Trace showing the default gateway assumption

    1  guy@omniscient:~% netstat -r | grep default
    2  default            vlan120.nortel    UGSc       12 100257463   fxp0
    3  guy@omniscient:~% traceroute vlan120.nortel
    4  traceroute to vlan120.nortel (146.231.120.1), 64 hops max, 40 byte packets
    5   1  vlan120.nortel.ru.ac.za (146.231.120.1)  0.983 ms  0.969 ms  0.874 ms
    6  guy@omniscient:~% traceroute nortel8600b.switch
    7  traceroute to nortel8600b.switch (146.231.128.210), 64 hops max, 40 byte packets
    8   1  nortel8600b.switch.ru.ac.za (146.231.128.210)  1.080 ms  0.926 ms  0.877 ms
    9  guy@omniscient:~%

The second loop between nortel8600a.switch, vlan1.cisco, and rucs03-e1.cisco is caused by the assumption that the forward and return routes are the same. Figure 4-3 shows how this assumption breaks down.

Lines 2–5 of this figure show that vlan1.cisco knows that it should use its default route (0.0.0.0/0) to forward all traffic destined for the Internet to rucs03-e1.cisco (146.231.128.206). In the same way, nortel8600a.switch also forwards all outgoing traffic bound for the Internet to rucs03-e1.cisco, as is shown by line 13.

rucs03-01.cisco, on the other hand, forwards all incoming traffic from the Internet to nortel8600a.switch (146.231.128.205) irrespective of its final destination within the University's network (146.231.0.0/16). This is shown by lines 16–19 of Figure 4-3. In this case, nortel8600a.switch is tasked with the correct forwarding of that traffic to other devices, including vlan1.cisco.

Figure 4-3. Routing table entries showing the return route assumption

    ##  vlan1.cisco.ru.ac.za has address 146.231.135.22
    1   vlan1.cisco>show ip route 0.0.0.0
    2   Routing entry for 0.0.0.0/0, supernet
    3     Known via "static", distance 1, metric 0, candidate default path
    4     Routing Descriptor Blocks:
    5     * 146.231.128.206
    6         Route metric is 0, traffic share count is 1
   
    ##  nortel8600a.switch.ru.ac.za has address 146.231.128.205
    7   NORTEL8600A:5# show ip route info 0.0.0.0
    8   =================================================================
    9                            Ip Route
    10  =================================================================
    11      DST     MASK              NEXT COST VLAN  PORT  PROTO  AGE
    12  -----------------------------------------------------------------
    13  0.0.0.0  0.0.0.0  146.231.128.206     1    1   -/- STATIC    0
    14  1 out of 28 Total Num of routes displayed.
   
    ##  rucs03-e1.cisco.ru.ac.za has address 146.231.128.206
    15  rucs03-e1.cisco>show ip route 146.231.0.0
    16    Routing entry for 146.231.0.0/16
    17    Known via "static", distance 1, metric 0
    18    Routing Descriptor Blocks:
    19    * 146.231.128.205
    20       Route metric is 0, traffic share count is 1
   

In other words, traffic from a device connected to vlan1.cisco to the Internet takes the most direct route it can, whereas traffic from the Internet to the same device passes through an extra hop, namely nortel8600a.switch. This means that the loop shown on the Figure 4-1 does in fact exist, but in practice it is uni-directional rather than bi-directional as indicated on the graph. The routing configuration that caused this anomaly was a temporary optimisation while systems were being migrated from vlan1.cisco to nortel8600a.switch. vlan1.cisco has subsequently been decommissioned and this loop has been removed.