Monday, October 1, 2007

OSI model and 'Network troubleshooting and diagnostics'

sponsored by www.careerbd.net
One commonly used framework for troubleshooting that helps structure your response to a known network problem is the International Standards Organisation (ISO) Open Systems Interconnect (OSI) model. If you've worked with networking devices for any period of time, you are likely already familiar with OSI. It's the framework that encapsulates much of modern networking, and most network protocols live somewhere within its seven layers. Where you may not have used it before is as a troubleshooting guide for triaging an unknown problem on the network.



Figure 1: The OSI model is an excellent mental framework to assist the troubleshooter with identifying network problems.

Without going into too much detail on the history and use of model, let's take a look at how you can extend the OSI model into a framework for problem isolation. Figure 4.1 shows the seven layers in the OSI model and some issues that typically occur related to each layer. Let's discuss each of the layers in-turn from the bottom-up:

At the Physical layer, problems typically involve some break in the physical connectivity that makes up the network. Broken network connections, cabling and connector issues, and hardware problems that inhibit the movement of electricity from device to device typically indicate a problem at this layer.

At the Data Link layer, we move away from purely electrical problems and into the configuration of the interface itself. Data Link problems often have to do with Address Resolution Protocol (ARP) problems in relating IP addresses to Media Access Control (MAC) addresses. These can be caused by speed and duplex mismatching between network devices or excessive hardware errors for the interface. An incorrectly configured interface within the device operating system (OS) or interference for wireless connections can also cause problems at the Data Link layer.

At the Network layer, we begin experiencing problems with network traversal. Network layer problems typically occur when network packets cannot make their way from source to destination. This may have something to do with incorrect IP addressing or duplicate IP addresses on the network. Problems with routing data or ICMP packets across the network or protocol errors can also cause problems here. In extreme cases, an external attack can also spike error levels on network devices and cause problems identified at the Network Layer.

At the Transport layer, we isolate problems that typically occur with TCP or UDP packets in Ethernet networks. These may have to do with excessive retransmission errors or packet fragmentation. Either of these problems can cause network performance to suffer or drop completely. Problems at this layer can be difficult to track down because unlike the lower layers they often don't involve a complete loss of connectivity. Additionally, Transport layer problems can often involve the blocking of traffic at the individual IP port layer. If you've ever been able to ping a server but cannot connect via a known port, this can be a Transport layer problem.

The Session, Presentation, and Application layers are often lumped together because more recent interpretations of the OSI model tend to grey the lines between these three layers. The troubleshooting process for these three layers involves problems that have to do with applications that rely on the network.

These applications could involve DNS, NetBIOS, or other resolution, application issues on residing OSs, or high-level protocol failures or misconfigurations. Examples of these high-level protocols are HTTP, SMTP, FTP, and other protocols that typically "use the network" rather than "run the network." Additionally, specialised external attacks such as "man-in-the-middle" attacks can occur at these levels.

Network problems can and do occur at any level in the model. And because the model is so highly understood by network administrators, it immediately becomes a good measuring stick to assist with communicating those problems between triaging administrators. If you've ever worked with another administrator who uses language like, "This looks like a Layer 4 problem," you can immediately understand the general area (the Transport layer) in which the problem may be occurring.

You'll hear seasoned network administrators often refer to problems by their layer number. For example, when you hear "that's at layer 3," it can mean an IP connectivity problem. Layer 4 can reveal the problem is due to a network port closure. Network administrators jokingly refer to problems that occur with a system and not part of their network as those "at layer 7."
Let's talk about three different ways in which you can progress through this model during a typical problem isolation activity.

Three different approaches

Network administrators who use OSI as a troubleshooting framework typically navigate the model in one of three ways: Bottom-Up, Top-Down, and Divide-and-Conquer. Depending on how the problem manifests and their experience level, they may choose one method over another for that particular problem. Each of these approaches has its utility based on the type of problem that is occurring. Let's look at each.

Bottom-Up

The Bottom-Up approach simply means that administrators start at the bottom of the OSI model and work their way up through the various levels as they strike off potential root causes that are not causing the problem. An administrator using the Bottom-Up approach will typically start by looking at the physical layer issues, determine whether a break in network connectivity has occurred, and then work up through network interface configurations and error rates, and continue through IP and TCP/UDP errors such as routing, fragmentation, and blocked ports before looking at the individual applications experiencing the problem.

This approach works best in situations in which the network is fully down or experiencing numerous low-level errors. It also works best when the problem is particularly complex. In complex problems, the faulting application often does not provide enough debugging data to the administrator to give insight as to the problem. Thus, a network-focused approach works best.

Top-Down

The Top-Down approach is the reverse of the Bottom-Up approach in that the administrator starts at the top of the OSI model first, looking at the faulted application and attempting to track down why that application is faulted. This model works best when the network is in a known-good state and a new application or application reconfiguration is being completed on the network. The administrator can start by ensuring the application is properly configured, then work downward to ensure that full IP connectivity and appropriate ports are open for proper functionality of the application. Once all upper-level issues are resolved, a back-check on the network can be done to validate its proper functionality. As said earlier, this approach is typically used when the network itself is believed to be functioning correctly but a new network application is being introduced or an existing one is being reconfigured or repurposed.

Divide-and-Conquer

The Divide-and-Conquer approach is a fancy name for the "gut feelling" approach. It is typically used by seasoned administrators who have a good internal understanding of the network and the problems it can face. The Divide-and-Conquer approach involves an innate feelling for where the problem may occur, starting with that layer of the OSI model first, and working out from that location. This approach can also be used for trivial issues that the administrator has seen before.

However, this approach has the downfall of often being non-scientific enough to properly diagnose a difficult problem. If the problem is complex in nature, the Divide-and-Conquer approach may not be structured enough to track down the issue.

Figure 4.: Depending on the type of problem, a Bottom-Up, Top-Down, or Divide-and-Conquer approach may be best for isolating the root cause of the problem.


No matter which approach you use, until you begin to develop that "gut instinct" for your network and its unique characteristics, you should consider a structured method for your troubleshooting technique. Although utilising a structured method can increase the time needed to resolve the problem, it will track down the problem without missing key items that drive resolution "band-aiding."

No comments:

Post a Comment