Troubleshooting Spanning Tree
STP problems

STP problems are most often evidenced by the existence of a bridge loop. Troubleshooting STP involves the identification and prevention of such loops.

The primary function of the spanning-tree algorithm (STA) is to remove loops created by redundant links in bridged networks. The STP operates at Layer 2 of the OSI mode, exchanging BPDUs between bridges, and selecting the ports that will eventually forward or block traffic. If BPDUs are not being sent or received over a link between switches, the role of the protocol in preventing loops may fail. Troubleshooting the resulting problems can be difficult in a complex network.

Any condition that prevents BPDUs from being sent or received can result in a bridge loop.

Here is an explanation of how those conditions may occur.

Duplex Mismatch
Duplex mismatch on a point-to-point link is a common configuration error and can have specific implications for STP. The results of the mismatch will vary some by platform.

There are two common mismatch scenarios between switches and their resulting STP problems:

  • Switch configured for full duplex connected to a host in autonegotiation mode – The rule of autonegotiation is that upon negotiation failure, a port is required to assume half-duplex operation. This creates a situation where there is either no connectivity, or inconsistent connectivity between the two devices as one side of the connection defaults to half duplex mode and the other side is set to full duplex operation. In many cases this condition will allow traffic to flow at low data rates, but as the traffic level increases on the link, the half duplex side of the link will be overwhelmed causing data and link integrity errors. As the error rate goes up BPDUs may not successfully negotiate the link.
  • Switch configured for half duplex on a link, the peer switch is configured for full duplex – In the example , the duplex mismatch on the link between bridge A and bridge B can lead to a bridge loop. Because B is configured for full duplex, it does not perform carrier sense when accessing the link. B will then start sending frames even if A is already using the link. This is a problem for A, which detects a collision and runs the backoff algorithm before attempting another transmission of its frame. The result is that frames, including BPDUs, sent by A may be deferred or collide and eventually be dropped. Because it does not receive BPDUs from A, bridge B may loose its connection to the root. This will cause B to unblock its port to bridge C, thereby creating the loop.

To mitigate transmission type mismatches, the best practice is to establish a standard within the organization regarding how each interface is configured prior to attaching it to the network. It is not always possible to disable autonegotiation on all attached client devices, but where possible, network infrastructure devices and servers should have matching transmission type settings and no switch ports should be set to autonegotiate. This will make it easier to troubleshoot these types of issues.

Unidirectional Link Failure
A unidirectional link is one that stays up while providing only one-way communication. Unidirectional links cause specific STP problems. In the example , the link between bridge A and bridge B is unidirectional and drops traffic from A to B while transmitting traffic from B to A. Suppose the port on bridge B was blocking. A port will block only if it receives BPDUs from a bridge with a higher BID. In this case, all the BPDUs coming from bridge A are lost so bridge B will never see the BPDU with the higher BID. B will unblock the port and eventually forward traffic, potentially creating a loop when other switches are in the scenario. If the unidirectional failure exists at startup, the STP will not converge correctly.

Frame Corruption
Frame corruption can occur from duplex mismatch, bad cables, or incorrect cable length and lead to an STP failure. If a link is receiving a high number of frame errors, BPDUs can be lost. This may cause a port in blocking state to transition to forwarding. In 802.1D, if a blocking port does not see any BPDUs for 50 seconds, it would transition to the forwarding state. If a single BPDU was successfully transmitted it would break the loop. This problem would be most likely if STP timing parameters, such as the max_age value setting, had been adjusted too low.

Resource Errors
STP is implemented in software. This means that if the CPU of the bridge is over utilized, the switch may lack the resources to send out or to receive BPDUs in a timely manner. Lack of BPDUs can cause ports to transition from blocking to forwarding when they should not transition. This can result in loops forming in the network.

The STA, however, is not processor-intensive and has priority over other processes. Therefore, a CPU utilization problem is unlikely on current Catalyst switch platforms.

PortFast Configuration Error
PortFast is a feature that is intended for configuration on a port connected to a single host. When the link comes up on such a port, the first stages of the STA are skipped and the port directly transitions to the forwarding state. If a switch is inadvertently attached to a PortFast port, a loop may occur or this rogue switch may be elected as the STP root bridge. Furthermore, if a hub is attached to a PortFast port with redundant connections to the switch then a loop will be introduced that will go unchecked by STP.

In the example , A is a bridge with port P1 forwarding and port P2 configured for PortFast. B is a hub. As soon as the second cable is plugged into A, port P2 goes to the forwarding state and creates a loop between P1 and P2 given that both ports are in forwarding state. As soon as P1 or P2 receives a BPDU, one of these two ports will transition to a blocking state. The traffic generated by this kind of loop may occur at such a high rate that the bridge may have trouble successfully sending the BPDU to stop the loop. Implementing BPDU guard will prevent this problem.

EtherChannel Issues
The challenges for EtherChannel can be divided into two main areas: Troubleshooting during the configuration phase, and troubleshooting during the execution phase. Configuration errors usually occur because of mismatched parameters on the ports involved (different speeds, different duplex, different spanning tree port values, mismatched native VLAN settings, etc.). But you can also generate errors during the configuration by setting the channel on one side to on and waiting too long before configuring the channel on the other side. This causes temporary spanning tree loops, which generate an error, and shut down the port.

Depending on the version of operating system and platform being configured for EtherChannel, the ports on one side of the link may remain in disabled state even after the configuration issue has been resolved. Be sure to verify that both sides of the link are operational after changing any port parameters on an EtherChannel link.