STP problems are most often evidenced by the existence of a bridge
loop. Troubleshooting STP involves the identification and prevention of such
loops.
The primary function of the spanning-tree algorithm (STA) is to
remove loops created by redundant links in bridged networks. The STP operates
at Layer 2 of the OSI mode, exchanging BPDUs between bridges, and selecting the
ports that will eventually forward or block traffic. If BPDUs are not being
sent or received over a link between switches, the role of the protocol in
preventing loops may fail. Troubleshooting the resulting problems can be
difficult in a complex network.

Any
condition that prevents BPDUs from being sent or received can result in a
bridge loop.
Here is an explanation of how those conditions may
occur.
Duplex Mismatch
Duplex mismatch on a point-to-point
link is a common configuration error and can have specific implications for
STP. The results of the mismatch will vary some by platform.
There are
two common mismatch scenarios between switches and their resulting STP
problems:
-
Switch configured for full duplex connected to a host in autonegotiation
mode – The rule of autonegotiation is that upon negotiation failure, a port
is required to assume half-duplex operation. This creates a situation where
there is either no connectivity, or inconsistent connectivity between the two
devices as one side of the connection defaults to half duplex mode and the
other side is set to full duplex operation. In many cases this condition will
allow traffic to flow at low data rates, but as the traffic level increases on
the link, the half duplex side of the link will be overwhelmed causing data and
link integrity errors. As the error rate goes up BPDUs may not successfully
negotiate the link.
-
Switch configured for half duplex on a link, the peer switch is
configured for full duplex – In the example
, the
duplex mismatch on the link between bridge A and bridge B can lead to a bridge
loop. Because B is configured for full duplex, it does not perform carrier
sense when accessing the link. B will then start sending frames even if A is
already using the link. This is a problem for A, which detects a collision and
runs the backoff algorithm before attempting another transmission of its frame.
The result is that frames, including BPDUs, sent by A may be deferred or
collide and eventually be dropped. Because it does not receive BPDUs from A,
bridge B may loose its connection to the root. This will cause B to unblock its
port to bridge C, thereby creating the loop.
To mitigate transmission type mismatches, the best practice is to
establish a standard within the organization regarding how each interface is
configured prior to attaching it to the network. It is not always possible to
disable autonegotiation on all attached client devices, but where possible,
network infrastructure devices and servers should have matching transmission
type settings and no switch ports should be set to autonegotiate. This will
make it easier to troubleshoot these types of issues.
Unidirectional
Link Failure
A unidirectional link is one that stays up while providing
only one-way communication. Unidirectional links cause specific STP problems.
In the example
, the link
between bridge A and bridge B is unidirectional and drops traffic from A to B
while transmitting traffic from B to A. Suppose the port on bridge B was
blocking. A port will block only if it receives BPDUs from a bridge with a
higher BID. In this case, all the BPDUs coming from bridge A are lost so bridge
B will never see the BPDU with the higher BID. B will unblock the port and
eventually forward traffic, potentially creating a loop when other switches are
in the scenario. If the unidirectional failure exists at startup, the STP will
not converge correctly.
Frame Corruption
Frame corruption can
occur from duplex mismatch, bad cables, or incorrect cable length and lead to
an STP failure. If a link is receiving a high number of frame errors, BPDUs can
be lost. This may cause a port in blocking state to transition to forwarding.
In 802.1D, if a blocking port does not see any BPDUs for 50 seconds, it would
transition to the forwarding state. If a single BPDU was successfully
transmitted it would break the loop. This problem would be most likely if STP
timing parameters, such as the max_age value setting, had been adjusted too
low.
Resource Errors
STP is implemented in software. This
means that if the CPU of the bridge is over utilized, the switch may lack the
resources to send out or to receive BPDUs in a timely manner. Lack of BPDUs can
cause ports to transition from blocking to forwarding when they should not
transition. This can result in loops forming in the network.
The STA,
however, is not processor-intensive and has priority over other processes.
Therefore, a CPU utilization problem is unlikely on current Catalyst switch
platforms.
PortFast Configuration Error
PortFast is a feature
that is intended for configuration on a port connected to a single host. When
the link comes up on such a port, the first stages of the STA are skipped and
the port directly transitions to the forwarding state. If a switch is
inadvertently attached to a PortFast port, a loop may occur or this rogue
switch may be elected as the STP root bridge. Furthermore, if a hub is attached
to a PortFast port with redundant connections to the switch then a loop will be
introduced that will go unchecked by STP.
In the example
, A is a bridge
with port P1 forwarding and port P2 configured for PortFast. B is a hub. As
soon as the second cable is plugged into A, port P2 goes to the forwarding
state and creates a loop between P1 and P2 given that both ports are in
forwarding state. As soon as P1 or P2 receives a BPDU, one of these two ports
will transition to a blocking state. The traffic generated by this kind of loop
may occur at such a high rate that the bridge may have trouble successfully
sending the BPDU to stop the loop. Implementing BPDU guard will prevent this
problem.
EtherChannel Issues
The challenges for EtherChannel
can be divided into two main areas: Troubleshooting during the configuration
phase, and troubleshooting during the execution phase. Configuration errors
usually occur because of mismatched parameters on the ports involved (different
speeds, different duplex, different spanning tree port values, mismatched
native VLAN settings, etc.). But you can also generate errors during the
configuration by setting the channel on one side to on and waiting too long
before configuring the channel on the other side. This causes temporary
spanning tree loops, which generate an error, and shut down the port.
Depending on the version of operating system and platform being configured
for EtherChannel, the ports on one side of the link may remain in disabled
state even after the configuration issue has been resolved. Be sure to verify
that both sides of the link are operational after changing any port parameters
on an EtherChannel link.