Describing High Availability in Multilayer Switching
Benefits and drawbacks of device-level fault tolerance

One approach to building highly available networks is to replicate all devices to create a fault-tolerant network. To achieve high end-to-end availability, each key network infrastructure device exists in duplicate. Fault tolerance through device replication offers these benefits:

  • Minimizes time periods during which the system is non-responsive to requests (for example, while the system is being reconfigured because of a component failure or recovery)
  • Eliminates all single points of failure that would cause the system to stop
  • Provides disaster protection by allowing the major system components to be separated geographically

Trying to achieve high network availability solely through device-level fault tolerance has a number of drawbacks.

  • Massive redundancy within each device adds significantly to its cost. Massive redundancy also reduces physical capacity of each device by consuming slots that could otherwise house network interfaces or provide useful network services.
  • Redundant subsystems within devices are often maintained in a hot-standby mode. In hot standby mode, such redundant subsystems cannot contribute additional performance because they are only fully activated when the primary component fails.
  • Focusing on device-level hardware reliability may result in a number of other failure mechanisms being overlooked. Network elements are not standalone devices; they are components of a network system whose internal operations and system-level interactions are governed by software and configuration parameters.