One approach to building highly available networks is to replicate all
devices to create a fault-tolerant network. To achieve high end-to-end
availability, each key network infrastructure device exists in duplicate. Fault
tolerance through device replication offers these benefits:
- Minimizes time periods during which the system is non-responsive to
requests (for example, while the system is being reconfigured because of a
component failure or recovery)
- Eliminates all single points of failure that would cause the system to
stop
- Provides disaster protection by allowing the major system components to be
separated geographically
Trying to achieve high network availability solely through device-level
fault tolerance has a number of drawbacks.
- Massive redundancy within each device adds significantly to its cost.
Massive redundancy also reduces physical capacity of each device by consuming
slots that could otherwise house network interfaces or provide useful network
services.
- Redundant subsystems within devices are often maintained in a hot-standby
mode. In hot standby mode, such redundant subsystems cannot contribute
additional performance because they are only fully activated when the primary
component fails.
- Focusing on device-level hardware reliability may result in a number of
other failure mechanisms being overlooked. Network elements are not standalone
devices; they are components of a network system whose internal operations and
system-level interactions are governed by software and configuration
parameters.