URL:
|
https://conferences.sigcomm.org/sigcomm/2002/papers/bgpmisconfig.pdf
|
Entry Date:
|
2002-5-30
|
Abstract:
|
It is well-known that simple, accidental BGP configuration errors
can disrupt Internet connectivity. Yet little is known about the frequency
of misconfiguration or its causes, except for the few spectacular
incidents of widespread outages. In this paper, we present
the first quantitative study of BGP misconfiguration. Over a three
week period, we analyzed routing table advertisements from 23
vantage points across the Internet backbone to detect incidents of
misconfiguration. For each incident we polled the ISP operators
involved to verify whether it was a misconfiguration, and to learn
the cause of the incident. We also actively probed the Internet to
determine the impact of misconfiguration on connectivity.
Surprisingly, we find that configuration errors are pervasive, with
200-1200 prefixes (0.2-1.0% of the BGP table size) suffering from
misconfiguration each day. Close to 3 in 4 of all new prefix advertisements
were results of misconfiguration. Fortunately, the connectivity
seen by end users is surprisingly robust to misconfigurations.
While misconfigurations can substantially increase the update
load on routers, only one in twenty five affects connectivity.
While the causes of misconfiguration are diverse, we argue that
most could be prevented through better router design.
|
Results:
|
-
Presents the first systematic study of BGP configuration
errors that propagate across the backbone of the Internet.
Focuses on two kinds of globally visible misconfigurations:
-
Origin misconfiguration: the accidental insertion of routes into the global
BGP tables.
The following kinds of origin misconfiguration are detected:
-
Self-deaggregation: an origin deaggregates one of its
prefixes
-
Related origin: an existing prefix (or subset) is
advertised by a new but related origin (one of the origins appears
in the AS path of the other)
-
Foreign origin: an existing prefix (or subset) is
advertised by a new and unrelated origin
-
Export misconfiguration: the accidental propagation of routes that
should
have been filtered. These are detected using Gao's algorithm for
discovering peering relationships (see References). AS paths with
short-lived subpaths that violate the valley-free condition or contain
multiple peering edges are detected as probable misconfigurations.
-
Presents heuristics to find misconfigurations in the stream of BGP updates
obtained from e.g. Routeviews.
-
Finds that 200-1200 prefixes, (0.2-1% of the global table size), suffer from
misconfiguration each day, i.e. about 3 in 4 of new route announcements per
day are the result of misconfiguration.
These results are likely a (significant) underestimate of the actual level of
misconfiguration, since only the following misconfigurations are considered:
-
lasting less than a day
-
of certain types
-
observable through Routeviews
-
Also analyzes the impact of misconfigurations on Internet connectivity by
actively probing paths that are suspected faulty. Connectivity is robust to
most misconfigurations, affected in only 4% of the misconfigured
announcements or 13% of the misconfiguration incidents. However, routing
load due to misconfigurations was more than 10% of the total update load for
2% of the time. On at least one occasion it exceeded 60% of the total update
load (with 15 minute averaging).
-
To validate the results and compile a list of causes of misconfiguration an
email survey among operators involved in incidents is used. The causes are
diverse, and not limited to human slips. The most serious causes (in terms
of the number of prefixes affected) of misconfigurations are:
-
Configuration features such as redistribution.
-
Initialisation bugs. While a router is rebooting or filters are being
updated a router may leak more specific prefixes pending filters taking
effect. One of the reasons behind this appears to be a bug in the software
of a major router vendor.
-
Reliance on upstream filtering.
-
Argue for changes in router and protocol design that would eliminate or
reduce the likelihood of observered errors or minimise their impact:
-
high-level policy specification as part and parcel of routers
-
automated verification of configuration
-
transactional semantics for configuration commands
|
Datasets:
|
-
Analyzes BGP updates from RouteViews using 23
different vantage points in 19 different ASes over a period
of 21 days (from 26 Dec 2001 to 15 Jan 2002). Changes that last less than a
day are used.
-
Email survey among operators involved in incidents to verify and determine
the cause of incidents. Email addresses were obtained from Internet routing
registries. A large portion of the addresses proved invalid for various
reasons.
-
Active probing of the Internet to determine the impact of misconfigurations
on connectivity. Public traceroute
servers were used. Normally responsive IP addresses (obtained
from Skitter) were probed during apparent misconfiguration.
|
References:
|
- Complements:
- J. Cowie, A. Ogielski, B. Premore, and Y. Yuan. Global
Routing Instabilities during Code Red II and Nimda Worm
Propagation.
http://www.renesys.com/projects/bgp_instability.
- T. Griffin and G. T. Wilfong. An Analysis of BGP Convergence
Properties. In ACM SIGCOMM, pages 277288, Aug. 1999.
- C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet
Routing Convergence. In ACM SIGCOMM, Sep. 2000.
- C. Labovitz, G. R. Malan, and F. Jahanian. Origins of Internet Routing
Instability. In IEEE INFOCOM, June 1999.
- K. Varadhan, R. Govindan, and D. Estrin. Persistent Route
Oscillations in Inter-Domain Routing. Computer Networks, 32(1),
1999.
- Expands on:
- C. Labovitz, A. Ahuja, and F. Jahanian. Experimental Study of Internet
Stability and Wide-Area Network Failures. In Fault-Tolerant Computing
Symposium (FTCS), June 1999.
- Explains / categorises:
- Uses algorithms of:
- L. Gao. On Inferring Autonomous System Relationships in the Internet.
In IEEE Global Internet Symposium, Nov. 2000.
|