The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

Bibliography Details

R. Mahajan, D. Wetherall, and T. Anderson, "Understanding BGP Misconfiguration", in ACM SIGCOMM, 2002.

Understanding BGP Misconfiguration
Authors: R. Mahajan
D. Wetherall
T. Anderson
Published: ACM SIGCOMM, 2002
URL: https://conferences.sigcomm.org/sigcomm/2002/papers/bgpmisconfig.pdf
Entry Date: 2002-5-30
Abstract: It is well-known that simple, accidental BGP configuration errors can disrupt Internet connectivity. Yet little is known about the frequency of misconfiguration or its causes, except for the few spectacular incidents of widespread outages. In this paper, we present the first quantitative study of BGP misconfiguration. Over a three week period, we analyzed routing table advertisements from 23 vantage points across the Internet backbone to detect incidents of misconfiguration. For each incident we polled the ISP operators involved to verify whether it was a misconfiguration, and to learn the cause of the incident. We also actively probed the Internet to determine the impact of misconfiguration on connectivity. Surprisingly, we find that configuration errors are pervasive, with 200-1200 prefixes (0.2-1.0% of the BGP table size) suffering from misconfiguration each day. Close to 3 in 4 of all new prefix advertisements were results of misconfiguration. Fortunately, the connectivity seen by end users is surprisingly robust to misconfigurations. While misconfigurations can substantially increase the update load on routers, only one in twenty five affects connectivity. While the causes of misconfiguration are diverse, we argue that most could be prevented through better router design.
Results:
  • Presents the first systematic study of BGP configuration errors that propagate across the backbone of the Internet. Focuses on two kinds of globally visible misconfigurations:
    • Origin misconfiguration: the accidental insertion of routes into the global BGP tables. The following kinds of origin misconfiguration are detected:
      • Self-deaggregation: an origin deaggregates one of its prefixes
      • Related origin: an existing prefix (or subset) is advertised by a new but related origin (one of the origins appears in the AS path of the other)
      • Foreign origin: an existing prefix (or subset) is advertised by a new and unrelated origin
    • Export misconfiguration: the accidental propagation of routes that should have been filtered. These are detected using Gao's algorithm for discovering peering relationships (see References). AS paths with short-lived subpaths that violate the valley-free condition or contain multiple peering edges are detected as probable misconfigurations.
  • Presents heuristics to find misconfigurations in the stream of BGP updates obtained from e.g. Routeviews.
  • Finds that 200-1200 prefixes, (0.2-1% of the global table size), suffer from misconfiguration each day, i.e. about 3 in 4 of new route announcements per day are the result of misconfiguration.
    These results are likely a (significant) underestimate of the actual level of misconfiguration, since only the following misconfigurations are considered:
    • lasting less than a day
    • of certain types
    • observable through Routeviews
  • Also analyzes the impact of misconfigurations on Internet connectivity by actively probing paths that are suspected faulty. Connectivity is robust to most misconfigurations, affected in only 4% of the misconfigured announcements or 13% of the misconfiguration incidents. However, routing load due to misconfigurations was more than 10% of the total update load for 2% of the time. On at least one occasion it exceeded 60% of the total update load (with 15 minute averaging).
  • To validate the results and compile a list of causes of misconfiguration an email survey among operators involved in incidents is used. The causes are diverse, and not limited to human slips. The most serious causes (in terms of the number of prefixes affected) of misconfigurations are:
    • Configuration features such as redistribution.
    • Initialisation bugs. While a router is rebooting or filters are being updated a router may leak more specific prefixes pending filters taking effect. One of the reasons behind this appears to be a bug in the software of a major router vendor.
    • Reliance on upstream filtering.
  • Argue for changes in router and protocol design that would eliminate or reduce the likelihood of observered errors or minimise their impact:
    • high-level policy specification as part and parcel of routers
    • automated verification of configuration
    • transactional semantics for configuration commands
Datasets:
  • Analyzes BGP updates from RouteViews using 23 different vantage points in 19 different ASes over a period of 21 days (from 26 Dec 2001 to 15 Jan 2002). Changes that last less than a day are used.
  • Email survey among operators involved in incidents to verify and determine the cause of incidents. Email addresses were obtained from Internet routing registries. A large portion of the addresses proved invalid for various reasons.
  • Active probing of the Internet to determine the impact of misconfigurations on connectivity. Public traceroute servers were used. Normally responsive IP addresses (obtained from Skitter) were probed during apparent misconfiguration.
References:
  • Complements:
    • J. Cowie, A. Ogielski, B. Premore, and Y. Yuan. Global Routing Instabilities during Code Red II and Nimda Worm Propagation. http://www.renesys.com/projects/bgp_instability.
    • T. Griffin and G. T. Wilfong. An Analysis of BGP Convergence Properties. In ACM SIGCOMM, pages 277288, Aug. 1999.
    • C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet Routing Convergence. In ACM SIGCOMM, Sep. 2000.
    • C. Labovitz, G. R. Malan, and F. Jahanian. Origins of Internet Routing Instability. In IEEE INFOCOM, June 1999.
    • K. Varadhan, R. Govindan, and D. Estrin. Persistent Route Oscillations in Inter-Domain Routing. Computer Networks, 32(1), 1999.
  • Expands on:
    • C. Labovitz, A. Ahuja, and F. Jahanian. Experimental Study of Internet Stability and Wide-Area Network Failures. In Fault-Tolerant Computing Symposium (FTCS), June 1999.
  • Explains / categorises:
  • Uses algorithms of:
    • L. Gao. On Inferring Autonomous System Relationships in the Internet. In IEEE Global Internet Symposium, Nov. 2000.