Bibliography Details

| 
|
|
C. Labovitz, A. Ahuja, and F. Jahanian, "Experimental Study of Internet Stability and Wide-Area Backbone Failures," Tech. Rep. CSE-TR-382-98, University of Michigan, 1998.
| 
|

|
|
Experimental Study of Internet Stability and Wide-Area Backbone Failures
|
|
Authors:
|
C. Labovitz A. Ahuja F. Jahanian
|
|
Published:
|
University of Michigan, 1998
|
|
URL:
|
http://www.eecs.umich.edu/techreports/cse/1998/CSE-TR-382-98.pdf
http://citeseer.nj.nec.com/labovitz98experimental.html
|
|
Entry Date:
|
2003-05-15
|
|
Abstract:
|
In this paper, we describe an experimental study of Internet stability
and the origins of failure in Internet protocol backbones. The
stability of end-to-end Internet paths is dependent both on the
underlying telecommunication switching system, as well as the higher
level software and hardware components specific to the Internet's
packet-switched forwarding and routing architecture. Although a number
of earlier studies have examined failures in the public
telecommunication system, little attention has been given to the
characterization of Internet stability. Our paper analyzes Internet
failures from three different perspectives. We first examine several
recent major Internet failures and their probable origins. These
empirical observations illustrate the complexity of the Internet and
show that unlike commercial transaction systems, the interactions of
the underlying components of the Internet are poorly understood. Next,
our examination focuses on the stability of paths between Internet
Service Providers. Our analysis is based on the experimental
instrumentation of key portions of the Internet
infrastructure. Specifically, we logged all of the routing control
traffic at five of the largest U.S. Internet exchange points over a
three year period. This study of network reachability information
found unexpectedly high levels of path fluctuation and an aggregate
low mean time between failures for individual Internet paths. These
results point to a high level of instability in the global Internet
backbone. While our study of the Internet backbone identifies major
trends in the level of path instability between different service
providers, these results do not characterize failures inside the
network of service provider. The final portion of our paper focuses on
a case study of the network failures observed in a large regional
Internet backbone. This examination of the internal stability of a
network includes twelve months of operational failure logs and a
review of the internal routing communication data collected between
regional backbone routers. We characterize the type and frequency of
failures in twenty categories, and describe the failure properties of
the regional backbone as a whole.
|
|
Datasets:
|
- for inter-provider faults:
- 10 months (Jan 97 to Nov 98) of BGP updates from three providers
- 3 years of BGP updates at 5 U.S. exchange points: AADS, Mae-East,
Mae-West, PacBell, and Sprint
- for faults within a backbone:
- studies MichNet, a medium size regional network connecting
educational and commercial customers in 132 cities at speeds
up to OC3; network connects 33 backbone routers to several hundred
customer routers
- 1 year (Nov 97 to Nov 98) of data from an automated system that
pings all router interfaces
- entries in trouble ticket system of the NOC
- 6 months (Mar 97 to Nov 98) of OSPF messages
|
|
Results:
|
Quoting and paraphrasing from paper:
- The Internet backbone infrastructure exhibit significantly less
availability and a lower mean-time to failure than the Public Switched
Telephone Network (PSTN).
- The majority of Internet backbone paths exhibit a mean-time to failure
of 25 days or less, and a mean-time to repair of twenty minutes or less.
Internet backbones are rerouted (either due to failure or policy changes)
on the average of once every three days or less.
- Routing instability inside of an autonomous network does not exhibit the
same daily and weekly cyclic trends as previously reported for routing
between Inter provider backbones, suggesting that most inter-provider
path failures stem from congestion collapse.
- A small fraction of network paths in the Internet contribute
disproportionately to the number of long-term outages and backbone
unavailability.
- Majority of intra-domain outages stem from maintenance, power outages
and PSTN failures.
|
|
Notes:
|
- Inter-domain BGP updates are classified into the following categories:
Route Repair, Route Fail-Over, Policy Fluctuation, and Pathological
Routing. Previous work by the authors used a lower level classification
(e.g., WWDup, AADiff). The paper does not analyze updates in the
Policy Fluctuation and Pathological Routing categories, as these
are outside the scope of the study.
- The study only considers prefixes "present in each ISP's routing
table for more than an aggregate 60 percent (170 days)" of the
study period. This removed 20% of short-lived routes, leading to
a lower estimate of network failures.
- The authors "applied a fifteen minute filter window to all BGP
route transitions;" specifically, multiple failures occurring during
the window are counted as a single failure. This is meant to reduce
the bias of high frequency pathological behavior and the effects of
BGP convergence.
|
|
|