Bibliography Details

N. Feamster, D.~G. Andersen, H. Balakrishnan, and M.~F. Kaashoek, "Measuring the Effects of Internet Path Faults on Reactive Routing", in ACM SIGMETRICS, Jun 2003.

Measuring the Effects of Internet Path Faults on Reactive Routing
Authors:	N. Feamster D. G. Andersen H. Balakrishnan M. F. Kaashoek
Published:	ACM SIGMETRICS, 2003
URL:	http://nms.lcs.mit.edu/~feamster/papers/failures-sigm2003.ps.gz http://nms.lcs.mit.edu/~feamster/papers/failures-sigm2003.pdf http://portal.acm.org/citation.cfm?id=781027.781043
Entry Date:	2003-05-14
Abstract:	Empirical evidence suggests that reactive routing systems improve resilience to Internet path failures. They detect and route around faulty paths based on measurements of path performance. This paper seeks to understand why and under what circumstances these techniques are effective. To do so, this paper correlates end-to-end active probing experiments, loss-triggered traceroutes of Internet paths, and BGP routing messages. These correlations shed light on three questions about Internet path failures: (1) Where do failures appear? (2) How long do they last? (3) How do they correlate with BGP routing instability? Data collected over 13 months from an Internet testbed of 31 topologically diverse hosts suggests that most path failures last less than fifteen minutes. Failures that appear in the network core correlate better with BGP instability than failures that appear close to end hosts. On average, most failures precede BGP messages by about four minutes, but there is often increased BGP traffic both before and after failures. Our findings suggest that reactive routing is most effective between hosts that have multiple connections to the Internet. The data set also suggests that passive observations of BGP routing messages could be used to predict about 20% of impending failures, allowing re-routing systems to react more quickly to failures.
Datasets:	measurements between Feb 2002 and Mar 2003 on 31 NTP-synchronized nodes in the RON testbed 390 million active probes between randomly paired hosts; probes in both directions 18,000 loss-triggered traceroutes between Jun 26, 2002 and Mar 12, 2003 BGP messages at 8 hosts using Zebra employ alias resolution techniques developed for Rocketfuel
Results:	Quoting from the paper: While a few paths are much more failure-prone than others, failures appear spread out over many different links, not just a few "bad" links. Failures appear more often inside AS's than on links between them. 90% of failures last less than 15 minutes, and 70% of failures last less than 5 minutes. BGP messages coincide with only half of the failures that reactive routing could potentially avoid, suggesting that these were failures that not even a "perfect" BGP could avoid. Reactive routing is potentially more effective at correcting failures for hosts with multiple Internet connections. BGP traffic is a good indicator that a failure has recently occurred or is about to occur. When BGP messages and failures coincide, BGP messages most often follow failures by 4 minutes.