G. Iannaccone, C. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot, "Analysis of link failures in an IP backbone", in ACM SIGCOMM Internet Measurement Workshop, Nov 2002.
|Analysis of link failures in an IP backbone|
|Published:||ACM SIGCOMM Internet Measurement Workshop, 2002|
|Abstract:||Today's IP backbones are provisioned to provide excellent performance in terms of loss, delay and availability. However, performance degradation and service disruption are likely in the case of failure, such as fiber cuts, router crashes, etc. In this paper, we investigate the occurence of failures in Sprint's IP backbone and their potential impact on emerging services such as Voice-over-IP (VoIP). We first examine the frequency and duration of failure events derived from IS-IS routing updates collected from three different points in the Sprint IP backbone. We observe that link failures occur as part of everyday operation, and the majority of them are short-lived (less than 10 minutes) . We also discuss various statistics such as the distribution of inter-failure time, distribution of link failure durations, etc. which are essential for constructing a realistic link failure model. Next, we present an analysis of routing and service reconvergence time during a controlled link failure scenario in our backbone. Our results indicate that disruption to packet forwarding after link failures depends not only on routing protocol dynamics, but also on the design of routers' architectures and control planes. Thus our results offer insights into two basic components for defining network-wide availability, which we consider a more appropriate metric for service-level agreements to support emerging applications.|