<?xml version="1.0" standalone="no"?>
                    <!DOCTYPE div SYSTEM "/www/backend/www-xml-443/dtd/caidaML.dtd">
                    <!-- do NOT ERASE the DOCTYPE declaration! --><div>


<tr bgcolor="#f4f4f4">
  <td>
<font face="helvetica,arial" size="2">
<b>URL:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
<a href="http://www.icir.org/vern/imw-2002/imw2002-papers/202.pdf">http://www.icir.org/vern/imw-2002/imw2002-papers/202.pdf</a><br/>
<a href="http://www.icir.org/vern/imw-2002/slides/202-slides.pdf">http://www.icir.org/vern/imw-2002/slides/202-slides.pdf</a><br/>
<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.2392">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.2392</a>
</font>
  </td>
</tr>


<tr bgcolor="#e9e9e9">
  <td>
<font face="helvetica,arial" size="2">
<b>Entry Date:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
2003-05-14


</font>
  </td>
</tr>


<tr bgcolor="#f4f4f4">
  <td>
<font face="helvetica,arial" size="2">
<b>Abstract:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
Today's IP backbones are provisioned to provide
excellent performance in terms of loss, delay and availability.
However, performance degradation and service disruption
are likely in the case of failure, such as fiber cuts,
router crashes, etc. In this paper, we investigate the occurence
of failures in Sprint's IP backbone and their potential
impact on emerging services such as Voice-over-IP
(VoIP). We first examine the frequency and duration of failure
events derived from IS-IS routing updates collected from
three different points in the Sprint IP backbone. We observe
that link failures occur as part of everyday operation,
and the majority of them are short-lived (less than 10 minutes)
. We also discuss various statistics such as the distribution
of inter-failure time, distribution of link failure durations,
etc. which are essential for constructing a realistic
link failure model. Next, we present an analysis of routing
and service reconvergence time during a controlled link failure
scenario in our backbone. Our results indicate that disruption
to packet forwarding after link failures depends not
only on routing protocol dynamics, but also on the design
of routers' architectures and control planes. Thus our results
offer insights into two basic components for defining
network-wide availability, which we consider a more appropriate
metric for service-level agreements to support emerging
applications.


</font>
  </td>
</tr>


<tr bgcolor="#e9e9e9">
  <td>
<font face="helvetica,arial" size="2">
<b>Datasets:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
<ul>
<li>Discusses only failure events that affect links connecting different
    POPs (Points of Presence).  Intra-POP failures are not covered.</li>
<li>Disregards link failures that are not fixed in 24 hours (under the
    assumption that these represent a permanent removal of links).</li>
<li>for link failures: IS-IS updates collected Dec 2001 to Apr 2002</li>
<li>for IS-IS convergence time: two-way packet probes and traceroutes
    between a host on the U.S. East Coast and a host on the West Coast; two
    backbone links were intentionally brought down</li>
</ul>


</font>
  </td>
</tr>


<tr bgcolor="#f4f4f4">
  <td>
<font face="helvetica,arial" size="2">
<b>Results:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
<ul>
<li>only 10% of failures last longer than 20 minutes</li>
<li>50% of failures last less than 1 minute</li>
<li>47% of all failure events occur between 10PM to 6AM EST, a time
    period including most planned maintenance (at Sprint)</li>
<li>links differ widely in number of failures and in mean time between
    failures; a small number of links are highly failure prone</li>
<li>using Cisco default values for IS-IS parameters: IS-IS convergence
    time after a failure is less than 18 seconds</li>
<li>tuning IS-IS parameters: IS-IS convergence time can be reduced to
    2-3 seconds</li>
</ul>




</font>
  </td>
</tr>
</div>

