K. CLAFFY,
TRACIE E. MONK & DANIEL McROBB
No aphorism is more frequently repeated...than that we must ask
Nature a few questions, or ideally, one question at a time. The writer is convinced
that this view is wholly mistaken. Nature, he suggests, will best respond to
a logically and artfully thought out questionnaire; indeed if we ask her a single
question, she will often refuse to answer until some other topic has been discussed.
-- Sir Ronald A. Fisher, Perspectives in Medicine and Biology, 1973.
The infrastructure of the Internet can be considered as the cyber equivalent
of an ecosystem. At its heart is a mesh of interconnected backbone networks.
This core is rapidly evolving and provides the underpinnings that will be vital
for future national and international communications.
The last mile connections from the Internet to homes and businesses
are supplied by thousands of small and medium sized Internet Service Providers
(ISPs), which are in turn interconnected by 'arteries' maintained by transit
(backbone) providers. The global infrastructure of the Internet consists of
a complex array of telecommunications carriers and providers, a very difficult
infrastructure to analyze diagnostically except within the borders of any individual
network. Nonetheless it is critical for the evolution of the Internet that insights
into its overall health and scalability are obtained1-3.
New connections among core Internet backbones are made hourly, ranging in
capacity from T1 copper cables (1.55 megabytes per second) to OC48 fibre optic
pipes (2.48 gigabytes per second). This physical structure supports a myriad
of new technologies and products, including live (or 'streaming') audio and
video, distance education, entertainment, telephony and video-conferencing,
as well as numerous new and often still evolving communications protocols. With
no central authority serving to regulate and quality check, nor any feedback
structure to throttle unfriendly practices or products, the Internet will continue
its unbounded growth.
Attempts to adequately track and monitor the Internet's evolution were greatly
diminished in early 1995 when the National Science Foundation (NSF) relinquished
its stewardship role over the Internet4. The resulting
transition into a competitive industry for Internet services left no framework
for the cross-ISP communications which are needed for engineering or debugging
of network performance problems and security incidents. Nor did competitive
providers, all operating at fairly low profit margins, consider it to be in
their best interests to build such a framework at that time.
As a result, today's Internet industry lacks any ability to evaluate trends,
identify performance problems beyond the boundary of a single ISP, or prepare
systemically for the growing expectations of its users. Maps depicting the structure
and topology of this amorphous global entity are non-existent.
Mapping the Internet ecosystem
To gain insights into Internet traffic and workloads, the Cooperative Association
for Internet Data Analysis (CAIDA) is developing and deploying tools to collect,
analyze and visualize data on connectivity and performance across a large proportion
of the Internet. These tools and analyses will provide windows on the infrastructure
for network operators, designers and researchers, using a process somewhat analogous
to medical 'catscanners' and which we thus refer to as computerized tomography.
Our principal tomography scanning tool, skitter, dynamically discovers
and depicts global Internet topology and measures the performance of specific
paths through the Internet. In essence skitter sends out packets of data
from a source to many destinations through the Internet. The information gained
about the paths that these packets take can then be used in four different ways.
- to acquire infrastructure-wide (global) connectivity information (what's
connected to what?). We are currently using skitter to gather data
for more than 29,000 destination hosts from six source monitors spread throughout
the United States, with additional monitors planned for the U.S., Europe,
and Asia this year.
- to collect round trip time (RTT) and path data (how does a packet get from
A to B and how long does it take?). skitter measures the Internet path
to a destination by sending a sequence of ICMP data packets to the destination
host, setting a longer 'time to live' (TTL) value in each packet, similar
to the traceroute utility*. Each intermediate
hop along the path between source and destination decrements the TTL value
of any packet passing through it, and will notify the source host of the fate
of packets whose TTLs reach zero (these packets must be discarded). Eventually
we will collect such data from up to 60,000 destinations.
- to analyze the frequency and pattern of routing changes (when and how often
are alternative paths used for the same journey?). Low-frequency persistent
changes are detected by the analysis of variations in RTT measurements for
specific paths.
- to visualize network-wide connectivity (what does the Internet look like?).
This is the primary goal of skitter. Probing paths from multiple sources
to a large set of destinations throughout the current IPv4 address space allows
us to produce both topological and geographical representations of a significant
fraction of Internet connectivity.
It is essential that skitter measurements impose only a minimal load on the
infrastructure (that is hosts and routers along the way) as it takes its measurements.
skitter packets are thus very small, 52 bytes in length, and typically
only probe destination hosts at approximately hourly intervals.
But how do we anticipate these skitter data sets can be used?
Network Connectivity
By analyzing data from tens of thousands of path measurements, particularly
critical paths can be identified. Visualization of these data also highlights
the pivotal roles that specific backbones, traffic exchange points, and even
individual routers (the devices that direct and carry packets through segments
of the network), play in transmitting Internet traffic. A preliminary two-dimensional
visualization of skitter data depicting a macroscopic snapshot of connectivity
networks is shown here (Fig. 1) with selected backbone ISPs
coloured separately. The graph reflects 23,000 end destinations, through many
more intermediate routers. While visually interesting, the volume of data represented
in the figure greatly reduces its utility to operators and users alike.
However, by grouping links according to the backbone ISPs to which they belong,
we highlight relationships among various backbone networks. Backbone providers
are identified in the global routing system by one or a few specific Autonomous
System Numbers (AS numbers). Fig 2 shows the interconnections
among a few ASes on December 3, 1998 as seen from a source monitor in San Diego,
California. For the purpose of this article an Autonomous System (AS) can be thought
of as a single Internet service provider.
 |
Figure 2 Interconnection
relationships among key Autonomous System networks on December 3, 1998 as
viewed from skitter host.
High
resolution image and legend (45k)
|
Note that the particular source monitor will skew the relative prevalence of a
given AS in the graph. For example, if a source is a customer of a given ISP,
that ISP will appear in most paths from that source, as for CERFnet's strong representation
in the data from the San Diego source. This also occurs with ASes topologically
'nearby' this primary service provider.
Even such aggregated, two-dimensional representations of this data prove limited.
We are currently exploring possible methods for three-dimensional visualization
of topology and RTT performance across segments of the infrastructure, such
as Munzner's hyperbolic space techniques5.
The critical role that certain ASes play in forwarding traffic across the
Internet is illustrated in table
1. Using samples from monitors covering 20,588 end destinations, we determined
the frequency with which an individual AS appeared in a path and the relative
depth of those appearances, both in terms of number of ASes and the number of
hops from the source monitor. In this example, CerfNet/AT&T, Cable & Wireless
(which purchased InternetMCI's backbone in 1998), Sprint, and UUNET (part of
MCI/Worldcom) play major roles in transporting packets across the infrastructure.
Routing
The robustness and reliability of the Internet are highly dependent on efficient,
stable connections and routing among provider networks. An analysis of real
world trends in routing behaviour across the Internet will have direct implications
for the next generation of networking hardware, software and operational policies.
Observations of macroscopic traffic patterns provide insights into:
- effects of outages on surrounding ISPs.
- effects of topology changes on Internet performance.
- unintended consequences of new routing policies.
- potential areas for improving an individual networks' ability to respond
to congestion and topology changes.
CAIDA can currently analyse skitter path data over time. This allows
the comparison of actual routing behaviour with routing policies (border gateway
protocol (BGP) route announcements), and identification of optimal routes given
performance results.
Performance (Network and Hardware)
To improve the accuracy of its round-trip time measurements, CAIDA added a
'kernel module' to the operating system (FreeBSD for Unix) on which skitter
runs. This module 'time stamps' the data that is collected by skitter,
which may be insufficient for one-way measurements or detailed correlation across
skitter platforms, but it does provide an indication of variations in
performance across the Internet infrastructure. By comparing data from various
sources, analysts can identify points of congestion and performance degradation
or areas for improvements in the global infrastructure.
To aid performance testing of Internet hardware and evaluation of performance
across specific paths, a skitter module called skping can be used
at high resolution. Initial measurements of operational routers (Fig.
3) have identified statistically significant problems on certain routers
that use network route cache technology: there are a number of RTTs longer than
250 ms. Data from routers running more recent (non-caching) software (for example
Fig. 4) do not reflect these performance problems.
 |
Figure 3 skping
can be used to identify anomalies associated with specific Internet hardware,
such as these periodic performance problems associated with prefix caching
on routers.
High
resolution image and legend (10k)
|
skping can also evaluate the performance of specific paths and the frequency
of route changes over time.
Time-series analyses of the number of packets that are lost in transit (shown
as a cumulative percentage of approximately 10% in Fig. 5)
highlight instabilities that are caused by sustained routing queues. The last
1,600 points of this dataset, shown in a histogram (Fig. 6),
depict a relatively symmetrical distribution of RTTs, indicating that the cause
of the losses is likely to be congestion or global synchronization at a specific
link.
 |
Figure 5 Time
series analyses of traffic between a skitter host and www.freebsd.org
suggests congestion or global synchronization at a specific link.
High
resolution image and legend (25k)
|
 |
Figure 6 A histogram
of the skping data reveals a median round trip time performance of
roughly 275 ms for the last 1,600 data points.
High
resolution image and legend (14k)
|
Running another skitter module, sktrace, along this path provides
evidence of congestion between its eleventh and twelfth hop. The minimum RTT values
for these hops are similar, but the median for hop 12 is much higher than the
median for hop 11 (Fig. 7). The distribution at hop 12 is
also wider and more symmetrical than at hop 11 (Fig. 8) and
shows a strong correlation to the distribution for the final destination.
This kind of analysis is only one of a number of possible uses to which cross-network
data such as that produced by skitter can be put. For example we have also
investigated the effect of load balancing across multiple paths (Fig.
9), and identified examples of route instability, known as 'flapping' (Fig.
10). Examination of data from these and other sites over time shows that instead
of fairly consistent performance, a significant fraction of the data sent over
the Internet takes a very long time to reach its destination, if it ever arrives
at all. This characteristic produces a tendency for heavy-tailed distributions
of round-trip times on the global Internet.
 |
Figure 9 skping
results showing minimum performance variation among paths, suggesting the
possible presence of load balancing among routers.
High
resolution image and legend (6k)
|
 |
Figure 10 Route
instability, as shown by this graph, led to variations in round trip performance
ranging from 98 to 163 ms between the same two hosts.
High
resolution image and legend (8k)
|
Continual vigilance
Without adequate monitoring and analysis of the behaviour of traffic across
the Internet, the ability of this promising infrastructure to continue growing
and achieving its full potential may be compromised. Efforts to deploy tomography
tools by CAIDA and others are only preliminary steps toward maintaining and
evolving the capability to watch over the Internet as it moves into its second
decade.
References
- Claffy, K., Miller, G. & Thompson K. The nature of the beast:
recent traffic measurements from an Internet backbone. in Proceedings of
INET'98(ISOC, Washington, DC, 1998).
- Claffy, K & Monk, T. What's next for internet data analysis?
in IEEE Special Issue on Communications in the 21st Century 85,
1563-1571 (1997).
- Monk, T. & Claffy, K. Cooperation in internet data acquisition
and analysis. in Coordinating the Internet (Kahin, B. & Keller, J.
eds) 438-465 (MIT Press, Cambridge, Mass, 1997).
- Braun, H.-W. & Claffy, K. Post-NSFNET statistics collection.
in White papers for the unpredictable certainty: Information infrastructure
through 2000 85-96 (Computer Science and Telecommunications Board, National
Research Council, National Academy of Sciences, Washington DC, 1997).
- Munzner, T. Exploring large graphs in 3D hyperbolic space.
in IEEE Computer Graphics and Applications 18, 18-23 (1998).
Acknowledgments.
Many thanks to Bill Cheswick and Hal Burch (Lucent/Bell
Laboratories) for providing the graph layout code for Figure 1.
K. Claffy, Tracie E. Monk
and Daniel McRobb
are members of CAIDA, a collaborative organization supporting cooperative efforts
among the commercial, government and research communities aimed at promoting a
scalable, robust Internet infrastructure. CAIDA is based at the University of
California's San Diego Supercomputer Center (SDSC). Support for these efforts
is provided by CAIDA members and by the Defense Advanced Research Project Agency
(DARPA), through its Next Generation Internet program, and by the National Science
Foundation (NSF). More information is available at https://www.caida.org.
*Other connectivity assessment tools, such
as the original traceroute from Van Jacobson, also use this technique
for determining Internet paths, though skitter uses ICMP rather then the UDP
packets of traceroute (that is packets using a different protocol)
For more details see the design documents on Skitter's home page: https://www.caida.org/catalog/software/skitter/