CAIDA: Visualizing the Internet
kc claffykc@caida.org
Internet usage is increasing as access to the Net grows critical for engineering, research, and all sorts of collaborative activities. It is difficult to imagine how the dynamically changing topological infrastructure of the Internet looks at any particular moment. As yet, we understand little about the impact that changes in traffic, topology, protocols, and business practices have on this new virtual frontier.
The Cooperative Association for Internet Data Analysis (CAIDA) is an independent research group based at the University of California, San Diego, Supercomputer Center (UCSD/SDSC). CAIDA's mission is to provide a neutral framework for promoting greater cooperation in developing and deploying Internet measurement, analysis, and visualization tools that will support engineering and maintaining a robust, scalable global Internet infrastructure. To that end, CAIDA both leads and participates in community efforts to establish respectable Internet traffic metrics, and develops prototype tools and publicizes others' tools that measure them effectively. Our activities are supported by agencies such as the U.S. National Science Foundation (NSF) and the Defense Advanced Research Projects Agency (DARPA), as well as by Internet service providers (ISPs) and equipment vendors participating as CAIDA members.
CAIDA's current activities include collecting, archiving, analyzing, and creating visualizations of massive data sets for Internet topology, workload, performance, and routing. We pursue insights into both normal and anomalous Internet behaviors. In particular, we recognize the need to correlate data from among these categories, and across many geographically and topologically diverse sources. CAIDA researchers are investigating techniques for bandwidth estimation, workload characterization, and long-term trend identification. Such information and analysis methods are useful for traffic engineering, capacity planning, and security breach detection.
Beyond the Myths
Studying real data at a macroscopic level yields compelling insights
for refuting myths about Internet traffic, performance, and growth. Proposals
for new protocols or techniques often make assumptions about traffic characteristics
that are simply not empirically validated. Internet data myths abound, for
example, regarding traffic fragmentation, encryption, favoritism, and
path symmetry. Architectural designs are often justified on similarly
unverified assumptions about distributions of packet, flow, and address
prefix lengths. Furthermore, policy questions are often argued based on
uncorroborated views on address space utilization and consumption, not
to mention routing protocol behavior and performance.
Even when researchers base their hypotheses on data attained at local
campuses, their results typically lose integrity in the face of more complete
or representative data sets. Web caching, content distribution networking,
multicast, and even traffic prioritization on a best-effort architecture
are examples of technologies whose system performance in the field is
prohibitively difficult to evaluate. ISPs have little way to systematically
assess the extent to which such technologies benefit either user-perceived
service quality or their financial bottom line.
CAIDA provides information about (and access to) network measurement tools,
and makes available the results of our research and analysis. Through
our activities, we seek to promote a better understanding of Internet
traffic behavior in order to influence and catalyze the development of
better protocols and services.
CAIDA Analysis
Through the CAIDA Web site, researchers and developers can find analysis projects organized by relevance to
- topology, workload characterization, performance measurement routing, or multicast (https://www.caida.org/analysis/);
- bandwidth estimation
(https://www.caida.org/bandwidth/); - reverse traceroute
and looking glass servers
https://www.caida.org/research/routing/reversetrace/); and - measurement infrastructures
(https://www.caida.org/measinfra/).
Internet Atlas Project
In 1999, CAIDA started the three-year Internet Atlas project, an effort to explore and evaluate analysis methods and techniques used to create a variety of Internet visualizations. Under this project CAIDA is developing methodologies, software, and protocols for mapping the Internet, focusing on Internet topology, performance, workload, and routing data. It also includes assessment of state-of-the-art in this nascent sector. See https://www.caida.org/projects/internetatlas/ for an early showcase of example visualizations with assessments of their strengths and weaknesses.
CAIDA Tools
The Tools page includes links to CAIDA's
measurement tools and software as well as a taxonomy of research and visualization
tools available elsewhere.
One of the CAIDA visualization tools is Walrus, which uses non-Euclidean
hyperbolic space to provide a focus+content view of large (million-node)
directed graphs. The Walrus data view resembles a continuous fish-eye
distortion in three dimensions, allowing the user to examine the fine
details of a small area while maintaining a view of the whole graph as
a frame of reference. The user examines the whole graph by interactively
moving the focus. (For some impressive visualizations done with Walrus,
see https://www.caida.org/catalog/software/walrus/.)
Another set of tools, the skitter family, is used to actively probe the
Internet to analyze topology and performance. Skitter measures forward
IP paths and round-trip time. It can also be used to track persistent
routing changes and visualize network connectivity. Some of the early
Walrus visualizations were generated from skitter topology graph data
sets.
A dozen other CAIDA tools are worth investigating, including
- cflowd and flowscan, for processing router netflow data;
- coralreef, for analyzing TCP/IP flows for optical network media (OC3-OC48);
- netramet, an open-source (GPL) implementation of the RTFM architecture for network traffic flow measurement, developed and supported by Nevil Brownlee at the University of Auckland;
- arts++, a binary file format specification for storing network data;
- netgeo, for mapping IP addresses, domain names, and AS numbers to geographical locations;
- RRDtool, for storing and displaying time-series data (such as network bandwidth, or server load average);
- GeoPlot, for creating geographical images of data sets;
- GTrace, a graphical front-end to traceroute that geographically depicts IP path information between source and destination hosts;
- Mapnet, for macroscopic Internet visualization and measurement; and
- Otter, for visualizing arbitrary network data that can be expressed as a set of nodes, links, or paths.
See https://catalog.caida.org/software for details
on these and other measurement tools.
Real-World Data
One of CAIDA's skitter project goals is
to develop techniques for illustrating relationships and depicting critical
components of the Internet infrastructure. Figure
1, for example, shows the results of a recent visualization
of global Internet topology at a macroscopic level. Our researchers used
skitter to discover topology paths by sending small probe ICMP (Internet
Control Message Protocol) packets across a wide spectrum of the Internet
address space. We then collected and analyzed the path and performance
information received in response to the probes. For this visualization,
we converted the IP path information to an ISP-level granularity by mapping
each responding IP address to the autonomous system (AS) responsible for
routing it. The result is a topology of interconnections among autonomous
systems, each of which approximately maps to an Internet Service Provider
(ISP).
The image in Figure 1 represents
a snapshot of global Internet connectivity during two weeks of October
2000. The graph reflects 685,045 IP addresses and 1,453,349 IP arcs of
skitter data from 16 monitors probing approximately 400,000 destinations
spread across more than 50 percent of globally routable Internet address
prefixes. Each responding IP address was mapped to its originating AS
using the University of Oregon's RouteViews collection of Border Gateway
Protocol (BGP) routing tables. The abstracted graph consists of 7,839
AS nodes and 34,434 observed peering sessions. We could not determine
geographical location for a few (79) AS nodes, so this plot includes
7,760 autonomous systems and 34,293 peering sessions.
Plotting in polar coordinates--where the angular position of each AS corresponds
to the geographical longitude of its administrative headquarters--allows
visualization of AS location by continent and country. Text and colored
bands at the perimeter of the circle label longitude (in degrees) as well
as the locations of continents and large cities. Each AS node's radial
position reflects its outdegree: the number of next-hop autonomous systems
to which it sent traffic. Outdegree is some measure of the richness of
an autonomous system's connectivity. In this graph, AS nodes with the
highest outdegree are yellow or orange, while those with lower outdegree
are blue.
Graphing dimensions of peering richness and geographic information reveals
the highly "core-centric" nature of North American AS nodes. All except
one of the top 15 are based in the United States, and the one exception
is based in Canada. While ISPs in Europe and Asia have many peering relationships
with ISPs in the United States, there are few direct links between ISPs
in Asia and Europe. There are both technical (cabling and router placement
and management, for example) and policy (business and cost models, geo-political
considerations, and so on) factors that contribute to the peering arrangements
represented in this graph.
Conclusion
Several researchers are currently collecting and analyzing
BGP data (tables, updates, and so on) and large-scale IP topology data;
indeed, this has become a vital component of Internet routing research,
although the data collection architectures do have limitations, mostly
in terms of scope and functionality. Operational issues make acquiring
complete data sets of other types (such as traffic matrices and performance)
much more difficult. We can only draw imprecise inferences from the data
collected thus far with available traffic analysis tools and techniques.
Methods for establishing more complete data sets in these areas would
improve the integrity of work in the field. Bandwidth estimation techniques
present other problematic issues in measuring the Internet, but to the
extent viable through the skitter infrastructure, CAIDA hopes to be able
to offer a measurement infrastructure for calibration.
It remains unclear how accurately typical core routing tables reflect
the actual paths traffic will take through the infrastructure. Yet, the
community must pursue research in areas such as correlating and analyzing
massive routing, topology, and performance data sets to provide timely
insight into both normal as well as anomalous Internet behavior. Identifying
and tracking critical or vulnerable pieces of the infrastructure will
also be an important area of exploration, especially as needs change for
MPLS/traffic engineering (TE) infrastructure and for the emerging optical-switched
core.
In the end, all Internet users will benefit from improved methods for monitoring
(and thus, for managing) large subsets of the Internet infrastructure. Research
and development in this area will result in a more stable Internet and a reduction
in user frustration caused by unexpected network congestion. A better understanding
of routing stability will reduce the effect of sudden routing changes, and yield
better performance for emerging applications such as multimedia conferencing,
streaming media, collaborative visualization and authoring tools, highly interactive
gaming, and VoIP. A better understanding of topology will allow improvements
in Internet infrastructure, again resulting in better performance for developers
and end users.
kc claffy is principal investigator for CAIDA and resident research scientist at the University of California, San Diego, Supercomputer Center. Her research interests include Internet workload/performance data collection; analysis and visualization, particularly with respect to commercial ISP collaboration/cooperation; and sharing of resources for analysis. kc received a PhD in computer science from UCSD in 1994.