CAIDA: Visualizing the Internet

kc claffykc@caida.org

Internet usage is increasing as access to the Net grows critical for engineering, research, and all sorts of collaborative activities. It is difficult to imagine how the dynamically changing topological infrastructure of the Internet looks at any particular moment. As yet, we understand little about the impact that changes in traffic, topology, protocols, and business practices have on this new virtual frontier.

The Cooperative Association for Internet Data Analysis (CAIDA) is an independent research group based at the University of California, San Diego, Supercomputer Center (UCSD/SDSC). CAIDA's mission is to provide a neutral framework for promoting greater cooperation in developing and deploying Internet measurement, analysis, and visualization tools that will support engineering and maintaining a robust, scalable global Internet infrastructure. To that end, CAIDA both leads and participates in community efforts to establish respectable Internet traffic metrics, and develops prototype tools and publicizes others' tools that measure them effectively. Our activities are supported by agencies such as the U.S. National Science Foundation (NSF) and the Defense Advanced Research Projects Agency (DARPA), as well as by Internet service providers (ISPs) and equipment vendors participating as CAIDA members.

CAIDA's current activities include collecting, archiving, analyzing, and creating visualizations of massive data sets for Internet topology, workload, performance, and routing. We pursue insights into both normal and anomalous Internet behaviors. In particular, we recognize the need to correlate data from among these categories, and across many geographically and topologically diverse sources. CAIDA researchers are investigating techniques for bandwidth estimation, workload characterization, and long-term trend identification. Such information and analysis methods are useful for traffic engineering, capacity planning, and security breach detection.

Beyond the Myths

Studying real data at a macroscopic level yields compelling insights for refuting myths about Internet traffic, performance, and growth. Proposals for new protocols or techniques often make assumptions about traffic characteristics that are simply not empirically validated. Internet data myths abound, for example, regarding traffic fragmentation, encryption, favoritism, and path symmetry. Architectural designs are often justified on similarly unverified assumptions about distributions of packet, flow, and address prefix lengths. Furthermore, policy questions are often argued based on uncorroborated views on address space utilization and consumption, not to mention routing protocol behavior and performance.

Even when researchers base their hypotheses on data attained at local campuses, their results typically lose integrity in the face of more complete or representative data sets. Web caching, content distribution networking, multicast, and even traffic prioritization on a best-effort architecture are examples of technologies whose system performance in the field is prohibitively difficult to evaluate. ISPs have little way to systematically assess the extent to which such technologies benefit either user-perceived service quality or their financial bottom line.

CAIDA provides information about (and access to) network measurement tools, and makes available the results of our research and analysis. Through our activities, we seek to promote a better understanding of Internet traffic behavior in order to influence and catalyze the development of better protocols and services.

CAIDA Analysis

Through the CAIDA Web site, researchers and developers can find analysis projects organized by relevance to

topology, workload characterization, performance measurement routing, or multicast (https://www.caida.org/analysis/);
bandwidth estimation
(https://www.caida.org/bandwidth/);
reverse traceroute and looking glass servers
https://www.caida.org/research/routing/reversetrace/); and
measurement infrastructures
(https://www.caida.org/measinfra/).

Internet Atlas Project

In 1999, CAIDA started the three-year Internet Atlas project, an effort to explore and evaluate analysis methods and techniques used to create a variety of Internet visualizations. Under this project CAIDA is developing methodologies, software, and protocols for mapping the Internet, focusing on Internet topology, performance, workload, and routing data. It also includes assessment of state-of-the-art in this nascent sector. See https://www.caida.org/projects/internetatlas/ for an early showcase of example visualizations with assessments of their strengths and weaknesses.

CAIDA Tools

The Tools page includes links to CAIDA's measurement tools and software as well as a taxonomy of research and visualization tools available elsewhere.

One of the CAIDA visualization tools is Walrus, which uses non-Euclidean hyperbolic space to provide a focus+content view of large (million-node) directed graphs. The Walrus data view resembles a continuous fish-eye distortion in three dimensions, allowing the user to examine the fine details of a small area while maintaining a view of the whole graph as a frame of reference. The user examines the whole graph by interactively moving the focus. (For some impressive visualizations done with Walrus, see https://www.caida.org/catalog/software/walrus/.)

Another set of tools, the skitter family, is used to actively probe the Internet to analyze topology and performance. Skitter measures forward IP paths and round-trip time. It can also be used to track persistent routing changes and visualize network connectivity. Some of the early Walrus visualizations were generated from skitter topology graph data sets.

A dozen other CAIDA tools are worth investigating, including

cflowd and flowscan, for processing router netflow data;

coralreef, for analyzing TCP/IP flows for optical network media (OC3-OC48);

netramet, an open-source (GPL) implementation of the RTFM architecture for network traffic flow measurement, developed and supported by Nevil Brownlee at the University of Auckland;

arts++, a binary file format specification for storing network data;

netgeo, for mapping IP addresses, domain names, and AS numbers to geographical locations;

RRDtool, for storing and displaying time-series data (such as network bandwidth, or server load average);

GeoPlot, for creating geographical images of data sets;

GTrace, a graphical front-end to traceroute that geographically depicts IP path information between source and destination hosts;

Mapnet, for macroscopic Internet visualization and measurement; and

Otter, for visualizing arbitrary network data that can be expressed as a set of nodes, links, or paths.

See https://catalog.caida.org/software for details on these and other measurement tools.

Real-World Data

One of CAIDA's skitter project goals is to develop techniques for illustrating relationships and depicting critical components of the Internet infrastructure. Figure 1, for example, shows the results of a recent visualization of global Internet topology at a macroscopic level. Our researchers used skitter to discover topology paths by sending small probe ICMP (Internet Control Message Protocol) packets across a wide spectrum of the Internet address space. We then collected and analyzed the path and performance information received in response to the probes. For this visualization, we converted the IP path information to an ISP-level granularity by mapping each responding IP address to the autonomous system (AS) responsible for routing it. The result is a topology of interconnections among autonomous systems, each of which approximately maps to an Internet Service Provider (ISP).

The image in Figure 1 represents a snapshot of global Internet connectivity during two weeks of October 2000. The graph reflects 685,045 IP addresses and 1,453,349 IP arcs of skitter data from 16 monitors probing approximately 400,000 destinations spread across more than 50 percent of globally routable Internet address prefixes. Each responding IP address was mapped to its originating AS using the University of Oregon's RouteViews collection of Border Gateway Protocol (BGP) routing tables. The abstracted graph consists of 7,839 AS nodes and 34,434 observed peering sessions. We could not determine geographical location for a few (79) AS nodes, so this plot includes 7,760 autonomous systems and 34,293 peering sessions.

Plotting in polar coordinates--where the angular position of each AS corresponds to the geographical longitude of its administrative headquarters--allows visualization of AS location by continent and country. Text and colored bands at the perimeter of the circle label longitude (in degrees) as well as the locations of continents and large cities. Each AS node's radial position reflects its outdegree: the number of next-hop autonomous systems to which it sent traffic. Outdegree is some measure of the richness of an autonomous system's connectivity. In this graph, AS nodes with the highest outdegree are yellow or orange, while those with lower outdegree are blue.

Graphing dimensions of peering richness and geographic information reveals the highly "core-centric" nature of North American AS nodes. All except one of the top 15 are based in the United States, and the one exception is based in Canada. While ISPs in Europe and Asia have many peering relationships with ISPs in the United States, there are few direct links between ISPs in Asia and Europe. There are both technical (cabling and router placement and management, for example) and policy (business and cost models, geo-political considerations, and so on) factors that contribute to the peering arrangements represented in this graph.

Conclusion

Several researchers are currently collecting and analyzing BGP data (tables, updates, and so on) and large-scale IP topology data; indeed, this has become a vital component of Internet routing research, although the data collection architectures do have limitations, mostly in terms of scope and functionality. Operational issues make acquiring complete data sets of other types (such as traffic matrices and performance) much more difficult. We can only draw imprecise inferences from the data collected thus far with available traffic analysis tools and techniques. Methods for establishing more complete data sets in these areas would improve the integrity of work in the field. Bandwidth estimation techniques present other problematic issues in measuring the Internet, but to the extent viable through the skitter infrastructure, CAIDA hopes to be able to offer a measurement infrastructure for calibration.

It remains unclear how accurately typical core routing tables reflect the actual paths traffic will take through the infrastructure. Yet, the community must pursue research in areas such as correlating and analyzing massive routing, topology, and performance data sets to provide timely insight into both normal as well as anomalous Internet behavior. Identifying and tracking critical or vulnerable pieces of the infrastructure will also be an important area of exploration, especially as needs change for MPLS/traffic engineering (TE) infrastructure and for the emerging optical-switched core.

In the end, all Internet users will benefit from improved methods for monitoring (and thus, for managing) large subsets of the Internet infrastructure. Research and development in this area will result in a more stable Internet and a reduction in user frustration caused by unexpected network congestion. A better understanding of routing stability will reduce the effect of sudden routing changes, and yield better performance for emerging applications such as multimedia conferencing, streaming media, collaborative visualization and authoring tools, highly interactive gaming, and VoIP. A better understanding of topology will allow improvements in Internet infrastructure, again resulting in better performance for developers and end users.

kc claffy is principal investigator for CAIDA and resident research scientist at the University of California, San Diego, Supercomputer Center. Her research interests include Internet workload/performance data collection; analysis and visualization, particularly with respect to commercial ISP collaboration/cooperation; and sharing of resources for analysis. kc received a PhD in computer science from UCSD in 1994.

Related Objects

See https://catalog.caida.org/paper/2001_caida/ to explore related objects to this document in the CAIDA Resource Catalog.