Visualizing the Internet: A Co-operative Endeavor
Introduction to CAIDA
Internet usage is increasing as access to the Net grows critical for engineering, research, and all sorts of collaborative activities. It is difficult to imagine how the dynamically changing topology or workload of the Internet infrastructure looks at any particular moment. As yet, we understand little about the impact that changes in traffic, topology, protocols, and business practices have on this new virtual frontier.
The Cooperative Association for Internet Data Analysis (CAIDA) both leads and participates in community efforts to establish respectable Internet traffic metrics, and develops prototype tools, as well as promoting others' tools that measure them effectively. Our activities are supported by agencies such as the U.S. National Science Foundation (NSF) and the Defense Advanced Research Projects Agency (DARPA), as well as by Internet service providers (ISPs) and equipment vendors participating as CAIDA members.
CAIDA's current activities include collecting, archiving, analyzing, and creating visualizations of massive data sets for Internet topology, workload, performance, and routing. We pursue insights into both normal and anomalous Internet behaviors. In particular, we recognize the need to correlate data from among these categories, and across many geographically and topologically diverse sources. CAIDA researchers are investigating techniques for bandwidth estimation, workload characterization, and long-term trend identification. Such information and analysis methods are useful for traffic engineering, capacity planning, and security breach detection.
Beyond the Myths
Studying real data at a macroscopic level yields compelling insights for refuting myths about Internet traffic, performance, and growth. Proposals for new protocols or techniques often make assumptions about traffic characteristics that are simply not empirically validated. Internet data myths abound, for example, regarding traffic fragmentation, encryption, favoritism, and path symmetry. Architectural designs are often justified on similarly unverified assumptions about distributions of packet, flow, and address prefix lengths. Furthermore, policy questions are often argued based on uncorroborated views on address space utilization and consumption, not to mention routing protocol behavior and performance. Even when researchers base their hypotheses on data attained at local campuses, their results typically lose integrity in the face of more complete or representative data sets. Web caching, content distribution networking, multicast, and even traffic prioritization on a best-effort architecture are examples of technologies whose system performance in the field is prohibitively difficult to evaluate. ISPs have little way to systematically assess the extent to which such technologies benefit either user-perceived service quality or their financial bottom line.
CAIDA provides information about, and access to, network measurement tools, and makes available the results of our research and analysis. Through our activities, we seek to promote a better understanding of Internet traffic behavior in order to influence and catalyze the development of better protocols and services.
CAIDA Analysis Projects
For researchers and developers, the CAIDA Web site organizes analysis projects by relevance to:
Topology, mapping network infrastructure at a variety of layers, including IP-level probing from multiple sources to a large set of destinations throughout the current IPv4 address space.
Workload characterization, collection of IP packet header information from a point within a network to determine nature and growth of application types, flow matrices, packet and flow inter and intra-arrival rates across a given network link.
Routing, including analysis of BGP unicast and multicast routing tables, which reflect the transit relationships between Individual Autonomous Systems (ASes) at any given point in time.
Performance, including tools and techniques for estimating available link and path bandwidth.
Measurement tools: a repository of CAIDA's tools, and a much larger taxonomy of other tools.
Measurement Infrastructures
Reverse trace route and looking glass servers
Internet Atlas Project. In 1999, CAIDA started the three-year Internet Atlas project, an effort to explore and evaluate analysis methods and techniques used to create a variety of Internet visualizations. Under this project CAIDA is developing methodologies, software, and protocols for mapping the Internet, focusing on Internet topology, performance, workload, and routing data. It also includes assessment of state-of-the-art in this nascent sector. See here for an early showcase of example visualizations with assessments of their strengths and weaknesses.
Skitter The skitter tool family is used to actively probe the Internet to analyze topology and performance. Skitter measures forward IP paths and round-trip time. It can also be used to track persistent routing changes and visualize network connectivity. See here for more details on these and other measurement tools.
One of CAIDA's skitter project goals is to develop techniques for illustrating relationships and depicting critical components of the Internet infrastructure. For example, this shows the results of a recent visualization of global internet topology at a macroscopic level.
Community Outreach
CAIDA also leads several community outreach efforts. One is the Internet Engineering Curriculum Repository, a Web-based "living repository" of Internet engineering teaching materials initiated in 1998. Another, the Internet Statistics and Metrics Analysis workshop series facilitates discussion among communities of academia, equipment vendors, and service providers - who share an interest in and incentive to understand one another's interests and concerns with Internet statistics and analysis.
CAIDA Directions and Challenges
Several researchers are currently collecting and analyzing BGP data (tables, updates, and so on) and large-scale IP topology data; indeed, this has become a vital component of Internet routing research, although the data collection architectures do have limitations, mostly in terms of scope and functionality. Operational issues make acquiring complete data sets of other types (such as traffic matrices and performance) much more difficult. We can only draw imprecise inferences from the data collected thus far with available traffic analysis tools and techniques.
Methods for establishing more complete data sets in these areas would improve the integrity of work in the field. For example, bandwidth estimation techniques present several problematic issues in measuring the Internet, but to the extend viable through the skitter infrastructure, CAIDA hopes to be able to offer a measurement infrastructure for calibration and correlation of performance and topology.
Also promising architectural insight is assessing how accurately typical core routing tables reflect actual paths traffic will take through the infrastructure. We have seen significant incongruities between the two but have not yet classified the phenomena from which such incongruities derive.
Identifying and tracking critical or vulnerable pieces of the infrastructure will also be an important area of exploration, especially as needs change for MPLS/traffic engineering (TE) infrastructure and for the emerging optical-switched core. Prerequisite to significant breakthroughs in this field are new methods for correlating and analyzing massive routing, topology, and performance data sets to provide timely insight into both normal and anomalous Internet behavior.
Conclusion
A better understanding of routing stability and topology dynamics will reduce the effect of sudden routing changes, and yield better performance for emerging application such as multimedia conferencing, streaming media, collaborative visualization and authoring tools, highly interactive gaming, and voice over IP. A better understanding of workload will allow improvements in Internet infrastructure, again resulting in better performance for developers and end users.
In the end, all Internet users will benefit from improved methods for monitoring (and thus, for managing) large subsets of the Internet infrastructure. Research and development in this area will result in a more stable Internet and a reduction in user frustration caused by unexpected network resource limitations.