Measuring the Internet
kc claffy
UC San Diego Supercomputer Center
Measuring
the Internet
The so-called science of poll-taking is not a science at all but a mere necromancy. People are unpredictable by nature, and although you can take a nation's pulse, you can't be sure that the nation hasn't just run up a flight of stairs.
--E.B. White, The New Yorker, Nov. 1948
Internet traffic behavior has been resistant to modeling. The reasons derive from the Internet's evolution as a composition of independently developed and deployed (and by no means synergistic) protocols, technologies, and core applications. Moreover, this evolution, though "punctuated" by new technologies, has experienced no equilibrium thus far.
The state of the art, or lack thereof, in high-speed measurement is neither surprising nor profound. It is a natural consequence of the economic imperatives in the current industry, where empirically grounded research in wide-area Internet modeling has been an obvious casualty. Specifically, the engineering know-how required to develop advanced measurement technologies, whether in software or hardware, is essentially the same skill set required to develop advanced routing and switching capabilities. Since the latter draw far greater interest, and profit, from the marketplace, it is where the industry allocates engineering talent.
A common complaint about traffic measurement studies is that they do not sustain relevance in this environment where traffic, technology, and topology change faster than we can measure them. Moreover, the proliferation of media and protocols make the acquisition of traffic data almost prohibitively complicated and costly. And finally, the time required to analyze and validate data means that most research efforts are obsolete by the time findings are published.
Thus, far from having an analytic handle on the Internet, we lack in most cases the ability even to measure traffic at a granularity that would enable infrastructure-level research. As a result, while the core of the Internet continues its rapid evolution, measurement and modeling of it progress at a leisurely pace.
Current Measurement Tools
Both active and passive measurement of the Internet do occur, as do analyses of the routing and IP addressing system.
In the active measurement arena, commercial and research ventures are emerging. Benchmarking end-to-end performance of commercial service providers (for example, transit, access, content hosting, and caching) across clearly specified paths or segments of the Internet is now a competitive market. Often spawned by end users interested in verifying performance of their Internet service, these measurements typically involve an end host sending active probe traffic out into the network and recording the delay until packets return to their source. Unfortunately, such traffic measurements inherently involve a large number of parameters that are difficult, if not impossible, to model independently; the complexity renders elusive any comparability or useful normalization of gathered data.
We lack in most cases the ability even to measure traffic at a granularity that would enable infrastructure-level research.
Research groups do deploy technology and infrastructure to measure and evaluate performance of selected Internet paths, but such efforts are slow and rarely meet the needs of the user, research, or ISP communities. Proliferation of uncoordinated active measurement initiatives has also led to counterproductive actions, such as ISPs turning off Internet Control Message Protocol traffic at select routers to limit the visibility (and vulnerability) of their infrastructure.
Passive infrastructure-wide measurements and characterization of traffic remain the purview of researchers and some large transit providers. The National Laboratory for Applied Network Research (NLANR) makes available traces from various High Performance Connection (HPC) university sites for analysis by the community. Analyses using these somewhat limited windows onto the Net suggest the emergence of new applications, such as streaming media (audio and video) and games, as well as trends in the distribution and sequencing of packet sizes. Some workload trends have clear if not ominous implications for ISPs--for example, the proportion of non-congestion controlled traffic (as derived from packet traces) directly affects infrastructural stability. The distribution of TCP flow lengths, while nontrivial, also indicates the degree to which traffic is likely to respond to network signals.
Analysis of routing data based on information obtained from Border Gateway Protocol (BGP) version 4 route tables can indicate richness and trends in ISP peering relationships. Mapping packet trace data to autonomous system paths provides insights as to actual paths or networks that traffic traverses through the Internet infrastructure at a particular time. While these analyses indicate traffic behavior and relationships among the ISPs, they are neither exhaustive nor can they be generalized across providers.
Analysis and visualization of Internet Protocol (IP) version 4 address space indicates how current Internet address space is allocated (to institutions and ISPs) and the degree to which allocated space is advertised and routed across the Internet infrastructure. Such depictions can inform analysis of public policy (such as equity) issues, as well as the evaluation of engineering and operational aspects of the commercial Internet.
Analysis of the emerging Internet infrastructure should also consider emerging protocols and technologies upon which new services may be based. While services on today's Internet may be relatively homogeneous, differentiation among provider services will accelerate as networks come to terms with the technical, measurement, and billing requirements for enhanced qualities of service. Such segmentation will ensure that traffic associated with guaranteed or value-added service levels provided by one network will not be directly comparable with those provided by another network.
Challenges for the New Decade
As we enter the new decade, organizations engaged in analyzing macroscopic, infrastructure-wide traffic behavior must focus on
- active measurements that are less invasive and do not provoke providers to defensive behavior;
- passive acquisition of performance data (latency, loss, jitter) that can reduce the perceived need to actively probe infrastructures;
- aggregating, mining, and visualizing the massive data sets in ways that are useful to multiple users;
- mapping IP addresses to more useful analysis entities: autonomous systems (BGP routing granularity), countries, equipment (multiple IP addresses map to the same router but without any mechanism to derive the mapping), and geographic location information (latitude/longitude coordinates);
- problems of hardware speed and memory/bus limitations, emerging media (Gigabit Ethernet, DWDM), IP security, and the reluctance of ISPs to use and/or share measurement results.
Progress requires both top-down and bottom-up momentum: users, researchers, and application developers must scope out the measurements essential to understanding Internet behavior and growth; ISPs need to deploy and evaluate measurement technology for their own network design, operation, and cost recovery. This work should be accompanied by more thoughtful infrastructure-relevant analysis of existing data. In particular, we need better correlation among data sources and types and greater feedback into the design of future data acquisition techniques as well as Internet technologies themselves.
Unlike many other fields of engineering, Internet data analysis is no longer justifiable as an isolated activity. The ecosystem under study has grown too large and is under the auspices of too many independent, uncoordinated entities. Nonetheless, as the system continues to evolve rapidly, the depth and breadth of our understanding of it should follow in close pursuit.
kc claffy is principal investigator for the distributed Cooperative Association for Internet Data Analysis (CAIDA), and resident research scientist at the University of California, San Diego, Supercomputer Center. Her research interests include Internet workload/performance data collection; analysis and visualization, particularly with respect to commercial ISP collaboration/cooperation; and sharing of analysis resources. kc received a PhD in computer science from UCSD in 1995.
Originally published at
IEEE Internet Computing Online, v4n1 January 2000.
All rights reserved by IEEE.