(NSF 98-120) Project Summary

This page contains the Project Summary for the CAIDA proposal entitled: "Correlating Heterogeneous Measurement Data to Achieve System-Level Analysis of Internet Traffic Trends."

Project Summary
During the late 40s at Princeton, Albert Einstein's graduate assistant reviewed a draft final exam and was shocked to discover that the questions were identical to the ones Einstein had used the previous year. He said to Einstein, "Professor, there are groups on this campus that maintain copies of all exams. If you use the same questions as last year, you will undoubtedly give those who saw last year's exam an advantage." After a few moments, Einstein responded, "Yes, my friend, it is true that the questions are the same this year, but the answers are different."

NSF has invested heavily in high-performance Internet infrastructure and development of distributed applications, resulting in burgeoning demand for additional capacity and services. We propose a project that takes advantage of existing traffic measurement instrumentation and enhances the availability and utility of existing and planned distributed heterogeneous network measurement data repositories.

In today's `cooperative Internet anarchy', competitive providers, struggling to meet skyrocketing needs, do not significantly invest in gathering or analyzing workload data on their networks. Rather, Internet service providers (ISPs) match rising demand by increasing network capacity as fast as possible; today's core backbone links are OC48 and will be OC192c by 2002. This `traditional' approach of per-link excess capacity is typically based on brute force over-engineering (e.g., upgrade after you reach a certain link utilization), rather than identification or understanding of parameters describing how network capacity is actually utilized. Individual ISPs also suffer from the fact that visibility of traffic trends is usually limited to their local domain. In addition, there is as yet no instrumentation available for gathering fine-grained workload information from any link above OC12 bandwidth; few such links are instrumented to do so, and most of these are located at lightly used research sites. Larger providers have little incentive to invest in measurement instrumentation, much less to risk political damage by making any resulting data public. Exacerbating the situation is the lack of rigorous analysis tools to support wide-area Internet data collection, and the absence of baseline data against which to compare any independent results. The lack of identified parameters for characterizing and managing network growth in a cost-effective manner is a situation that shows little sign of changing without substantial shift in attention to this task.

One detrimental side effect is that myths about Internet growth and performance abound, and plans for provisioning are often made based on locally attained data generalized to mythical proportions. One of the most important contributions of our proposed research is to provide the ability to base predictions of Internet traffic, performance, and growth on real data rather than obsolete assumptions[1]. The community could make better use of its collective intellectual resources if they could validate ideas against a larger variety of empirical data sets before investing research and development resources in further studies.

Our proposal takes advantage of and integrates existing NSF-sponsored technologies and tools to 1) more strategically instrument the Internet to capture real data of interest to both traffic engineers and Internet modelers, 2) create distributed repositories of experimentally derived traffic trend parameters while enabling access to heterogeneous network measurements, and 3) develop meaningful and timely analysis tools and reports. The research and tools proposed under this effort can lead to empirically-based understanding of the evolving Internet infrastructure, yielding results that benefit all who depend on this increasingly critical global resource. The proposed project will also assist in the development of much-needed tools for navigation, analysis, and correlated visualization of massive network data sets. This work is critical to advancing both research and operational efforts regarding the evolving commercial Internet, and has obvious relevance to public policy and regulatory questions concerning the organization and administration of Internet infrastructure.