This page contains the Project Summary for the CAIDA proposal entitled:
"Correlating Heterogeneous Measurement Data to Achieve System-Level Analysis of Internet Traffic Trends."
Project Summary
During the late 40s
at Princeton, Albert Einstein's graduate assistant reviewed a draft
final exam and was shocked to discover that the questions were
identical to the ones Einstein had used the previous year. He said
to Einstein, "Professor, there are groups on this campus that
maintain copies of all exams. If you use the same questions as last
year, you will undoubtedly give those who saw last year's exam an
advantage." After a few moments, Einstein responded, "Yes, my
friend, it is true that the questions are the same this year, but
the answers are different."
NSF has invested heavily in
high-performance Internet infrastructure and development of
distributed applications, resulting in burgeoning demand for
additional capacity and services. We propose a project that
takes advantage of existing traffic measurement instrumentation
and enhances the availability and utility of existing and planned
distributed heterogeneous network measurement data
repositories.
In today's `cooperative
Internet anarchy', competitive providers, struggling to meet
skyrocketing needs, do not significantly invest in gathering or
analyzing workload data on their networks. Rather, Internet service
providers (ISPs) match rising demand by increasing network capacity
as fast as possible; today's core backbone links are OC48 and will
be OC192c by 2002. This `traditional' approach of per-link excess
capacity is typically based on brute force over-engineering (e.g.,
upgrade after you reach a certain link utilization), rather than
identification or understanding of parameters describing how
network capacity is actually utilized. Individual ISPs also suffer
from the fact that visibility of traffic trends is usually limited
to their local domain. In addition, there is as yet no
instrumentation available for gathering fine-grained workload
information from any link above OC12 bandwidth; few such links are
instrumented to do so, and most of these are located at lightly
used research sites. Larger providers have little incentive to
invest in measurement instrumentation, much less to risk political
damage by making any resulting data public. Exacerbating the
situation is the lack of rigorous analysis tools to support
wide-area Internet data collection, and the absence of baseline
data against which to compare any independent results. The lack of
identified parameters for characterizing and managing network
growth in a cost-effective manner is a situation that shows little
sign of changing without substantial shift in attention to this
task.
One detrimental side effect is
that myths about Internet growth and performance abound, and plans
for provisioning are often made based on locally attained data
generalized to mythical proportions. One of the most important
contributions of our proposed research is to provide the ability to
base predictions of Internet traffic, performance, and growth on
real data rather than obsolete assumptions[1]. The community could make better use of its
collective intellectual resources if they could validate ideas
against a larger variety of empirical data sets before investing
research and development resources in further
studies.
Our proposal takes advantage of
and integrates existing NSF-sponsored technologies and tools to 1)
more strategically instrument the Internet to capture real data of
interest to both traffic engineers and Internet modelers, 2) create
distributed repositories of experimentally derived traffic trend
parameters while enabling access to heterogeneous network
measurements, and 3) develop meaningful and timely analysis tools
and reports. The research and tools proposed under this effort can
lead to empirically-based understanding of the evolving Internet
infrastructure, yielding results that benefit all who depend on
this increasingly critical global resource. The proposed project
will also assist in the development of much-needed tools for
navigation, analysis, and correlated visualization of massive
network data sets. This work is critical to advancing both research
and operational efforts regarding the evolving commercial Internet,
and has obvious relevance to public policy and regulatory questions
concerning the organization and administration of Internet
infrastructure.