1 The Challenge: Characterizing Internet
Traffic Trends
The transition
of Internet infrastructure from NSF stewardship to a competitive
service industry left this incredible resource with no framework
for system-level analysis of wide-area, cross-domain Internet
traffic behavior. Nonetheless, development of applications using
data and computing resources distributed throughout the Internet is
underway. Development occurs, at incalculable risk, in spite of a
lack of Internet traffic models based on real
data.
Competitive
Internet providers, struggling to meet burgeoning demands of
customers for additional services, do not significantly invest in
gathering or analyzing workload data on their networks. Instead,
Internet service providers match rising demand by increasing
network capacity as fast as possible; today's core backbone links
are OC48 and will be OC192c by 2002. This `traditional' approach is
primarily based on brute force over-engineering. For example, ISPs
simply upgrade after reaching a certain link utilization level,
rather than examining parameters of how network capacity is
actually utilized, or determining if link use is
efficient.
The lack of
specific traffic flow parameters or any realistic model of Internet
traffic is a situation that shows little sign of changing without a
substantial shift in attention toward the task. There is as yet no
instrumentation available for gathering fine-grained workload
information from anything above OC12 bandwidth links.1 The few high-speed
links that are monitored are typically found at lightly utilized
research sites. Larger providers have little incentive to invest in
such instrumentation, much less risk political damage by making any
resulting data public. The lack of rigorous analysis tools to
support wide-area Internet data collection, and the absence of
baseline data against which to compare any independent results
serve to further dissuade efforts to collect
data.
As a result,
evaluation of macroscopic workload trends, and systematic
preparation for the growing expectations of Internet users, is not
possible today. Lack of historic or current data providing a
cross-domain characterization of traffic on the wide-area Internet
prevents accurate projection of the network's evolution. Existing
projections either have no empirical basis or are based upon small
data sets from few locations with no justification for claiming to
be representative of larger scale infrastructure. Without
cross-domain analysis, we cannot determine the extent to which
local phenomena (e.g., caching, routing flaps, flash events,
denial-of-service incidents) correlate to global Internet
behavior.
Overcoming
Myths and Obsolete Assumptions
Globally
relevant measurement requires: 1) research into methods for
classifying, archiving, and retrieving data from massive,
distributed datasets, 2) improvements in both measurement and
traffic characterization methods, and 3) analysis software capable
of correlating and visualizing data in time to be useful for
traffic engineering purposes. The CAIDA team is comprised of
experts in network measurement, systems engineering, and data
analysis.
The P.I. for the
proposed effort has published a number of studies involving the
collection and analysis of massive datasets monitoring heavily used
research and commercial Internet links. Team members have years of
experience engineering systems for Internet measurement, and
analyzing data from both active and passive measurement
infrastructures.
1.1 Our
Mission
The research
team's goals:
- Utilize
multiple deployed and tested NSF-funded networking
technologies.
- Establish a
network measurement meta-data repository to facilitate access to
results as well as raw data by both the Internet research community
and application developers. Support this with an annotation system
applicable to other researcher's demonstrably relevant Internet
data sets.
- Enable testing
of network traffic analysis methodologies to determine which
parameters and attributes are vital to network management. Create a
language for labeling and annotating data
sets.
1.1.1 Relevance to Present State of
Knowledge and the Future of the Internet
There are
several currently deployed Internet measurement infrastructures
having various intents and scope of analysis.2 Several sources of Internet data
exist, each focused on specific aspects of workload, topology,
performance, and routing, but subject to significant limitations.
For example, NLANR/Moat, through its Passive Measurement Analysis
(PMA) program[2], has
been archiving packet header trace samples (under 2 minutes each,
several times a day) from OC3mon and OC12mon devices located at
NSF-sponsored High Performance Computing institutions, typically
college campuses attached to the vBNS or Abilene backbones. One
strength of this data is the large scale of its deployment.
Monitors are located at more than 20 HPC campus measurement points,
supporting several research and analysis projects. However, there
are limitations for projects wishing to utilize this data. The data
mining tasks involved are formidable: the topological situation of
each campus measurement device is not standard, and sometimes not
well-documented. Trace formats have changed over time, and thus
conversion utilities are necessary to analyze long-term trends.
Users have also expressed interest in longer traces (e.g., greater
than five minutes) for some time. Though it is technically possible
to capture both directions of traffic flow on a link for hours,
bidirectional flow analysis is difficult without clock drift
compensation across the two data collection interfaces[3]. Furthermore, PMA archiving policy
requires IP address sanitization, which precludes the ability to
answer any question involving geography or actual topology. While
there are legitimate privacy concerns. CAIDA has been able to
successfully navigate them while carrying out research involving
geographic and topological information using unsanitized data. In
addition to needing geographic data, traffic profiling would
benefit from correlating the results of many disparate sources of
data. Strategic use of distributed data sets could enhance the
ability to detect and model network anomalies and trends, improving
the ability to predict the effects of external hardware, software,
security, and news events.
Current papers
that propose new techniques and protocols often make assumptions
about traffic characteristics that are simply not validated by real
data. The proposed meta-data repositories will allow researchers to
investigate hypotheses about the level of fragmented traffic,
encrypted traffic, traffic favoritism, path symmetry, address space
utilization and consumption, directional balance of traffic volume,
routing protocol behavior and policy, distribution statistics of
path lengths, flow sizes, packet sizes, prefix lengths, and routing
announcements. In cases where analysis is based on locally
generated academic data sets, attempts to generalize typically lose
integrity when applied to additional real-world data sets. The
community could make better use of its collective intellectual
resources if they could test hypotheses against a larger variety of
empirical data sets before investing research and development time
and energy into specific studies.
1.1.2 Impact, Innovations, and
Longer-term Goals
The proposed
meta-data repositories will seed a new generation of Internet
research, putting the community in a position to significantly
accelerate the pace of progress of measurement-based network
research. The project yields the opportunity to place realistic
network measurement data within easy reach of the community of
researchers and application developers most likely to benefit from
it. Further, the proposed project solves some problems with current
measurement projects that limit their utility, increasing return on
NSF investment in those projects.
In the next
decade, the need to access and manage massive heterogeneous
tracefile datasets will increase dramatically. An annotation and
storage system suitable for distributed repositories of
demonstrably relevant Internet data sets, in conjunction with a
language for describing traffic phenomena along a variety of
dimensions, will support the field of network research for the
foreseeable future. The network measurement meta-data repository
will allow for cross-domain characterization of wide-area Internet
traffic, and evaluation of macroscopic trends in workload,
performance, and routing behavior. Researchers will have the
opportunity to correlate data across time, space (trace location),
and data features. Such studies will yield predictive models that
can then be applied back to the measurement tools to improve their
operational utility. The ability to coordinate data collection from
multiple sites in the community will also provide a facility to
track distributed security attacks more effectively[4], and to assess potential
consequences of introducing new or emerging protocols and
technology into current networks. It will also facilitate the study
of hybrid techniques to support technologies that have been
resistant to solution, e.g., the use of real-time passive
measurements in correlation with real-time active measurements to
support realistic and enforceable Service Level Agreements (SLAs)
[5], or bandwidth
estimation techniques. (See related proposal of PIs.[6])
While the
proposed work collects data needed to reach long-term research
goals, its most immediate and yet presumably lasting effect will be
the ability to base proposed new techniques and protocols on
empirical traffic data rather than assumptions - before investing
research and development time and energy on them. In short, the
proposed effort places us in a safer position to project as well as
control characteristics of the network's
evolution.
As described in
Section 1.1.1, there is no dearth of
Internet measurement data. On the contrary, additional data will
not help without a rational architecture for 1) collecting
measurement data, 2) storing, processing, indexing, and searching
that data, and 3) making data accessible to a wide variety of
users. Developing such a system will provide network researchers,
application developers, and traffic engineers with a fundamentally
new vantage point, from which they can accurately refute or confirm
crucial assumptions about Internet traffic, behavior, and
development.
This proposal
is motivated by the recognition, shared by many in both the
research and operational communities, that understanding Internet
behavior and trends requires a carefully designed collection of
data. Most research efforts that need Internet data for
experimentation and validation e.g., packet traces, flow export
records, macroscopic topology, performance, or routing information,
typically require large data sets, easily several Gigabytes for a
single data file. The NLANR/Moat project [2] alone has almost a Terabyte worth of
archived (un-annotated, un-indexed) data. The proposed work
described below will facilitate access, archiving, and long-term
storage of such data sets.
2 Specific
Goals
2.1 Goal 1: Deploy Strategic Internet
Measurement Instrumentation
Frequent passive
header capture from a statistically significant number of monitors
with any reasonable amount of traffic is unsustainable. We must
design a strategic approach in terms of trace schedules and
duration, post-processing, analysis, visualization and archival to
minimize system management requirements. We also need to provide
maximally representative data sets.3
NLANR/Moat's PMA
measurements provide high-precision brief traces of HPC site packet
headers, with addresses anonymized to protect user privacy. The DAG
project network measurement cards support high-precision
timestamping and clock synchronization, which allows collection of
longer bidirectional traces[3].
We propose to complement the current PMA measurement program with
strategically located commercial measurement sites, and to
corroborate passive data with other types of measurements (e.g.,
active probing, routing tables). We will gather longer
(multiple-hour) traces on several high bandwidth commodity backbone
links utilizing card-to-card synchronization to prevent clock drift
and allow for bidirectional flow-based analysis of Internet
traffic. We will support various levels of aggregation of these
traces. Because certain studies might require different parts of
the packet headers, we will provide some application-specific
traces to support analysis of e.g., streaming media protocol
behavior and performance.
CAIDA continues
to support CoralReef, a publically available comprehensive software
suite developed to collect and analyze data from passive Internet
traffic monitors, in real-time or from trace files. CoralReef is a
package of libraries, device drivers, classes, and applications
written in, and for use with, several programming languages[7]. Its architecture
makes it a powerful, extensible, efficient, and convenient package
for passive data collection and traffic characterization, enabling
the addition of tools for correlating with other types of data.
CoralReef includes modules for the storage and manipulation of
frequently collected data including: source and destination hosts,
IP protocols, ports, and amounts of traffic in bytes, packets and
flows. CoralReef's demonstrated passive monitoring and analysis
capabilities represent a key strength for achieving the proposed
goals. CAIDA key technical personnel and collaborators also support
other passive measurement tools4, including NeTraMet[8], cflowd[9], and FlowScan[10]. These tools support
continuous monitoring and archiving of data and can be used for
calibration of finer-grained measurements, or benchmarking against
commercial statistics collection functionality, (e.g., NetFlow[11]).
Network
measurement improvements are necessary to apply the meta-data
repository to current and emerging research problems, examples of
which are section 2.3. For example, we expect to add software
modules that can trigger more complete packet capture upon
detection of DoS activity.
2.2 Goal 2: Facilitate Community Access
to Data Repositories
Rather than trying
to support data set storage and access from a central location, we
will design an annotation system for the repository in which
meta-data for data sets is archived and served from many other
sites. We will support a large storage infrastructure at SDSC, but
the meta-data repository will multiply its value by accommodating
raw data sets from other sites.
We will develop
common formats, terminology, and a formal language to allow
multiple annotations to a given data set based on independent
analyses. External researchers making use of the meta-data
repository can then query for specific signatures in data sets, and
register their own annotations based on results of their own
analyses.
The
Ïnternet Meta-Data Repository" architecture will be
sufficiently flexible to support a variety of data sets and include
capabilities to add user-defined annotations. Measurement data from
tools5 such
as skitter[12],
Mantra[13],
CoralReef[7],
cflowd[9], FlowScan[10], and NeTraMet[8], as well as
web cache logs, Route-Views[14], and MRTd[15] will initially seed the meta-data repository.
Raw data or meta-data may then be catalogued for distribution from
the sites providing the data.
Participants
can submit data sets with extensive annotations gathered as the set
was collected and processed, including user-perceived performance
data or exogenous events occurring during the trace (e.g., users
observed that the network was `slow' at time t1, or the
campus web cache was disabled at time t2 during this data
set). More importantly, subsequent research on the same traces
could yield additional knowledge that could subsequently be
annotated, rather than only archived in the prose of
less-accessible journal or workshop
proceedings.
Maximizing the
quality of data in the repository will be difficult without
ensuring the availability and dissemination of measurement tools
with compelling operational relevance, as described in section 2.1.
However, one aspect of this project will be to ensure that
measurement tools are sufficiently responsive to user needs. While
we will support existing tools, our core focus concerns specifying
and developing the back end information management system necessary
to support a wide variety of tools, traces, and analysis needs for
the community. As such, this project provides an opportunity to
bring several of the independent measurement projects in the
community together.
2.3 Goal 3: Apply Repository Data to
Current Research Problems
The proposed
Internet meta-data repository, carefully architected and annotated,
will significantly advance the possibilities for Internet data
analysis and modeling. Without such a repository, Internet research
will continue to be handicapped by lack of baseline data calibrated
against real traffic behavior. For example, after identifying a
given `killer application' protocol, a researcher using the
proposed resources could determine when its usage began, and
annotate related traces accordingly. The comprehensive nature of
the data and the ability to tie different data sets together, will
enable us to explore macroscopic questions regarding Internet
robustness and efficiency that we cannot answer from single
viewpoints.
For example,
sets of potential Internet research questions are presented below,
roughly organized by analysis category. Many of these questions
have political and regulatory relevance (e.g., trade balance of
traffic, traffic favoritism, traffic locality). Answers to these
questions require the wide variety in type, scale, and global
context of data that the proposed meta-data repository will
afford:
2.3.1 Workload Trend
Research
- For what
applications or traffic categories is usage growing most
quickly?
- How rapidly
are new protocols such as SCTP and RTP being deployed on the
Internet? (Such protocols provide alternatives to TCP and UDP for
newly emerging services.)
- How much
growth is there in tunneling technologies (e.g., encapsulation for
IPv6, IPsec, MPLS) and how does this growth impact levels of packet
fragmentation?
- To what degree
is traffic growth due to more users and to what degree is it due to
more traffic/user? (for various definitions of `user', e.g., hosts,
prefixes, sites, ASes, and aggregating effects such as web proxies,
IP masquerading at firewalls, and de-aggregating effects such as IP
assignment on dialup modems).
- How can we
classify traffic categories at a semantically higher level, such as
behavioral characteristics, without relying only upon inconclusive
or even possibly misleading header fields such as TCP/UDP ports? In
particular, what traffic classifications are useful for engineering
purposes (beyond rudimentary `bulk transfer' vs `interactive') and
what characteristics are best used for the classification (e.g.,
inter-arrival time distribution directional symmetry, packet sizes
and directional sequence patterns, initial ports, distribution of
destinations per source, address signatures, matrix of host pairs
making lots of connections).
- How do
different models of flows compare (e.g., SYN/FIN vs timeout-based
definitions) for a given trace in terms of statistics such as flow
size distributions[10]?
- Is traffic
locality changing with growth, e.g., what percent of traffic stays
within a campus, region, or country?
- How much
global distributed denial-of-service activity is occurring (using
information coordinated from multiple sites)? Note that two of the
PI/KTPs have recently published a paper quantitatively assessing
the degree of global dDOS activity on the Internet based on CAIDA
measurements [4]. The
methodology described in that paper was recently applied to track
hosts infected with several variations of the Code Red
worm.
2.3.2 Performance Assessment
Research
- Using patterns
of acknowledgments, sequence numbers, and advertised window sizes,
how much bandwidth is wasted on retransmissions? (e.g., a
congestion indicator might be defined based on ACK retransmissions
within 12-20 secs compared to the number of outstanding unACKed
packets).
- How elastic
(responsive to congestion conditions) are flows at various levels
of granularity (host, net, autonomous system,
city)?
- How common are
high-bandwidth flows in the Internet that are not using end-to-end
congestion control?
- Are TCP flows
really using their entire bandwidth-delay product? How much buffer
space should routers allocate for TCP
flows?
- What is the
macroscopic effect of flash events on Internet traffic behavior,
e.g., unsuccessful presidential election or transition to gTLD
server infrastructure?
- What are
performance effects of violations of the traditional end-to-end
model, e.g., transparent caching, global load balancing,
CDNs?
- How does the
DNS system perform, e.g., has the gTLD mesh improved the
macroscopic performance for users[8]? Brownlee and Nemeth have done passive
and active analysis of the root name server and gTLD mesh over the
last year, and are developing robust methods for correlating the
two techniques.
2.3.3 Topology Correlations to
Workload, Performance, and Routing
- To what
countries is the US a net exporter of IP packets? Are these numbers
growing or shrinking?
6
- How can we
identify (and monitor long-term performance of) critical routers
and sites that play a significant (and thus perhaps vulnerable)
role in the infrastructure?
- How can we
develop a calculus for describing and drawing the difference
between two given `snapshots' of network
topology?
2.3.4 Routing and Addressing
Research
- What are
long-term trends in per-prefix routing table growth? Is there an
uneven distribution of traffic exchanged with few
sites?
- How is IPv4
address space being announced versus actually used over time? [16]
- What gives
rise to the discrepancies seen between actual traffic behavior
(forward paths) and routing policies articulated via
BGP?
- What are the
macroscopic effects of different multicast architectures, e.g.,
traditional versus `single-source' multicast
(SSM)?
- What are
long-term trends in a) per-prefix routing table growth [17]? b) prevalence of packet
fragmentation [18]? c) number of globally reachable hosts?
d) IP path hop count distribution [12]? e) AS path length distribution [19]? f) traffic flow by prefix
length distributions [20]?
2.4 Key
Collaborators
CAIDA has
developed many collaborative relationships with Internet
researchers. Opportunities for sharing data and methods provide
benefit to the community at large. For
example:
-
UO Route-Views
- - CAIDA is
collaborating with the University of Oregon's Advanced Network
Technology Center's Route-Views[14] project, which provides archives of a union of
several dozen unpruned backbone tables[21].
-
MRA
- - Team members
are well acquainted with the principals involved in another NSF
project for "Multiresolution Analysis for the Global Internet" that
is actively pursuing analysis algorithms and methodologies using
both real and simulated network data. The research proposed here
complements and offers technologies and tools which help the MRA
effort to accomplish its goals. (A letter of support from an MRA
principal investigator is attached.)
-
University of Wisconsin
- - Collaborator
David Plonka developed FlowScan - a network traffic flow
visualization and reporting utility. He uses it to monitor and
graph flow information from Cisco and Riverstone routers at the
University of Wisconsin in near real-time. The FlowScan utility has
proven to be very useful for facilitating traffic engineering, and
offers much needed support for managing time-series
data.
-
Waikato University
- - The
WAND/WITS Project has developed a network hardware interface and
software drivers under subcontract to CAIDA. These DAG[3] Project interfaces support traffic
monitoring of up to OC48 speeds. WAND/WITS also publishes Trace
data for network researchers.
-
ISPs and Vendors
- - To be of
lasting effect for the continued evolution of the Internet, we
recognize that the measurement and meta-data repository
infrastructure will require support beyond the duration of this
project. CAIDA has a proven record of effective technology transfer
to industry and also of engaging industry in cost-sharing for
measurement and analysis activities of direct relevance to their
activities. We hope to take advantage of this experience in
creating a lasting architecture that will help support Internet
infrastructure research and development for at least the next
decade.
2.5 Curriculum
outreach
As part of this
project, CAIDA plans to develop network analysis curriculum
materials for undergraduate and graduate use as part of CAIDA's
NSF-supported Internet Engineering Curriculum (IEC) repository and
Internet Teaching Labs7. CAIDA will also sponsor tutorials
and workshops on how to use both the measurement tools themselves
as well as the data repository. For several years, the IEC project
held curriculum training workshops for professors of Internet
classes. Documentation from the proposed tasks would be ideal for
incorporation into future workshops, as well as the curriculum
repository itself. Curriculum modules for undergraduate and
graduate education will be patterned after the successful Traffic
Analysis module8 and suggested Projects for Networking
Classes page9.
3 Work Plan: Task Goals and
Objectives
Senior personnel
assigned to each task are listed, with the task lead indicated in
bold print. Top-level goals are given by year, along with more
specific objectives.
3.1 Task 1: Improve Internet
Measurement Instrumentation
Researchers:
Brownlee, Claffy, Moore, Voelker,
GSR(1)
-
Year One:
- Strategic
Deployment of High-Speed Passive Monitors
- Deploy 4-5
passive monitors at strategic high-speed commercial global Internet
locations.
- Identify
optimal data collection strategies.
- Coordinate
movement of traffic measurement data.
Objectives for Year
One include acquisition of tracefiles from strategically deployed
passive monitors.
-
Year Two:
- Develop
Meaningful, Maintainable Passive Traffic
Measurement.
- Based on
community feedback (See Task 3, Year One), run specialized targeted
trace collection.
- Correlate
active probes with passive measurement (header capture) techniques
for developing new Service Level Agreement
models.
Objectives for Year
Two include a monitor configuration and management application, as
well as documentation for traffic engineers wishing to run passive
monitoring applications similar to NetFlow[11], cflowd[9], FlowScan[10], CoralReef[7], and NeTraMet[8].
-
Year Three:
- Data
Collection and Annotation Refinement
- Coordinate
with results to date from Task 2 to automatically post-process,
filter, aggregate, annotate, or index collected tracefiles in order
to more efficiently facilitate their inclusion into published
traffic meta-data repositories.
Deliverables for
Year Three focus on automating maintenance and data collection on
deployed monitors.
3.2 Task 2: Develop Distributed
Meta-Data Repositories
Researchers:
Brownlee, Claffy, GSR(1)
-
Year One:
- Data modeling
activities.
- Standardize
traffic attributes and schemas (consider inclusion of XML DTD or
XML Schema definitions) that correspond to the various low-level
Internet traffic monitor data.
- Create logical
representations for a hierarchy of trace information from low-level
trace data fields through user-defined hierarchies of annotations,
to even higher event-level grouping relationships and
concepts
- Define
Internet data collection management strategies for handling
distributed repositories and archives.
Objectives for
Year One include "Community Internet Data Annotation Language" that
will be drafted and submitted for review to the research
community.
-
Year Two:
- Evaluate
automatic methods for annotation and
meta-indexing.
- Evaluate trace
data annotation and meta-indexing
strategies.
- Standardize
the APIs and query interfaces to the traffic data collections. (For
example, consider using XPath or XQuery mechanisms to specify
analysis and filter methods.)
- Evaluate
requirements for interfacing with collection-based persistent
archive software.
Objectives for
Year Two include: Develop specifications for manipulating and
querying Internet data using a new yet "community-based"
approach.
-
Year Three:
- Data
collection for prototypes.
- Publish
Internet traffic data at distributed
sites.
- Define
specifications for data ingestion, storage, and interaction modules
using standards-based interfaces.
Objectives for
Year Three include: Distributed data collection and
publication.
3.3 Task 3: Apply Repository Data to
Current Internet Research Problems
Researchers:
Brownlee, Claffy, Moore, Voelker,
GSR(2)
-
Year One:
- Choose
research questions in response to concerns of both ISP operators
and high-speed application developers.
- Participate in
a separately funded Internet Statistics and Metrics Analysis (ISMA)
or other relevant conference for academic and commercial
researchers. Survey their research goals and
concerns.
- Design data
collection experiments to address selected research
questions.
- Define and
classify parameters and attributes for link capacity
measurements.
- Define
annotation language.
- Analyze and
visualize results. Publish results in technical journals and on
web-site.
Objectives: Papers,
Public web pages.
-
Years Two and Three:
- Continue
analysis and visualization activities taking into account currently
relevant research issues and questions.
4 Previous
Results
- "CAIDA:
Cooperative Association for Internet Data Analysis". NCR-9711092.
$3,143,580. Oct 1997 - Jul 2001. (Brownlee, Claffy, Moore, Murray)
This collaborative undertaking brings together organizations in the
commercial, government, and research sectors. CAIDA provides a
neutral framework to support cooperative technical endeavors, and
encourages the creation and dissemination of Internet traffic
metrics and measurement methodologies. 10 Results of this collaborative
research and analytic environment can be seen on published web
pages.11
CAIDA also develops advanced Internet measurement and visualization
tools.
12
- ÏEC:
Internet Engineering Curriculum Repository". ANI-97-06181.
$590,555. Aug 1997 - Sep 2001. (Claffy) This CAIDA project helps
educators and others interested in Internet technology to keep up
with developments in the field. A repository of collected teaching
materials is published on the web.13.
- Ïnternet
Atlas". ANI-99-96248 $304,816. Jan 1999 - Dec 2001. (Claffy,
Murray) This effort involves developing techniques and tools for
mapping the Internet, focusing on Internet topology, performance,
workload, and routing data. A gallery that presents and evaluates
state-of-the-art techniques and tools in this nascent sector is
published on the web.14.
5 Management
Plan
All research
participants have significant experience mentoring students. CAIDA
staff and previous students have produced software prototypes and
production code. In addition:
-
Administration:
- Administration
of the project will be provided by
CAIDA/SDSC.
-
Communication:
- Communication
will be facilitated by monthly project status meetings.
Additionally, personnel involved with each specific task, including
post-docs and students, will set a meeting schedule as appropriate
for reviewing progress and investigating results of research. CAIDA
also has a well-regarded history of remote online collaboration
using text-based virtual environments, including one in support of
inter-ISP coordination[22].
-
Workshops:
- Yearly
workshops, funded by other grants, will bring together members of
the academic and commercial research community to guide the
evolution of yearly goals for analysis and instrumentation. The
workshop will be attended by all proposal participants, including
junior personnel. We plan to invite CAIDA industrial sponsors and
members to the workshops. Additionally, we expect to have smaller
joint workshops with other network research organizations, (e.g.,
with the MRA project collaborators). Historically, CAIDA has
sponsored a series of workshops on Internet Statistics and Metrics
Analysis (ISMA). where both academics and industrial researchers
can meet with traffic engineering and operations personnel to
discuss issues and solution strategies.15
-
Dissemination of Results:
- In addition to
publication of research results through scientific journals, we
will make results of this project available in several other ways.
Data, tools and specifications developed during the course of this
project will be made available via the CAIDA
website.
6 Conclusion: Significance of Proposed
Effort
As it grows, the Internet is becoming more fragile in many ways.
The complexity in managing or repairing damage to the system
can only be navigated with sustained understanding
of the evolving commercial Internet infrastructure.
The research and tools proposed under this effort
lead to such insights. In particular, richer access to
data will facilitate development of tools for navigation,
analysis, and correlated visualization of
massive network data sets and path specific performance and
routing data that are critical to advancing both research
and operational efforts.
We also expect to be able to offer suggestions to ISPs and
routing vendors with respect to what instrumentation within the
router would facilitate diagnosing and fixing problems in [closer
to] real-time. Finally, this research has obvious relevance to
public policy and regulatory questions regarding concentration of
administration of Internet infrastructure.