CAIDA Program Plan - 2001
The Cooperative Association for Internet Data Analysis (CAIDA) is entering
its third year of operation and its final year of seed funding from the National
Science Foundation (NSF). CAIDA remains devoted to promoting greater cooperation
in the engineering and maintenance of a robust, scalable global Internet infrastructure.
Areas where CAIDA believes it can make important contributions to community
advancements include: the development of traffic measurement and analysis tools
and techniques, network visualization, and Internet engineering related education
and outreach.
The sections below provide a broad framework for CAIDA's programmatic priorities
during FY2001, including both federally-supported and member-supported efforts.
The sections are divided into the following categories:
1. Measurement Tool Development
1.A. Hardware
1.B. Software
2. Meta-Repository
3. Traffic Analysis
3.A. Tools
3.B. Analysis
4. Measurement Infrastructure
5. Visualization
6. Education and Outreach
6.A. IEC
6.B. ITL
6.C. ISMA
6.D. Community Resources
Fundamental to CAIDA's research endeavors is the development
and implementation of solutions for passively aquiring
traffic data (e.g., packet headers) using stand-alone
monitors. Currently UNIX hardware implementations are
available OC3 and OC12 ATM speeds. Drivers and most of the
hardware schematics are publicly available and 3rd party
systems integrators are building monitors using the Applied
Telecom and Fore cards. Waikato University's Dag OC3/12
ATM/POS cards are available to select research entities.
Goals:
1.A.1. Collaborate with the University of Waikato, NZ to develop and
test a Dag-4 OC48 ATM/POS / CoralReef monitor.
1.A.2. Collaborate with the University of Waikato, NZ to develop and
test a Dag-4 Gigabit Ethernet/CoralReef monitor.
1.A.3. Deploy above monitors and collect data into meta-repository.
1.A.4. Research (FPGA design) and begin movement of measurement analysis
features into firmware.
skitter is a
traceroute like tool used in infrastructure-wide measurements and analyses (routing,
performance, connectivity) supporting projects like CAIDA's NGI initiative for
DARPA and research by several collaborating institutions.
Goals:
1.B.1. Continue enhancement of skitter modules to reflect
Member and research priorities. Use Caimis
version of skitter where appropriate (http://www.caimis.com/)
1.B.2. Analysis software (Cflowd,
NeTraMet, FlowScan)
see below.
Project Leads: David Moore
Funding Sources: Membership and DARPA
Collaborators / Contributors: TDB
We will design an annotation system repository in which meta-data for data
sets is archived and served from many other sites. We will develop common formats,
terminology, and a formal language to allow multiple annotations to a given
data set based on independent analyses. Others making use of the repository
can then query for specific signatures in data sets, and register their own
annotations based on results of their analyses.
With the proposed meta-repository, researchers will have the opportunity to
correlate data across time, space (trace location), and data features. The comprehensive
nature of the data, and the ability to tie different sets together, will enable
exploration of macroscopic questions regarding Internet robustness and efficiency
that we cannot answer from single viewpoints, e.g., the potential consequences
of introducing new or emerging protocols and technology into current networks.
Participating institutions could compare workload characteristics from traffic
at their own sites with others.
Goals:
2.A. Design and establish an annotation and storage system to support
a distributed repository of demonstrably relevant Internet data sets.
2.B. Demonstrate and evolve the effectiveness of this repository for
answering many of our own research questions.
2.C. Provide training and outreach to both the operational and research
communities in how to effectively use and contribute to this repository.
Project Lead: kc claffy
Funding Sources: NSF Proposal submitted
Collaborators / Contributors: TBD
Goals:
3.A.1. Cflowd
is a flow analysis tool used for analyzing Cisco's NetFlow routing statistics
for trend analysis and capacity planning purposes. Further development of
this open-source (GPLed) tool will be completed by Caimis
although we will house the code and support storage of data from using this
tool format. We will also collaborate with Cisco engineers to
EFT new netflow versions.
3.A.2. arts++
is a system for storing (binary file format) and manipulating network data.
This (open-source, L-GPL) tool has been licensed to Caimis for further development.
3.A.3. CoralReef
is a comprehensive software suite developed by CAIDA to analyze data collected
by passive Internet traffic monitors using Fore, Applied Telecom, or Dag cards.
3.A.3.a. Enhance the reporting formats and graphics capabilities
of CoralReef analysis software
3.A.3.b. Develop a CoralReef application that outputs NetFlow flow-export
format.
3.A.4. Tobi Oetiker, who joined CAIDA for a sabbatical in 1999, completed
and released RRDtool
(Round-Robin Database tool, a system to store and display time-series data)
and continues to maintain it.
3.A.5. NeTraMet
is an open-source (GPL) implementation of the RTFM architecture for Network
Traffic Flow Measurement, developed and supported by Nevil Brownlee at the
University of Auckland. Nevil also developed a version of NeTraMet that uses
the CoralReef library to read packet headers. This 'CoralReef NeTraMet meter'
can work with any CoralReef data source; it has been tested on both CAIDA
and NLANR trace files, and on Dag and Apptel ATM interface cards.
3.A.6. Flowscan,
analyzes and reports on IP flow data exported by routers. Consisting of Perl
scripts and modules, FlowScan binds together (1) a flow collection engine
(a patched version of cflowd), (2) a high performance database (Round Robin
Database - RRD), and (3) a visualization tool (RRDtool).
FlowScan produces graph images that provide a continuous, near real-time view
of the network border traffic.
Project Lead: David Moore (CoralReef), Nevil Brownlee (NeTraMet), David
Plonka (Flowscan)
Funding Sources: Membership, NSF, DARPA
Collaborators / Contributors: TBD
Analysis of Internet traffic (active and passive) and
routing data can provide insights into the nature and
evolution of Internet traffic and network topologies, assist
engineers to better architect and manage their networks, and
enable researchers to better design and implement emerging
Internet protocols and technology.
Goals:
3.B.1. Research and develop technical papers on the following topics:
- techniques for evaluating architectural optimality for placement of DNS
root servers, mirrors, and other infrastructure
- correlation of active (skitter), passive and/or routing data across time,
sources, and data types.
- analysis of techniques and uses for visualizations (2D, 3D and hyperbolic
layouts) of network-related data (e.g., active, passive and routing data).
- update and expand workload characterization concepts covered in 1998
Nature
of the Beast article.
3.B.2. Expand the collaborations with other research institutions
who are using skitter datasets in their research. Make skitter data sets available
to research community.
3.B.3. Facilitate meritorious research being conducted by 3rd parties,
either by offering them internships or sabbaticals at CAIDA or collaborating
with them directly.
3.B.4. CAIDA's own research tasks will involve using the repository
described in 3.B.2. This will include answering the following questions, whose
answers will serve as milestones during the maturation of the meta-repository.
Our research reports will serve as status reports and advertisements of the
existence and effectiveness of contributing to and using the repository.
- How can we classify traffic categories at a semantically higher level,
such as behavioral characteristics, without relying only upon inconclusive
or even possibly misleading header fields such as TCP/UDP ports. In particular,
what traffic classifications are useful for engineering purposes (beyond
rudimentary `bulk transfer' vs `interactive') and what characteristics are
best used for the classification (e.g., inter-arrival time distribution
directional symmetry, packet sizes and directional sequence patterns, address
signatures, matrix of host pairs making lots of connections)
- For what applications or traffic categories described above is usage growing
most quickly?
- How elastic (responsive to congestion conditions) are flows at various
levels of granularity (host, net, autonomous system, city)?
- What gives rise to the discrepancies seen between actual traffic behavior
(forward paths) and routing policies articulated via BGP?
- How do different models of flows compare (e.g., SYN/FIN vs timeout-based
definitions) for a given trace in terms of such as flow size distributions?
[Plonka00b]
- What effects do violations of the traditional end-to-end model, e.g.,
transparent caching, global load balancing, have on performance?
- What are the macroscopic effects of different multicast architectures,
e.g., traditional versus `single-source' multicast (SSM)?
We will also pursue research tasks in longer-term trend tracking to answer
the following questions:
- How can we identify (and monitor long-term performance of) critical routers
and sites that play a significant (and thus perhaps vulnerable) role in
the infrastructure?
- Is traffic locality changing with growth, e.g., what percent of traffic
stays within a campus, region, or country?
- How is IPv4 address space being announced versus actually used over time?
[McCreary98]
- How does the DNS system perform, e.g., has the gTLD mesh improved the
macroscopic performance for users [Brownlee01]?
- To what degree is long-term traffic growth due to more users and to what
degree is it due to more traffic/user?
- How much growth is there in tunneling technologies (e.g., encapsulation
for IPv6, IPsec) and fragmentation?
- What is the macroscopic effect of flash events on Internet traffic behavior,
e.g., unsuccessful presidential election or transition to gTLD server infrastructure
Project Lead: David Moore, k claffy, Brad Huffaker
Funding Sources: DARPA, NSF
Collaborators / Contributors: TBD
CAIDA's forays into operational analyses have historically focused on collaborations
with individual ISP engineers and hardware vendors aimed at providing insights
concerning the workload characteristics of specific links. Future operational
analyses will include significant expansion of data sources and development of
new techniques for analysis of passively and actively collected data.
Goals:
4.1. Maintain a webpage and daily summary graphs of the performance
trends associated with each of the 13 DNS root servers.
4.2. Test NeTraMet, ported to Coral, at one or more trial sites.
4.3. Monitor traffic traversing links at UCSD/SDSC and AIX and generate
automatic html summaries.
4.4. Develop basic filtering and data collection mechanisms using CoralReef
at OC12 and OC48 speeds.
Project Lead: David Moore, Brad Huffaker, Nevil Brownlee
Funding Sources: Membership, NSF, DARPA
Collaborators / Contributors: TBD
Fundamental to the analyses and research described above is the availability
of data from active and passive traffic monitors.
Goals:
4.5. Maintain 25 distributed skitter monitors, various passive
monitors, and related network testing equipment.
4.6. Develop and maintain skitter destination target lists and related
databases (for global monitors and each root).
4.7. Deploy additional active measurement hosts in support of the DNS
root server monitoring activities.
4.8. Deploy passive monitors at 2-5 commercial facilities under appropriate
NDAs.
Project Lead: kc claffy
Funding Sources: Membership, DARPA, NSF
Collaborators / Contributors: TBD
Network visualizations are in their early stages of development, with CAIDA contributing
and continuing development on several prototypes to the field. A core technology
necessary for most of the objectives in this project will be mechanisms to manipulate,
analyze, and navigate large BGP routing tables, and compare them not only to one
another but to topology maps as derived from active probe measurements. We expect
results of exploratory data analysis tools to allow more efficient analysis of
large Internet data sets, not only limited to topology graphs, but also incorporating
performance and workload data into link and node semantics Modeling the Internet
Core
Goals:
5.1. Develop and implement techniques for visualizing massive volumes
of distributed, time-series path, performance, routing, and flow data.
5.2. Maintain the "Internet
Atlas" website that will include examples and resource links for geographic,
semi-geographic, and topology-based network-related visualizations; maintain
alternative datasets for visualization and experimentation by researchers
and others.
5.3. We will continue development of a hyperbolic viewer (http://www.caida.org/tools/visualization/walrus/
- sample images available) for visualizing very large directed graphs, where
large is a few hundred thousand nodes to a million nodes, and about as many
links. It is applicable mainly to graphs that are tree-like in having a meaningful
spanning tree and in including a relatively small number of non-tree links.
It tackles the difficulties of visualizing large graphs in several ways.
5.4. Continue development of the Otter,
GeoPlot, and
related visualization code to enhance their relevance as tools for Internet
engineers and network architects. CAIDA has made modifications to its visualization
tool Otter (see http://www.caida.org/tools/visualization/otter/)
in order to accommodate the special semantics of routing tables, which open
the tool up to fundamentally different and acutely relevant classes of Internet
visualization.
5.5. Continue to update the backbone database/visualization tool,
Mapnet and
integrate skitter.
Project Leads: kc claffy, Brad Huffaker and David Moore
Funding Sources: NSF, Membership
Collaborators / Contributors: Sun Microsystems, NSI, ARIN, APNIC
The IEC was initiated in early 1998 with the goal of helping educators and others
interested in Internet technology keep up with developments in the field. The
focus is on developing and maintaining a repository of teaching materials to support
new University courses. Workshops such as the one held in August 1999 help to
facilitate Internet-related faculty's use of the repository.
Goals:
6.A.1. Expand and update the curriculum and related materials
available in the IEC repository, including addition of ITL-specific lab and
curriculum materials.
6.A.2. Publish a Traffic Analysis CD for use in Internet engineering
instruction. The CD will include: training materials, animations, analysis software,
lab tutorials, and traffic traces. Approximately 1,000 CDs will be distributed
to university professors and industry professionals in the U.S. and abroad.
6.A.3. Develop and make public a searchable database of Internet
engineering related faculty, institutions, and curriculum.
6.A.4. Develop a plan for IEC continuation (or transition to a 3rd party)
following conclusion of NSF support in 2001. Present the plan to the IEC Advisory
Committee for their consideration.
Project Leads: Evi Nemeth and Theresa Ott
Funding Sources: NSF (IEC grant) and Member (Cisco)
Collaborators / Contributors: Participating Universities
Few of the nation's Universities have courses in networking technology and even
fewer have facilities for hands-on exposure and training of students on current
Internet hardware and software. With donations of equipment from vendors, notably
Cisco, and with financial support from NSF, CAIDA is facilitating the establishment
of Internet teaching laboratories at approximately 20 Universities. Efforts are
currently focused on evaluation and selection of meritorious proposals; later
efforts will emphasize implementation of the ITL facilities and encouraging cooperation
and collaboration among the participants.
Goals:
6.B.1. Establish a prototype ITL lab at the University of
California, San Diego (UCSD) -- including development of appropriate lab and
tutorial materials.
6.B.2. Work with vendor and University PR officials to ensure the appropriate
press coverage of the award and inauguration of the ITL facilities.
6.B.3. Maintain an infrastructure and curriculum materials supporting
these laboratories.
6.B.4. Develop and implement plans for: (1) evaluation of ITL Phase 1;
(2) sustainability of the ITL collaboration following NSF funding support; and
as appropriate, (3) develop and implement a transition plan.
6.B.5. Hold a 3/4-day workshop for faculty at University of Virginia
in June 2001 for ITL faculty; topics include: routing and traffic analysis.
6.B.6. Collaborate on online (distance education) seminar series on Introduction
to Internet engineering.
Project Lead: Evi Nemeth, Theresa Ott (CAIDA/IEC); Jorg Leibeherr (University
of Virginia)
Collaborators / Equipment Contributors: Cisco, Cable & Wireless, MCI Worldcom
As an outreach vehicle to the research and commercial communities, CAIDA periodically
holds invitational Internet statistics and metrics analysis workshops. ISMA workshops
are held to discuss the current and future state of Internet measurement and analysis.
The intent of the workshops are to facilitate discussion among communities of
academia, equipment vendors, and service providers, who share an interest in and
incentive to understand one another's interests and concerns with Internet statistics
and analysis.
Goals:
6.C.1. Hold 1-2 ISMA workshops in (March, September) 2001 in support
of the networking modeling and routing research communities.
6.C.2. Submit papers and collaborate, specifically with the program
committee, on the PAM-2001 Passive & Active Measurement Workshop scheduled
for Amsterdam, April 23-24, 2001.
Project Leads: k claffy
Funding Sources: NSF, Membership
Collaborators / Contributors: Waikato University, RIPE
6.D.
Collaboration with R&E and commercial community
Products of proposed effort
- the Internet data meta-repository
- tools and techniques for analyzing data in the repository
- a community of participants actively contributing, using and collaborating
on projects to analyze the data
- training and curriculum modules for undergraduate and graduate education
We will work with the network modeling and simulation (NMS) community to identify
what formats of datasets are most useful to them. Each format must focus on
the potential use of the data, e.g., a study focusing on emerging protocols
may require different data format from one designed to profile the effects of
network congestion, outages, or route flapping. Datasets will include:
- Inter-domain routing
- Topology and Performance
- Traffic Workload
- Multicast Traffic Behavior
In 1998, Cisco Systems funded CAIDA to develop a taxonomy of
Internet measurement and analysis tools. CAIDA maintains
and updates this taxonomy weekly; it covers public
and proprietary measurement tools, initiatives, and
infrastructures, weather-report related services, and
network visualization tools. Expansion of the scope and
quality of the materials available at this site will
continue through FY2001.
Goals:
6.D.2.A. Maintain the Tool
Taxonomy web site
Project Lead: Margaret Murray
Funding Sources: Cisco and NSF
Collaborators / Contributors: Various organizations contributing information
In 1999, CAIDA established a metrics WG to document network measurements
and developing specifications for a set of useful 'standard' metrics.
Goals:
6.D.3.A.
Publish Metrics FAQ in final form on CAIDA web site, announce on
NANOG list.
6.D.3.B.
Produce best current practice document listing which metrics
providers/enterprise network managers should measure, and
give details of how to measure them effectively.
6.D.3.C.
Encourage CAIDA WG members to share information about their
current work on new metrics.
Project Lead: Nevil Brownlee
Funding Sources: Cisco and NSF
Collaborators / Contributors: Various organizations
contributing information
Encourage the development of traffic measurement standards,
within the IETF or by industry groups. Topics of particular
interest include:
-
A. A standard definition of traffic flows, suitable for use by
network equipment vendors and by vendors of systems which
use flow data, e.g. service provider billing billing systems.
-
B. A standard protocol for the interchange of traffic flow
data. This should be simple enough to be implemented in
high-speed hardware, so as to support the development of
distributed packet header data collection systems.