CAIDA Program Plan - 2001

CAIDA Program Plan - 2001

The Cooperative Association for Internet Data Analysis (CAIDA) is entering its third year of operation and its final year of seed funding from the National Science Foundation (NSF). CAIDA remains devoted to promoting greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure. Areas where CAIDA believes it can make important contributions to community advancements include: the development of traffic measurement and analysis tools and techniques, network visualization, and Internet engineering related education and outreach.

The sections below provide a broad framework for CAIDA's programmatic priorities during FY2001, including both federally-supported and member-supported efforts. The sections are divided into the following categories:

1. Measurement Tool Development

1.A. Hardware
1.B. Software

2. Meta-Repository

3. Traffic Analysis

3.A. Tools
3.B. Analysis

4. Measurement Infrastructure

5. Visualization

6. Education and Outreach

6.A. IEC
Community Resources

1. Measurement Tool Development

1.A. Hardware - OC48 and Gig-Ether (finishing DARPA NGI SOW) -

Fundamental to CAIDA's research endeavors is the development and implementation of solutions for passively aquiring traffic data (e.g., packet headers) using stand-alone monitors. Currently UNIX hardware implementations are available OC3 and OC12 ATM speeds. Drivers and most of the hardware schematics are publicly available and 3rd party systems integrators are building monitors using the Applied Telecom and Fore cards. Waikato University's Dag OC3/12 ATM/POS cards are available to select research entities.


1.A.1. Collaborate with the University of Waikato, NZ to develop and test a Dag-4 OC48 ATM/POS / CoralReef monitor.
1.A.2. Collaborate with the University of Waikato, NZ to develop and test a Dag-4 Gigabit Ethernet/CoralReef monitor.
1.A.3. Deploy above monitors and collect data into meta-repository.
1.A.4. Research (FPGA design) and begin movement of measurement analysis features into firmware.

1.B. Software - Skitter

skitter is a traceroute like tool used in infrastructure-wide measurements and analyses (routing, performance, connectivity) supporting projects like CAIDA's NGI initiative for DARPA and research by several collaborating institutions.

1.B.1. Continue enhancement of skitter modules to reflect Member and research priorities. Use Caimis version of skitter where appropriate (
1.B.2. Analysis software (Cflowd, NeTraMet, FlowScan) see below.

Project Leads: David Moore
Funding Sources: Membership and DARPA
Collaborators / Contributors: TDB

2. Meta-Repository

We will design an annotation system repository in which meta-data for data sets is archived and served from many other sites. We will develop common formats, terminology, and a formal language to allow multiple annotations to a given data set based on independent analyses. Others making use of the repository can then query for specific signatures in data sets, and register their own annotations based on results of their analyses.

With the proposed meta-repository, researchers will have the opportunity to correlate data across time, space (trace location), and data features. The comprehensive nature of the data, and the ability to tie different sets together, will enable exploration of macroscopic questions regarding Internet robustness and efficiency that we cannot answer from single viewpoints, e.g., the potential consequences of introducing new or emerging protocols and technology into current networks. Participating institutions could compare workload characteristics from traffic at their own sites with others.


2.A. Design and establish an annotation and storage system to support a distributed repository of demonstrably relevant Internet data sets.
2.B. Demonstrate and evolve the effectiveness of this repository for answering many of our own research questions.
2.C. Provide training and outreach to both the operational and research communities in how to effectively use and contribute to this repository.

Project Lead: kc claffy
Funding Sources: NSF Proposal submitted
Collaborators / Contributors: TBD

3. Traffic Analysis

3.A. Tools -,, and


3.A.1. Cflowd is a flow analysis tool used for analyzing Cisco's NetFlow routing statistics for trend analysis and capacity planning purposes. Further development of this open-source (GPLed) tool will be completed by Caimis although we will house the code and support storage of data from using this tool format. We will also collaborate with Cisco engineers to EFT new netflow versions.
3.A.2. arts++ is a system for storing (binary file format) and manipulating network data. This (open-source, L-GPL) tool has been licensed to Caimis for further development.
3.A.3. CoralReef is a comprehensive software suite developed by CAIDA to analyze data collected by passive Internet traffic monitors using Fore, Applied Telecom, or Dag cards.

3.A.3.a. Enhance the reporting formats and graphics capabilities of CoralReef analysis software
3.A.3.b. Develop a CoralReef application that outputs NetFlow flow-export format.

3.A.4. Tobi Oetiker, who joined CAIDA for a sabbatical in 1999, completed and released RRDtool (Round-Robin Database tool, a system to store and display time-series data) and continues to maintain it.
3.A.5. NeTraMet is an open-source (GPL) implementation of the RTFM architecture for Network Traffic Flow Measurement, developed and supported by Nevil Brownlee at the University of Auckland. Nevil also developed a version of NeTraMet that uses the CoralReef library to read packet headers. This 'CoralReef NeTraMet meter' can work with any CoralReef data source; it has been tested on both CAIDA and NLANR trace files, and on Dag and Apptel ATM interface cards.
3.A.6. Flowscan, analyzes and reports on IP flow data exported by routers. Consisting of Perl scripts and modules, FlowScan binds together (1) a flow collection engine (a patched version of cflowd), (2) a high performance database (Round Robin Database - RRD), and (3) a visualization tool (RRDtool). FlowScan produces graph images that provide a continuous, near real-time view of the network border traffic.

Project Lead: David Moore (CoralReef), Nevil Brownlee (NeTraMet), David Plonka (Flowscan)
Funding Sources: Membership, NSF, DARPA
Collaborators / Contributors: TBD

3.B. Analysis

Analysis of Internet traffic (active and passive) and routing data can provide insights into the nature and evolution of Internet traffic and network topologies, assist engineers to better architect and manage their networks, and enable researchers to better design and implement emerging Internet protocols and technology.


3.B.1. Research and develop technical papers on the following topics:

  • techniques for evaluating architectural optimality for placement of DNS root servers, mirrors, and other infrastructure
  • correlation of active (skitter), passive and/or routing data across time, sources, and data types.
  • analysis of techniques and uses for visualizations (2D, 3D and hyperbolic layouts) of network-related data (e.g., active, passive and routing data).
  • update and expand workload characterization concepts covered in 1998 Nature of the Beast article.

3.B.2. Expand the collaborations with other research institutions who are using skitter datasets in their research. Make skitter data sets available to research community.
3.B.3. Facilitate meritorious research being conducted by 3rd parties, either by offering them internships or sabbaticals at CAIDA or collaborating with them directly.
3.B.4. CAIDA's own research tasks will involve using the repository described in 3.B.2. This will include answering the following questions, whose answers will serve as milestones during the maturation of the meta-repository. Our research reports will serve as status reports and advertisements of the existence and effectiveness of contributing to and using the repository.

  1. How can we classify traffic categories at a semantically higher level, such as behavioral characteristics, without relying only upon inconclusive or even possibly misleading header fields such as TCP/UDP ports. In particular, what traffic classifications are useful for engineering purposes (beyond rudimentary `bulk transfer' vs `interactive') and what characteristics are best used for the classification (e.g., inter-arrival time distribution directional symmetry, packet sizes and directional sequence patterns, address signatures, matrix of host pairs making lots of connections)
  2. For what applications or traffic categories described above is usage growing most quickly?
  3. How elastic (responsive to congestion conditions) are flows at various levels of granularity (host, net, autonomous system, city)?
  4. What gives rise to the discrepancies seen between actual traffic behavior (forward paths) and routing policies articulated via BGP?
  5. How do different models of flows compare (e.g., SYN/FIN vs timeout-based definitions) for a given trace in terms of such as flow size distributions? [Plonka00b]
  6. What effects do violations of the traditional end-to-end model, e.g., transparent caching, global load balancing, have on performance?
  7. What are the macroscopic effects of different multicast architectures, e.g., traditional versus `single-source' multicast (SSM)?

We will also pursue research tasks in longer-term trend tracking to answer the following questions:

  1. How can we identify (and monitor long-term performance of) critical routers and sites that play a significant (and thus perhaps vulnerable) role in the infrastructure?
  2. Is traffic locality changing with growth, e.g., what percent of traffic stays within a campus, region, or country?
  3. How is IPv4 address space being announced versus actually used over time? [McCreary98]
  4. How does the DNS system perform, e.g., has the gTLD mesh improved the macroscopic performance for users [Brownlee01]?
  5. To what degree is long-term traffic growth due to more users and to what degree is it due to more traffic/user?
  6. How much growth is there in tunneling technologies (e.g., encapsulation for IPv6, IPsec) and fragmentation?
  7. What is the macroscopic effect of flash events on Internet traffic behavior, e.g., unsuccessful presidential election or transition to gTLD server infrastructure

Project Lead: David Moore, k claffy, Brad Huffaker
Funding Sources: DARPA, NSF
Collaborators / Contributors: TBD

4. Measurement Infrastructure

CAIDA's forays into operational analyses have historically focused on collaborations with individual ISP engineers and hardware vendors aimed at providing insights concerning the workload characteristics of specific links. Future operational analyses will include significant expansion of data sources and development of new techniques for analysis of passively and actively collected data.


4.1. Maintain a webpage and daily summary graphs of the performance trends associated with each of the 13 DNS root servers.
4.2. Test NeTraMet, ported to Coral, at one or more trial sites.
4.3. Monitor traffic traversing links at UCSD/SDSC and AIX and generate automatic html summaries.
4.4. Develop basic filtering and data collection mechanisms using CoralReef at OC12 and OC48 speeds.

Project Lead: David Moore, Brad Huffaker, Nevil Brownlee
Funding Sources: Membership, NSF, DARPA
Collaborators / Contributors: TBD

Fundamental to the analyses and research described above is the availability of data from active and passive traffic monitors.


4.5. Maintain 25 distributed skitter monitors, various passive monitors, and related network testing equipment.
4.6. Develop and maintain skitter destination target lists and related databases (for global monitors and each root).
4.7. Deploy additional active measurement hosts in support of the DNS root server monitoring activities.
4.8. Deploy passive monitors at 2-5 commercial facilities under appropriate NDAs.

Project Lead: kc claffy
Funding Sources: Membership, DARPA, NSF
Collaborators / Contributors: TBD

5. Visualization

Network visualizations are in their early stages of development, with CAIDA contributing and continuing development on several prototypes to the field. A core technology necessary for most of the objectives in this project will be mechanisms to manipulate, analyze, and navigate large BGP routing tables, and compare them not only to one another but to topology maps as derived from active probe measurements. We expect results of exploratory data analysis tools to allow more efficient analysis of large Internet data sets, not only limited to topology graphs, but also incorporating performance and workload data into link and node semantics Modeling the Internet Core


5.1. Develop and implement techniques for visualizing massive volumes of distributed, time-series path, performance, routing, and flow data.
5.2. Maintain the "Internet Atlas" website that will include examples and resource links for geographic, semi-geographic, and topology-based network-related visualizations; maintain alternative datasets for visualization and experimentation by researchers and others.
5.3. We will continue development of a hyperbolic viewer ( - sample images available) for visualizing very large directed graphs, where large is a few hundred thousand nodes to a million nodes, and about as many links. It is applicable mainly to graphs that are tree-like in having a meaningful spanning tree and in including a relatively small number of non-tree links. It tackles the difficulties of visualizing large graphs in several ways.
5.4. Continue development of the Otter, GeoPlot, and related visualization code to enhance their relevance as tools for Internet engineers and network architects. CAIDA has made modifications to its visualization tool Otter (see in order to accommodate the special semantics of routing tables, which open the tool up to fundamentally different and acutely relevant classes of Internet visualization.
5.5. Continue to update the backbone database/visualization tool, Mapnet and integrate skitter.

Project Leads: kc claffy, Brad Huffaker and David Moore
Funding Sources: NSF, Membership
Collaborators / Contributors: Sun Microsystems, NSI, ARIN, APNIC

6. Education and Outreach

6.A. Internet Engineering Curriculum Repository (IEC) -

The IEC was initiated in early 1998 with the goal of helping educators and others interested in Internet technology keep up with developments in the field. The focus is on developing and maintaining a repository of teaching materials to support new University courses. Workshops such as the one held in August 1999 help to facilitate Internet-related faculty's use of the repository.


6.A.1. Expand and update the curriculum and related materials available in the IEC repository, including addition of ITL-specific lab and curriculum materials.
6.A.2. Publish a Traffic Analysis CD for use in Internet engineering instruction. The CD will include: training materials, animations, analysis software, lab tutorials, and traffic traces. Approximately 1,000 CDs will be distributed to university professors and industry professionals in the U.S. and abroad.
6.A.3. Develop and make public a searchable database of Internet engineering related faculty, institutions, and curriculum.
6.A.4. Develop a plan for IEC continuation (or transition to a 3rd party) following conclusion of NSF support in 2001. Present the plan to the IEC Advisory Committee for their consideration.
Project Leads: Evi Nemeth and Theresa Ott
Funding Sources: NSF (IEC grant) and Member (Cisco)
Collaborators / Contributors: Participating Universities

6.B. Internet Teaching Laboratories (ITL) -

Few of the nation's Universities have courses in networking technology and even fewer have facilities for hands-on exposure and training of students on current Internet hardware and software. With donations of equipment from vendors, notably Cisco, and with financial support from NSF, CAIDA is facilitating the establishment of Internet teaching laboratories at approximately 20 Universities. Efforts are currently focused on evaluation and selection of meritorious proposals; later efforts will emphasize implementation of the ITL facilities and encouraging cooperation and collaboration among the participants.


6.B.1. Establish a prototype ITL lab at the University of California, San Diego (UCSD) -- including development of appropriate lab and tutorial materials.
6.B.2. Work with vendor and University PR officials to ensure the appropriate press coverage of the award and inauguration of the ITL facilities.
6.B.3. Maintain an infrastructure and curriculum materials supporting these laboratories.
6.B.4. Develop and implement plans for: (1) evaluation of ITL Phase 1; (2) sustainability of the ITL collaboration following NSF funding support; and as appropriate, (3) develop and implement a transition plan.
6.B.5. Hold a 3/4-day workshop for faculty at University of Virginia in June 2001 for ITL faculty; topics include: routing and traffic analysis.
6.B.6. Collaborate on online (distance education) seminar series on Introduction to Internet engineering.
Project Lead: Evi Nemeth, Theresa Ott (CAIDA/IEC); Jorg Leibeherr (University of Virginia)
Collaborators / Equipment Contributors: Cisco, Cable & Wireless, MCI Worldcom

6.C. Internet Statistics and Metrics Analysis (ISMA) Workshops -

As an outreach vehicle to the research and commercial communities, CAIDA periodically holds invitational Internet statistics and metrics analysis workshops. ISMA workshops are held to discuss the current and future state of Internet measurement and analysis. The intent of the workshops are to facilitate discussion among communities of academia, equipment vendors, and service providers, who share an interest in and incentive to understand one another's interests and concerns with Internet statistics and analysis.


6.C.1. Hold 1-2 ISMA workshops in (March, September) 2001 in support of the networking modeling and routing research communities.
6.C.2. Submit papers and collaborate, specifically with the program committee, on the PAM-2001 Passive & Active Measurement Workshop scheduled for Amsterdam, April 23-24, 2001.

Project Leads: k claffy
Funding Sources: NSF, Membership
Collaborators / Contributors: Waikato University, RIPE

6.D. Community Resources

6.D.1. Repository

Collaboration with R&E and commercial community

Products of proposed effort

  1. the Internet data meta-repository
  2. tools and techniques for analyzing data in the repository
  3. a community of participants actively contributing, using and collaborating on projects to analyze the data
  4. training and curriculum modules for undergraduate and graduate education

We will work with the network modeling and simulation (NMS) community to identify what formats of datasets are most useful to them. Each format must focus on the potential use of the data, e.g., a study focusing on emerging protocols may require different data format from one designed to profile the effects of network congestion, outages, or route flapping. Datasets will include:

  1. Inter-domain routing
  2. Topology and Performance
  3. Traffic Workload
  4. Multicast Traffic Behavior

6.D.2. Tool Taxonomy - >

In 1998, Cisco Systems funded CAIDA to develop a taxonomy of Internet measurement and analysis tools. CAIDA maintains and updates this taxonomy weekly; it covers public and proprietary measurement tools, initiatives, and infrastructures, weather-report related services, and network visualization tools. Expansion of the scope and quality of the materials available at this site will continue through FY2001.


6.D.2.A. Maintain the Tool Taxonomy web site
Project Lead: Margaret Murray
Funding Sources: Cisco and NSF
Collaborators / Contributors: Various organizations contributing information

6.D.3. Metrics WG -

In 1999, CAIDA established a metrics WG to document network measurements and developing specifications for a set of useful 'standard' metrics.


6.D.3.A. Publish Metrics FAQ in final form on CAIDA web site, announce on NANOG list. 6.D.3.B. Produce best current practice document listing which metrics providers/enterprise network managers should measure, and give details of how to measure them effectively. 6.D.3.C. Encourage CAIDA WG members to share information about their current work on new metrics.
Project Lead: Nevil Brownlee
Funding Sources: Cisco and NSF
Collaborators / Contributors: Various organizations contributing information

6.D.4. Standards development activities

Encourage the development of traffic measurement standards, within the IETF or by industry groups. Topics of particular interest include:
  1. A. A standard definition of traffic flows, suitable for use by network equipment vendors and by vendors of systems which use flow data, e.g. service provider billing billing systems.
  2. B. A standard protocol for the interchange of traffic flow data. This should be simple enough to be implemented in high-speed hardware, so as to support the development of distributed packet header data collection systems.
Last Modified