CAIDA 2003-2005 Program Plan

A summary of research goals and plans for September 2003 through September 2005.

For further information contact k claffy, kc@caida.org

Executive Summary: The Cooperative Association for Internet Data Analysis (CAIDA) is an independent analysis and research group based at the University of California's San Diego Supercomputer Center, seeking to foster collaboration among the commercial, government, and research sectors of the Internet industry. Aimed at promoting greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure, CAIDA provides a neutral framework to support cooperative technical endeavors in measurement, analysis, and tool development.

Mission Statement: CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:

  • are macroscopic in nature and provide enhanced insight into the function of Internet infrastructure worldwide
  • provide free access to traffic analysis and visualization tools to facilitate network measurement and management

Research Program Areas: CAIDA is actively engaged in the following six program areas:

program areagoal
Irouting, addressing and topology Develop a calculus to describe and model the structure and dynamics of global Internet topology.
IIworkload characterization Analyze pertinent (and to the extent possible, `typical') features of Internet usage, including by protocol, application, and location dynamics.
IIInetwork security Monitor unsolicited Internet traffic and distill malicious activity, including DOS attacks, worms, host and port scanning, and novel attacks.
IVDNS Develop and evolve tools for improving DNS measurement and analysis, focused on implications for future DNS functionality (IPv4 & IPv6).
Vperformance Develop methodologies and tools for measuring Internet performance characteristics, in particular estimation of path capacity and available bandwidth
VItrends Correlate heterogeneous network measurement data to identify, describe, and analyze Internet traffic trends.

In each of the program areas, CAIDA

  • collects Internet measurement data sets and makes them available to other researchers
  • develops software tools
  • performs research and analysis
  • provides multiple outreach and educational resources

CAIDA actively collaborates with other researchers by releasing tools and data sets. This document describes in more detail CAIDA's activities regarding research and analysis, tool development and data availability. Outreach and educational activities are described at: https://www.caida.org/publications/ and https://www.caida.org/workshops/.

Allocation of Effort:

  1. Routing, Addressing and Topology

    1. Research and Analysis
    2. CAIDA investigates Internet topology growth and routing system characteristics in support of future growth of the Internet. Specific projects include:
      1. Routing and peering analysis - "Routing and Peering Analysis for Enhancing Internet Performance and Security" is funded by NCS via NSF ANIR.
        Main funded tasks are:
        - investigate patterns of IP address space usage;
        - develop methodologies to identify 'core' Internet nodes, prefixes, ASes, geographic regions;
        - track growth, refinement and churn and categorize contributors to BGP dynamics;
        - assess incongruity between actual and announced paths and develop taxonomy of incongruity types.
        Additional funds are needed to:
        - evaluate effects of incongruities on Internet performance and stability;
        - create an apparatus for parameterizing and modeling of peering policies;
        - assist vendors and U.S. government agencies with prioritized emergency notification processes

      2. Routing atoms - "Next Generation Routing (Atoms)" is funded by NLnet and RIPE. CAIDA is researching and implementing modifications to BGP routing that aggregate prefixes into equivalence classes (policy atoms) based on common AS path from a given topological location. The motivation for this project is the recognized concern regarding the increased instability imposed by inherent additional computation and communication costs as the global BGP table size increases.
        Main funded tasks are:
        - write software to compute policy atoms based on data from the U. of Oregon Route Views project;
        - investigate properties and dynamics of atoms with respect to attributes other than AS path;
        - comparing models of atoms derived from CAIDA topology measurements with those from Route Views BGP tables;
        - develop atomized BGP routing software and simulate and test it in a confined deployment scenario;
        - making tested and optimized (zebra-based) implementations publicly available.
        Additional funds are needed to:
        - analysis of atoms assuming that providers themselves aggregate prefixes originated by customers into atoms.

      3. AS connectivity and ranking - "Connectivity Ranking of Autonomous Systems" is funded by Cisco. CAIDA has developed an algorithm to rank ASes by their outdegree in the Internet graph constructed from our macroscopic topology measurements. This ranking reveals critical subsets of peering relationships as well as the extent of connectivity coverage and relative market share among different providers.
        Main funded tasks are:
        - evaluate inter-AS connectivity based on prefix-level granularity as well as AS granularity;
        - develop a set of utilities processing BGP data (IOS and zebra formats) for heuristic analysis of peering, transit, and customer relationships.
        Additional funds are needed to:
        - develop a visualization tool for navigating BGP tables capable of visualizing at least half a million nodes with a graceful drill-down.

      4. IPv4 and IPv6 macroscopic topology - is funded by WIDE (Japan) gift support. CAIDA has been conducting active measurements of global IPv4 IP level connectivity since 1998. We are now in the process of extending these measurements to track IPv6 connectivity as well.
        Main funded tasks are:
        - design and test IPv6 monitoring tool scamper;
        - investigate IPv6 tunnel links between IPv4 routers;
        - compare current structure and characteristics of IPv4 and IPv6 topologies.
        Additional funds will be needed to:
        - develop and deploy IPv6 monitoring infrastructure to support continuous collection, storage, and analysis of IPv6 routing and topology data (monitors around the world, hardware and software resources for continuous data collection and downloading, sysadmin support).

      5. Geolocation of IP resources. Accurately identifying the geographic location of network objects is critical to projects in all six of CAIDA's focus areas. This program area is currently not funded. As of September 2003 we are exploring a partnership with Digital Envoy for providing IP geolocation services to CAIDA for internal use. If this partnership succeeds, we will offer suggestions for strategies and techniques for geo-location of IP resources (including parsing registry databases, automated name recognition in ISPs host naming patterns, using RTTs for triangulation) and heuristics for integrating available techniques.
        CAIDA will continue to try to support publicly accessible technology for the mapping of AS numbers to geographic locations according to the main regional Internet registries (RIRs): ARIN, APNIC, LACNIC, RIPE.

      The table below summarizes the status of the listed projects.

      projectproject and status URLs current funding status
      Routing and peering analysis www.caida.org/funding/ncs2002/
      www.caida.org/research/routing/
      early Y2 funding.
      proposal Y3 submitted.
      Routing atoms www.caida.org/funding/atomized_routing/
      end Y1 funding.
      no follow-on.
      AS ranking https://asrank.caida.org/ early Y2 funding.
      IPv4 and IPv6 macroscopic topology www.caida.org/projects/macroscopic
      sk-summary.caida.org/cgi-bin/main.pl
      Q3 of Y2 funding.
      WIDE funds pending.
      Geolocation of IP resources www.caida.org/catalog/software/netgeo/
      evaluating commercial partner options.


    3. Supporting Tool Development
    4. Building and maintaining software tools to measure and analyze Internet topology is an important part of CAIDA activities. Existing tools are:
      • skitter - actively probes IPv4 connectivity
        - update in 2003 to include intermediate hop RTTs & optimize storage
      • rocketfuel - actively probes IPv4 connectivity (AS-specific). CAIDA will assume support for U. Washington's rocketfuel at the tool authors' request. (undecided)
      • scamper - actively probes IPv6 connectivity
        - continue development in 2004 and 2005
      • iffinder - identifies interfaces belonging to the same router
      • arts++ - a C++ class library used by CAIDA software packages
        - needs update or replacement
      • NetGeo - a database and collection of Perl scripts to map IP addresses and AS numbers to geographic locations
        - needs maintenance and update (Project I-v) or commercial alternative
      • Walrus - interactively visualizes large directed graphs in 3D space
      • Otter - visualizes arbitrary network data expressed as a set of nodes, links or paths
      • GeoPlot - creates a geographical image of an arbitrary network data set
      • plot-latlong - simple tool for plotting lat/long points on geographic maps
      • LibSea - a Java library for representing large directed graphs
      • PlotPaths - displays forward and reverse network paths from a single source to one or more destinations

      The project "Macroscopic Internet Data Measurement and Analysis" funded by NCS via NSF-NPACI-CISE (www.caida.org/funding/ncs2002/) supports the following development of Internet topology tools:
      - maintain Walrus 3D hyperbolic viewer and apply it to specific tasks, e.g., Internet worm/virus spread;
      - develop and implement navigational techniques and visual representations for routing table and peering relationships.
      This project is in Q2Y1 funding.

      CAIDA needs more funds to maintain and update existing tools and to continue developing new, better tools for the Internet research community.


    5. Data to Community
    6. At the moment CAIDA makes available to other researchers the following data sets relevant for routing and topology studies:


  2. Workload Characterization

    1. Research and Analysis
      CAIDA aims to measure and analyze traffic on production Internet links in pursuit of better understanding of that traffic. Specific projects are listed below, currently funded by NCS via NSF-NPACI CISE (www.caida.org/funding/ncs2002/) and are currently in the first half of year 1 funding.
      1. Flow estimation and taxonomy by size/speed/duration.
        Main funded tasks are:
        - develop self-tuning measurement algorithms that are robust in the face of anomalous traffic patterns, e.g., port scans, DOS attacks;
        - develop a measurement system that concisely summarizes traffic on a link;
        - test real-time algorithms and software using traffic at UCSD/SDSC, e.g., SDNAP, to identify common network applications or groups of applications.

      2. Analysis of peer-to-peer traffic. CAIDA develops methods to identify peer-to-peer (p2p) traffic that no longer uses fixed port numbers.
        Main funded tasks are:
        - develop p2p command strings searching algorithms simple and fast enough for real-time implementation, e.g., in NeTraMet;
        - set up a long-term p2p monitor on a backbone Internet link;
        - create a dynamic web page showing traffic levels of various p2p applications.

      3. Modeling TCP dynamics. CAIDA researchers study features of TCP flows that can be reliably estimated from packet header trace data.
        Main funded tasks are:
        - compare various algorithms determining for round trip time (RTT) of TCP flow packets in captured traffic samples;
        - identify and analyze the behavior of long TCP flows, i.e., those responsive to TCP's feedback and congestion control algorithms;
        - implement a new NeTraMet attribute indicating the status of TCP control mechanisms for a given flow.
        Additional funds are needed to:
        - constructing a measurement-based TCP traffic model.

      4. Compare workload characteristics between IPv4 and IPv6, and correlate with topology data where applicable. CAIDA will devise methodologies for joint analysis of data collected from its macroscopic topology monitors and from passive traffic monitors.
        Main funded tasks are:
        - compare characteristics of IPv4 and IPv6 workload (e.g., flow lengths, ports, protocols);
        - correlation of workload and topology characteristics, i.e., do patterns differ between access and core links; - establish real-time online tracking of workload characteristics at representative locations.
        Additional funds are needed to:
        - track propagation of active probes continuously sent by topology monitors through collected traffic samples;
        - monitor IPv4 and IPv6 workload and performance over several years.


    2. Supporting Tool Development
    3. CAIDA develops device drivers and applications enabling network data collection and workload characterization, in real time or from trace files. Existing tools are:
      • CoralReef - a comprehensive software suite to collect and analyze data from passive Internet traffic monitors
      • NeTraMet - an implementation of the RTFM architecture for Network Traffic Flow Measurement
      • flowplot - a new visualization module tool to visualize output of aguri, CoralReef, and new flow estimation tools

      Current development and maintenance of workload characterization tools are supported by DARPA NMS and WIDE. The DARPA project is in the 3rd (last) year of funding. We will also use Endace's gift of DAG traffic monitoring cards to deploy monitors at strategic monitoring points in the backbone networks.
      Main funded tasks are:
      - make CoralReef and NeTraMet work with all current models of Endace DAG cards;
      - establish GPS synchronization of DAG cards in SDSC computer room;
      - re-establish long-term CAIDA monitoring systems, as SDSC/UCSD network changes from OC12 to Gigabit Ethernet links.


    4. Data to Community
    5. At the moment CAIDA makes available for other researchers the following data sets relevant for workload characterization:
      • NetTraMet anonymized traces
        Archived flow data files used for DNS performance summaries (see Research Area IV) can be requested from nevil@caida.org.
      • backbone traces
        CAIDA has a growing collection of traces from OC48 backbone links. Visitors to CAIDA may use those data while at SDSC (AUPs apply.) Researchers requesting access may discuss visit schedule options by contacting kc@caida.org.


  3. Network Security

    1. Research and Analysis
      CAIDA researchers pioneered the application of the backscatter technique to study denial-of-service (DoS) attacks worldwide. We developed a network telescope to study Denial-of-Service attacks, Internet worm spread, and host and port scan characteristics. We are currently developing real-time publicly available reports that quantify global network security threats worldwide. Research tasks for this project are partially funded by a Cisco URP grant and an NSF Trusted Computing grant until September 2006. An additional Cisco URP grant and a proposal to support operational tasks are pending.
      Main funded tasks are:
      - identify the scope and characteristics of distributed denial-of-service attacks and Internet worms
      - develop a tool to monitor and generate graphical representations of malicious traffic
      - classify victims of wide-are-network security events
      - develop real-time, adaptive denial-of-service-attack definitions
      - quantify the damage experienced by DoS and worm attack victims
      - assess the efficacy of nascent efforts at distributed attack mitigation
      - identify trends in attack types over time
      - investigate the ways that telescope size and position influence results
      Additional funds will be needed to:
      - long-term network telescope operation (disk space, network infrastructure)
      - honeynet development to validate telescope results
      - development of long-term patching/vulnerability profile studies
      - expanding the network telescope to cover additional locations
      - public release of recent worm, denial-of-service attack, and host scan datasets



    2. Supporting Tool Development
      • - CAIDA has developed software based on the Coralreef API to capture and analyze denial-of-service attacks and Internet worms. A modified version of the Coralreef Report Generator helps to display realtime security reports.
      • - Countries.pm: this perl module provides country code, country name, and continent location information.
      • - crl_attack_flow: specialized high-speed security event monitoring and classification software. Specialized event-based, rather than flow-based, attack monitoring and reporting software is also under development.
      • - plot_country_intervals: perl software that uses the fly and gifsicle open-source programs to generate animations of denial-of-service attacks, Internet worms, and host/port scans worldwide
      • - The "Macroscopic Internet Data Measurement and Analysis" project, funded by by NCS via NSF-NPACI-CISE (www.caida.org/funding/ncs2002/), supports the application of the Walrus 3D hyperbolic viewer for tasks such as visualization of worm/virus spread throughout the Internet. This project is in the first half of year 1 funding.

    3. Data to Community
      Three weeks of backscatter data are available to academic researchers, US government funded researchers, US agencies, and CAIDA members via www.caida.org/data/passive/backscatter_request. These are some of the most comprehensive publicly available datasets of distributed denial-of-service attacks around the world. We will continue to support community data needs as resources permit.


  4. Domain Name System (DNS)

    1. Research and Analysis
      The Internet depends on reliable DNS service for correct, robust operation. Relentlessly increasing Internet growth and the rise of IPv6 will further load the DNS infrastructure. The U.S. research agenda must pursue better understanding of this critical element of the global Internet. CAIDA has been actively involved in DNS data analysis since 1999. Currently, our specific projects in this program area are:
      1. Characterization of DNS workload and performance. This project is supported by WIDE (Japan) gift funds and is currently in the 2nd half of year 2 funding. Additional WIDE gift funds are pending.
        Main funded tasks are:
        - maintain NeTraMet meter in Tokyo; make measurements web-accessible;
        - repeat analysis of private (RFC1918) update traffic to determine whether there has been any improvement in vendor software to alleviate spurious updates;
        - investigate the impact of anycast on DNS root service.

      2. DNS Modeling - "Network Modeling and Simulation" funded by DARPA NMS is currently in the 1st half of year 3 funding. CAIDA uses the simulation lab at the Measurement Factory for conducting laboratory experiments simulating DNS behavior under controlled conditions. Results are at www.caida.org/projects/dns/.
        Main funded tasks are:
        - refine laboratory simulations of large-scale DNS behavior, e.g., use more realistic TTLs, add some percentage of lame delegations, try another set of names to query, address the `replaying trace too fast' problem, etc.;
        - develop statistical techniques to categorize the state of TLD nameserver operation and monitor configuration changes as they occur;
        - collect and analyze BIND log files.
        Additional funds will be needed to:
        - define parameters of realistic DNS scenarios for use in network models;
        - investigate scalability of the proposed parameters from laboratory environment to the global Internet.

      3. DNS Security - DNS-OARC proposal to incorporate research and analysis capabilities into trusted operational centers has been submitted to NCS. If funded CAIDA will conduct DNS performance and vulnerability analyses and produce recommendations on hardening DNS for both IPv4 and IPv6 operations.
        Main proposed tasks are:
        - Build tools for automatic analysis of BIND log files;
        - develop statistical techniques to categorize the state of TLD nameserver operation and recognize operating state changes as they occur;
        - investigate implications and effects of anycast on root server operation;
        - maintain continual communication with NCS and DHS on DNS performance as viewed through our measurement;
        Additional funds will be needed to:
        - build and test tools enhancing server security;
        - evaluate quality of other macroscopic DNS measurements procured by the federal government for use in cybersecurity monitoring;
        - archive long-term performance data on root and gTLD DNS use;
        - maintain existing measurement tools as protocols and formats evolve.

    2. Supporting Tool Development
    3. CAIDA documents, packages, and distributes passive traffic monitoring tools ( CoralReef and NeTraMet - II-B above) and methods for their use to monitor the DNS infrastructure. Other DNS related tools supported by CAIDA are:
      • dnsstat - collects accurate statistics of DNS queries on a specific nameserver (or client)
      • dnsstop - displays various tables of DNS traffic on your network

      Further development of tools to monitor DNS behavior is supported by DARPA NMS and is in the first half of year 3 funding.
      Main funded tasks are:
      - add DNS TLD attribute to NeTraMet to simplify collection of country-code TLD (ccTLD) data;
      - devise method of monitoring ccTLD performance and add reports to public web pages.


    4. Data to Community
      Two of the three production NeTraMet meters at Auckland (New Zealand), and Boulder (Colorado, USA) continuously collect data for plotting daily root and gTLD server performance. The third meter (at UCSD) is temporary unusable. It will be revived after the UCSD/SDSC network upgrade. The following DNS relevant data are available:
      • DNS performance summaries - via www.caida.org/cgi-bin/dns_perf/main.pl.
      • root nameserver workload data sets (logs)
        Visitors to CAIDA may use those data while at SDSC (AUPs apply.) Researchers requesting access may discuss visit schedule options by contacting kc@caida.org.


  5. Performance

    1. Research and Analysis
      CAIDA measures Internet performance and develops methodologies for its improvement. Our tool for plotting RTTs and packet loss to all IP hops along a specified forward IP path (beluga) remains unfunded. Our main funded activity in the performance area this year is the DOE-funded project "Bandwidth Estimation Research" (www.caida.org/projects/bwest/), currently in early year 3 funding. We have surveyed existing bandwidth estimation tools and algorithms and given results to tool developers. We have set up a test lab environment, developed testing procedures and evaluated tools on 100 MB links. We have started and will continue testing available tools on GigEther speed paths.
      Remaining funded tasks are:
      - evaluate performance of bandwidth estimation tools on 3- and 4-hop GigEther paths;
      - develop a GUI interface for scheduling and visualizing bandwidth measurements on end-to-end paths;
      - test implementation of bandwidth measurement methodology (capacity and available bandwidth) in high-performance academic and research networks;
      - develop application-layer techniques to help TCP to achieve its maximum feasible bandwidth on a path (SOBAS).
      Additional funds will be needed to:
      - developing middleware to support bandwidth estimation in collaboration with SciDAC Grid Portals researchers;
      - integrating bandwidth estimation technologies into DOE network infrastructures;

    2. Supporting Tool Development
    3. Available BWEST tools are:
      • pathload - estimates end-to-end available bandwidth
      • pathrate - estimates end-to-end bandwidth capacity
      • beluga - plots RTTs and packet loss to all IP hops along a specified forward IP path

      pathload and pathrate tools are maintained by C. Dovrolis at Georgia Tech.

      The project "Macroscopic Internet Data Measurement and Analysis" funded by NCS via NSF-NPACI-CISE (www.caida.org/funding/ncs2002/) supports integration of visualization tools with performance data in real time. This project is in the first half of year 1 funding.


    4. Data and Infrastructure to Community
      • Survey of tools and methodologies (accepted by IEEE Network)
      • Tool evaluation results given to tool developers
      • BWEST testbed (at SDSC) available to qualified researchers


  6. Trends

    1. Research and Analysis
      The research community suffers from a lack of coherent longitudinal datasets for cross-domain analysis of traffic on the wide-area Internet. CAIDA proposed to create a database (Internet Measurement Data Catalog, or IMDC) that will index distributed repositories and archives of Internet data and tools. Potential benefits from unifying individual heterogeneous data sets and making them available to all interested researchers are enormous, and we envision the IMDC database as supporting the field of network research for the foreseeable future. This research will also provide measurement-based input to answer public policy and regulatory questions regarding administration, stability and security of Internet infrastructure. The project "Correlating heterogeneous measurement data to achieve system-level analysis of Internet traffic trends" funded by NSF (www.caida.org/projects/trends/) is currently at the end of year 1 funding. Results and progress are at www.caida.org/projects/trends/imdc/.
      Main funded tasks are:
      - design a universal annotation system (and database support) suitable for describing heterogeneous Internet data sets;
      - make recommendations for meaningful, maintainable long-term passive traffic samples collection;
      - apply IMDC to Internet research problems.
      Additional funds will be needed to:
      - strategic deployment of high-speed passive monitors;
      - continuing analysis and visualization activities to answer currently overarching Internet research issues and questions.


    2. Supporting Tool Development
      Available NSF funding supports creating the IMDC database and populating it (initially) with CAIDA data sets.

      Additional funding will be needed to maintain, improve and enlarge the database in the future.


    3. Data and Infrastructure to Community
      The use of the IMDC database will be open to the research community and highly encouraged. (AUPs will apply.)

Personnel

CAIDA currently employs 15 researchers and support staff based at SDSC; 3 remotely based staff/consultants; 3 undergraduate student workers; and 4 graduate student researchers.

Sponsors

CAIDA has garnered significant corporate support through its Membership program during the Internet bubble, and lost several members when that bubble burst. Currently, the following organizations have made designated gifts to support CAIDA activities:
  • Cisco Systems -- the worldwide leader in networking for the Internet.
  • WIDE -- a consortium of Japanese research organizations and companies working to establish a Widely Integrated Distributed Environment.
  • Endace -- the only company in the world that specializes in building high performance PCI cards for remote network monitoring and surveillance applications. The range of their products covers almost every physical layer at every network speed up to OC192 and 10GigE.

Designated gifts to CAIDA enable us to maximize use of research dollars. CAIDA could not survive without the generosity of its sponsors.

Published
Last Modified