Mission Statement: CAIDA The Cooperative Association for Internet Data Analysis (CAIDA) is an independent analysis and research group based at the University of California's San Diego Supercomputer Center. CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:
- are macroscopic in nature and provide enhanced insight into the function of Internet infrastructure worldwide,
- improve the integrity of the field of Internet science,
- improve the integrity of operational Internet measurement and management,
- inform science, technology, and communications public policies.
CAIDA is actively engaged in the following three main program areas:
|I||Research and Analysis||Analyze and model pertinent features and trends of current Internet usage, develop novel approaches to enable future Internet growth|
|II||Infrastructure for Data Procurement||Create state-of-the art infrastructure for data measurement and sharing with the research community|
|III||Measurements and Data||Conduct active and passive measurements for comprehensive characterization of the Internet, provide best available data to the research community|
This program plan outlines CAIDA's plans for 2007-2010, in the areas of research, infrastructure, and data collection to support the research community. For over ten years CAIDA has provided a neutral framework to support cooperative technical endeavors in Internet measurement, analysis, and tool development based on the best available empirical data. Our current research projects, primarily funded by the U.S. National Science Foundation (NSF), include several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming system. Our infrastructure activities, funded by NSF and DHS as well as other government and industry sources, include building a catalog of Internet measurement data sets, contributing to the (DHS-funded) PREDICT repository of datasets to support the (U.S.-based) network research community, and developing and deploying active and passive measurement infrastructure that cost-effectively supports the global Internet research community. We also will continue to lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Finally, we will continue our outreach activities, including our web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops.
I. Research and Analysis
CAIDA is involved in a variety of research activities spanning many domains related to network science and engineering. We seek to discover models that characterize the Internet as an evolving complex system and are capable of predicting salient aspects of future evolution. We also aim to reliably measure and analyze the current state of the Internet in order to validate our theoretical research against empirical data. Specific projects are listed below, with funding status summarized at the end of each section.
Motivation: There is a growing consensus among experts that the routing system is approaching a critical architectural breaking point which any significant deployment of IPv6 will only exacerbate.
Our routing research effort is aimed at developing truly scalable routing algorithms. Through mid-2007, an NSF NeTS NR grant supports our work in this area, and we will complete the main goal of this grant in 2007: thorough analysis of the performance (space, stretch, and communication complexity) of a new class of routing schemes that promise extraordinary scalability improvements over current Internet routing.
The next steps of our research agenda build on what we have confirmed in the last three years, specifically, that the two main problems with the scalability of the current routing system are the routing table size and -- considered even more problematic -- the computational overhead to process updates caused by topology changes. Thus, we will next pursue a rigorous investigation of methods for scalable routing without updates on realistic Internet-like topologies. A 3-year NSF NeTS FIND grant award awarded in 2007 will support CAIDA to develop and validate our new model for Greedy Routing on Hidden Metrics (GROHModel), with the following milestones:
- demonstrate the existence of Hidden Metric Spaces (HMS);
- build methodologies to explicitly re-construct the HMS for the observable Internet topology and, more generally, for any given complex network;
- address challenges associated with using GROHModel-based routing in practice.
Motivation: Our dependence on the Internet in so many dimensions of our lives has rapidly grown much stronger than our comprehension of its underlying structure, performance limits, dynamics, and evolution. Further, the Internet's heritage as a cooperative network for government funded researchers leaves it with fundamental vulnerabilities that are incongruent with its role as a global communications substrate, and ironically leaves it perpetually challenging to research and analyze, for technical as well as policy and economic reasons.
CAIDA is now in a position to integrate six strategic measurement and analysis capabilities to improve our understanding of Internet topology structure and behavior: a new architecture to support topology measurement (Ark); application of IP alias resolution techniques; conversion of IP/router to AS-level topology graphs; AS taxonomy and relationship inference; geolocation of IP resources; and graph visualization. The result will be the capability to regularly provide richly annotated topology maps of observable Internet infrastructure, as well as a powerful measurement platform capable of performing other types of Internet infrastructure assessments as needed.
Currently funded tasks include:
- analyzing skitter data to build router- and AS- level Internet graphs based on the best available data;
- extending our dK-series methodology for graph topology analysis to incorporate link and node annotations;
- constructing annotated Internet graphs;
- developing re-scaling methods for graph generators based on dK-series approach;
- further develop new architecture to support topology measurement (Ark);
- use and refine best available IP alias resolution and AS relationship inference techniques to extract accurate topology graphs;
- integrate and improve Internet geolocation data into topology graphs;
- improve topology graph visualization techniques.
Additional funds are requested to:
- design and develop an Internet topology database with public access web-based query for validation;
- extend Ark to include IPv6 topology discovery;
- improve alias resolution techniques and automate regular collection of data.
Internet evolution model
Motivation: We seek to fill a glaring void in the field of Internet research by finding an analytically tractable model of Internet topology evolution that faithfully captures system dynamics while utilizing a minimum number of measurable external parameters.
Through 2007, this research will receive support from an NSF NR grant; the funded tasks include:
- build an economy-based model for Internet topology evolution;
- find the parameter values that match observed Internet topology data.
In collaboration with George Riley at Georgia Tech School of Electrical and Computer Engineering, additional funds are requested to:
- develop a topology generator based on the developed evolution model;
- simulate various future development scenarios within the model;
- study self-similarity of emerging Internet topologies at multiple scales (local, regional, continental, worldwide);
- track the evolution of Internet topology through transitional states to predict its asymptotically converging behavior;
- evaluate a wider class of formal models that attempt to not only faithfully reproduce observed data, but also capture fundamental laws of network evolution.
Motivation: We seek to develop meaningful and up-to-date quantitative characterizations of malicious activities utilizing or impacting the Internet. Achieving fundamental insights into the nature of malicious software will point us toward the best directions for mitigating its effects.
CAIDA researchers pioneered the application of the backscatter technique to study denial-of-service (DoS) attacks worldwide. We developed a network telescope to study Denial-of-Service attacks, Internet worm spread, and host and port scan characteristics.
CAIDA researchers are heavily involved with the Collaborative Center for Internet Epidemiology and Defenses (CCIED, "seaside"). CCIED addresses the critical challenges posed by large-scale Internet-based pathogens, such as worms and viruses. CCIED efforts focus on analyzing the behavior and limitations of Internet pathogens, developing early-warning and forensic capabilities, and developing technologies that can automatically defend against new outbreaks in real-time.
The main funded tasks include:
- track the spread of new Internet worms, viruses, and other malicious activities;
- classify victims of wide-area-network security events;
- investigate the ways that telescope size and position influence results;
- analyze the potential for a given computer to be vulnerable to, or highly likely to spread, a given piece of malicious software;
- investigate methods of confining the spread of malicious environment to realistic online test environments (honeyfarms);
- monitor the spread of spam and investigate the longevity of sites serving phishing or black market items;
- find ways to identify novel or anomalous content in a large volume of typical traffic;
- analyze the market value of compromised computers, routers, and other resources.
Additional funds will be needed to:
- track and report on the spread, activities, and economics of bot networks (botnets);
- conduct long-term patching/vulnerability profile studies;
- expand the network telescope to cover additional locations.
Anonymization and Privacy
Motivation: Researchers studying the Internet face a significant challenge in looking for traffic traces: the fundamental conflict between end-user privacy and the research utility of data. When data is heavily anonymized, important attributes of data that reveal the structure and function of networks are obscured. If data is not heavily anonymized, details about end users, including geographic and network location, organization, names, passwords, and other personal information could be subject to unauthorized access.
Current state-of-the-art tools, including the CryptoPAN library for prefix-preserving anonymization of IP addresses, and Bro and tcpmkpub allow anonymization of traces while preserving as much network-relevant data as possible. These anonymization methods effectively protect user privacy when used on a single trace from a single location, but they may leak information when used in many locations over time. In particular, the privacy provided by these methods depends on the use of cryptographic keys. The risks of reusing these keys over time remain unknown.
We seek to increase the volume of data available to the network and security research and development communities, facilitate access to novel datasets, and work toward real-time situational awareness on current Internet usage and threats. We hope to help develop best practices for data providers to anonymize data acquired with single-site, single measurement and distributed, longitudinal measurement anonymization techniques with prefix-preservation.
CAIDA seeks funds for the following tasks:
- develop best practices for using current single-site, single-measurement anonymization techniques such as CryptoPAN prefix-preserving anonymization and tcpmkpub with distributed, longitudinal measurements;
- distribute the resulting anonymized measurements;
- publish the best practices;
- incorporate support for NetFlow and cFlow data, anonymized in accordance with the best practices, into the CoralReef Software Suite's realtime report generation tool;
Traffic characterization and classification
Motivation: Classifying Internet traffic into applications is important to defining service quality patterns, optimizing hardware and software design, traffic engineering, and perhaps of most recent importance, security systems. Despite a plethora of research devoted to traffic classification and a variety of proposed traffic classification methods, the research community still does not have satisfying answers to the most fundamental questions, including how to best identify traffic as a specific application. Rigorous comparison of various classification methods is challenging for three reasons. First, there is no publicly available payload trace set, so every method is evaluated using a different set of locally collected payload traces. Second, existing approaches use different techniques that track different features, tune different parameters and use different definitions and categorization of applications. Third, authors generally do not make their tools or data available once they publish their results.
To address these challenges, we have conducted a comprehensive and coherent evaluation of three traffic classification approaches: port-based, behavior-based, and statistical, and are undertaking a comparison using the best available tools in each category using traces that are available for others to reproduce the results.
Funding is need to support the following tasks:
- gather additional traces to support traffic classification research;
- evaluate existing methods of traffic classification and propose more efficient approaches;
- identify and refine methods to identify peer-to-peer (p2p) traffic that do not use fixed port numbers;
- develop self-tuning traffic measurement algorithms that are robust in the face of anomalous traffic patterns, e.g., port scans, DOS attacks;
Domain Name System (DNS) monitoring and analysis
Motivation: DNS is a critical infrastructure service whose efficiency and robustness are crucial for the operation of the global Internet. CAIDA pursues long-term research and analysis of DNS workload, performance, and integrity in the face of relentless growth of DNS server and client populations.
In 2007-2008, CAIDA's DNS activities receive support from an NSF ITR grant We will complete the following funded tasks by mid-2008:
- collect raw data sets on DNS workload and performance and make them available to other researchers;
- using measurement and simulation, investigate implications and effects of anycast on DNS root server operation;
- using data collected in 2006 and 2007, complete the influence map of DNS Root Anycast Servers;
- initial analysis of the 2007 DITL-DNS data;
- regular surveys of characteristics of DNS servers deployed in the wide area Internet, e.g., server software and vulnerabilities;
- archive long-term performance data on root/gTLD RTT Dataset;
- develop tools for high performance monitoring of busy DNS servers including: dsc, dnstop, dnsdump, and dnscap.
Additional funds are needed to:
- continue long term data collection and analysis on the root name server system ;
- integrate passive DNS collectors from a variety of projects and build web interface to access salient statistics;
- automate DNS influence map creation;
- extend survey of DNS software to include EDNS0 and IPv6 support, as well as broader range of DNS servers in the IPv4 and IPv6 topology;
- integrate DNS knowledge into our topology data sets.
Analysis of Internet Identifier consumption and characteristics
Motivation: The impending exhaustion of the IPv4 address space and inevitable transition toward IPv6 require urgent attention to issues in not only the technical but also economic and socio-political domains. To support informed public policy discussion and development, CAIDA provides objective analysis of relevant issues such as concentration of address ownership, and IPv6 adoption and usage.
Through 2007, this work is supported by a donation from ARIN. The remaining funded tasks are:
- develop methodologies to identify public facing IPv6 servers and determine their prevalence in the ARIN region;
- execute preliminary probe to determine prevalence of IPv6-responding servers (http, dns) in ARIN region;
- compare IPv4 vs IPv6 performance to sample set of responding servers.
Additional funds are required to:
- compare structure and characteristics of observed IPv4 and IPv6 topology
- correlate performance and topology characteristics where applicable.
- build system for ongoing topology and performance monitoring of IPv6-responding servers.
- host interdisciplinary scenario planning workshop to discuss IPv4 and IPv6 futures
The table below summarizes the status of the projects listed in Section I.
|Routing||Toward mathematically rigorous next-generation routing protocols for realistic network topologies||NSF NeTS||Oct 04 - Sep 07||active|
|Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures Without Topology Updates||NSF FIND||Oct 07 - Sep 10||active|
|Topology||Toward Mathematically Rigorous Next-generation Routing Protocols for Realistic Network Topologies||NSF NeTS||Oct 04 - Sep 07||active|
|Generating Realistic Network Traffic and Topologies||NSF-NBD||Oct 06 - Sep 09||active|
|Generating Realistic Network Traffic and Topologies||CNS||Feb 07 - Jan 09||active|
|Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security||DHS S&T||Aug 08 - Sep 10||active|
|Evolution||Toward Mathematically Rigorous Next-generation Routing Protocols for Realistic Network Topologies||NSF NeTS||Oct 04 - Sep 07||active|
|Security||Cybertrust Center for Internet Epidemiology and Defenses||NSF||Oct 04 - Sep 09||active|
|Traffic classification||Correlating Heterogeneous Measurement Data to Achieve System-level Analysis of Internet Traffic Trends||NSF||Oct 04 - Sep 07||ended|
|DNS||Improving the Integrity of DNS Monitoring and Protection||NSF ITR||Sep 04 - Aug 07||active|
|IPv4 and IPv6 address space||ARIN||Oct 04 - Sep 07||active|
II. Infrastructure for Data Procurement
The research community suffers from a lack of coherent data sets for cross-domain analysis of properties, performance, and dynamics of the wide-area Internet. Since its foundation in 1998, CAIDA has pioneered the collection of commodity Internet measurements and making the collected data available to researchers. We plan to continue our investment into data procurement to improve the integrity of Internet science, while navigating the associated technology, legal, and ethical challenges. Current CAIDA infrastructure projects are listed below.
DatCat - Internet Measurement Data Catalog
Motivation: DatCat, the Internet Measurement Data Catalog (DatCat), indexes Internet measurement data, allowing researchers, faculty, and students to find, annotate, cite, and share data. The goals of the catalog are:
- to facilitate searching for and sharing of data among researchers
- to enhance documentation of datasets via a public annotation system
- to advance network science by promoting reproducible research
In June 2006, after we had seeded DatCat with CAIDA data sets, we opened the catalog for public browsing. By July 2007, DatCat contained more than 82 thousand data items for a total of more than 9 TB of data from 19 organizations. We strongly encourage Internet researchers to use DatCat to (1) find real data to help validate their research; and (2) to share their data with others.
Additional funds are required to:
- continue ongoing indexing of newly collected CAIDA data;
- assist outside researchers with cataloging their data;
- design and implement versatile DatCat submission tools;
- develop tools to automatically analyze datasets and generate annotations for data in DatCat;
- open the catalog for public contribution;
- refine and customize the DatCat browse and search interfaces;
- develop a sophisticated catalog output API to allow external tools to manipulate search results;
- expand DatCat capabilities to enable cataloging tools in addition to data;
- develop techniques to analyze DatCat meta-data to answer overarching Internet research questions;
- promote widespread use of the DatCat to the research community;
- hold workshops helping researchers to index their data in DatCat;
- hold workshops demonstrating the use of DatCat and Internet data in general in undergraduate- and graduate-level classes;
- to maintain, improve and grow the database in the future.
Archipelago (Ark) - Community Measurement Platform
Motivation: Several years ago the network research community acknowledged the need for a community-oriented measurement platform that achieves greater scalability and flexibility than existing infrastructures, encourages collaboration among traditionally independent groups and allows previously unattainable groundbreaking research on cyberinfrastructure itself. Several challenges face the development of such a platform: sustainable infrastructure funding; navigation of legal and data ownership issues, and protecting the network from experiments that might cause harm. In 2006, armed with cautious optimism and joint funding from NSF and NSA, CAIDA started development of Archipelago (Ark) as a major step toward a community-oriented measurement platform.
Ark is a distributed experimental system representing the post-skitter generation of CAIDA active measurement infrastructure. We designed Ark to serve as a security-hardened platform where trusted collaborators can run vetted experiments while the general public can also participate in more restricted measurements and viewing of results. In 2007, in collaboration with WAND a research group at the University of Waikato Computer Science Department, we are testing the scamper tool on Ark monitors to capture macroscopic topology and performance data across a large cross-section of IPv4 and IPv6 space. We will complete the following funded tasks by mid-2008:
- develop software for communication between and management of remote Ark monitors;
- develop software for dealing with scheduling, resource utilization, and load balancing requirements of core measurement tasks;
- resolve infrastructure security issues;
- test solutions for data transfers, handling, and storage;
- upgrade existing skitter monitors to work with the new platform.
Additional funds are needed to:
- deploy several dozen new monitors worldwide, including in countries that never had a monitor before;
- provide a high-level API scripting language for writing measurement programs.
- test other applications on Ark, e.g., bandwidth estimation, dns surveys, comprehensive IPv4 and IPv6 topology measurements, DNS surveys, security tools
Motivation: Access to real data collected at the Internet backbone links is vital to large-scale network empirical analysis, modeling, security, policy, and architecture development. Unfortunately, the expense of the monitoring equipment (which must be upgraded every few years to keep up with changes in the underlying infrastructure), the difficulty of coordinating deployment with remote volunteers, and security, privacy, and legal concerns limit the available data. Current traces from high-speed Internet links have significant restrictions on their use and are not widely available. CAIDA is in the process of deploying four new OC192 monitors to provide current data from backbone and peering point links to the research community.
As of 2007, CAIDA's work on strategic deployment of high-speed passive monitors receives funding from an NSF CRI grant although NSF was only able to provide 25% of the requested budget. We are expending effort on the following proposed tasks for 2007-2010, though additional funding is needed to complete them:
- set up long-term high-speed traffic monitors on backbone Internet links;
- set up traffic monitors on public, research, and municipal networks;
- integrate visualization tools with performance data collection to achieve real time visualization;
- expand coordinated, real-time traffic monitors that provide dynamic web pages showing real-time traffic reports
Cooperative Measurement and Modeling of Open Networked Systems (COMMONS)
Motivation: We recognize the need to explore novel approaches in order to address and avert three crises facing information infrastructure in the United States: the worsening digital divide, the lack of scientific integrity in network research, and the inability to empirically inform policy decisions at a critical juncture in telecommunications history.
The Cooperative Measurement and Modeling of Open Networked Systems (COMMONS) initiative proposes the following trade: the academic community offers bandwidth and technology resources to emerging community and municipal networks in exchange for access to privacy-respecting measurements to support the research community. Participating networks will allow researchers access to historical and current operational data -- with appropriate anonymization and other privacy-respecting guards -- and will agree to permit and/or participate in openly reviewed experiments required to test new technologies. Funding is needed to support the following tasks:
- develop tools to facilitate privacy-respecting measurement on emerging community and municipal wireless infrastructure;
- test these tools on supporting community networks;
- index gathered data into DatCat;
- build a geographic map of broadband Internet deployment, including penetration, peering, pricing, and performance;
- calibrate the map against our own long-term measurements of topological structure and macroscopic performance, and other related measurements;
- organize workshops to disseminate the data and research results to operators and network researchers.
Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT)
Motivation: Current data on Internet security threats and baseline Internet traffic are required for the development of hardware and software that protects against and mitigates the effects of current malicious software. Yet such data is inherently sensitive in terms of privacy, security, proprietary information, and legal risk. Thus there are few datasets available for the development and testing of defensive technologies.
The Department of Homeland Security (DHS) has established the Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) to provide vetted researchers with current network operational data in a secure and controlled manner that respects the privacy, legal, and economical concerns of Internet users and network operators.
CAIDA has been involved with the development of the PREDICT program since its inception; CAIDA personnel have served in an advisory capacity on all committee developing and implementing PREDICT processes and procedures. CAIDA participates in the PREDICT program as a Data Provider via the collection of routing data, peering point passive traces, and data from the UCSD Network Telescope. CAIDA is a Data Host, serving that data to researchers who have been vetted and approved through the PREDICT program. Through its Data Host and Data Provider roles, a CAIDA representative serves on the PREDICT Application Review and Publication Review Boards. The main funded tasks include:
- collection, documentation, anonymization, and distribution of routing, peering point, and UCSD Network Telescope data
- continue to advise on technical, legal, and practical aspects of PREDICT policies and procedures
Day in the Life of the Internet (DITL)
Motivation: The U.S. National Academy of Sciences challenged the research community to develop the means to capture a day in the life of the Internet. Such a comprehensive measurement would help researchers uncover trends, validate simulations of new architectures and protocols, and constitute a general baseline for evaluating the implications of introducing new ideas into the network.
In January 2006 and January 2007 CAIDA partnered with ISC to coordinate a 48-hour simultaneous measurement event on dozens of root server anycast nodes. In 2007 participants of the experiment also collected packet traces at a few backbone links in various locations. To the best of our knowledge, this event delivered the largest scale simultaneous collection of data from a core component of the global Internet infrastructure ever made available to academic researchers. Based on the success of this preliminary DITL experiment, we have requested funding to support the following tasks:
- conduct several annual DITL data collection events;
- further refine measurement supporting software technologies;
- develop strategic tools for measurement, analysis, visualization, and indexing of resulting data sets.
Additional funds are needed to support creative outreach for disseminating DITL data to the academic community:
- develop educational materials based on DITL data;
- organize series of hands-on workshops teaching examples of methodologically sound DITL data analysis to faculty and students;
- publish case studies derived from DITL data.
Geolocation of Internet resources
Motivation: Accurately identifying the geographic location of network objects is critical to projects in all of CAIDA's focus areas.
This program area currently receives no specific support. However, CAIDA will try to continue its support of publicly accessible technology for the mapping of AS numbers to geographic locations according to the main regional Internet registries (RIRs): ARIN, APNIC, LACNIC, RIPE. Our strategies and techniques for determining geolocation of Internet resources include parsing registry databases and automated name recognition in ISPs host naming patterns. We intend also to develop heuristics for integrating available techniques.
The table below summarizes the status of the projects listed in Section II.
|DatCat||Improvement of Contribution / Curation Tools for DatCat||NSF CRI||in development|
|Archipelago||Community-Oriented Network Measurement Infrastructure||NSF CRI||Oct 06 - Sep 09||active|
|Passive monitors||Community-Oriented Network Measurement Infrastructure||NSF CRI||Oct 06 - Sep 09||active|
|COMMONS||Strategic Measurement and Modeling Technologies for Community Cyberinfrastructure||NSF STCI||submitted|
|PREDICT||Network Traffic Data Repository to Develop Secure IT Infrastructure||DHS||active|
|Day in the life of the Internet||Improving the Integrity of DNS Monitoring and Protection||NSF ITR||Sep 04 - Aug 07||active|
|Strategic Measurement and Modeling Technologies for Community Cyberinfrastructure||NSF STCI||submitted|
|Geolocation of Internet resources||not funded|
III. Measurements and Data
Collection of data for scientific analysis of network function is one of CAIDA's core objectives. We consider measurement tasks to be a valuable component of all our projects and are constantly seeking better technical and methodological solutions to the challenges of Internet measurements.
Ongoing Data Collections
As of 2007, we are funded to continue collecting the following data which we provide to the research community:
- Raw Topology Traces - currently we use skitter tool to probe the IPv4 address space and scamper tool to probe IPv6 address space. In the near future we plan to upgrade our measurement infrastructure to the Archipelago active measurement system.
- AS adjacencies - we filter and aggregate skitter data to compute the adjacency matrix of the Internet AS-level graph on a daily basis.
- AS relationships - we derive AS graph links from RouteViews BGP table snapshots taken at 8-hour intervals over a 5-day period and annotate the links as customer-provider, peer-to-peer, or sibling-to-sibling.
- AS ranking - this interactive CGI script computes degree-based and AS-relationship-based ranking of ASes.
- DNS root/gTLD RTT data - NeTraMet traffic monitors continuously collect round trip times to DNS root and gTLD servers and aggregate them by 5 min intervals.
- Denial-of-Service Backscatter Data - The dataset consists of quarterly week-long collections of responses to spoofed traffic sent by denial-of-service attack victims and received by the UCSD Network Telescope. The Backscatter-TOCS, Backscatter-2004-2005, Backscatter-2006, and Backscatter-2007 datasets provide six years of denial-of-service backscatter data to Internet researchers.
- Witty Internet Worm - the first five days of the spread of the Witty Internet worm, as monitored by the UCSD Network Telescope between Fri Mar 19 20:01:40 PST 2004 and Wed Mar 24 23:01:40 PST 2004.
- Code-Red Worms - the first twenty-one days of the spread of the Code-Red version 2 and CodeRedII Internet worms, as monitored by the UCSD Network Telescope between July 19-20 and August 1-20, 2001.
We also continue to provide previously collected data:
- Passive OC48 Peering Point traces -- data collected from two peering points for major US backbone Internet Service Providers (ISPs).
Proposed Data Collections
We are constantly looking for support to conduct novel, previously unattainable, measurements of Internet characteristics. We require additional funds to:
- collect passive traces on strategic backbone links;
- establish and continue periodic DITL collection events;
- measurements of IPv4 and IPv6 topology and performance over several years;
- improved dns mapping for topology data;
- measurements of broadband penetration, pricing, peering, and performance in the US;
- improved capabilities for identifying the geographic location of IP addresses;
- automated, timely lookup and storage of DNS hostnames and geographic locations of IP addresses seen in Internet measurements;
- develop additional datasets for Internet viruses and worms;
- collect data on queries to spam blacklist servers; and
- anonymize, process, and distribute raw traces from the UCSD Network Telescope.
Supporting Tool Development
Building and maintaining software tools to measure, analyze, and model various Internet characteristics is an important part of CAIDA activities. CAIDA needs more funds to maintain and update existing tools and to continue developing new, better tools for the Internet research community.
Existing tools include:
- workload characterization
- CoralReef - a comprehensive software suite to collect, analyze, visualize, and display trace and flow data from passive Internet traffic monitors
We would like to extend CoralReef with:
- increased support for multi-site, coordinated, privacy protecting anonymization techniques;
- improved ipv6 support; and
- integrated techniques for traffic classification.
- NeTraMet - an implementation of the RTFM architecture for Network Traffic Flow Measurement. We are conducting this work in collaboration with The University of Auckland.
- CoralReef - a comprehensive software suite to collect, analyze, visualize, and display trace and flow data from passive Internet traffic monitors We would like to extend CoralReef with:
- topology measurement
- Archipelago (Ark) - CAIDA's next-generation
active measurement infrastructure. Ark represents an evolution of the skitter infrastructure that has served the network research community for more than 8 years.
We would like to extend the Ark infrastructure:
- to deploy additional measurement nodes in topologically interesting locations;
- to allow collaborators to run their vetted measurement tasks;
- to allow the general public to run highly restricted measurements; and
- through further development of its high-level API and scripting language.
- scamper - a program for conducting
Internet measurement tasks to large numbers of IPv4 and IPv6 addresses, in parallel, to fill a specified packets-per-second
rate. In collaboration with WAND a research group at
the University of Waikato
Computer Science Department, we would like to extend this software to enable:
- comparison of the IPv4 and IPv6 performance for a given server,
- comparison of data collected for five traceroute methods -- traditional udp traceroute, icmp traceroute, paris udp, paris icmp, and tcp according to metrics such as: destinations reached; complete IP paths; AS links inferred for each,
- BGP guided doubletree support,
- non-blocking name resolution for each discovered IP address, and
- improved alias resolution techniques.
- iffinder - identifies interfaces belonging to the same router. We would like to extend this software to enable the integration of other known techniques for alias resolution.
- The warts file format - a C++ class library used by the scamper tool mentioned above, extends the previous arts++ file format. We would like to extend this software with: complete documentation and additional tools for reading, writing, and processing files using this format.
- Archipelago (Ark) - CAIDA's next-generation active measurement infrastructure. Ark represents an evolution of the skitter infrastructure that has served the network research community for more than 8 years. We would like to extend the Ark infrastructure:
- geolocation of IP addresses
- NetGeo - a database and collection of Perl scripts to map IP addresses and AS numbers to geographic locations (not maintained)
- Countries.pm - a CoralReef module allowing conversion between country code abbreviations, country names, and continents
- Walrus - interactively visualizes large directed graphs in 3D space
- Otter - visualizes arbitrary network data expressed as a set of nodes, links or paths
- GeoPlot - creates a geographical image of an arbitrary network data set
- plot-latlong - simple tool for plotting lat/long points on geographic maps
- LibSea - a Java library for representing large directed graphs
- PlotPaths - displays forward and reverse network paths from a single source to one or more destinations
- beluga - plots RTTs and packet loss to all IP hops along a specified forward IP path
- cuttlefish - produces animated GIFs that elucidate diurnal and geographical patterns of displayed data.
- DNS research
- workload characterization
Staff and Support
CAIDA currently employs 15 researchers and support staff based at SDSC; 3 remotely based staff/consultants; 3 undergraduate student workers; and 4 graduate student researchers.
CAIDA has garnered significant corporate support through its Membership program during the Internet bubble, and lost several members when that bubble burst. Currently, the following organizations have made designated gifts to support CAIDA activities:
- Cisco Systems -- the worldwide leader in networking for the Internet.
- WIDE -- a consortium of Japanese research organizations and companies working to establish a Widely Integrated Distributed Environment.
- ARIN -- American Registry for Internet Numbers.
- Endace -- the only company in the world that specializes in building high performance PCI cards for remote network monitoring and surveillance applications. The range of their products covers almost every physical layer at every network speed up to OC192 and 10GigE.
- Limelight Networks -- a leading provider of high-performance content delivery network services.
- Digital Envoy -- a provider of IP intelligence solutions for geo-location, and improved customer interactions.
Designated gifts to CAIDA enable us to maximize use of research dollars. CAIDA could not survive without the generosity of its sponsors.
For further information, please send a message to