CAIDA 2010-2013 Program Plan

A summary of research goals and plans for the period from January 2010 through March 2013.


Mission Statement: CAIDA The Cooperative Association for Internet Data Analysis (CAIDA) is an independent analysis and research group based at the University of California's San Diego Supercomputer Center. CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on:

  • collection, curation, analysis, visualization, dissemination of sets of the best available Internet data,
  • providing macroscopic insight into the behavior of Internet infrastructure worldwide,
  • improving the integrity of the field of Internet science,
  • improving the integrity of operational Internet measurement and management,
  • informing science, technology, and communications public policies.

Program Areas

CAIDA is actively engaged in the following three main program areas:

Program AreaGoal
IResearch and Analysis Analyze and model pertinent features and trends of current Internet usage, develop novel approaches to enable future Internet growth.
IIMeasurements, Data Procurement, and Curation Create state-of-the art infrastructure for measurements, data procurement, and curation; conduct measurements for comprehensive characterization of the Internet.
IIIData and Tools Provide best available datasets and analysis tools to the research community.

Executive Summary

This program plan outlines CAIDA's anticipated activities for 2010-2013, in the areas of research, infrastructure, and data collection to support the research community. For over twelve years, CAIDA has provided a neutral framework for cooperative research, tool, and infrastructure development to support Internet measurement and analysis based on the best available empirical data. Our current research projects, primarily funded by the U.S. National Science Foundation (NSF) and the U.S. Department of Homeland Security (DHS) Science and Technology Directorate, include both measurement-based and theoretical studies of the Internet's core infrastructure, with emphasis on the health and integrity of the global Internet. We are supporting DHS's PREDICT program with several projects: macroscopic traffic data collection and analysis; advancing data disclosure control techniques and methods; and developing ethical guidelines for conducting cybersecurity research. Our infrastructure activities include developing and deploying an active measurement platform that cost-effectively supports global Internet research and security vulnerability analysis. We also continue to lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Finally, we will continue our outreach activities, including our web sites, peer-reviewed papers, workshops, blogging, presentations, and technical reports, presentations.


I. Research and Analysis

CAIDA is involved in a variety of research activities spanning many domains related to network science and engineering. We seek to discover models that characterize fundamental behavior of the Internet as an evolving complex system and are capable of predicting salient aspects of future evolution. We also aim to reliably measure and analyze the current state of the Internet in order to validate our theoretical research against empirical data. Specific projects are listed below, with funding status summarized at the end of each section.

  1. Routing


    Motivation: Routing information is the most basic and, perhaps, the most complicated function that networks perform. Conventional wisdom states that to find paths to destinations through the complex network maze, nodes must communicate and exchange information about the status of their connections to other nodes, since without some knowledge of changing network connectivity, it is not possible to successfully route information through the network.

    In the Internet, this required inter-node communication makes routing both expensive and fragile. The recent Internet Architecture Board report on routing and addressing (RFC4984) identifies convergence costs of deployed routing protocols as one of the most serious scaling issues with the existing Internet routing architecture, aggravated by explosive rates of routing table size growth. Worse yet, the required number of messages for routing state to converge on Internet-like topologies cannot scale better than linearly with network size for any routing algorithm.

    Our routing research efforts aim to develop truly scalable routing algorithms that do not require global knowledge of the network topology. In this project, funded by NSF's Network Science and Engineering (NetSE) program, we will apply our previously developed theoretical framework of hidden metric spaces (HMS) underlying real networks to investigate the efficiency of greedy forwarding mechanisms exploiting the structural properties of the Internet topology. Our research agenda builds on what we have already confirmed: HMSes do underlie real complex network topologies, including the Internet, and make such networks naturally navigable. Even more impressive, greedy routing paths in such topologies are almost always shortest, and successful with a high probability. Next, we will explore more sophisticated models of HMSes, find out how nodes in real networks compute and compare their HMS coordinates, and how network dynamics including link and node failures affect the efficiency of greedy forwarding.

    Currently funded tasks include:

    • show that negatively curved (hyperbolic) HMS are the most congruent with complex networks and explain their common structural properties;
    • prove that all greedy paths in hyperbolic spaces are shortest and successful;
    • verify that HMS underlying real networks are hyperbolic and measure their basic geometric properties;
    • construct embeddings of real networks into the identified hyperbolic spaces.

  2. Topology


    Motivation: We now critically depend on the Internet for our professional, personal, and political lives - and yet our comprehension of its underlying structure and dynamics remains disturbingly insufficient. Fundamental characteristics of the Internet topology are perpetually challenging to measure, research, and analyze, for technical as well as policy and economic reasons.

    With previous NSF and DHS funding, we designed, implemented, deployed, and now operate Ark - a secure platform capable of performing various types of Internet infrastructure measurements and assessments. To improve our understanding of Internet topology structure and behavior, we will integrate state-of-the-art measurement data provided by Ark and our novel analysis methodologies including IP alias resolution techniques, conversion of IP/router to AS-level topology graphs, AS taxonomy and relationship inference, geolocation of IP resources, and various informative graph visualizations.

    Currently funded tasks include:

    • collect annotated macroscopic IPv4 topology data;
    • improve and apply best available IP alias resolution techniques to build router-level graphs of the Internet;
    • improve and apply best available AS relationship inference techniques to create annotated AS-level graphs of the Internet;
    • conduct comparison of topology datasets obtained by different measurement methods and at different levels of granularity;
    • develop software to merge router-level and AS-level graphs into annotated dual Internet topologies;
    • analyze existing methodologies for geolocation of Internet objects and augment our topology graphs with geographic annotations;
    • develop a generator to produce annotated dual Internet topologies for simulations;
    • improve the accuracy of our AS ranking service;
    • develop software and interactive visualizations enabling validation of our ranking inferences;
    • conduct a pilot study on integrating Ark topology measurements and real-time BGPMON routing data;
    • expand collection of IPv6 topology data;

    Additional funds are needed to:

    • develop efficient IPv6 topology probing algorithms;
    • create packages of data about connectivity, routing and latency gathered from a large cross-section of the global Internet targeted as educational aid
    • cross-validate source of topology and performance data from other projects, including the FCC's recently announced broadband measurement program.

  3. Internet Economics


    Motivation: The high-level objective of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. Specifically, we aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow. Existing models fail to capture the complexity of network interactions, or they are not parameterized with real-world data, leaving both fields of network architecture and communications policy research mostly groping in the dark while facing critical related transitions.

    We propose to create powerful, empirically parameterized computational models, and enable broader validation than previously possible. We will use measurements of interdomain traffic, topology dynamics, routing policies and peering practices as input to our detailed model of AS interconnection, and compute the equilibrium -- a state where no network has the incentive to change its connectivity. The key difference from previous modeling efforts is that we parameterize each component of our model using real-world measurements. To validate our model, we will verify that it can reproduce known macroscopic properties of the Internet AS topology as well as known trends in Internet evolution, based on publicly available financial and topological data. We will then use our model to study various interconnection practices, the stability and dynamics of interdomain links, and economic properties of the resulting equilibrium.

    Currently funded tasks include:

    • measure interdomain traffic characteristics directly from different vantage points on the Internet;
    • measure structural characteristics of the Internet's interdomain topology, its evolution over time, and the the economic implications of these properties;
    • measure interdomain routing policies used by networks and the economic incentives behind those policies;
    • measure policies used by different network types as inferred from publicly available information such as peeringDB and Internet Exchange Points (IXPs);
    • develop a simulator based on the developed evolution model and parameterize it using results from the aforementioned measurement studies;
    • validate our model of AS interconnection and dynamics by reproducing known macroscopic properties of the Internet AS topology;
    • use historical, publicly available financial and topological data to verify that our model can reproduce known trends in the evolution of the Internet;
    • use our model to answer various what-if scenarios about the evolution of the Internet, such as the effect of changing traffic patterns, changing price/cost structures, increasing popularity of Internet Exchange Points (IXPs) and the increased use of paid-peering.

  4. Data Analysis and Sharing Methods to Support Security Research


    Motivation: The UCSD network telescope is a portion of routed IP address space on which little or no legitimate traffic exists ("darkspace"). Observing the unsolicited Internet traffic reaching such unoccupied address space, traffic which has increased dramatically over the last year, allows visibility into a wide range of security-related events, including misconfiguration, scanning by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and automated spread of malicious software such as Internet worms or viruses. Learning more about the nature and characteristics of the still growing unsolicited traffic across the Internet, and how it compares across different segments of address sapce, will allow researchers to develop efficient mitigation strategies.

    We aim to design and implement novel methods, protocols and tools to enable researchers to more effectively study current and emerging patterns in spurious and malicious traffic reaching observable darkspace.

    The funded tasks include:

    • maintenance of the UCSD network telescope passive data collection system;
    • collection, documentation, storage, and distribution of requested telescope data to vetted researchers.
    • build infrastructure to retain an approximately 30 day window of telescope data;
    • create front-end software for the telescope ongoing data collection to inform potential data users of interesting security events.
    • automate the heatmap visualization technique to display penetration of a given vulnerability into the IPv4 address space.
    • develop a scheme to classify the observed traffic into currently-known source types (e.g. DoS attacks, port or host scans, vulnerability scans of particular ports, etc.);
    • define sustainable policy for enabling real-time sharing of telescope data;
    • design and deploy a process for allowing multiple vetted researchers to run their analysis programs on the data within approximately an hour of its collection ("moving-code-to-the-data" type of approach);
    • enable tests for specific events/malware;
    • identify a set of attributes for hosts sending unsolicited traffic simple enough to allow real-time computation;

  5. Domain Name System (DNS) monitoring and analysis


    Motivation: The Domain Name System (DNS) is a crucial component of today's Internet. The top layer of the DNS hierarchy (the root name- servers) is facing dramatic changes: cryptographically signing the root zone with DNSSEC, deploying Internationalized Top-Level Domain (TLD) Names (IDNs), and addition of other new global Top Level Domains (TLDs). ICANN has stated plans to deploy all of these changes in the next year or two, and there is growing interest in measurement, testing, and provisioning for foreseen (or unforeseen) complications. With the past NSF funding, CAIDA has been collecting relevant DNS data and conducting research and analysis of DNS workload, stability, and performance. The resulting statistics serve as a baseline for the impending transition to DNSSEC.

    We seek to apply lessons learned from our global trace collection experiments, including improvements to future measurements that will help answer critical questions in the evolving DNS landscape.

    Additional funds are needed to:

    • collect data sets on DNS workload and performance and make them available to other researchers;
    • using measurements and simulations, investigate implications and effects of new IDN ccTLD, DNSSEC, and IPv6 on DNS root server operation;


The table below summarizes the funding status of the projects listed in Section I.

ProjectProposal title Agency/
Program
PeriodStatus
Routing Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures Without Topology Updates NSF FINDOct 2007 - Sep 2010active
Discovering Hyperbolic Metric Spaces Hidden beneath the Internet and Other Complex Networks NSF NetSEMar 2010 - Feb 2013active
Topology Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS S&TAug 2008 - Mar 2011active
CRI-ADDO-EN: Internet Laboratory for Empirical Network Science (iLENS) NSF CRIMar 2010 - Feb 2013active
Internet Economics NetSE:Small:Collaborative Research:The economics of transit and peering interconnections in the Internet NSF NetSEAug 2010 - July 2013active
Data Analysis and Sharing Methods to Support Security Research Supporting Research and Development of Security Technologies through Network and Security Data Collection DHS PREDICTAug 2007 - Jul 2012active
Domain Name System (DNS) Monitoring and Analysis not funded

II. Measurements, Data Procurement, and Curation

The research community continues to suffer a lack of coherent data sets for cross-domain analysis of properties, performance, and dynamics of the wide-area Internet. Since its foundation in 1998, CAIDA has pioneered commodity Internet measurements and making the collected data available to researchers. We plan to continue our investment into data procurement to improve the integrity of Internet science, while navigating the associated technology, legal, and ethical challenges. Current CAIDA infrastructure projects are listed in this section of the Program Plan.

  1. Archipelago (Ark) - Community Measurement Platform


    Motivation: The network research community has long acknowledged the need for a community-oriented measurement platform that would achieve greater scalability and flexibility than existing infrastructures, encourage collaboration among traditionally independent groups and allow previously unattainable groundbreaking research on cyberinfrastructure itself. Now, in its third year of production, the Archipelago (Ark) measurement platform has made measurable progress toward these goals.

    Funded by NSF and DHS, Archipelago (Ark) is CAIDA's newest active measurement infrastructure. It consists of several dozen standard PC's deployed around the world, running software that allows them to operate as a coordinated secure platform capable of performing various types of Internet measurements. We continue to upgrade and extend Ark in geographic scope as well as function. The ultimate goal is to provide academic researchers with an unprecedented laboratory in which to quickly design, implement, and easily coordinate the execution of experiments across a widely distributed set of dedicated monitors.

    CAIDA researchers currently employ Ark to capture macroscopic topology and performance data across a large cross-section of IPv4 and IPv6 address space thus gathering the largest set of Internet topology data available to the research community. Our probing tool for this project is scamper developed by Matthew Luckie from the WAND research group at the University of Waikato Computer Science Department. We will continue our collaboration with WAND on improving the accuracy and integrity of topology measurement, inference, annotation, and other analysis.

    We will also continue our collaboration with the MIT ANA Spoofer project, which uses Ark infrastructure to assess macroscopic trends in IPv4 and IPv6 source address filtering, e.g., of private or bogon addresses, which should not be exiting appropriately configured networks. Ark allows the spoofer project to more comprehensively identiy networks that malicious users can compromise and misuse for purposes of obfuscating their behavior and circumventing existing security barriers.

    Funded tasks include:

    • management and maintenance of existing remote Ark monitors;
    • deploy 1 or 2 monitors per month, preferrably, in the countries that do not have a monitor yet and other topologically interesting locations;
    • curate, archive, and distribute collected data;
    • consolidate, refine, and generalize our tool set for controlling active measurement workflows;
    • develop a high-level API scripting language for writing measurement programs;
    • introduce new experiments on Ark and develop new partnerships with Internet researchers worldwide.

    Additional funds are needed to:

    • deploy more Ark monitors with IPv6 probing capabilities.

  2. Bulk DNS lookup service


    Motivation: DNS annotations are valuable for many passive and active data analyses as it often indicates whether a given IP address belongs to a router, home box, or a web server, as well as revealing organizational affiliation and geographical location information. Since host names may change, it is important to obtain DNS meta-information as close as possible to the primary measurements.

    Our previously developed bulk DNS lookup service executes DNS lookups in parallel with our macroscopic topology measurements, and archives them with timestamps to allow researchers to easily download corresponding topology and DNS data.

    Funded tasks include:

    • create an API for external researchers to use our bulk DNS service
    • upgrade the system performance to handle increased data sizes;
    • develop software to automatically annotate topology data with hostnames.

    Additional funds are needed to:

    • enable lookups of IPv6 addresses in our bulk DNS service;

  3. Passive Monitors


    Motivation: Large-scale network empirical analysis, modeling, security, policy, and architecture development all require access to real data collected on Internet backbone links. Unfortunately, the expense of the monitoring equipment (which must be upgraded every few years to keep up with changes in the underlying infrastructure) exacerbated by the general difficulty of coordinating deployment with remote volunteers seriously impede our efforts to deploy additional passive monitors on backbone Internet links or on public, research, and municipal networks. Moreover, security, privacy, and legal concerns severely limit our options for making already collected data available to researchers. In collaboration with the DHS PREDICT project (see below), we continue to look for solutions that would enable representative passive measurements of the Internet traffic and sharing of the resulting data. In the meantime we focus on tool development to support operators and researchers with current and emerging passive measurement needs.

    Currently, in collaboration with hosting sites, CAIDA helps support data collection components on monitors in Internet exchange points in San Jose, CA and Chicago, IL, where we regularly capture packet traces from backbone and peering point links. These data have significant policy restrictions on their use.

    Funded tasks include:

    • add IPv6, DNSSEC, anonymization, aggregation, and netflow capabilities to the Coralreef traffic analysis software suite;
    • expand and improve dynamic web pages showing real-time traffic reports;
    • update the Coralreef report generator to include statistics of interest for the NSF International Research Network Connections (IRNC) community;
    • help IRNC and other research community members install traffic monitors in their networks to support researcher access to data .

  4. Day in the Life of the Internet (DITL)


    Motivation: A significant increase in the quantity, quality, and accessibility of empirical data supporting Internet research would help to answer many open research questions regarding Internet workload, topology, routing, performance, and economics. Comprehensive measurements would enable researchers to uncover trends, validate simulations of new architectures and protocols, and develop a baseline against which to evaluate new network protocols or other architectural innovations.

    Over the past four years, CAIDA has worked with DNS-OARC and ISC to coordinate annual Day-in-the-Life of the Internet (DITL) collection events. Our goal remains the establishment of periodic, synchronized, widely distributed measurements delivering heterogeneous and diverse data sets, and the provision of supporting tools, analyses, visualizations, and meta-data indexes.

    CAIDA seeks funding to support the following tasks:

    • conduct annual DITL data collection events;
    • further refine measurement supporting software technologies;
    • develop strategic tools for analysis, visualization, and indexing of resulting data sets;
    • develop educational materials based on DITL data;
    • organize series of hands-on workshops teaching examples of methodologically sound DITL data analysis to faculty and students;
    • publish case studies derived from DITL data.

  5. Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT)


    Motivation: Current data on Internet security threats and baseline Internet traffic are required for the development of hardware and software that would protect against and mitigate the effects of these threats. Yet obtaining such data is not only a technically challenging task, but also fraught with privacy, security, proprietary, and legal risks. Thus there are few datasets available for the development and testing of defensive technologies, stunting the progress of security research. The Department of Homeland Security (DHS) has established the Protected REpository for the Defense of Infrastructure against Cyber Threats (PREDICT) to provide vetted researchers with current network operational data in a secure and controlled manner that respects the privacy, legal, and ethical concerns of Internet users and network operators. In collaboration with DHS and network researchers across the country, CAIDA personnel have helped develop and implement PREDICT processes and procedures. CAIDA also serves as a PREDICT Data Provider and Host for a subset of CAIDA data approved by PREDICT, including Internet topology data, older backbone passive traces (packet headers), and traffic data from the UCSD Network Telescope.

    The main funded tasks include:

    • collection, documentation, and distribution of IPv4 topology and derived AS topology datasets;
    • collection, documentation, anonymization, and distribution of traces from a large-ISP backbone link;
    • collection, documentation, and distribution of UCSD Network Telescope data ;
    • participation in the PREDICT Application Review and Publication Review Boards;
    • providing feedback and advice on technical, legal, and practical aspects of developing PREDICT policies and procedures.

  6. Privacy-Sensitive Data Sharing


    Motivation: Concerns regarding end-user privacy and potential risks stemming from unauthorized or unintended data disclosure present daunting challenges to researchers looking for access to real world Internet data. Adopting efficient, appropriate, and flexible disclosure control techniques is a promising first step toward enabling the kind of meaningful data access necessary for development and validation of scientific models.

    We have developed a Privacy-sensitive Internet Data-Sharing framework that integrates privacy-enhancing technology with a consistent policy approach applying proven and standard privacy principles and obligations of data seekers and data providers. The framework allows evaluation of data-sharing techniques along two primary criteria: (1) how they address privacy risks; and, (2) how they achieve utility objectives. We will continue to develop, refine, and popularize this Framework seeking to increase the volume of data available to the network research, security, and development communities, facilitate access to novel datasets, and work toward real-time situational awareness on current Internet usage and threats.

    The main funded tasks include:

    • investigate possible approaches to real-time data sharing balancing privacy and utility;
    • educate policy community about Internet research, its advances and unsolved problems, and related data needs;
    • educate researchers about the current legal obstacles to data collection and sharing, and how to effectively apply privacy protection mechanisms in their research;
    • help design workshops and contribute to a documented set of guidelines for considering and addressing ethical issues in network and security research.

  7. DatCat - Internet Measurement Data Catalog


    Motivation: DatCat, the Internet Measurement Data Catalog (DatCat), indexes Internet measurement data, allowing researchers, faculty, and students to find, annotate, cite, and share data. The goals of the catalog are:

    • to facilitate searching for and sharing of data among researchers
    • to enhance documentation of datasets via a public annotation system
    • to advance network science by promoting reproducible research

    Current development of the DatCat respository will be driven by lessons we have learned over the past several years; fine-grained metadata is unwieldly, proprietary database backend limits sustainability, and penetration would benefit from momentum in a narrow community. We strongly encourage Internet researchers to use DatCat to (1) find real data to help validate their research; and (2) to share their data with others.

    The funded tasks include:

    • continue ongoing indexing of newly collected CAIDA data;
    • design and implement versatile DatCat submission tools;
    • expand DatCat capabilities to enable cataloging tools in addition to data;
      • streamline the user experience by creating standalone publications and collections objects;
      • implement an entirely web-based submission interface;
      • add support for more flexible granularity in annotations to allow annotating entire collections, subsets of files
    • migrate to an open source solution for the backend database;
    • expand the DatCat community to Cybersecurity and other research fields;
    • hold workshops helping researchers to index their data in DatCat.


The table below summarizes the funding status of the projects listed in Section II.

ProjectProposal titleAgency/
Program
PeriodAmount/
Status
Archipelago Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS S&TAug 2008 - Mar 2011active
CRI-ADDO-EN: Internet Laboratory for Empirical Network Science (iLENS) NSF CRIMar 2010 - Feb 2013active
DNS lookup service CRI-ADDO-EN: Internet Laboratory for Empirical Network Science (iLENS) NSF CRIMar 2010 - Feb 2013active
Passive monitors Supporting Research and Development of Security Technologies through Network and Security Data Collection DHS PREDICTAug 2007 - Jul 2012active
Day in the life of the Internet not funded
PREDICT Supporting Research and Development of Security Technologies through Network and Security Data Collection DHS PREDICTAug 2007 - Jul 2012active
Privacy Sensitive Data Sharing Supporting Research and Development of Security Technologies through Network and Security Data Collection DHS PREDICTAug 2007 - Jul 2012active
IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks NSF IRNCMar 2010 - Feb 2013active

III. Data and Tools

Collection of data for scientific analysis of network function and developing measurement and analysis tools are among CAIDA's core objectives. We are constantly seeking better technical and methodological solutions to the challenges of Internet measurements.

  1. Ongoing Data Collections


    Currently, we are funded to continue collecting the following data which we provide to the research community:

    • Macroscopic Internet Topology Data Kit (ITDK) - the latest ITDK release (Jan 2010), currently consisting of two router-level topologies that differ in the accuracy and completeness of the alias resolution methods used to create them.
    • Raw Topology Traces - our Macroscopic Topology Project runs the team-probing experiment on our Archipelago infrastructure using the scamper tool to probe IPv4 and IPv6 address space.
    • AS adjacencies - we filter and aggregate Ark data to compute the adjacency matrix of the Internet AS-level graph on a daily basis.
    • AS relationships - we derive AS graph links from RouteViews BGP table snapshots taken at 8-hour intervals over a 5-day period and annotate the links as customer-provider, peer-to-peer, or sibling-to-sibling.
    • AS ranking - this interactive CGI script computes degree-based and AS-relationship-based ranking of ASes.
    • DNS root/gTLD RTT data - NeTraMet traffic monitors continuously collect round trip times to DNS root and gTLD servers and aggregate them by 5 min intervals.
    • Passive OC192 Peering Point traces - one-hour traces are collected quarterly from two peering points for major US backbone Internet Service Providers (ISPs).
    • Network Telescope traces - we maintain a collection of traces from the Network Telescope, covering a time period from two months ago up to the current time.

  2. Unique Data Sets


    We also continue to offer a few data sets that were either created as a one time event or are the past collections that we do not replenish any more.

    • Macroscopic Internet Topology Data Kits (ITDK) - created in 2002 and 2003.
    • "PAM AS Router-Assignement Dataset - supplemental data for the publication "Toward Topology Dualism: Improving the Accuracy of AS Annotations for Routers" by B. Huffaker, A. Dhamdhere, M. Fomenkov, and kc claffy (in: PAM 2010 Proc.).
    • The computed statistics for AS-level Internet graphs derived from BGP, skitter traceroutes, and WHOIS data - supplemental data for the paper "The Internet AS-Level Topology: Three Data Sources and One Definitive Metric"by P. Mahadevan, A. Vahdat, D. Krioukov, M. Fomenkov, B. Huffaker, kc claffy, and X. Dimitropoulos.
    • "DDoS Attack 2007" Dataset - approximately one hour of anonymized traffic traces from a DDoS attack on August 4, 2007 (20:50:08 UTC to 21:56:16 UTC).
    • "Two Days in November 2008" Dataset - two days (2008-11-12 and 2008-11-19) of "typical" background radiation as observed by the UCSD Network Telescope.
    • "Three Days Of Conficker" Dataset - three days (2008-11-21, 2008-12-21 and 2009-01-21) of UCSD Network Telescope traffic related to the Conficker outbreak that started on 2008-11-21.
    • Passive OC48 Peering Point traces - data collected in 2002 and 2003 from two peering points for major US backbone Internet Service Providers (ISPs).
    • Witty Internet Worm - the first five days of the spread of the Witty Internet worm, as monitored by the UCSD Network Telescope between Fri Mar 19 20:01:40 PST 2004 and Wed Mar 24 23:01:40 PST 2004.
    • Code-Red Worms - the first twenty-one days of the spread of the Code-Red version 2 and CodeRedII Internet worms, as monitored by the UCSD Network Telescope between July 19-20 and August 1-20, 2001.
    • Denial-of-Service Backscatter dataset - quarterly week-long collections of responses to spoofed traffic sent by denial-of-service attack victims and received by the UCSD Network Telescope. The Backscatter-TOCS, Backscatter-2004-2005, Backscatter-2006, and Backscatter-2007 datasets provide six years of denial-of-service backscatter data to Internet researchers.

  3. Proposed Data Collections


    We are looking to conduct novel, previously unattainable, measurements of Internet characteristics.

    Additional funds are needed to:

    • continue periodic DITL collection events;
    • expand measurements of IPv6 topology and performance;
    • enable DNS mapping for IPv6 topology data;
    • improve capabilities for identifying the geographic location of IP addresses;
    • develop software to create, archive, and distribute to researchers dual router-AS level graphs of the Internet annotated with DNS hostnames and geographic locations;
    • enable real-time access to filtered traces from the UCSD Network Telescope.

  4. Supporting Tool Development


    Our research requires building and maintaining software tools to measure, analyze, and model various Internet characteristics. We plan the following tool improvements and new developments:

    • AS-relationship visualization tool - we received NSF funding to develop and implement a new vizualization tool that will present an understandable view of intra- and inter-domain AS topology. This interactive tool will not only reveal connectivity of individual ASes, but would also create an incentive and an easy method for network operators to validate and correct our measurement results and derived inferences.
    • CoralReef - a comprehensive software suite to collect, analyze, visualize, and display trace and flow data from passive Internet traffic monitors. We are funded by NSF to extend CoralReef with:
      • parsing netflow output from routers;
      • increasing support for privacy protecting anonymization techniques;
      • improving IPv6 capabilities;
      • adding DNSSEC support.
      We need additional support to update our traffic reporting, configurable for more recently applications and for effectively viewing traffic in one or both directions.
    • topostats - a package of programs that calculate various statistics on network topologies (that is, on graphs). With DHS funding, we will continue to test and refine the currently released beta-version of this software, prepare the necessary documentation and usage examples.
    • scamper - developed by Matthew Luckie of the WAND group at the University of Waikato - a program for conducting Internet measurement tasks to large numbers of IPv4 and IPv6 addresses, in parallel, at a specified packets-per-second rate. In collaboration with the WAND research group, we would like to extend this software to enable:
      • comparison of the IPv4 and IPv6 performance for a given server;
      • BGP guided doubletree support;
      • non-blocking name resolution for each discovered IP address,
    • mper - a measurement engine based on scamper for parallel low-level dynamic measurements. Supported by NSF and DHS , we will develop this tool as a supplement to scamper to:
      • enable precise control over probe spacing and timing;
      • react swiftly and dynamically to probing responses;
      • match probes and responses with a high accuracy.
    • MAARS - Multi-Approach Alias Resolution System for IP-to-router alias resolution measurements on Ark and related data analysis. Using funds from NSF and DHS, we will integrate our previously developed alias resolution tools iffinder, kapar, and MIDAR to achieve a comprehensive systemic approach to alias resolution producing state-of-the-art router-level graphs of the Internet.


Staff and Support

Staff

As of 2010, CAIDA employs 11 researchers and support staff based at SDSC; 2 remotely based staff; and 2 postdoctoral appointees. We regularly support undergraduate student workers via NSF-funded REU program. We also offer summer and/or longer term internships to graduate students and young scientists.

Support

The following organizations have made designated gifts or provided in-kind licensing or services in support of CAIDA activities:

  • Cisco Systems -- the worldwide leader in networking for the Internet.
  • Limelight Networks -- a leading provider of high-performance content delivery network services.
  • Digital Envoy -- a provider of IP intelligence solutions for geo-location, and improved customer interactions.

Designated gifts to CAIDA enable us to maximize use of research dollars. Additionally, CAIDA could not survive without the generosity of its affiliates, members, and sponsors.

For further information, please send a message to

Published
Last Modified