The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

CAIDA's Annual Report for 2009

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2009.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Executive Summary

This annual report covers CAIDA's activities in 2009, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, and policy. Our infrastructure activities support several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems.

We made significant advances (again...) in Internet topology research, supported by the expanding Ark measurement infrastructure and growing interest in understanding more about the Internet's robustness, security, and scalability. We continue to share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6, and ten of our hosting sites provide IPv6 connectivity. We have developed substantial additional software to better support distributed measurement experiments. Specific to our IPv4 topology mapping project, we have taken on the task of optimizing and improving on existing techniques for IP address alias resolution for large Internet graphs, and are planning to package up and release an implementation of our algorithms next year. In 2009 we expanded the capability of other researchers to use the Ark infrastructure for independent experiments, including an extensive Internet-wide test of network filtering hygiene.

On the theoretical side of topology research, we finally published our topology modeling framework that treats annotations as an extended correlation profile of a network, which supports rescaling topologies while retaining the same (measured) annotation profile. We also advanced our exploration of geometric structure underlying Internet-like topologies as observed in our and other measurements. Specifically, hyperbolic geometry captures an important property of complex networks: exponential expansion in space. We explored even deeper connections between network topological structure (e.g., degree distribution, clustering) and physical phenomena such as curvature and temperature.

These discoveries about topology drive our routing research agenda, a long-term objective of which is to enable dramatically more scalable global Internet routing. We explored the ramifications of the discoveries we made last year regarding efficient routing on graph topologies statistically similar to those of the Internet. Based on the evidence, e.g, clustering, observable on the Internet and other complex networks, we found that underlying hyperbolic hidden metric spaces provide a natural explanation for why so many of these complex networks found in nature can achieve such phenomenally efficient (greedy) routing without distributing global topology knowledge. Since the distribution of global knowledge about network structure is perhaps the most critically limiting requirement of the current Internet interdomain routing system, we are still investigating theoretical details of a potentially radical solution to Internet routing scalability, which takes advantage of what nature knows that we do not (yet).

We undertook several traffic analysis activities, including creating a structured taxonomy of Internet traffic classification papers and their data sets, and analyzing the "Day in the Life of the Internet" 2009 data set, consisting of 24 hours of detailed DNS packet data collected at many participating root servers as well other high-profile DNS servers. We have reduced our traffic analysis activities in lieu of pursuing progress in the policy space through participation in DHS's PREDICT project (Protected Repository of Data for Internet Cyber Threats). As part of this project, we have proposed a more flexible privacy-sensitive data-sharing framework and an experiment to test it on the UCSD network telescope instrumentation next year.

We are growing the scope of our economics and policy research. We responded to several requests from Internet governance as well as U.S. government agencies for comments and guidance on policy matters. We launched a workshop series in Internet economics, to try to begin framing a research agenda for the emerging but stunted field of Internet infrastructure economics. On the theoretical side, we published an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes), which builds on the preferential attachment (PA) model but captures fundamental differences between transit and non-transit networks. This multi-class PA model predicts a definitive set of statistics characterizing the AS topology structure, closing the "measure-model-validate-predict" loop, and providing further evidence that preferential attachment is the main driving force behind Internet evolution.

Finally, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at https://www.caida.org/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's topology research agenda includes three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.

Activities

  1. Macroscopic Topology Measurements:
    1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the second full calendar year of the IPv4 Routed /24 Topology Dataset. By the end of 2009, we increased the number of vantage points to 40 Ark monitors deployed in 22 countries.
    2. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2009, Ark had 10 monitors collecting the IPv6 Topology Dataset for researchers to get a view of the emerging IPv6 global topology.
    3. We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
  2. Analysis of the Observable Topology:
    • We improved our measurement techniques and analysis methodologies for alias resolution inferences. We use the Ark platform and run the following three tools: kapar, iffinder and MIDAR. We then combine the outcomes in order to map IPs to routers as accurately and completely as feasible. Using publicly available data from many networks and ground-truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. Our preliminary results were submitted to ACM Computer Communications Review (CCR), and appeared ("Internet-Scale IP Alias Resolution Techniques") in the January 2010 issue.
    • We continued to produce the AS-level topologies annotated with business relationships between ASes dataset on a bi-weekly basis. We use our published algorithms to infer these relationships, recognizing their directional nature, and annotate each link in an AS topology as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship.
    • We created a new version of our popular AS Core Graph visualizations for both IPv4 and IPv6 address space using January 2009 data collected by Ark monitors.
  • Topology Modeling:
    1. We introduced a network topology modeling framework that treats annotations as an extended correlation profile of a network. The framework includes an algorithm to rescale and construct networks of varying size that still reproduce the original measured annotation profile. These results are published in a paper "Graph Annotations in Modeling Complex Network Topologies" in ACM Transactions on Modeling and Computer Simulation (TOMACS).
    2. We developed an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes) -- the multi-class preferential attachment (MPA) model. All of the model parameters are measurable from available Internet topology data. Given the estimated values of these parameters, our analytic results predict a definitive set of statistics characterizing the AS topology structure that is not part of the model formulation. The MPA model thus closes the "measure-model-validate-predict" loop, and provides further evidence that preferential attachment is the main driving force behind Internet evolution. The results were published in "Evolution of the Internet AS-Level Ecosystem", presented at the First International Conference on Complex Sciences: Theory and Applications (Complex'2009).
    3. We established a connection between observed scale-free topologies and hidden hyperbolic geometries of complex networks. Space expands exponentially in hyperbolic geometry, and scale-free topologies emerge as a consequence of this exponential expansion. Fermi-Dirac statistics connects observed topology to hidden geometry: observed edges are fermions, hidden distances are their energies; the curvature of the hidden space affects the heterogeneity of the degree distribution, while clustering is a function of temperature. Understanding the connection between topology and geometry of complex networks contributes to studying the efficiency of their functions, and may find practical applications in many disciplines, ranging from Internet routing to brain, cell signaling, or protein folding research. We published the paper "Curvature and Temperature of Complex Networks" in Physical Review E.
    4. We showed that the global structure of some real networks is statistically determined by the distributions of local motifs (small building blocks of complex networks) of size at most 3, once we augment motifs to include node degree information. We applied our analysis to various complex networks, such as: a social web of trust, protein interactions, scientific collaborations, air transportation, the Internet, and a power grid. In all cases except the power grid, random networks that maintain the degree-enriched connectivity profiles for node triples in the original network reproduce all its local and global properties. Therefore, network topology generators are guaranteed to reproduce essential local and global network properties as soon as they reproduce 3-node connectivity statistics. Our results are published on our web site ("How Small Are Building Blocks of Complex Networks") and in arxiv.
  • Major Milestones

    Funding Sources

    Our topology research received support from:


    Routing

    Toward Mathematically Rigorous Next-Generation Routing Protocols for Realistic Network Topologies

    Goals

    The primary objective of CAIDA's research in Internet routing is to develop and evaluate solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, our work in this area has profound implications for network science in other disciplines (physics, biology, chemistry, social sciences).

    Activities

    1. We studied the process of routing information through networks as a universal phenomenon existing in both natural and man-made complex systems. In many complex networks found in nature, nodes communicate efficiently even without full knowledge of global network connectivity. We demonstrated that the peculiar structural characteristics of observable complex networks is consistent with maximizing communication efficiency when using greedy routing approaches without global knowledge. We also described a general mechanism that explains this connection between network structure and function, in "Navigability of complex networks" published in Nature Physics and given significant press coverage.
    2. Random scale-free networks are ultrasmall worlds since the average length of the shortest paths in networks of size N scales as lnlnN. We showed that these ultrasmall worlds can be navigated in ultrashort time. Greedy routing on scale-free networks embedded in metric spaces uses only local information yet finds asymptotically the shortest paths, direct computation of which requires global topology knowledge. Our findings imply that the peculiar structure of complex networks ensures that the lack of global topological awareness has asymptotically no impact on the length of communication paths. These results have important consequences for communication systems such as the Internet, where maintaining knowledge of current topology is a major scalability bottleneck. We published "Navigating Ultrasmall Worlds in Ultrashort Time" in Physical Review Letters. This paper received favorable press coverage in Nature, NewScientist, and PhysOrg.
    3. We showed that complex (scale-free) network topologies naturally emerge from hyperbolic metric spaces. The negatively curved hyperbolic spaces also ensure extremely efficient greedy forwarding on these topologies, achieving almost 100% reachability and optimal (i.e., shortest) path lengths, even under dynamic network conditions. Our findings suggest that forwarding information through complex networks like the Internet may be possible without the current overhead of routing protocols, and may also find practical applications in overlay networks for tasks such as application-level routing, information sharing, and data distribution. These results are published in "Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces" in ACM SIGMETRICS Performance Evaluation Review.

    Major Milestones

    Student Involvement

    An undergraduate student Connie Liu developed informative visualizations representing hyperbolic spaces and other routing research results.

    Funding Sources

    Our routing research received support from:


    Traffic Analysis

    Internet traffic measurement, classification, and analysis

    Goals

    CAIDA has a long history of passive traces acquisition and curation aimed at traffic monitoring, classification, and workload characterization. In 2009 we continued to host visiting researchers who, in collaboration with CAIDA researchers, analyzed properties of available traces.

    Activities

    1. With the help of visiting scholar Mia Zhang, we created a structured taxonomy of Internet traffic classification papers and their data sets.
    2. Visiting scholars Maurizio Dusi and Wolfgang John developed a flow-based symmetry estimation tool to evaluate routing asymmetry in Internet traffic.
    3. Maurizio Dusi developed and tested his new tool, gt, which gathers and indexes ground truth information about passively collected network traffic. A paper describing the tool, "GT: picking up the truth from the ground for Internet traffic" was published in ACM SIGCOMM Computer Communication Review (CCR).
    4. Working with traffic traces from backbone links in the US and in Sweden collected over the period 2002-2009, visiting scholars Wolfgang John and Mia Zhang analyzed UDP traffic in the Internet. They found that most UDP flows use random high ports and carry few packets with little content, consistent with UDP's role in signaling protocols for increasingly popular P2P applications.

    Major Milestones

    Student Participants

    CAIDA hosted the following visiting graduate student scholars:

    • Maurizio Dusi from the Universita di Brescia, Italy
    • Mia Zhang from Beijing Jiatung University, China
    • Wolfgang John from Chalmers University of Technology, Sweden

    Funding Sources

    Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.


    DNS

    Improving the Integrity of Domain Name System (DNS) Monitoring and Protection

    Goals

    CAIDA researchers conduct DNS measurements and develop tools, models, and analysis methodologies for use by DNS operators and researchers.

    Activities

    NSF funding supporting CAIDA DNS research ended in August 2009. However, we continued collection and analysis of data from the DNS root nameservers continuing the series of annual Day-in-the-life-of-the-Internet (DITL) experiments.

    1. In collaboration with ISC and OARC, we held the fourth large-scale data collection event on March 30 - April 1, 2009 (DITL 2009). We captured tcpdump traces at nearly all anycast instances of the A, C, E, F, H, K, L, and M root servers as well as numerous AS112, gTLD and ccTLD domain servers. The 2009 collection spans three full days of continuous capture. This unique dataset again represents the most comprehensive measurements of the root servers to date, and provides researchers with unprecedented insight into root server workload characteristics and performance. OARC published a summary of the collection event. These data are available to the research community via the DNS-OARC. Academic researchers can participate in the DNS-OARC for free.
    2. We also capture tcpdump traces of these DNS queries for other potential annotations and for analysis of EDNS0, DNSSEC, and other emerging protocols.

    Major Milestones

    • DNS Research Update
      • Participation in DITL 2009 large-scale simultaneous DNS root data collection event.
      • kc claffy prepared and distributed CAIDA's DNS Research Updates to the attendees of the DNS RSSAC meeting in March 2009.

    Student Involvement

    CAIDA hosted the following visiting graduate student scholars:

    • Mia Zhang, a PhD candidate from Beijing Jiatung University, China; and
    • Wolfgang John, a PhD candidate from Chalmers University of Technology, Sweden.

    Funding Sources

    Our DNS research received support from NSF grant (SCI-0427144) DNS-ITR: "Improving the Integrity of Domain Name System (DNS) Monitoring and Protection" (though it ended early this year).

    Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.


    Data Sharing for Security

    Goals

    CAIDA recognizes the UCSD Network Telescope, a passive data collection system focused on a globally routed /8 network that carries almost no legitimate traffic, as a unique resource whose data may provide insights for network security researchers. Because we can easily separate the legitimate traffic from the incoming packets, the network telescope provides us with a monitoring point for anomalous traffic that represents almost 1/256th of all IPv4 destination addresses on the Internet.

    Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, the monitor does not capture legitimate traffic, but rather communications that results from wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).

    To deliver such data to the research community requires technology to accomplish the data capture and further requires policy infrastructure to protect the rights and avoid risk to stakeholders. CAIDA spent much effort in 2009 on building the policy infrastructure and data sharing framework required to enable the sharing of the data we capture with the network security researcher community.

    Activities

    1. UCSD Network Telescope
      • In line with our mission to foster a collaborative environment for data acquisition and sharing, we made Two days in November 2008 from our network telescope available to researchers.
      • Reports in late November 2008 of a worm outbreak drew our attention to our telescope to look for evidence. We published a web report, "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" that includes background information on the worm, a description of the scanning behavior we observed, heuristics for determining which packets were likely associated with the Conficker worm, some animations that show the growth of TCP/445 scanning globally, and some correlations we observed with other data sources.
      • Development and refinement of measurement and analysis tools for one-way unsolicited traffic monitoring.
    2. Data Sharing Policy Development

    Major Milestones

    • UCSD Network Telescope
      • We released the Two days in November 2008 dataset sourced from our network telescope.
    • Data Sharing Policy Development

    Funding Sources

    Our research in data sharing for security comes from DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".


    Economics, Ownership, and Trust

    Goals

    Our dependence on the Internet for our professional, personal, and political lives has rapidly grown much stronger than our comprehension of its underlying structure, performance limits, dynamics, and evolution. In light of recent milestones in regulatory policy, our understanding of the underlying economic forces and dynamics of the Internet is of increasing relevance.

    To follow up on our several years of work studying IPv4 exhaustion and IPv6 deployment (or lack thereof) in response to RIR needs, in 2009 we offered a draft recommendation for an IPv4 exhaustion research agenda (none of which, so far as we know, have been pursued). We also responded to requests from government agencies and policymaking bodies (including the FCC, DHS, FTC) for comments and positions on inform policy with the best available empirical data. As society recognizes the need for an equitable way to pay for this new communications infrastructure, policymakers will need metrics to more effectively describe, and policies for more transparently reporting on, infrastructure penetration, performance, peering, and prices for bit transmission services.

    Activities

    1. In March, PI kc claffy presented "Broadband Conditions" at the NTIA Broadband Technology Opportunities Program meeting. The video is also available at the meeting URL under March 23, 2009 "Show Session 1: Roundtable on Nondiscrimination and Interconnection Obligations", and a text transcript of that session 1 is available on the NTIA website. kc claffy followed up with a posting of the Top ten ($7.2B) broadband stimulus: ideal conditions
    2. kc claffy presented "Ten Things the FCC Should Know about the Internet" to the Federal Communications Commission in Washington D.C. on May 29, 2009.
    3. Early in the year, we put forth a proposal for an ICANN/RIR scenario planning exercise to conduct a more structured conversation according to established discipline of scenario planning. While this never happened, later in the year, on September 23, 2010 CAIDA, in collaboration with Georgia Tech, hosted the 1st Workshop on Internet Economics via web videoconference. The event made use of the electronic conference hosting facilities supported by the California Institute of Technology (CalTech) EVO Collaboration Network. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. We published the final report in ACM SIGCOMM Computer Communication Review (CCR), April 2010. Vol 40, no. 2, pp. 55-59.
    4. In October, Dmitri Krioukov presented "Evolution of the Internet Ecosystem" at the Southern California Symposium on Network Economics and Game Theory (SoCal NEGT). The paper was also presented at The First International Conference on Complex Sciences: Theory and Applications (Complex'2009), and published in the European Physical Journal B, vol. 74, no. 2, March 2010, pp. 271-278.

    Major Milestones

    Funding Sources

    Our economics research received support from:


    Infrastructure Projects


    Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure

    Goals

    On September 12, 2007, our next generation active measurement infrastructure, Archipelago (Ark) began collecting its first production data as part of the IPv4 Routed /24 Topology Dataset, using the scamper probing tool. Ark provides the hardware and software infrastructure for the Macroscopic Topology Project and replaces the previous skitter-based infrastructure. Ark achieves greater scalability and flexibility than the previous measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to support vetted measurement tasks on a dedicated distributed platform.

    Ark's uniquel design considers coordination the fundamental activity of a measurement infrastructure. Coordination allows the many pieces of the infrastructure to work together efficiently toward a common goal and is necessary to enable collaborative use of the infrastructure by multiple researchers. Archipelago utilizes Marinda, a coordination facility inspired by David Gelernter's tuple-space based Linda coordination language. Archipelago extends Gelernter's tuple space model with features needed to support a globally distributed measurement infrastructure that hosts heterogeneous measurements by a community of researchers.

    Activities

    1. The Archipelago (Ark) Project expanded its infrastructure scope in 2009, from 30 monitors in 21 countries at the end of 2008 to 41 monitors in 25 countries at the end of 2009. We also implemented IPv6 measurements on 10 Ark boxes, and a prototyped a systemwide topology-measurement-on-demand service.
    2. We improved our infrastructure for meta-data annotations of Autonomous Systems and IP addresses, augmenting it with DNS data.
    3. Building on our study of existing state-of-the-art IP address alias resolution technology, we did research, development, and evaluation of probing and inference algorithms to resolve independent IP addresses into the same physical device (router). We are planning to publish the results of this work in 2010.
    4. Our team-probing application uses scamper as its primary active measurement topology tool. Developed by Matthew Luckie, it supports IPv4 & IPv6, TCP-, UDP-, and ICMP traceroutes, ping, path MTU discovery, fine-grained multiplexing of destination lists, programmatic control via socket, warts format files with more information than arts++ files including cycle start & end markers and measurement metadata (e.g., probing parameters). We contributed patches to scamper, and several software tools to make it easier to write measurement tools and servers: ScamperDataFeed, ScamperIO. We also implemented a derivative tool based on scamper to enable lighter weight measurements that can still benefit from of some of scamper's functionality.
    5. We implemented persistence in the Marinda tuple space, allowing us to transparently checkpoint and restart the global server without disrupting ongoing experiments. We wrote extensive Marinda installation and programming guides and shared the software with collaborators for evaluation and feedback before we release it more broadly.
    6. In collaboration with Rob Beverly of the Naval Postgraduate School, we developed software support to enhance the spoofer project, which used Ark to globally expand its measurement of source address validation and filtering. Using Ark's distributed infrastructure and approximately 12,000 active measurement clients, our measurements revealed little improvement over four years of measurement. 80% of the source address filters we observed were implemented a single IP hop from sources, with over 95% of blocked packets observably filtered within the source's autonomous system. Our results were published and presented at IMC2009 in ``Understanding the Efficacy of Deployed Internet Source Address Validation Filtering''.

    Major Milestones


    Tools

    CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

    2009 Tool Development

    CoralReef

    The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (10GE/OC192, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.

    We released CoralReef version 3.8.6 in June of 2009.

    CAIDA Tools Download Report

    The table below displays all the CAIDA developed tools distributed via our home page at https://catalog.caida.org/software and the number of downloads of each version during 2009.

    • Currently Supported Tools

      Tool Description Downloads
      Autofocus Tool for generating Internet traffic reports and time-series graphs. 324
      CoralReef A software suite to collect and analyze data from passive Internet traffic monitors. 818
      Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 150
      dsc A system for collecting and exploring statistics from DNS servers. 1,570
      dnsstat An application that collects DNS queries on UDP port 53 to report statistics. 199
      dnstop A libpcap application that displays tables of DNS traffic. 8,944
      iffinder One of several tools to perform alias resolution, to discover IP interfaces belonging to the same router. 288
      sk_analysis_dump A tool for analysis of traceroute-like topology data. 164
      Walrus A tool for interactively visualizing large directed graphs in 3D space. 3,410
      libsea A file format and a Java library for representing large directed graphs. 353
      Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 91
      plot-latlong A tool for plotting points on geographic maps. 248
    • Past Tools (Unsupported)

      Tool Description Downloads
      arts++ A binary file format for storing network data. 1647
      cflowd A Netflow analysis tool 311
      Mapnet A tool for visualizing the infrastructure of multiple backbone providers simultaneously. 12,215
      GeoPlot A light-weight java applet creates a geographical image of a data set. 552
      GTrace A graphical front-end to traceroute. 579
      otter A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths. 374
      plotpaths An application that displays forward and reverse network path data. 114
      plankton A tool for visualizing NLANR's Web Cache Hierarchy 35

    Data

    In 2009, CAIDA captured and curated data from three primary sources of network data:

    • macroscopic topology data with the Archipelago infrastructure,
    • passive traffic traces at tier1 OC192 Internet Backbone links,
    • passive traffic traces from the UCSD Network Telescope
    • .
    We derived several datasets from this data that we make publicly available to researchers. These include our AS Rank, AS adjacencies, and Router adjacencies datasets. In addition we released a Telescope Internet "background radiation" dataset and a Telescope Conficker dataset. Some datasets are made publicly available by CAIDA without restrictions to the user, while access to other datasets is restricted to academic researchers and CAIDA members, with data access subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

    Major Milestones

    • We released two dataset based on "raw" traces from the UCSD Telescope ( Two days in November 2008 and Three days of Conficker)
    • We initiated a real-time collection of traces from the Network Telescope. This "live" dataset currently covers a two-month time window from two months ago up to the current time.
    • The passive traffic traces from the equinix-chicago and equinix-sanjose monitors connected to tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA, for 2009 are made available in the CAIDA Anonymized 2009 Internet Traces dataset.

    Data Collected in 2009

    Data Type First date Last date Total size1
    Macroscopic Topology Measurements, IPv4 (Archipelago) 2009-01-01 2009-12-31 388 GB (1.2 TB)
    Macroscopic Topology Measurements, IPv6 (Archipelago) 2009-01-01 2009-12-31 222 MB (784 MB)
    Internet backbone Traces 2009-01-15 2009-12-17 4.8 TB (8.8 TB)
    Network Telescope Datasets 2009-01-01 2009-12-31 96 GB (233 MB)
    "Live" Network Telescope Data 2009-01-01 2009-12-31 3.2 TB (6.2 TB)2
    DNS Names for IPv4 Routed /24 Topology Dataset 2009-01-01 2009-12-31 4.7 GB (17 GB)2
    DNS root/gTLD RTT Dataset 2009-01-01 2009-12-31 1.2 GB

    1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
    2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only.

    Data Distributed in 2009

    We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2009, this resulted in the following datasets:

    • Publicly Available Data

      These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

    Dataset Unique visitors (IPs) Data Downloaded
    AS Rank 3419 4.7 GB
    AS Links (AS Adjacencies) 361 2.40 GB
    AS Relationships 1336 18.2 GB
    Router Adjacencies 287 657.5 MB
    Witty Worm Dataset 94 231 MB
    AS Taxonomy 197 104.3 MB *
    Code-Red Worms Dataset 72 3.8 GB
    We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
    * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
    • Restricted Access Data

      These datasets require that users:

      • be academic or government researchers, or join CAIDA;
      • request an account and provide a brief description of their intended use of the data; and
      • agree to an Acceptable Use Policy.
    Dataset Unique visitors (usernames) Data Downloaded *
    Anonymized Internet Backbone Traces 129 4.5 TB
    Backscatter Datasets 39 567 GB
    (Raw Topology Traces from Archipelago infrastructure)
    35 1.0 TB
    Raw Topology Traces (skitter) 21 1.2 TB
    Witty Worm Dataset 11 71 GB
    DNS Names for IPv4 Routed /24 Topology Dataset 28 19.4 GB
    2003 Internet Topology Data Kit 19 2.9 GB
    DNS Root/gTLD server RTT Dataset 1 2.7 MB
    * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
    • Restricted Access Data Requests

      The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

      We received about 4% more requests in 2009 then in 2008, and approved 16% more requests for access to restricted datasets. Almost 80% of the users that are granted access actually access our webservers to download data.

    Dataset Number of requests received Number of users granted access Number of users that accessed data
    Anonymized Backbone and Peering Link Traces 242 181 151
    Active Topology Trace Datasets 136 90 63
    Backscatter-2008 Dataset 101 62 45
    Witty Worm Dataset 28 18 14
    DNS Root/gTLD server RTT Dataset 7 3 3
    Totals 514 354 276

    Workshops

    As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

    ISMA 2009 AIMS - 1st Workshop on Active Internet Measurements

    On February 12-13, 2009, CAIDA hosted the 1st Active Internet Measurements (AIMS) Workshop supporting science and policy in La Jolla, CA. This workshop sought to define priority directions of various active measurement infrastructures especially aimed at macroscopic security, stability, and performance measurements.

    2nd CAIDA/WIDE/CASFI Workshop

    The 2nd CAIDA/WIDE/CASFI Workshop was held on April 4-5, 2009 in Seoul, South Korea. This workshop continued a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the Workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and DNS research. The Workshop will also cover miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants.

    1st Workshop on Internet Economics (WIE'09)

    The 1st Workshop on Internet Economics (WIE'09) hosted by CAIDA and Georgia Tech was held on September 23, 2009 by web videoconference. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics.


    Publications

    The following table contains the papers published by CAIDA for the calendar year of 2009. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

    Year Month Author(s) Title Publication
    2009 Nov
    1. Beverly, Robert
    2. Berger, Arthur
    3. Hyun, Young
    4. claffy, kc
    Understanding the Efficacy of Deployed Internet Source Address Validation Filtering ACM Internet Measurement Conference (IMC)
    2009 Oct
    1. claffy, kc
    2. Fomenkov, Marina
    3. Katz-Bassett, Ethan
    4. Beverly, Robert
    5. Cox, Beverly
    6. Luckie, Matthew
    The Workshop on Active Internet Measurements (AIMS) Report ACM SIGCOMM Computer Communication Review (CCR)
    2009 Oct
    1. Dimitropoulos, Xenofontas
    2. Krioukov, Dmitri
    3. Vahdat, Amin
    4. Riley, George
    Graph Annotations in Modeling Complex Network Topologies ACM Transactions on Modeling and Computer Simulation (TOMACS)
    2009 Oct
    1. Gringoli, Francesco
    2. Salgarelli, Luca
    3. Dusi, Maurizio
    4. Cascarano, Niccolo
    5. Risso, Fulvio
    6. claffy, kc
    GT: picking up the truth from the ground for Internet traffic ACM SIGCOMM Computer Communication Review (CCR)
    2009 Oct
    1. Kenneally, Erin
    2. claffy, kc
    An Internet Data Sharing Framework For Balancing Privacy and Utility Engaging Data: First International Forum on the Application and Management of Personal Electronic Information
    2009 Sep
    1. Papadopoulos, Fragkiskos
    2. Krioukov, Dmitri
    3. Boguñá, Marián
    4. Vahdat, Amin
    Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces ACM SIGMETRICS Performance Evaluation Review
    2009 Sep
    1. Krioukov, Dmitri
    2. Papadopoulos, Fragkiskos
    3. Vahdat, Amin
    4. Boguñá, Marián
    On Curvature and Temperature of Complex Networks Physical Review E
    2009 Sep
    1. Jamakovic, Almerima
    2. Mahadevan, Priya
    3. Vahdat, Amin
    4. Boguñá, Marián
    5. Krioukov, Dmitri
    How Small Are Building Blocks of Complex Networks arXiv physics.soc-ph/0908.1143
    2009 Jul
    1. Papadopoulos, Fragkiskos
    2. Krioukov, Dmitri
    3. Boguñá, Marián
    4. Vahdat, Amin
    Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces: Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
    2009 Mar
    1. claffy, kc
    2. Hyun, Young
    3. Keys, Ken
    4. Fomenkov, Marina
    5. Krioukov, Dmitri
    Internet Mapping: from Art to Science IEEE DHS Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH)
    2009 Feb
    1. Boguñá, Marián
    2. Krioukov, Dmitri
    Navigating ultrasmall worlds in ultrashort time Physical Review Letters
    2009 Feb
    1. Shakkottai, Srinivas
    2. Fomenkov, Marina
    3. Koga, Ryan
    4. Krioukov, Dmitri
    5. claffy, kc
    Evolution of the Internet AS-Level Ecosystem International Conference on Complex Sciences (Complex)
    2009 Jan
    1. Boguñá, Marián
    2. Krioukov, Dmitri
    3. claffy, kc
    Navigability of Complex Networks Nature Physics

    Presentations

    The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2009. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

    Year Month Presenters(s) Title Venue
    2009 Dec
    1. claffy, kc
    Historical and Architectural Context for Traffic Management Needs Today FCC Technical Advisory Process Workshop
    2009 Nov
    1. claffy, kc
    Archipelago Measurement Infrastructure Africa-Asia Forum Workshop
    2009 Oct
    1. Krioukov, Dmitri
    Hyperbolic geometry of complex networks USC Center for Applied Mathematical Sciences Colloquia
    2009 Oct
    1. Kenneally, Erin
    An Internet Data Sharing Framework For Balancing Privacy and Utility Engaging Data: First International Forum on the Application and Management of Personal Electronic Information
    2009 Oct
    1. Krioukov, Dmitri
    Evolution of the Internet Ecosystem Southern California Symposium on Network Economics and Game Theory (SoCal NEGT)
    2009 Sep
    1. claffy, kc
    Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity PI Meeting
    2009 Aug
    1. claffy, kc
    CAIDA participation in PREDICT DHS PREDICT PI Meeting
    2009 Jun
    1. Papadopoulos, Fragkiskos
    Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces ACM SIGMETRICS MAMA
    2009 Jun
    1. Krioukov, Dmitri
    dK-series and hidden hyperbolic metric spaces Telefonica Research
    2009 Jun
    1. Kenneally, Erin
    Belmont Report Overview DHS Directorate for Science and Technology Workshop
    2009 May
    1. Kenneally, Erin
    Legal Issues in Network Research: Determining Content in Web-Browsing Communications DHS Directorate for Science and Technology Workshop
    2009 May
    1. claffy, kc
    Ten Things the FCC Should Know about the Internet Federal Communications Commission (FCC)
    2009 Apr
    1. Huffaker, Bradley
    CAIDA's Topology Updates and Analysis WIDE-CASFI
    2009 Apr
    1. Fomenkov, Marina
    CAIDA Report WIDE-CASFI
    2009 Apr
    1. Huffaker, Bradley
    DatCat: Lessons Learned WIDE-CASFI
    2009 Apr
    1. Brownlee, Nevil
    netmap: the mini-Ark project WIDE-CASFI
    2009 Apr
    1. Zhang, Min
    State of the Art in Traffic Classification WIDE-CASFI
    2009 Apr
    1. Zhang, Min
    A Measurement-Based Study of Xunlei WIDE-CASFI
    2009 Apr
    1. Krioukov, Dmitri
    Hidden Metric Spaces and Navigability of Complex Networks NeTS FIND
    2009 Mar
    1. claffy, kc
    DNS Research Update from CAIDA DNS RSSAC Meeting
    2009 Mar
    1. claffy, kc
    Broadband Conditions NTIA Broadband Technology Opportunities Program
    2009 Mar
    1. Hyun, Young
    Internet Visualization with Walrus Simulation Interoperability Workshop
    2009 Feb
    1. Krioukov, Dmitri
    Evolution of the Internet AS-Level Ecosystem International Conference on Complex Sciences (Complex)
    2009 Feb
    1. Keys, Ken
    IP-to-Router Mapping Techniques ISMA Workshop on Active Internet Measurements (AIMS)
    2009 Feb
    1. Hyun, Young
    Archipelago Measurement Infrastrucure: Updates and Analyses ISMA Workshop on Active Internet Measurements (AIMS)
    2009 Jan
    1. Aben, Emile
    Conficker ISOI

    Web Site Usage

    In 2009, CAIDA's web site continued to attract considerable attention from a broad, international audience. The portion of the increased traffic in the earlier half of the year can be attributed to attention following our release of the "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" web report.

    The table below presents the monthly history of traffic to www.caida.org for 2009. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.



    Month Unique visitors Number of visits Pages Hits Bandwidth (GB)
    Jan 2009 38,521 64,600 203,654 1,155,049 40.14 GB
    Feb 2009 41,436 67,709 208,371 1,246,721 46.61 GB
    Mar 2009 42,996 70,676 229,058 1,338,538 64.77 GB
    Apr 2009 40,486 64,965 209,267 1,183,418 44.84 GB
    May 2009 33,777 57,081 201,441 1,013,896 37.83 GB
    Jun 2009 34,253 57,040 174,455 1,105,566 43.86 GB
    Jul 2009 31,552 54,348 178,023 954,572 38.93 GB
    Aug 2009 33,145 54,392 170,597 1,040,296 37.07 GB
    Sep 2009 36,824 62,205 200,598 1,034,684 42.82 GB
    Oct 2009 39,742 67,003 220,984 1,080,407 52.81 GB
    Nov 2009 35,262 58,451 254,290 1,007,644 46.47 GB
    Dec 2009 31,377 53,110 204,633 868,402 35.87 GB
    Total 439,371 731,580 2,455,371 13,029,193 532.02 GB


    Organizational Chart

    CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2009. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.

    [Image of CAIDA Functional Organization Chart]

    CAIDA Functional Organization Chart


    Funding Sources

    CAIDA thanks our 2009 sponsors, members, and collaborators.

    The charts below depict funds received by CAIDA during the 2009 calendar year.

    Funding Source Allocations Percentage of Total
    DHS 331,816 31%
    DOI 488,830 46%
    NSF 17,160 2%
    GIFT 200,000 19%
    CSE 27,716 3%
    Total 1,065,522 100%

    Figure 1. Allocations by funding source received during 2009.


    Operating Expenses

    The charts below depict CAIDA's Annual Expense Report for the 2009 calendar year.

    LABOR Salaries and benefits paid to staff and students
    IDC Indirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services.
    SUBCONTRACTS There were no subcontracts in 2009.
    TRAVEL Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
    SUPPLIES & EXPENSES All office supplies and equipment (including computer hardware and software) costing less than $5000.
    EQUIPMENT Computer hardware or other equipment costing more than $5000.
    TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
    Program Area Expenses Percentage of Total
    Labor 1,413,587 60%
    IDC 769,474 33%
    Subcontract 0 0%
    Travel 62,427 3%
    Supplies & Expenses 69,550 3%
    Equipment 27,586 1%
    Transfers 6,046 0%
    Total 2,293,238 100.0%

    Figure 2. 2009 Operating Expenses

    These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.



    Program Area Expenses Percentage of Total
    DNS 407,447 17.3%
    Infrastructure 909,442 38.7%
    Routing 399,360 17.0%
    Policy 36,133 1.5%
    Topology 550,304 23.4%
    Outreach 45,985 2.0%
    Total 2,348,671 100.0%

    Figure 3. 2009 Expenses by Program Area

    Published