Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
CAIDA's Annual Report for 2011
A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2011.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Contents


Executive Summary

This annual report covers CAIDA's activities in 2011, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. We are also dedicating resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation's International Research Network Connections (IRNC) program, and the Department of Homeland Security's Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project.

We continue to expand our Internet active measurement platform Ark in scale and functionality, and use this platform to collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and share many aggregated annotated derivative data sets publicly. Our topology measurement platform supports IPv6 -- by the end of 2011, 28 of our 57 Ark hosting sites provided IPv6 connectivity and topology measurements. We have dramatically improved existing techniques for IP address alias resolution for large Internet graphs; we submitted a paper describing and evaluating the performance of our algorithms in late 2011, hopefully for publication in 2012. (Preliminary technical report available on the web site now, see Topology section of the report.) Using these new techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in April and October 2011. Each 2011 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We are still working on improving and validating our AS relationship inference algorithm so that we can add additional annotations to future ITDKs.

On the theoretical side of topology research, we continued investigation of the geometric model we developed last year to study the structure and function of complex networks. This model assumes that hyperbolic geometry underlies many complex networks, which if true provides a natural explanation for the heterogeneous degree distributions and strong clustering that characterize so many complex networks, i.e., they are simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. We also showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. We developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The optimization framework more accurately describes large-scale Internet evolution (new links) than previous models, e.g., preferential attachment. The mathematically inclined will appreciate our related recent investigation of random bipartite networks using a hidden variable formalism that facilitates study of the structure and function of complex networks, as well as inference of individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.

We gained momentum on our economics and policy research agenda, focused primarily on explanatory and predictive modeling of the economics of transit and peering interconnections in the Internet. Two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex - in traffic dynamics, topology, and economics - for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. We are making progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible. We also held the second interdisciplinary Workshop on Internet Economics (WIE) in December, connecting academic researchers, commercial Internet facilities and service providers, theorists, policy makers, and pundits of Internet economics to frame an Internet economics research agenda, and more specifically to improve the realism, utility, and predictive power of economic models of Internet topology and dynamics.

In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. We analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be used, and automated, to detect outages or similar macroscopically disruptive events in other geographic or topological regions.

We are applying our theoretical, empirical, and practical understandings of the Internet's evolution to engage in the NSF's exciting Future Internet Architecture (FIA) Research program. We are participating in the Named Data Networking project, a 12-university collaboration funded by FIA to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP address, but also data (content) itself. This approach shifts the focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of the today Internet: routing scalability, network security, content protection and privacy. In 2011 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation.

Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and (six) workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at http://www.caida.org/home/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's long-term topology research agenda includes four strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling; and 4) analysis of IPv4 and IPv6 address space allocation.

Activities

  1. Macroscopic Topology Measurements:
    1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the fourth full calendar year of the IPv4 Routed /24 Topology Dataset and the third full calendar year of the IPv6 Topology Dataset collection.
    2. We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
  2. Analysis of Observable Topology:
    1. We run the alias resolution tools on the Ark platform and combine the outcomes to map IP addresses to routers as accurately and completely as feasible. Using publicly available data from many networks and ground truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. We released a technical report Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture detailing the MIDAR system architecture and submitted a version of this paper to IEEE/ACM Transactions on Networking for publication in 2012.
    2. Resulting from our improved measurement and analysis techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, using traceroute data collected as part of the IPv4 Routed /24 Topology Dataset and alias resolution measurements conducted in April and October 2011. Each 2011 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses.
    3. We created new IPv4 and IPv6 AS Core Graph visualizations using August 2010 Ark data.
    4. In January 2011 we temporarily halted the bi-weekly production of AS-level topologies annotated with business relationships between ASes dataset and started revisions and improvements of our published algorithms inferring these relationships. We plan to resume the production of this popular data after completing the changes and verification of the new algorithms.
    5. Data collected using traceroute-based algorithms underpins research into the Internet's router-level topology, though it is possible to infer false links from this data. In Measured Impact of Crooked Traceroute, we examined the inaccuracies induced from such false inferences, both on macroscopic and ISP topology mapping. We observed that most per-flow load-balancing did not induce false links when macroscopic topology is inferred using classic traceroute. The effect of false links on ISP topology mapping is possibly much worse, because the degrees of a tier-1 ISP's routers derived from classic traceroute were inflated by a median factor of 2.9 as compared to those inferred with Paris traceroute.
    6. We continued our work measuring the evolution and dynamics of peering relationships. In Twelve Years in the Evolution of the Internet Ecosystem, we analyzed data and studied trends in the evolution of the Internet AS topology in the last 12 years. This work focused mainly on transit (customer-provider) links in the AS topology, as these are visible in data available from public repositories of BGP data.
    7. We published the technical report, "Geocompare: a comparison of public and commercial geolocation databases" in May 2011. The report attempts a systematic quantitative comparison of currently available geolocation service providers. The report describes our process for selecting distance thresholds for comparison, and our centroid-based algorithm for comparing database lat-long results against a majority of responses from the set of databases we evaluated. We presented the work at Network Mapping and Measurement Conference (NMMC) in May 2011.
  3. Topology Modeling:
    1. We proved that graphs in a general class of self-similar networks have zero percolation threshold. The considered self-similar networks included random scale-free graphs with given expected node degrees and zero clustering, scale-free graphs with finite clustering and metric structure, growing scale-free networks, and many real networks. The proof and the derivation of the giant component size in Percolation in Self-Similar Networks did not require the assumption that networks were treelike. Our results rely only on the observation that self-similar networks possess a hierarchy of nested subgraphs whose average degree grows with their depth in the hierarchy. We conjecture that this property is pivotal for percolation in networks.
  4. Analysis of IPv4 and IPv6 address space allocation

Publications

Outreach

Ongoing data releases

We made publicly available the following topology datasets:

Student Involvement

Justin Cheng, UCSD undergraduate student, worked as an assistant Graphics Designer.

Funding Sources

Our topology research received support from:

Routing

Discovering Hyperbolic Metric Spaces Hidden Beneath the Internet and Other Complex Networks

Goals

The primary objective of CAIDA's research in Internet routing remains the development and evaluation of solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, we spent the past year investigating the implications of this work to other disciplines, physics, biology, chemistry, and economics.

Activities

  1. We showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. In Popularity versus Similarity in Growing Networks, we developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which preferential attachment emerges from local optimization processes. As opposed to preferential attachment, the optimization framework accurately describes large-scale Internet evolution, predicting new links in the Internet with remarkable precision. The developed framework can thus potentially be used to predict new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.
  2. We introduced and studied random bipartite networks with hidden variables. The hidden variable formalism developed in Hidden variables in bipartite networks has been a powerful tool in studying the structure and function of complex networks, and can also be useful in inferring individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.

Publications

Outreach

Student Involvement

CAIDA hosted Chiara Orsini, a graduate student from University of Pisa, Italy.

Funding Sources

Our routing research received support from:

Economics and Policy

Goals

The high-level objective of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. We aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow.

Activities

  1. We developed GENESIS, a computational model of interdomain network formation that captures strategy selection dynamics by autonomous networks. This model provides the underpinnings for our study of peering strategy selection by autonomous networks in the Internet. We submitted a paper for publication in IEEE Infocom 2012.
  2. We continued our work on measuring the statistical properties of the interdomain traffic matrix (ITM). Our study revealed a sparse ITM and that we can model the traffic sent by an AS using either the log-normal or Pareto distribution, depending on whether the corresponding traffic experiences congestion. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We submitted a paper Towards a Statistical Characterization of the Interdomain Traffic Matrix for publication at the International Federation for Information Processing (IFIP) Networking Conference in 2012.
  3. We began drafting a worldwide IPv6 Network Operator Survey. In 2012, we plan to collect feedback on the survey, make revisions, and conduct the survey to parameterize our IPv6 modeling work.
  4. Amogh Dhamdhere posted an economics-related essay on CAIDA blog, "Model for Internet Evolution Predicts Consolidation in Tier-1 Transit Market", in July 2011.
  5. We regularly responded to requests from government agencies and policymaking bodies for comments and positions that inform policy with the best available empirical data. kc claffy served on two ICANN advisory committees, RSSAC and SSAC, and continued on in her second year as a member of the FCC Technical Advisory Committee (TAC). She wrote blog commentaries about TAC meetings in March and in June, 2011.
  6. kc claffy published a blog commentary "network neutrality: the meme, its cost, its future", as follow-up to a panel on network neutrality hosted at the June 2011 cybersecurity meeting of the DHS/SRI Infosec Technology Transition Council.
  7. kc claffy contributed an article Underneath the Hood: Ownership vs. Stewardship of the Internet to the CircleID Internet Infrastructure blog, discussing ICANN's approval of the creation of the .XXX top level domain suffix.

Publications

Outreach

Student Involvement

Gylmar Moreno, UCSD undergraduate student, worked as an assistant Programmer Analyst.

Funding Sources

Our economics research received support from:

Security and Stability

Goals

We seek to develop new methods of analysis and aggregation of Internet measurement data from multiple available sources in order to shed light on various Internet security related events, including global connectivity disruptions due to political or catastrophic causes. Our methodology and findings can form the basis for automated early-warning detection systems for large-scale Internet outages.

Activities

  1. In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. In "Analysis of Country-wide Internet Outages Caused by Censorship", we analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be automated and used to detect outages or similar macroscopically disruptive events in other geographic or topological regions.

Publications

Outreach

Funding Sources

Our support for security and stability research comes from:

Future Internet Architecture

Named Data Networking (NDN)

Goals

The main goal of this collaborative project is research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer routing directly on content names.

Activities

The list of collaborating institutions includes UC Los Angeles, Palo Alto Research Center (PARC), Colorado State University, University of Arizona, University of Illinois/Urbana-Champaign, UC Irvine, UC San Diego, University of Memphis, Washington University, and Yale University, and is led by Lixia Zhang (UCLA) and Van Jacobson (PARC). CAIDA researchers participated in activities of the Evaluation and Measurement, Theory, and Routing/Forwarding teams.

  1. kc claffy posted a blog commentary, "my first Future Internet Architecture PI meeting" in January 2011.
  2. We deployed and maintained a local node on the national NDN testbed using the CCNX hub software.
  3. To test the applicability of the hyperbolic greedy routing methods to NDN, we conducted simulations forwarding packets on the new CCNx network. We extracted the Autonomous System (AS) graph of the testbed and mapped each AS number to its hyperbolic coordinates using the supplementary data from our 2010 paper Sustaining the Internet with Hyperbolic Mapping. We then evaluated the performance of modified greedy forwarding strategies using the metrics of the delivery success ratio and three types of stretch.

Outreach

  • We contributed to the Named Data Networking (NDN) Project 2010 - 2011 Progress Summary.
  • In May, CAIDA researchers participated in the first NDN retreat at PARC, Palo Alto, CA.
  • CAIDA researchers participated in the Future Internet Architecture Program Meeting and contributed to discussions of the four projects funded by FIA and the security features inherent to each architecture design.

Funding Sources

This research received support from NSF grant (CNS-1039646) Named Data Networking.


Infrastructure Projects


Archipelago (Ark)

Goals

Archipelago (Ark) is CAIDA's active measurement infrastructure. It aims to enable large-scale Internet measurements, while reducing the effort needed to develop, deploy and conduct sophisticated experiments. Ark represents a step toward a community-oriented measurement infrastructure as it allows CAIDA collaborators to run their vetted measurement tasks on a security-hardened distributed platform.

Activities

  1. By the end of 2011, we increased the number of vantage points to 57 Ark monitors deployed in 29 countries.
  2. We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. In 2011, we released the following tools to the public: kapar, MIDAR, Motu, mper, and rb-mperio.
  3. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2011, Ark had 28 monitors collecting the data on the emerging IPv6 global topology.
  4. We continued support for spoofer experiment (collaboration with R. Beverly, NPS).

Outreach

In 2011, CAIDA researchers published 9 papers and non-CAIDA researchers published 11 papers that used Ark data.

Funding Sources

Ark infrastructure receives support from:

UCSD Network Telescope

Goals

We develop and maintain a passive data collection system known as the Network Telescope, in order to study security related events by monitoring and analyzing unsolicited traffic arriving to a globally routed underutilized /8 network.

Activities

  1. Since data storage is becoming considerably more expensive, we prioritized telescope data curation and meta-data preservation.
  2. We started improving our software infrastructure for processing, management, analysis, visualization and reporting on data collected with the UCSD Network Telescope.
  3. We developed iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly defined subsets: 14 source types and 10 inter-arrival-time based groups. We used this tool to observe changes in one-way traffic at the UCSD Network Telescope over the first half of 2011. A paper One-way Traffic Monitoring with iatmon was submitted to PAM.

Outreach

  • In March we organized and hosted a one-day Workshop on Network Telescopes to discuss the network and security research using network telescopes.
  • Dr. Tanja Zseby (Fraunhofer Institute for Open Communication Systems, Berlin, Germany) joined CAIDA in October as a Visiting Scholar for one year to work on darknet data analysis.

Student Involvement

Sarah Larsen, UCSD undergraduate student, worked as an assistant System Administrator.

Funding Sources

Our Network Telescope received support from:


Data Sharing for Security / PREDICT

The goal of the Department of Homeland Security project Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) is to provide vetted researchers with current network operational data in a secure and controlled manner that respects the security, privacy, legal, and economic concerns of Internet users and network operators. CAIDA supports PREDICT goals as Data Provider and Data Host and also plays an advisory role in developing technical, legal, and practical aspects of PREDICT policies and procedures.

Goals

Activities

  1. We received six user requests via the PREDICT portal during 2011 all of whom received access to our data.
  2. We completed the CAIDA Anonymized 2011 Internet Traces Dataset that contains traffic traces from our two monitors deployed on high-speed backbone links.
  3. We continued drafting a proposed framework document in the spirit of the Belmont Report that would address ethical principles and guidelines for the protection of human subjects in Information and Communications Technologies research.
  4. We attended the Workshop on Research Data Lifecycle Management and participated in discussions of best practices and funding models for selecting, storing, describing, preserving, and sharing the digital research data.
  5. On 28 December 2011, the Department of Homeland Security (DHS) posted Ethical Principles Guiding Information and Communication Technology Research: The Menlo Report and its Companion Report and announced the reports in the Federal Register. DHS also posted the Interaction of the Menlo Report and Revisions to the Common Rule-Comments in Response to the Advanced Notice of Proposed Rulemaking (ANPRM).

Publications

Outreach

Funding Sources

Support for this work comes from DHS contract, (DHS D07PC75579) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".


DatCat: Internet Measurement Data Catalog

Goals

Originally funded by the NSF award (OCI-0137121) "Correlating Heterogeneous Measurement Data to Achieve System-Level Analysis of Internet Traffic Trends", CAIDA built the Internet Measurement Data Catalog (IMDC) to facilitate searching for and sharing of data and metadata among researchers. Since its launch in June 2006 at www.datcat.org the catalog has received contributions of metadata indexing nearly 19TB of data. Lack of funding and increased Oracle database licensing cost required that we disable the IMDC temporarily while we integrate lessons learned into our transition from this research prototype to the proposed increased operational capabilities.

Based on the lessons we learned during the development and operation of IMDC, we began to upgrade and modify the underlying DatCat service with three tasks: streamline the user experience by simplifying the metadata entry process; migrate from a proprietary database backend (Oracle) to a completely open source solution; and expand the community of the catalog users to a broader range of cybersecurity and other researchers. We completed the third task this year and plan to complete the first two tasks in 2012.

Activities

  1. We designed and developed a public forums interface integrated with the IMDC to hold discussion of data sharing issues and to answer frequently asked questions regarding the IMDC and the information it contains.

Student Involvement

Jesse Weinstein, UCSD undergraduate student, worked as an assistant Programmer Analyst.

Funding Sources

In 2011, our DatCat research received support from:


Sustainable data-handling and analysis methodologies for the IRNC networks

Goals

NSF International Research Network Connections Program (IRNC) has funded five projects to provide network connections linking U.S. research networks with peer networks in other parts of the world. The goal of our IRNC Special Project is to support the IRNC community measurement efforts by fostering and leading discussion of how to best make IRNC data and statistics available, and by adapting CAIDA measurement technologies for IRNC community needs.

Activities

  1. We added Internet Protocol Version 6 (IPv6) capabilities to the Coralreef suite of network data collection and analysis tools for processing network traces and flows. We also added support for prefix preserving IPv6 address anonymization, an option to apply IPv4 anonymization policy to IPv4 addresses embedded within IPv6 addresses (IPv4-mapped, SIIT, Teredo, 6to4, 6over4, ISATAP), an option to anonymize IP addresses in nested headers (e.g. IPIP, or the original IP header in an ICMP error message) as well as an option to leave multicast addresses intact. Our next step will be to extend the Coralreef Report Generator software to better visualize the IPv6 traffic separately from the IPv4 packets.
  2. CAIDA held several conference calls with IRNC ProNET PI Julio Ibarra and his staff to discuss how he might instrument a hybrid network router that transits both OpenFlow as well as IP traffic. We discussed use of CAIDA's Coralreef suite of data collection, analysis, and reporting tools for reporting and visualization of the IP portion of the traffic.
  3. We made progress on extending our Archipelago measurement infrastructure to monitor IRNC sites.
    1. With an introduction by IRNC ProNet PI Julio Ibarra, we obtained contacts at the Academic Network for State of Sao Paulo (ANSP) and signed and Ark Memorandum of Cooperation (MoC) with them.
    2. With an introduction by IRNC ProNet PI David Lassner, we worked with Australia's Academic and Research Network (AARNet). AARNet accepted our MoC and donated hardware for a new Ark server in Perth, Australia.
    3. ProNet PI Steve Huter provided contacts with two network engineers in Gambia, where we will deploy an Ark monitor.
    4. IRNC Network Engineer John Hicks provided contacts with the University of Peradeniya in Sri Lanka where we are pursuing the deployment of another Ark monitor.
  4. We developed an IRNC Wiki page with the intention for it to serve as a collection point for IRNC related activities.

Outreach

Funding Sources

This project is funded by NSF grant (OCI-0963073) "IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks".


Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

2011 Tool Development

MIDAR

MIDAR stands for Monotonic ID-Based Alias Resolution, is a tool developed by CAIDA that builds on recent work in alias resolution using IP-ID time stamps to scale related techniques to the size of large-scale Internet topologies (millions of nodes) with greater precision and sensitivity. MIDAR, our Monotonic ID-Based Alias Resolution tool, provides an extremely precise ID comparison test based on monotonicity rather than proximity. MIDAR integrates multiple probing methods, multiple vantage points, and a novel sliding-window probe scheduling algorithm to increase scalability to millions of IP addresses. Experiments show that MIDAR's approach is effective at minimizing the false positive rate sufficiently to achieve a high positive predictive value at Internet scale.

kapar

The "kapar" tool is inspired by the promising foundation presented in Mehmet Gunes' APAR, CAIDA wrote a highly optimized implementation for production use on large-scale Internet topologies, as well as fixing a few bugs and experimenting with our own improvements to the algorithm.

mper

mper is a probing engine that clients can use to conduct network measurements using ICMP, UDP, and TCP probes.

rb-mperio

rb-mperio is a RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. rb-mperio v0.3.0 was released on September 30, 2011.

Motu

Motu is a simple tool for dealiasing pairs of IPv4 addresses. Version 1.0.1 was released on October 5, 2011.

CAIDA Tools Download Report

The table below displays all CAIDA developed and currently supported tools distributed via our home page at http://www.caida.org/tools/ and the number of downloads of each version during 2011.

Tool Description Downloads
Autofocus Internet traffic reports and time-series graphs. 290
Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 108
CoralReef Measures and analyzes passive Internet traffic monitor data. 482
Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 95
dnsstat DNS traffic measurement utility. 169
iffinder Discovers IP interfaces belonging to the same router. 278
libsea Scalable graph file format and graph library. 212
kapar Graph-based IP alias resolution. 18
MIDAR Identifies IPv4 addresses belonging to the same router (aliases) using shared monotonic IP ID counters. 33
Motu Dealiases pairs of IPv4 addresses. 14
mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 50
otter Visualizes arbitrary network data. 238
plot-latlong Plots points on geographic maps. 276
plotpaths Displays forward traceroute path data. 53
rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 47
RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 321
sk_analysis_dump A tool for analysis of traceroute-like topology data. 58
topostats Computes various statistics on network topologies. 133
Walrus Visualizes large graphs in three-dimensional space. 2506

Data

Data Collected in 2011

In 2011, CAIDA captured the following raw data:

We curated and archived several datasets from these data: During the Day In The Life of the Internet (DITL 2011 on April 13-15)) we collected one-hour passive traces on high-speed internet backbone links (distributed as part of the CAIDA Anonymized High-speed Internet Traces 2011, and retained the "live" data collected on the UCSD Network Telescope as well.

The table below lists the amount of data collected in our ongoing data collection operations.

Data Type First date Last date Total size1
Macroscopic Topology Measurements, IPv4 (Archipelago) 2011-01-01 2011-12-31 596.9 GiB (1.9 TiB)
Macroscopic Topology Measurements, IPv6 (Archipelago) 2011-01-01 2011-12-31 1.9 GiB (6.6 GiB)
Internet backbone Traces 2011-01-20 2011-12-15 3.1 TiB (6.8 TiB)3
"Live" Network Telescope Data 2011-01-01 2011-12-31 29.9 TiB (59.7 TiB)2,4
DNS Names for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 7.8 GiB (29.5 GiB)
AS Links for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 155.8 MiB (636.5 MiB)
Macroscopic Internet Topology Data Kit (ITDK) 2011-04-01 2011-11-03 361.5 MiB (1.9 GiB)
DNS root/gTLD RTT Dataset 2011-03-16 2011-12-31 448.7 MiB
1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
2The size of this data set varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
3This includes traces on April 13 during DITL 2011, and traces on 8 June 2011 (IPv6 Day)
4This includes 279 GB of data collected during DITL 2011 and 95 GB on IPv6 Day.

Datasets Distributed in 2011

CAIDA makes some datasets publicly available without restrictions to the user, while access to other datasets is restricted to academic researchers, CAIDA members, and government contractors with data access subject to certain safeguards designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

  • Publicly Available Data

    These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

Dataset Unique visitors (IPs) Data Downloaded
AS Rank 23 4.4 MiB
AS Links (AS Adjacencies) 644 22.5 GiB
AS Relationships 862 5.7 GiB
Router Adjacencies 267 626.2 MiB
AS Taxonomy 156 81.4 MiB *
Witty Worm Dataset 223 319.7 MiB
Code-Red Worms Dataset 527 6.3 GiB
We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers.
* AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
  • Restricted Access Data

    These datasets require that users:

    • be academic or government researchers, or join CAIDA;
    • request an account and provide a brief description of their intended use of the data; and
    • agree to an Acceptable Use Policy.
Dataset Unique visitors (usernames) Data Downloaded *
Anonymized Internet Backbone Traces 187 22.4 TiB
Backscatter Datasets 34 245.2 GiB
(Raw Topology Traces from Archipelago infrastructure)
50 1.9 TiB
Raw Topology Traces (skitter) 25 82.1 GiB
DNS Names for IPv4 Routed /24 Topology Dataset 31 53.9 GiB
Macroscopic Internet Topology Data Kit 70 43.5 GiB
Witty Worm Dataset 15 190.4 GiB
DNS Root/gTLD server RTT Dataset 7 12.9 MiB
DDoS Attack Dataset 60 230.0 GiB
Telescope Datasets 17 268.1 GiB
* We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly under-counting the total volume of data served through our dataservers.
  • Restricted Access Data Requests

    The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

    We received about 33 more requests in 2011 then in 2010, and approved 46 more requests for access to restricted datasets. About 77.1 % of the users that are granted access actually accessed our webservers to download data.

Dataset Number of requests received Number of users granted access Number of users that accessed data
Anonymized Backbone and Peering Link Traces 270 208 168
Active Topology Trace Datasets 153 127 83
Backscatter Datasets 51 34 28
Witty Worm Dataset 15 11 10
DNS Root/gTLD server RTT Dataset 10 8 6
DDoS Attack Dataset 91 62 51
Telescope Datasets 29 22 18
Totals 619 472 364

Workshops

As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

CAIDA/UCY Workshop on Network Geometry

From January 11-13, 2011, the University of Cyprus (UCY) hosted an interdisciplinary "Network Geometry" workshop jointly organized by CAIDA, UCSD and UCY. The agenda included short presentations by participants as well as extensive time for discussions and interactions.

ISMA - 3rd Workshop on Active Internet Measurements (AIMS-3)

On February 9-11, 2011, CAIDA hosted the 3rd workshop on Active Internet Measurements supporting science and policy. This workshop continues the series of Internet Statistics and Metrics Analysis (ISMA) workshops that are held to discuss the current and future state of Internet measurement and analysis.

Workshop on Network Telescopes

On March 22, 2011, CAIDA hosted a half-day workshop on network and security research using Network Telescopes. The agenda included short presentations by participants, discussions, and interactions. Some participants attended remotely via web videoconference.

Workshop on BGP and Traceroute data

As part of our efforts on the Internet Laboratory for Empirical Network Science (iLENS) Project, CAIDA hosted a workshop on August 22nd, 2011 to discuss scalable measurement and analysis of BGP and traceroute data.

Workshop on Internet Economics

On December 1-2, 2011, CAIDA and Georgia Tech hosted its second Workshop on Internet Economics. The workshop included presentations by participants, and in depth discussions on how to improve the realism and utility of Internet interdomain connectivity models for trend analysis, as well as predictions of how the Internet ecosystem will look 5-15 years from now. A two-day event to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. The event brought together a mix of academia and industry to discuss the topics surrounding the field of Internet infrastructure economics and AS peering policies and practices.

CAIDA-WIDE-CASFI Workshop

On December 5th, 2011, the 4th CAIDA-WIDE-CASFI Joint Measurement Workshop was held in Tokyo, Japan. This workshop continues a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The Workshop covered miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants and brought various groups together to share their latest research.

UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)

Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES was to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar fostered communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provided young researchers a forum to practice their presentation and communication skills. The seminars continued in 2011 to bring in attendees from a diversity of disciplines.

Publications

The following table contains the papers published by CAIDA for the calendar year of 2011. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

Year Month Author(s) Title Publication
2011 Oct
  1. claffy, kc
Underneath the hood: ownership vs. stewardship of the internet ACM SIGCOMM Computer Communication Review (CCR)
2011 Sep
  1. claffy, kc
"Network Neutrality": the meme, its cost, its future ACM SIGCOMM Computer Communication Review (CCR)
2011 Sep
  1. Dhamdhere, Amogh
  2. Dovrolis, Constantine
Twelve Years in the Evolution of the Internet Ecosystem IEEE/ACM Transactions on Networking
2011 Aug
  1. Kitsak, Maksim
  2. Krioukov, Dmitri
Hidden variables in bipartite networks Physical Review E
2011 Jul
  1. Fomenkov, Marina
  2. claffy, kc
Internet measurement data management challenges Workshop on Research Data Lifecycle Management
2011 Jul
  1. claffy, kc
The 3rd Workshop on Active Internet Measurements (AIMS-3) Report ACM SIGCOMM Computer Communication Review (CCR)
2011 Jul
  1. claffy, kc
Tracking IPv6 Evolution: Data We Have and Data We Need ACM SIGCOMM Computer Communication Review (CCR)
2011 Jun
  1. Mérindol, Pascal
  2. Donnet, Benoit
  3. Pansiot, Jean-Jacques
  4. Luckie, Matthew
  5. Hyun, Young
MERLIN: MEasure the Router Level of the INternet Conference on Next Generation Internet
2011 May
  1. Huffaker, Bradley
  2. Fomenkov, Marina
  3. claffy, kc
Geocompare: a comparison of public and commercial geolocation databases - Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
2011 May
  1. Keys, Ken
  2. Hyun, Young
  3. Luckie, Matthew
  4. claffy, kc
Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture - Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
2011 Mar
  1. Kenneally, Erin
  2. Stavrou, Angelos
  3. McHugh, John
  4. Christin, Nicolas
Moving Forward, Building an Ethics Community (Panel Statements) Workshop on Ethics in Computer Security Research (WECSR)
2011 Jan
  1. Serrano, Mirian Ángeles
  2. Krioukov, Dmitri
  3. Boguñá, Marián
Percolation in Self-Similar Networks Physical Review Letters
2011 Jan
  1. Luckie, Matthew
  2. Dhamdhere, Amogh
  3. claffy, kc
  4. Murrell, David
Measured Impact of Crooked Traceroute ACM SIGCOMM Computer Communication Review (CCR)

Presentations

The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2011. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

Year Month Presenter(s) Title Venue Topic(s)
2011
Dec Huffaker, B. CAIDA Update 2011 WIDE-CASFI
- data
- measurement methodology
- overview
- routing
- security
- topology
- trends
- trends
- visualization
2011
Dec Dhamdhere, A. A cost model for network traffic (with an application to paid-peering) Workshop on Internet Economics (WIE)
- economics
2011
Nov Krioukov, D. Popularity versus Similarity in Growing Networks University of Maryland
- network geometry
- routing
- topology
2011
Nov Dainotti, A. Analysis of Country-wide Internet Outages Caused by Censorship ACM Internet Measurement Conference (IMC)
- active data analysis
- internet outages
- passive data analysis
- policy
- routing
- security
2011
Nov claffy, k. Analysis of Country-wide Internet Outages Caused by Censorship Different Angles on Network Complexity, Engineering, and Science (DANCES)
- active data analysis
- data
- measurement methodology
- passive data analysis
- routing
- security
- topology
2011
Oct Krioukov, D. Popularity versus Similarity in Growing Networks Institute for Mathematics and its Applications (IMA)
- network geometry
- routing
- topology
2011
Oct Krioukov, D. Geometry of Large Networks (Computer Science Perspective) American Institute of Mathematics (AIM)
- network geometry
- routing
- topology
2011
Oct claffy, k. IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks NSF-IRNC
- active data analysis
- ipv6
- overview
- passive data analysis
- policy
2011
Oct claffy, k. IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks: Updates IRNC PI
- active data analysis
- ipv6
- passive data analysis
- policy
2011
Aug claffy, k. Tracking IPv6 evolution: Data We Have and Data We Need Chinese-American Networking Symposium (CANS)
- data
- ipv6
- measurement methodology
- policy
2011
Jul claffy, k. DHS PREDICT project: CAIDA update PREDICT PI
- data
- measurement methodology
- overview
- policy
- security
2011
Jun Krioukov, D. Percolation in self-similar networks International School and Conference on Network Science (NetSci)
- topology
2011
May Krioukov, D. Optimal routing in complex networks NDN PI
- network geometry
- routing
- topology
2011
May Kitsak, M. Do Bipartite Networks Have Metric Structure? Different Angles on Network Complexity, Engineering, and Science (DANCES)
- network geometry
2011
May Huffaker, B. Geolocation Comparison: CAIDA's Geolocation Database Comparison Network Mapping and Measurement Conference (NMMC)
- measurement methodology
- overview
2011
Apr Krioukov, D. Hyperbolic geometry of complex networks Bell Labs-NIST Workshop on Large-Scale Geometry of Networks
- network geometry
- routing
- topology
2011
Apr Huffaker, B. AS Core: Visualizing the Internet Different Angles on Network Complexity, Engineering, and Science (DANCES)
- data
- overview
- topology
- visualization
2011
Mar Krioukov, D. Percolation in self-similar networks Decision Making: Bridging Psychophysics and Neurophysiology
- topology
2011
Mar Kitsak, M. Identification of Influential Spreaders in Complex Networks Decision Making: Bridging Psychophysics and Neurophysiology
- network geometry
2011
Mar Kenneally, E. The Need for Community Standards for Ethical Behavior in E-Crime Research AntiPhishing Working Group E-Crime (APWG eCR) Sync-Up
- overview
- policy
- trends
2011
Mar claffy, k. AS Core: Visualizing the Internet UCSD CSE Perspectives in Computer Science
- data
- overview
- topology
- visualization
2011
Mar claffy, k. DHS PREDICT project: CAIDA update PREDICT PI
- data
- measurement methodology
- overview
- policy
- security
2011
Feb Hyun, Y. Archipelago Measurement Infrastructure Updates ISMA AIMS
- measurement methodology
- software/tools
- topology
2011
Feb Hyun, Y. Internet Topology Data Kit ISMA AIMS
- data
- software/tools
- topology
2011
Feb Dhamdhere, A. An Agent-based Model of Interdomain Interconnection in the Internet Different Angles on Network Complexity, Engineering, and Science (DANCES)
- economics
- routing
- topology
2011
Feb Dhamdhere, A. Measured Impact of Crooked Traceroute ISMA AIMS
- routing
- topology
2011
Feb claffy, k. IPv6: hither, thither, and yon ISMA AIMS
- ipv6
- policy
- topology


Web Site Usage

In 2011, CAIDA's web site continued to attract considerable attention from a broad, international audience. The wave of heavy traffic that occurred mid-year is attributed to increased downloads of the recently updated AS Core IPv4 and IPv6 graph, which was publicized at the various conferences and workshops that CAIDA staff attended.

The graph and table below present the monthly history of traffic to www.caida.org for 2011. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.



Web Usage Bar Graph
MonthUnique visitorsNumber of visitsPagesHitsBandwidth
Jan 201133,13356,462171,962893,93245.15 GB
Feb 201133,57454,227185,918844,55149.11 GB
Mar 201131,45153,835150,413799,77140.46 GB
Apr 201132,06152,687154,328791,86138.57 GB
May 201132,35154,849153,959729,15039.48 GB
Jun 201128,70450,236154,292698,43081.40 GB
Jul 201126,93748,059164,147648,54337.84 GB
Aug 201125,42045,281139,964587,85046.25 GB
Sep 201125,73645,590134,314587,51832.38 GB
Oct 201128,25152,674162,092676,46032.02 GB
Nov 201128,76352,291171,833808,93933.79 GB
Dec 201126,13148,488145,888694,61028.29 GB
Total 352,512 614,679
(1.74 visits/visitor)
1,889,110
(3.07 pages/visit)
8,761,615
(14.25 hits/visit)
504.75 GB
(861.05 kb/visit)

Organizational Chart

CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2011. The image below shows the functional organization of CAIDA. Please check the home page for more complete information about CAIDA staff.

[Image of CAIDA Functional Organization Chart]

CAIDA Functional Organization Chart


Funding Sources

CAIDA thanks our 2011 sponsors, members, and collaborators.

The charts below depict funds received by CAIDA during the 2011 calendar year.

Funding SourceAllocationsPercentage of Total
NSF1,386,37546%
DOI1,412,72747%
GIFT200,4707%
Total2,999,572100%
[Figure: Allocations by funding source]

Figure 1. Allocations by funding source received during 2011.


Operating Expenses

The charts below depict CAIDA's Annual Expense Report for the 2011 calendar year.

LABOR Salaries and benefits paid to staff and students
IDC Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).
SUPPLIES & EXPENSES Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.
TRAVEL Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
EQUIPMENT Computer hardware or other equipment costing more than $5000.
TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
Program AreaExpensesPercentage of Total
Labor1,511,75259%
IDC878,66934%
Supplies and Expenses100,9644%
Travel54,5152%
Equipment18,2251%
Transfers12,0300%
Total2,576,155100%
[Figure: Operating Expenses]

Figure 2. 2011 Operating Expenses


Program AreaExpensesPercentage of Total
Infrastructure1,222,16647%
Topology777,07230%
Routing414,18716%
Policy92,2254%
Outreach70,5053%
Total2,576,155100%
[Figure: Expenses by Program Area]

Figure 3. 2011 Expenses by Program Area

  Last Modified: Wed Nov-6-2013 17:18:51 PST
  Page URL: http://www.caida.org/home/about/annualreports/2011/index.xml