CAIDA's Annual Report for 2011

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2011.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Executive Summary

This annual report covers CAIDA's activities in 2011, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. We are also dedicating resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation's International Research Network Connections (IRNC) program, and the Department of Homeland Security's Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project.

We continue to expand our Internet active measurement platform Ark in scale and functionality, and use this platform to collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and share many aggregated annotated derivative data sets publicly. Our topology measurement platform supports IPv6 -- by the end of 2011, 28 of our 57 Ark hosting sites provided IPv6 connectivity and topology measurements. We have dramatically improved existing techniques for IP address alias resolution for large Internet graphs; we submitted a paper describing and evaluating the performance of our algorithms in late 2011, hopefully for publication in 2012. (Preliminary technical report available on the web site now, see Topology section of the report.) Using these new techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in April and October 2011. Each 2011 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We are still working on improving and validating our AS relationship inference algorithm so that we can add additional annotations to future ITDKs.

On the theoretical side of topology research, we continued investigation of the geometric model we developed last year to study the structure and function of complex networks. This model assumes that hyperbolic geometry underlies many complex networks, which if true provides a natural explanation for the heterogeneous degree distributions and strong clustering that characterize so many complex networks, i.e., they are simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. We also showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. We developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The optimization framework more accurately describes large-scale Internet evolution (new links) than previous models, e.g., preferential attachment. The mathematically inclined will appreciate our related recent investigation of random bipartite networks using a hidden variable formalism that facilitates study of the structure and function of complex networks, as well as inference of individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.

We gained momentum on our economics and policy research agenda, focused primarily on explanatory and predictive modeling of the economics of transit and peering interconnections in the Internet. Two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex - in traffic dynamics, topology, and economics - for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. We are making progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible. We also held the second interdisciplinary Workshop on Internet Economics (WIE) in December, connecting academic researchers, commercial Internet facilities and service providers, theorists, policy makers, and pundits of Internet economics to frame an Internet economics research agenda, and more specifically to improve the realism, utility, and predictive power of economic models of Internet topology and dynamics.

In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. We analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be used, and automated, to detect outages or similar macroscopically disruptive events in other geographic or topological regions.

We are applying our theoretical, empirical, and practical understandings of the Internet's evolution to engage in the NSF's exciting Future Internet Architecture (FIA) Research program. We are participating in the Named Data Networking project, a 12-university collaboration funded by FIA to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP address, but also data (content) itself. This approach shifts the focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of the today Internet: routing scalability, network security, content protection and privacy. In 2011 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation.

Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and (six) workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at https://www.caida.org/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's long-term topology research agenda includes four strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling; and 4) analysis of IPv4 and IPv6 address space allocation.

Activities

  1. Macroscopic Topology Measurements:
    1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the fourth full calendar year of the IPv4 Routed /24 Topology Dataset and the third full calendar year of the IPv6 Topology Dataset collection.
    2. We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
  2. Analysis of Observable Topology:
    • We run the alias resolution tools on the Ark platform and combine the outcomes to map IP addresses to routers as accurately and completely as feasible. Using publicly available data from many networks and ground truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. We released a technical report Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture detailing the MIDAR system architecture and submitted a version of this paper to IEEE/ACM Transactions on Networking for publication in 2012.
    • Resulting from our improved measurement and analysis techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, using traceroute data collected as part of the IPv4 Routed /24 Topology Dataset and alias resolution measurements conducted in April and October 2011. Each 2011 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses.
    • We created new IPv4 and IPv6 AS Core Graph visualizations using August 2010 Ark data.
    • In January 2011 we temporarily halted the bi-weekly production of AS-level topologies annotated with business relationships between ASes dataset and started revisions and improvements of our published algorithms inferring these relationships. We plan to resume the production of this popular data after completing the changes and verification of the new algorithms.
    • Data collected using traceroute-based algorithms underpins research into the Internet's router-level topology, though it is possible to infer false links from this data. In Measured Impact of Crooked Traceroute, we examined the inaccuracies induced from such false inferences, both on macroscopic and ISP topology mapping. We observed that most per-flow load-balancing did not induce false links when macroscopic topology is inferred using classic traceroute. The effect of false links on ISP topology mapping is possibly much worse, because the degrees of a tier-1 ISP's routers derived from classic traceroute were inflated by a median factor of 2.9 as compared to those inferred with Paris traceroute.
    • We continued our work measuring the evolution and dynamics of peering relationships. In Twelve Years in the Evolution of the Internet Ecosystem, we analyzed data and studied trends in the evolution of the Internet AS topology in the last 12 years. This work focused mainly on transit (customer-provider) links in the AS topology, as these are visible in data available from public repositories of BGP data.
    • We published the technical report, "Geocompare: a comparison of public and commercial geolocation databases" in May 2011. The report attempts a systematic quantitative comparison of currently available geolocation service providers. The report describes our process for selecting distance thresholds for comparison, and our centroid-based algorithm for comparing database lat-long results against a majority of responses from the set of databases we evaluated. We presented the work at Network Mapping and Measurement Conference (NMMC) in May 2011.
  • Topology Modeling:
    1. We proved that graphs in a general class of self-similar networks have zero percolation threshold. The considered self-similar networks included random scale-free graphs with given expected node degrees and zero clustering, scale-free graphs with finite clustering and metric structure, growing scale-free networks, and many real networks. The proof and the derivation of the giant component size in Percolation in Self-Similar Networks did not require the assumption that networks were treelike. Our results rely only on the observation that self-similar networks possess a hierarchy of nested subgraphs whose average degree grows with their depth in the hierarchy. We conjecture that this property is pivotal for percolation in networks.
  • Analysis of IPv4 and IPv6 address space allocation
  • Publications

    Outreach

    Ongoing data releases

    We made publicly available the following topology datasets:

    Student Involvement

    Justin Cheng, UCSD undergraduate student, worked as an assistant Graphics Designer.

    Funding Sources

    Our topology research received support from:

    Routing

    Discovering Hyperbolic Metric Spaces Hidden Beneath the Internet and Other Complex Networks

    Goals

    The primary objective of CAIDA's research in Internet routing remains the development and evaluation of solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, we spent the past year investigating the implications of this work to other disciplines, physics, biology, chemistry, and economics.

    Activities

    1. We showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. In Popularity versus Similarity in Growing Networks, we developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which preferential attachment emerges from local optimization processes. As opposed to preferential attachment, the optimization framework accurately describes large-scale Internet evolution, predicting new links in the Internet with remarkable precision. The developed framework can thus potentially be used to predict new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.
    2. We introduced and studied random bipartite networks with hidden variables. The hidden variable formalism developed in Hidden variables in bipartite networks has been a powerful tool in studying the structure and function of complex networks, and can also be useful in inferring individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.

    Publications

    Outreach

    Student Involvement

    CAIDA hosted Chiara Orsini, a graduate student from University of Pisa, Italy.

    Funding Sources

    Our routing research received support from:

    Economics and Policy

    Goals

    The high-level objective of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. We aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow.

    Activities

    1. We developed GENESIS, a computational model of interdomain network formation that captures strategy selection dynamics by autonomous networks. This model provides the underpinnings for our study of peering strategy selection by autonomous networks in the Internet. We submitted a paper for publication in IEEE Infocom 2012.
    2. We continued our work on measuring the statistical properties of the interdomain traffic matrix (ITM). Our study revealed a sparse ITM and that we can model the traffic sent by an AS using either the log-normal or Pareto distribution, depending on whether the corresponding traffic experiences congestion. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We submitted a paper Towards a Statistical Characterization of the Interdomain Traffic Matrix for publication at the International Federation for Information Processing (IFIP) Networking Conference in 2012.
    3. We began drafting a worldwide IPv6 Network Operator Survey. In 2012, we plan to collect feedback on the survey, make revisions, and conduct the survey to parameterize our IPv6 modeling work.
    4. Amogh Dhamdhere posted an economics-related essay on CAIDA blog, " Model for Internet Evolution Predicts Consolidation in Tier-1 Transit Market", in July 2011.
    5. We regularly responded to requests from government agencies and policymaking bodies for comments and positions that inform policy with the best available empirical data. kc claffy served on two ICANN advisory committees, RSSAC and SSAC, and continued on in her second year as a member of the FCC Technical Advisory Committee (TAC). She wrote blog commentaries about TAC meetings in March and in June, 2011.
    6. kc claffy published a blog commentary " network neutrality: the meme, its cost, its future", as follow-up to a panel on network neutrality hosted at the June 2011 cybersecurity meeting of the DHS/SRI Infosec Technology Transition Council.
    7. kc claffy contributed an article Underneath the Hood: Ownership vs. Stewardship of the Internet to the CircleID Internet Infrastructure blog, discussing ICANN's approval of the creation of the .XXX top level domain suffix.

    Publications

    Outreach

    Student Involvement

    Gylmar Moreno, UCSD undergraduate student, worked as an assistant Programmer Analyst.

    Funding Sources

    Our economics research received support from:

    Security and Stability

    Goals

    We seek to develop new methods of analysis and aggregation of Internet measurement data from multiple available sources in order to shed light on various Internet security related events, including global connectivity disruptions due to political or catastrophic causes. Our methodology and findings can form the basis for automated early-warning detection systems for large-scale Internet outages.

    Activities

    1. In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. In "Analysis of Country-wide Internet Outages Caused by Censorship", we analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be automated and used to detect outages or similar macroscopically disruptive events in other geographic or topological regions.

    Publications

    Outreach

    Funding Sources

    Our support for security and stability research comes from:

    Future Internet Architecture

    Named Data Networking (NDN)

    Goals

    The main goal of this collaborative project is research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer routing directly on content names.

    Activities

    The list of collaborating institutions includes UC Los Angeles, Palo Alto Research Center (PARC), Colorado State University, University of Arizona, University of Illinois/Urbana-Champaign, UC Irvine, UC San Diego, University of Memphis, Washington University, and Yale University, and is led by Lixia Zhang (UCLA) and Van Jacobson (PARC). CAIDA researchers participated in activities of the Evaluation and Measurement, Theory, and Routing/Forwarding teams.

    1. kc claffy posted a blog commentary, " my first Future Internet Architecture PI meeting" in January 2011.
    2. We deployed and maintained a local node on the national NDN testbed using the CCNX hub software.
    3. To test the applicability of the hyperbolic greedy routing methods to NDN, we conducted simulations forwarding packets on the new CCNx network. We extracted the Autonomous System (AS) graph of the testbed and mapped each AS number to its hyperbolic coordinates using the supplementary data from our 2010 paper Sustaining the Internet with Hyperbolic Mapping. We then evaluated the performance of modified greedy forwarding strategies using the metrics of the delivery success ratio and three types of stretch.

    Outreach

    • We contributed to the Named Data Networking (NDN) Project 2010 - 2011 Progress Summary.
    • In May, CAIDA researchers participated in the first NDN retreat at PARC, Palo Alto, CA.
    • CAIDA researchers participated in the Future Internet Architecture Program Meeting and contributed to discussions of the four projects funded by FIA and the security features inherent to each architecture design.

    Funding Sources

    This research received support from NSF grant (CNS-1039646) Named Data Networking.


    Infrastructure Projects


    Archipelago (Ark)

    Goals

    Archipelago (Ark) is CAIDA's active measurement infrastructure. It aims to enable large-scale Internet measurements, while reducing the effort needed to develop, deploy and conduct sophisticated experiments. Ark represents a step toward a community-oriented measurement infrastructure as it allows CAIDA collaborators to run their vetted measurement tasks on a security-hardened distributed platform.

    Activities

    1. By the end of 2011, we increased the number of vantage points to 57 Ark monitors deployed in 29 countries.
    2. We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. In 2011, we released the following tools to the public: kapar, MIDAR, Motu, mper, and rb-mperio.
    3. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2011, Ark had 28 monitors collecting the data on the emerging IPv6 global topology.
    4. We continued support for spoofer experiment (collaboration with R. Beverly, NPS).

    Outreach

    In 2011, CAIDA researchers published 9 papers and non-CAIDA researchers published 11 papers that used Ark data.

    Funding Sources

    Ark infrastructure receives support from:

    UCSD Network Telescope

    Goals

    We develop and maintain a passive data collection system known as the Network Telescope, in order to study security related events by monitoring and analyzing unsolicited traffic arriving to a globally routed underutilized /8 network.

    UCSD Network Telescope , a passive data collection system focused on a globally routed /8 network that carries almost no legitimate traffic. The captured data, a unique resource, provides insights for network security researchers. The network telescope allows us to monitor unsolicited traffic, commonly referred to as Internet background radiation (IBR), destined to almost 1/256th of all IPv4 destination addresses on the Internet.

    Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, the monitor does not capture legitimate traffic, but rather communications that result from a wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).

    -->

    Activities

    1. Since data storage is becoming considerably more expensive, we prioritized telescope data curation and meta-data preservation.
    2. We started improving our software infrastructure for processing, management, analysis, visualization and reporting on data collected with the UCSD Network Telescope.
    3. We developed iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly defined subsets: 14 source types and 10 inter-arrival-time based groups. We used this tool to observe changes in one-way traffic at the UCSD Network Telescope over the first half of 2011. A paper One-way Traffic Monitoring with iatmon was submitted to PAM.

    Outreach

    • In March we organized and hosted a one-day Workshop on Network Telescopes to discuss the network and security research using network telescopes.
    • Dr. Tanja Zseby (Fraunhofer Institute for Open Communication Systems, Berlin, Germany) joined CAIDA in October as a Visiting Scholar for one year to work on darknet data analysis.

    Student Involvement

    Sarah Larsen, UCSD undergraduate student, worked as an assistant System Administrator.

    Funding Sources

    Our Network Telescope received support from:


    Data Sharing for Security / PREDICT

    The goal of the Department of Homeland Security project Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) is to provide vetted researchers with current network operational data in a secure and controlled manner that respects the security, privacy, legal, and economic concerns of Internet users and network operators. CAIDA supports PREDICT goals as Data Provider and Data Host and also plays an advisory role in developing technical, legal, and practical aspects of PREDICT policies and procedures.

    Goals

    Activities

    1. We received six user requests via the PREDICT portal during 2011 all of whom received access to our data.
    2. We completed the CAIDA Anonymized 2011 Internet Traces Dataset that contains traffic traces from our two monitors deployed on high-speed backbone links.
    3. We continued drafting a proposed framework document in the spirit of the Belmont Report that would address ethical principles and guidelines for the protection of human subjects in Information and Communications Technologies research.
    4. We attended the Workshop on Research Data Lifecycle Management and participated in discussions of best practices and funding models for selecting, storing, describing, preserving, and sharing the digital research data.
    5. On 28 December 2011, the Department of Homeland Security (DHS) posted Ethical Principles Guiding Information and Communication Technology Research: The Menlo Report and its Companion Report and announced the reports in the Federal Register. DHS also posted the Interaction of the Menlo Report and Revisions to the Common Rule-Comments in Response to the Advanced Notice of Proposed Rulemaking (ANPRM).

    Publications

    Outreach

    • We participated in two PREDICT PI meetings and contributed to developing PREDICT policies, data sharing, and marketing efforts. The PI kc claffy made the following presentations:
    • In March 2011, Erin Kenneally moderated a panel at the Workshop on Ethics in Computer Security Research (WECSR 2011).
    • We co-organized the 4th CAIDA-WIDE-CASFI Joint Measurement Workshop (Tokyo, Japan). The Workshop covered miscellaneous research and technical topics of mutual interest for CAIDA (USA), WIDE (Japan), and CASFI (South Korea) researchers.

    Funding Sources

    Support for this work comes from DHS contract, (DHS D07PC75579) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".


    DatCat: Internet Measurement Data Catalog

    Goals

    Originally funded by the NSF award (OCI-0137121) "Correlating Heterogeneous Measurement Data to Achieve System-Level Analysis of Internet Traffic Trends", CAIDA built the Internet Measurement Data Catalog (IMDC) to facilitate searching for and sharing of data and metadata among researchers. Since its launch in June 2006, the catalog has received contributions of metadata indexing nearly 19TB of data. Lack of funding and increased Oracle database licensing cost required that we disable the IMDC temporarily while we integrate lessons learned into our transition from this research prototype to the proposed increased operational capabilities.

    Based on the lessons we learned during the development and operation of IMDC, we began to upgrade and modify the underlying DatCat service with three tasks: streamline the user experience by simplifying the metadata entry process; migrate from a proprietary database backend (Oracle) to a completely open source solution; and expand the community of the catalog users to a broader range of cybersecurity and other researchers. We completed the third task this year and plan to complete the first two tasks in 2012.

    Activities

    1. We designed and developed a public forums interface integrated with the IMDC to hold discussion of data sharing issues and to answer frequently asked questions regarding the IMDC and the information it contains.

    Student Involvement

    Jesse Weinstein, UCSD undergraduate student, worked as an assistant Programmer Analyst.

    Funding Sources

    In 2011, our DatCat research received support from:


    Sustainable data-handling and analysis methodologies for the IRNC networks

    Goals

    NSF International Research Network Connections Program (IRNC) has funded five projects to provide network connections linking U.S. research networks with peer networks in other parts of the world. The goal of our IRNC Special Project is to support the IRNC community measurement efforts by fostering and leading discussion of how to best make IRNC data and statistics available, and by adapting CAIDA measurement technologies for IRNC community needs.

    Activities

    1. We added Internet Protocol Version 6 (IPv6) capabilities to the Coralreef suite of network data collection and analysis tools for processing network traces and flows. We also added support for prefix preserving IPv6 address anonymization, an option to apply IPv4 anonymization policy to IPv4 addresses embedded within IPv6 addresses (IPv4-mapped, SIIT, Teredo, 6to4, 6over4, ISATAP), an option to anonymize IP addresses in nested headers (e.g. IPIP, or the original IP header in an ICMP error message) as well as an option to leave multicast addresses intact. Our next step will be to extend the Coralreef Report Generator software to better visualize the IPv6 traffic separately from the IPv4 packets.
    2. CAIDA held several conference calls with IRNC ProNET PI Julio Ibarra and his staff to discuss how he might instrument a hybrid network router that transits both OpenFlow as well as IP traffic. We discussed use of CAIDA's Coralreef suite of data collection, analysis, and reporting tools for reporting and visualization of the IP portion of the traffic.
    3. We made progress on extending our Archipelago measurement infrastructure to monitor IRNC sites.
      1. With an introduction by IRNC ProNet PI Julio Ibarra, we obtained contacts at the Academic Network for State of Sao Paulo (ANSP) and signed and Ark Memorandum of Cooperation (MoC) with them.
      2. With an introduction by IRNC ProNet PI David Lassner, we worked with Australia's Academic and Research Network (AARNet). AARNet accepted our MoC and donated hardware for a new Ark server in Perth, Australia.
      3. ProNet PI Steve Huter provided contacts with two network engineers in Gambia, where we will deploy an Ark monitor.
      4. IRNC Network Engineer John Hicks provided contacts with the University of Peradeniya in Sri Lanka where we are pursuing the deployment of another Ark monitor.
    4. We developed an IRNC Wiki page with the intention for it to serve as a collection point for IRNC related activities.

    Outreach

    Funding Sources

    This project is funded by NSF grant (OCI-0963073) "IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks".


    Tools

    CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

    2011 Tool Development

    MIDAR

    MIDAR stands for Monotonic ID-Based Alias Resolution, is a tool developed by CAIDA that builds on recent work in alias resolution using IP-ID time stamps to scale related techniques to the size of large-scale Internet topologies (millions of nodes) with greater precision and sensitivity. MIDAR, our Monotonic ID-Based Alias Resolution tool, provides an extremely precise ID comparison test based on monotonicity rather than proximity. MIDAR integrates multiple probing methods, multiple vantage points, and a novel sliding-window probe scheduling algorithm to increase scalability to millions of IP addresses. Experiments show that MIDAR's approach is effective at minimizing the false positive rate sufficiently to achieve a high positive predictive value at Internet scale.

    kapar

    The "kapar" tool is inspired by the promising foundation presented in Mehmet Gunes' APAR, CAIDA wrote a highly optimized implementation for production use on large-scale Internet topologies, as well as fixing a few bugs and experimenting with our own improvements to the algorithm.

    mper

    mper is a probing engine that clients can use to conduct network measurements using ICMP, UDP, and TCP probes.

    rb-mperio

    rb-mperio is a RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. rb-mperio v0.3.0 was released on September 30, 2011.

    Motu

    Motu is a simple tool for dealiasing pairs of IPv4 addresses. Version 1.0.1 was released on October 5, 2011.

    CAIDA Tools Download Report

    The table below displays all CAIDA developed and currently supported tools distributed via our home page at https://catalog.caida.org/details/software and the number of downloads of each version during 2011.

    Tool Description Downloads
    Autofocus Internet traffic reports and time-series graphs. 290
    Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 108
    CoralReef Measures and analyzes passive Internet traffic monitor data. 482
    Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 95
    dnsstat DNS traffic measurement utility. 169
    iffinder Discovers IP interfaces belonging to the same router. 278
    libsea Scalable graph file format and graph library. 212
    kapar Graph-based IP alias resolution. 18
    MIDAR Identifies IPv4 addresses belonging to the same router (aliases) using shared monotonic IP ID counters. 33
    Motu Dealiases pairs of IPv4 addresses. 14
    mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 50
    otter Visualizes arbitrary network data. 238
    plot-latlong Plots points on geographic maps. 276
    plotpaths Displays forward traceroute path data. 53
    rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 47
    RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 321
    sk_analysis_dump A tool for analysis of traceroute-like topology data. 58
    topostats Computes various statistics on network topologies. 133
    Walrus Visualizes large graphs in three-dimensional space. 2506

    Data

    Data Collected in 2011

    In 2011, CAIDA captured the following raw data:

    We curated and archived several datasets from these data: During the Day In The Life of the Internet (DITL 2011 on April 13-15)) we collected one-hour passive traces on high-speed internet backbone links (distributed as part of the CAIDA Anonymized High-speed Internet Traces 2011, and retained the "live" data collected on the UCSD Network Telescope as well.

    The table below lists the amount of data collected in our ongoing data collection operations.

    Data Type First date Last date Total size1
    Macroscopic Topology Measurements, IPv4 (Archipelago) 2011-01-01 2011-12-31 596.9 GiB (1.9 TiB)
    Macroscopic Topology Measurements, IPv6 (Archipelago) 2011-01-01 2011-12-31 1.9 GiB (6.6 GiB)
    Internet backbone Traces 2011-01-20 2011-12-15 3.1 TiB (6.8 TiB)3
    "Live" Network Telescope Data 2011-01-01 2011-12-31 29.9 TiB (59.7 TiB)2,4
    DNS Names for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 7.8 GiB (29.5 GiB)
    AS Links for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 155.8 MiB (636.5 MiB)
    Macroscopic Internet Topology Data Kit (ITDK) 2011-04-01 2011-11-03 361.5 MiB (1.9 GiB)
    DNS root/gTLD RTT Dataset 2011-03-16 2011-12-31 448.7 MiB
    1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
    2The size of this data set varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
    3This includes traces on April 13 during DITL 2011, and traces on 8 June 2011 (IPv6 Day)
    4This includes 279 GB of data collected during DITL 2011 and 95 GB on IPv6 Day.

    Datasets Distributed in 2011

    CAIDA makes some datasets publicly available without restrictions to the user, while access to other datasets is restricted to academic researchers, CAIDA members, and government contractors with data access subject to certain safeguards designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

    • Publicly Available Data

      These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

    Dataset Unique visitors (IPs) Data Downloaded
    AS Rank 23 4.4 MiB
    AS Links (AS Adjacencies) 644 22.5 GiB
    AS Relationships 862 5.7 GiB
    Router Adjacencies 267 626.2 MiB
    AS Taxonomy 156 81.4 MiB *
    Witty Worm Dataset 223 319.7 MiB
    Code-Red Worms Dataset 527 6.3 GiB
    We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers.
    * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
    • Restricted Access Data

      These datasets require that users:

      • be academic or government researchers, or join CAIDA;
      • request an account and provide a brief description of their intended use of the data; and
      • agree to an Acceptable Use Policy.
    Dataset Unique visitors (usernames) Data Downloaded *
    Anonymized Internet Backbone Traces 187 22.4 TiB
    Backscatter Datasets 34 245.2 GiB
    (Raw Topology Traces from Archipelago infrastructure)
    50 1.9 TiB
    Raw Topology Traces (skitter) 25 82.1 GiB
    DNS Names for IPv4 Routed /24 Topology Dataset 31 53.9 GiB
    Macroscopic Internet Topology Data Kit 70 43.5 GiB
    Witty Worm Dataset 15 190.4 GiB
    DNS Root/gTLD server RTT Dataset 7 12.9 MiB
    DDoS Attack Dataset 60 230.0 GiB
    Telescope Datasets 17 268.1 GiB
    * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly under-counting the total volume of data served through our dataservers.
    • Restricted Access Data Requests

      The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

      We received about 33 more requests in 2011 then in 2010, and approved 46 more requests for access to restricted datasets. About 77.1 % of the users that are granted access actually accessed our webservers to download data.

    Dataset Number of requests received Number of users granted access Number of users that accessed data
    Anonymized Backbone and Peering Link Traces 270 208 168
    Active Topology Trace Datasets 153 127 83
    Backscatter-2008 Dataset 51 34 28
    Witty Worm Dataset 15 11 10
    DNS Root/gTLD server RTT Dataset 10 8 6
    DDoS Attack Dataset 91 62 51
    Telescope Datasets 29 22 18
    Totals 619 472 364

    Workshops

    As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

    CAIDA/UCY Workshop on Network Geometry

    From January 11-13, 2011, the University of Cyprus (UCY) hosted an interdisciplinary "Network Geometry" workshop jointly organized by CAIDA, UCSD and UCY. The agenda included short presentations by participants as well as extensive time for discussions and interactions.

    ISMA - 3rd Workshop on Active Internet Measurements (AIMS-3)

    On February 9-11, 2011, CAIDA hosted the 3rd workshop on Active Internet Measurements supporting science and policy. This workshop continues the series of Internet Statistics and Metrics Analysis (ISMA) workshops that are held to discuss the current and future state of Internet measurement and analysis.

    Workshop on Network Telescopes

    On March 22, 2011, CAIDA hosted a half-day workshop on network and security research using Network Telescopes. The agenda included short presentations by participants, discussions, and interactions. Some participants attended remotely via web videoconference.

    Workshop on BGP and Traceroute data

    As part of our efforts on the Internet Laboratory for Empirical Network Science (iLENS) Project, CAIDA hosted a workshop on August 22nd, 2011 to discuss scalable measurement and analysis of BGP and traceroute data.

    Workshop on Internet Economics

    On December 1-2, 2011, CAIDA and Georgia Tech hosted its second Workshop on Internet Economics. The workshop included presentations by participants, and in depth discussions on how to improve the realism and utility of Internet interdomain connectivity models for trend analysis, as well as predictions of how the Internet ecosystem will look 5-15 years from now. A two-day event to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. The event brought together a mix of academia and industry to discuss the topics surrounding the field of Internet infrastructure economics and AS peering policies and practices.

    CAIDA-WIDE-CASFI Workshop

    On December 5th, 2011, the 4th CAIDA-WIDE-CASFI Joint Measurement Workshop was held in Tokyo, Japan. This workshop continues a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The Workshop covered miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants and brought various groups together to share their latest research.

    UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)

    Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES was to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar fostered communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provided young researchers a forum to practice their presentation and communication skills. The seminars continued in 2011 to bring in attendees from a diversity of disciplines.

    Publications

    The following table contains the papers published by CAIDA for the calendar year of 2011. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

    Year Month Author(s) Title Publication
    2011 Nov
    1. Dainotti, Alberto
    2. Squarcella, Claudio
    3. Aben, Emile
    4. Claffy, Kimberly
    5. Chiesa, Marco
    6. Russo, Michele
    7. Pescapè, Antonio
    Analysis of Country-wide Internet Outages Caused by Censorship ACM Internet Measurement Conference (IMC)
    2011 Sep
    1. claffy, kc
    Underneath the hood: ownership vs. stewardship of the internet ACM SIGCOMM Computer Communication Review (CCR)
    2011 Sep
    1. claffy, kc
    "Network Neutrality": the meme, its cost, its future ACM SIGCOMM Computer Communication Review (CCR)
    2011 Sep
    1. Dhamdhere, Amogh
    2. Dovrolis, Constantine
    Twelve Years in the Evolution of the Internet Ecosystem IEEE/ACM Transactions on Networking
    2011 Aug
    1. Kitsak, Maksim
    2. Krioukov, Dmitri
    Hidden variables in bipartite networks Physical Review E
    2011 Jul
    1. claffy, kc
    Tracking IPv6 Evolution: Data We Have and Data We Need ACM SIGCOMM Computer Communication Review (CCR)
    2011 Jul
    1. claffy, kc
    The 3rd Workshop on Active Internet Measurements (AIMS-3) Report ACM SIGCOMM Computer Communication Review (CCR)
    2011 Jul
    1. Fomenkov, Marina
    2. claffy, kc
    Internet measurement data management challenges Workshop on Research Data Lifecycle Management
    2011 Jun
    1. Mérindol, Pascal
    2. Donnet, Benoit
    3. Pansiot, Jean-Jacques
    4. Luckie, Matthew
    5. Hyun, Young
    MERLIN: MEasure the Router Level of the INternet Conference on Next Generation Internet
    2011 May
    1. Huffaker, Bradley
    2. Fomenkov, Marina
    3. claffy, kc
    Geocompare: a comparison of public and commercial geolocation databases - Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
    2011 May
    1. Keys, Ken
    2. Hyun, Young
    3. Luckie, Matthew
    4. claffy, kc
    Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture - Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
    2011 Mar
    1. Kenneally, Erin
    2. Stavrou, Angelos
    3. McHugh, John
    4. Christin, Nicolas
    Moving Forward, Building an Ethics Community (Panel Statements) Workshop on Ethics in Computer Security Research (WECSR)
    2011 Jan
    1. Luckie, Matthew
    2. Dhamdhere, Amogh
    3. claffy, kc
    4. Murrell, David
    Measured Impact of Crooked Traceroute ACM SIGCOMM Computer Communication Review (CCR)
    2011 Jan
    1. Serrano, Mirian Ángeles
    2. Krioukov, Dmitri
    3. Boguñá, Marián
    Percolation in Self-Similar Networks Physical Review Letters

    Presentations

    The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2011. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

    Year Month Presenters(s) Title Venue
    2011 Dec
    1. Dhamdhere, Amogh
    A cost model for network traffic (with an application to paid-peering) Workshop on Internet Economics (WIE)
    2011 Dec
    1. Huffaker, Bradley
    CAIDA Update 2011 WIDE-CASFI
    2011 Dec
    1. claffy, kc
    Analysis of Country-wide Internet Outages Caused by Censorship CAIDA-WIDE-CASFI Joint Measurement Workshop
    2011 Nov
    1. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks University of Maryland
    2011 Nov
    1. claffy, kc
    Analysis of Country-wide Internet Outages Caused by Censorship Different Angles on Network Complexity, Engineering, and Science (DANCES)
    2011 Nov
    1. Dainotti, Alberto
    Analysis of Country-wide Internet Outages Caused by Censorship ACM Internet Measurement Conference (IMC)
    2011 Oct
    1. claffy, kc
    IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks: Updates NSF International Research Network Connections (IRNC) PI Meeting
    2011 Oct
    1. Krioukov, Dmitri
    Geometry of Large Networks (Computer Science Perspective) American Institute of Mathematics (AIM)
    2011 Oct
    1. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks Institute for Mathematics and its Applications (IMA)
    2011 Oct
    1. claffy, kc
    IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks NSF International Research Network Connections (IRNC) Workshop
    2011 Aug
    1. claffy, kc
    Tracking IPv6 evolution: Data We Have and Data We Need Chinese American Networking Symposium (CANS)
    2011 Jul
    1. claffy, kc
    DHS PREDICT project: CAIDA update DHS PREDICT PI Meeting
    2011 Jun
    1. Krioukov, Dmitri
    Percolation in self-similar networks International School and Conference on Network Science (NetSci)
    2011 May
    1. Huffaker, Bradley
    Geolocation Comparison: CAIDA's Geolocation Database Comparison Network Mapping and Measurement Conference (NMMC)
    2011 May
    1. Krioukov, Dmitri
    Optimal routing in complex networks Named Data Network (NDN) PI Meeting
    2011 May
    1. Kitsak, Maksim
    Do Bipartite Networks Have Metric Structure? Different Angles on Network Complexity, Engineering, and Science (DANCES)
    2011 Apr
    1. Krioukov, Dmitri
    Hyperbolic geometry of complex networks Bell Labs-NIST Workshop on Large-Scale Geometry of Networks
    2011 Apr
    1. Huffaker, Bradley
    AS Core: Visualizing the Internet Different Angles on Network Complexity, Engineering, and Science (DANCES)
    2011 Mar
    1. claffy, kc
    DHS PREDICT project: CAIDA update DHS PREDICT PI Meeting
    2011 Mar
    1. claffy, kc
    AS Core: Visualizing the Internet UCSD CSE Perspectives in Computer Science
    2011 Mar
    1. Kitsak, Maksim
    Identification of Influential Spreaders in Complex Networks Decision Making: Bridging Psychophysics and Neurophysiology
    2011 Mar
    1. Krioukov, Dmitri
    Percolation in self-similar networks Decision Making: Bridging Psychophysics and Neurophysiology
    2011 Mar
    1. Kenneally, Erin
    The Need for Community Standards for Ethical Behavior in E-Crime Research AntiPhishing Working Group E-Crime (APWG eCR) Sync-Up
    2011 Feb
    1. claffy, kc
    IPv6: hither, thither, and yon ISMA Workshop on Active Internet Measurements (AIMS)
    2011 Feb
    1. Dhamdhere, Amogh
    Measured Impact of Crooked Traceroute ISMA Workshop on Active Internet Measurements (AIMS)
    2011 Feb
    1. Hyun, Young
    Internet Topology Data Kit ISMA Workshop on Active Internet Measurements (AIMS)
    2011 Feb
    1. Hyun, Young
    Archipelago Measurement Infrastructure Updates ISMA Workshop on Active Internet Measurements (AIMS)
    2011 Feb
    1. Dhamdhere, Amogh
    An Agent-based Model of Interdomain Interconnection in the Internet Different Angles on Network Complexity, Engineering, and Science (DANCES)

    Web Site Usage

    In 2011, CAIDA's web site continued to attract considerable attention from a broad, international audience. The wave of heavy traffic that occurred mid-year is attributed to increased downloads of the recently updated AS Core IPv4 and IPv6 graph, which was publicized at the various conferences and workshops that CAIDA staff attended.

    The graph and table below present the monthly history of traffic to www.caida.org for 2011. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.



    Web Usage Bar Graph
    MonthUnique visitorsNumber of visitsPagesHitsBandwidth
    Jan 201133,13356,462171,962893,93245.15 GB
    Feb 201133,57454,227185,918844,55149.11 GB
    Mar 201131,45153,835150,413799,77140.46 GB
    Apr 201132,06152,687154,328791,86138.57 GB
    May 201132,35154,849153,959729,15039.48 GB
    Jun 201128,70450,236154,292698,43081.40 GB
    Jul 201126,93748,059164,147648,54337.84 GB
    Aug 201125,42045,281139,964587,85046.25 GB
    Sep 201125,73645,590134,314587,51832.38 GB
    Oct 201128,25152,674162,092676,46032.02 GB
    Nov 201128,76352,291171,833808,93933.79 GB
    Dec 201126,13148,488145,888694,61028.29 GB
    Total 352,512 614,679
    (1.74 visits/visitor)
    1,889,110
    (3.07 pages/visit)
    8,761,615
    (14.25 hits/visit)
    504.75 GB
    (861.05 kb/visit)

    Organizational Chart

    CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2011. The image below shows the functional organization of CAIDA. Please check the home page for more complete information about CAIDA staff.

    [Image of CAIDA Functional Organization Chart]

    CAIDA Functional Organization Chart


    Funding Sources

    CAIDA thanks our 2011 sponsors, members, and collaborators.

    The charts below depict funds received by CAIDA during the 2011 calendar year.

    Funding Source Allocations Percentage of Total
    NSF 1,386,375 46%
    DOI 1,412,727 47%
    GIFT 200,470 7%
    Total 2,999,572 100%
    [Figure: Allocations by funding source]

    Figure 1. Allocations by funding source received during 2011.


    Operating Expenses

    The charts below depict CAIDA's Annual Expense Report for the 2011 calendar year.

    LABOR Salaries and benefits paid to staff and students
    IDC Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).
    SUPPLIES & EXPENSES Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.
    TRAVEL Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
    EQUIPMENT Computer hardware or other equipment costing more than $5000.
    TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
    Program Area Expenses Percentage of Total
    Labor 1,511,752 59%
    IDC 878,669 34%
    Supplies and Expenses 100,964 4%
    Travel 54,515 2%
    Equipment 18,225 1%
    Transfers 12,030 0%
    Total 2,576,155 100%
    [Figure: Operating Expenses]

    Figure 2. 2011 Operating Expenses



    Program Area Expenses Percentage of Total
    Infrastructure 1,222,166 47%
    Topology 777,072 30%
    Routing 414,187 16%
    Policy 92,225 4%
    Outreach 70,505 3%
    Total 2,576,155 100%
    [Figure: Expenses by Program Area]

    Figure 3. 2011 Expenses by Program Area

    Published
    Last Modified