Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:
- provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
- foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
- improve the integrity of the field of Internet science,
- inform science, technology, and communications public policies.
- Executive Summary
- Research Projects
- Infrastructure Projects
- Web Site Usage
- Organizational Chart
- Funding Sources
- Operating Expenses
This annual report covers CAIDA's activities in 2011, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. We are also dedicating resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation's International Research Network Connections (IRNC) program, and the Department of Homeland Security's Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project.
We continue to expand our Internet active measurement platform Ark in scale and functionality, and use this platform to collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and share many aggregated annotated derivative data sets publicly. Our topology measurement platform supports IPv6 -- by the end of 2011, 28 of our 57 Ark hosting sites provided IPv6 connectivity and topology measurements. We have dramatically improved existing techniques for IP address alias resolution for large Internet graphs; we submitted a paper describing and evaluating the performance of our algorithms in late 2011, hopefully for publication in 2012. (Preliminary technical report available on the web site now, see Topology section of the report.) Using these new techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in April and October 2011. Each 2011 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We are still working on improving and validating our AS relationship inference algorithm so that we can add additional annotations to future ITDKs.
On the theoretical side of topology research, we continued investigation of the geometric model we developed last year to study the structure and function of complex networks. This model assumes that hyperbolic geometry underlies many complex networks, which if true provides a natural explanation for the heterogeneous degree distributions and strong clustering that characterize so many complex networks, i.e., they are simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. We also showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. We developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The optimization framework more accurately describes large-scale Internet evolution (new links) than previous models, e.g., preferential attachment. The mathematically inclined will appreciate our related recent investigation of random bipartite networks using a hidden variable formalism that facilitates study of the structure and function of complex networks, as well as inference of individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.
We gained momentum on our economics and policy research agenda, focused primarily on explanatory and predictive modeling of the economics of transit and peering interconnections in the Internet. Two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex - in traffic dynamics, topology, and economics - for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. We are making progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible. We also held the second interdisciplinary Workshop on Internet Economics (WIE) in December, connecting academic researchers, commercial Internet facilities and service providers, theorists, policy makers, and pundits of Internet economics to frame an Internet economics research agenda, and more specifically to improve the realism, utility, and predictive power of economic models of Internet topology and dynamics.
In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. We analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be used, and automated, to detect outages or similar macroscopically disruptive events in other geographic or topological regions.
We are applying our theoretical, empirical, and practical understandings of the Internet's evolution to engage in the NSF's exciting Future Internet Architecture (FIA) Research program. We are participating in the Named Data Networking project, a 12-university collaboration funded by FIA to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP address, but also data (content) itself. This approach shifts the focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of the today Internet: routing scalability, network security, content protection and privacy. In 2011 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation.
Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and (six) workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at http://www.caida.org/home/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.
CAIDA's long-term topology research agenda includes four strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling; and 4) analysis of IPv4 and IPv6 address space allocation.
Macroscopic Topology Measurements:
- We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the fourth full calendar year of the IPv4 Routed /24 Topology Dataset and the third full calendar year of the IPv6 Topology Dataset collection.
- We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
Analysis of Observable Topology:
- We run the alias resolution tools on the Ark platform and combine the outcomes to map IP addresses to routers as accurately and completely as feasible. Using publicly available data from many networks and ground truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. We released a technical report Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture detailing the MIDAR system architecture and submitted a version of this paper to IEEE/ACM Transactions on Networking for publication in 2012.
- Resulting from our improved measurement and analysis techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, using traceroute data collected as part of the IPv4 Routed /24 Topology Dataset and alias resolution measurements conducted in April and October 2011. Each 2011 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses.
- We created new IPv4 and IPv6 AS Core Graph visualizations using August 2010 Ark data.
- In January 2011 we temporarily halted the bi-weekly production of AS-level topologies annotated with business relationships between ASes dataset and started revisions and improvements of our published algorithms inferring these relationships. We plan to resume the production of this popular data after completing the changes and verification of the new algorithms.
- Data collected using traceroute-based algorithms underpins research into the Internet's router-level topology, though it is possible to infer false links from this data. In Measured Impact of Crooked Traceroute, we examined the inaccuracies induced from such false inferences, both on macroscopic and ISP topology mapping. We observed that most per-flow load-balancing did not induce false links when macroscopic topology is inferred using classic traceroute. The effect of false links on ISP topology mapping is possibly much worse, because the degrees of a tier-1 ISP's routers derived from classic traceroute were inflated by a median factor of 2.9 as compared to those inferred with Paris traceroute.
- We continued our work measuring the evolution and dynamics of peering relationships. In Twelve Years in the Evolution of the Internet Ecosystem, we analyzed data and studied trends in the evolution of the Internet AS topology in the last 12 years. This work focused mainly on transit (customer-provider) links in the AS topology, as these are visible in data available from public repositories of BGP data.
- We published the technical report, "Geocompare: a comparison of public and commercial geolocation databases" in May 2011. The report attempts a systematic quantitative comparison of currently available geolocation service providers. The report describes our process for selecting distance thresholds for comparison, and our centroid-based algorithm for comparing database lat-long results against a majority of responses from the set of databases we evaluated. We presented the work at Network Mapping and Measurement Conference (NMMC) in May 2011.
- We proved that graphs in a general class of self-similar networks have zero percolation threshold. The considered self-similar networks included random scale-free graphs with given expected node degrees and zero clustering, scale-free graphs with finite clustering and metric structure, growing scale-free networks, and many real networks. The proof and the derivation of the giant component size in Percolation in Self-Similar Networks did not require the assumption that networks were treelike. Our results rely only on the observation that self-similar networks possess a hierarchy of nested subgraphs whose average degree grows with their depth in the hierarchy. We conjecture that this property is pivotal for percolation in networks.
Analysis of IPv4 and IPv6 address space allocation
- In "Tracking IPv6 Evolution: Data We Have and Data We Need", published in ACM SIGCOMM Computer Communication Review (CCR), vol. 43, no. 3, pp. 43--48, Jul 2011, we evaluate the types of measurement, data, and analysis needed to inform technical, business, and policy decisions. We survey available data that have allowed limited tracking of IPv6 deployment thus far, describe additional types of data that would support better tracking, and offer a perspective on the challenging future of IPv6 evolution.
- kc claffy wrote a blog commentary, "Data on current status of IPv6 deployment", in April 2011.
- kc claffy wrote a blog commentary, "CAIDA.s IPv6 measurement and analysis activities", in April 2011.
- kc claffy wrote a blog commentary, "Exhausted IPv4 address architectures", in May 2011.
- CAIDA cooperated with RIPE NCC's measurements of World IPv6 Day on June 8, 2011. As a follow-up, kc claffy wrote a blog commentary, "CAIDA participation in IPv6 day", in June 2011.
- Measured Impact of Crooked Traceroute, ACM SIGCOMM CCR, v. 41, no. 1, pp. 14-21, January 2011
- Percolation in Self-Similar Networks, Phys. Rev. Lett., v. 106, no. 4, pp. 048701, January 2011
- Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture, CAIDA Technical Report, May 2011
- Geocompare: a comparison of public and commercial geolocation databases, CAIDA Technical Report, May 2011.
- Tracking IPv6 Evolution: Data We Have and Data We Need, ACM SIGCOMM Computer Communication Review (CCR), vol. 43, no. 3, pp. 43--48, Jul 2011.
- MERLIN: MEasure the Router Level of the INternet, Proceedings of the Conference on Next Generation Internet (NGI 2011), June 2011
- Twelve Years in the Evolution of the Internet Ecosystem, IEEE/ACM Transactions on Networking, v. 19, no. 5, pp. 1420-1433, September 2011
- In February we organized and hosted the 3rd
Active Internet Measurements
(AIMS-3) workshop and made the following presentations:
- kc claffy presented IPv6: hither, thither, and yon
- Amogh Dhamdhere presented Measured Impact of Crooked Traceroute
- Young Hyun presented Internet Topology Data Kit and Archipelago Measurement Infrastructure Updates
- In August we organized and hosted a one day workshop on BGP and Traceroute Data. The workshop report was submitted to ACM SIGCOMM CCR.
- kc claffy presented AS Core: Visualizing the Internet at the UCSD Perspectives in Computer Science course in March 2011. Bradley Huffaker also made this presentation at the UCSD Complex Network Seminar (DANCES) in April 2011.
- Dmitri Krioukov presented Percolation in self-similar networks at the Decision Making: Bridging Psychophysics and Neurophysiology conference in March 2011.
Ongoing data releasesWe made publicly available the following topology datasets:
- The IPv4 Routed /24 Topology Dataset from Ark measurements
- The IPv6 Topology Dataset from Ark measurements
- The adjacency matrix of the observed Internet AS-level graph computed daily from Ark measurements
- The Routeviews Prefix to AS mappings Dataset (pfx2as) created on a daily basis starting from 2005-05-09.
- Daily files of the DNS reverse name lookups for the IPv4 core traceroute data.
- Internet Topology Data Kits for April (ITDK-2011-04) and October (ITDK-2011-10)
Student InvolvementJustin Cheng, UCSD undergraduate student, worked as an assistant Graphics Designer.
Funding SourcesOur topology research received support from:
- NSF grant (CNS-0958547) "Internet Laboratory for Empirical Network Science (iLENS)"
- NSF grant (OCI-0963073) "IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks"
- DHS Science and Technology Directorate contract (N66001-08-C-2029) "Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security"
- a University Research Program gift from Cisco Systems, Inc.
The primary objective of CAIDA's research in Internet routing remains the development and evaluation of solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, we spent the past year investigating the implications of this work to other disciplines, physics, biology, chemistry, and economics.
- We showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. In Popularity versus Similarity in Growing Networks, we developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which preferential attachment emerges from local optimization processes. As opposed to preferential attachment, the optimization framework accurately describes large-scale Internet evolution, predicting new links in the Internet with remarkable precision. The developed framework can thus potentially be used to predict new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.
- We introduced and studied random bipartite networks with hidden variables. The hidden variable formalism developed in Hidden variables in bipartite networks has been a powerful tool in studying the structure and function of complex networks, and can also be useful in inferring individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.
- Popularity versus Similarity in Growing Networks, Technical Report, Arxiv, June 2011.
- Hidden variables in bipartite networks, Phys. Rev. E, v. 82, pp 026114, August 2011.
- CAIDA continued hosting the UCSD Complex Network Seminar Different Angles on Network Complexity, Engineering, and Science (DANCES). The seminar brought together researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc.)
- CAIDA and the University of Cyprus (UCY) co-organized a 3-day interdisciplinary CAIDA/UCY Workshop on Network Geometry hosted at UCY in February 2011.
- Maksim Kitsak presented Identification of Influential Spreaders in Complex Networks at the Decision Making: Bridging Psychophysics and Neurophysiology conference in March 2011.
- Dmitri Krioukov presented Hyperbolic geometry of complex networks at the Bell Labs-NIST Workshop on Large-Scale Geometry of Networks in April 2011.
- Dmitri Krioukov presented Optimal routing in complex networks at the NSF Name Data Networking FIA PI meeting in May 2011.
- Maksim Kitsak presented Do Bipartite Networks Have Metric Structure? at the UCSD Complex Network Seminar (DANCES) in May 2011.
- Dmitri Krioukov presented Geometry of Large Networks (Computer Science Perspective) at the American Institute of Mathematics in Palo Alto, California, in October 2011.
- Dmitri Krioukov presented Popularity versus Similarity in Growing Networks at the Institute for Mathematics and its Applications in October 2011.
- Dmitri Krioukov presented Popularity versus Similarity in Growing Networks at an invited visit to University of Maryland in November 2011.
Student InvolvementCAIDA hosted Chiara Orsini, a graduate student from University of Pisa, Italy.
Funding SourcesOur routing research received support from:
- NSF grant (CNS-0964236) "NetSE: Discovering Hyperbolic Metric Spaces Hidden Beneath the Internet and Other Complex Networks"
- a University Research Program gift from Cisco Systems, Inc.
The high-level objective of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. We aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow.
- We developed GENESIS, a computational model of interdomain network formation that captures strategy selection dynamics by autonomous networks. This model provides the underpinnings for our study of peering strategy selection by autonomous networks in the Internet. We submitted a paper for publication in IEEE Infocom 2012.
- We continued our work on measuring the statistical properties of the interdomain traffic matrix (ITM). Our study revealed a sparse ITM and that we can model the traffic sent by an AS using either the log-normal or Pareto distribution, depending on whether the corresponding traffic experiences congestion. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We submitted a paper Towards a Statistical Characterization of the Interdomain Traffic Matrix for publication at the International Federation for Information Processing (IFIP) Networking Conference in 2012.
- We began drafting a worldwide IPv6 Network Operator Survey. In 2012, we plan to collect feedback on the survey, make revisions, and conduct the survey to parameterize our IPv6 modeling work.
- Amogh Dhamdhere posted an economics-related essay on CAIDA blog, "Model for Internet Evolution Predicts Consolidation in Tier-1 Transit Market", in July 2011.
- We regularly responded to requests from government agencies and policymaking bodies for comments and positions that inform policy with the best available empirical data. kc claffy served on two ICANN advisory committees, RSSAC and SSAC, and continued on in her second year as a member of the FCC Technical Advisory Committee (TAC). She wrote blog commentaries about TAC meetings in March and in June, 2011.
- kc claffy published a blog commentary "network neutrality: the meme, its cost, its future", as follow-up to a panel on network neutrality hosted at the June 2011 cybersecurity meeting of the DHS/SRI Infosec Technology Transition Council.
- kc claffy contributed an article Underneath the Hood: Ownership vs. Stewardship of the Internet to the CircleID Internet Infrastructure blog, discussing ICANN's approval of the creation of the .XXX top level domain suffix.
- Network neutrality: the meme, its cost, its future, ACM SIGCOMM CCR, v. 41, no. 1, pp. 44-45, September 2011.
- Underneath the Hood: Ownership vs. Stewardship of the Internet, ACM SIGCOMM CCR, v. 41, no. 5, pp. 46-47, October 2011.
CAIDA and Georgia Tech co-organized the second interdisciplinary
Workshop on Internet Economics (WIE) hosted at UCSD in December.
- Amogh Dhamdhere presented A cost model for network traffic (with an application to paid-peering) at the workshop.
- Amogh Dhamdhere presented An Agent-based Model of Interdomain Interconnection in the Internet at the UCSD Complex Network Seminar (DANCES) in February 2011.
Student InvolvementGylmar Moreno, UCSD undergraduate student, worked as an assistant Programmer Analyst.
Funding SourcesOur economics research received support from:
- NSF grant (CNS-1017064) "NetSE-Econ: The economics of transit and peering interconnections in the Internet"
- a University Research Program gift from Cisco Systems, Inc.
We seek to develop new methods of analysis and aggregation of Internet measurement data from multiple available sources in order to shed light on various Internet security related events, including global connectivity disruptions due to political or catastrophic causes. Our methodology and findings can form the basis for automated early-warning detection systems for large-scale Internet outages.
- In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. In "Analysis of Country-wide Internet Outages Caused by Censorship", we analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya's attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be automated and used to detect outages or similar macroscopically disruptive events in other geographic or topological regions.
- Analysis of Country-wide Internet Outages Caused by Censorship, Proceedings of the Internet Measurement Conference (IMC), Berlin, Germany, November 2011
- kc claffy presented Analysis of Country-wide Internet Outages Caused by Censorship at the UCSD Complex Network Seminar (DANCES) in November 2011.
Funding SourcesOur support for security and stability research comes from:
The main goal of this collaborative project is research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer routing directly on content names.
The list of collaborating institutions includes UC Los Angeles, Palo Alto Research Center (PARC), Colorado State University, University of Arizona, University of Illinois/Urbana-Champaign, UC Irvine, UC San Diego, University of Memphis, Washington University, and Yale University, and is led by Lixia Zhang (UCLA) and Van Jacobson (PARC). CAIDA researchers participated in activities of the Evaluation and Measurement, Theory, and Routing/Forwarding teams.
- kc claffy posted a blog commentary, "my first Future Internet Architecture PI meeting" in January 2011.
- We deployed and maintained a local node on the national NDN testbed using the CCNX hub software.
- To test the applicability of the hyperbolic greedy routing methods to NDN, we conducted simulations forwarding packets on the new CCNx network. We extracted the Autonomous System (AS) graph of the testbed and mapped each AS number to its hyperbolic coordinates using the supplementary data from our 2010 paper Sustaining the Internet with Hyperbolic Mapping. We then evaluated the performance of modified greedy forwarding strategies using the metrics of the delivery success ratio and three types of stretch.
- We contributed to the Named Data Networking (NDN) Project 2010 - 2011 Progress Summary.
- In May, CAIDA researchers participated in the first NDN retreat at PARC, Palo Alto, CA.
- CAIDA researchers participated in the Future Internet Architecture Program Meeting and contributed to discussions of the four projects funded by FIA and the security features inherent to each architecture design.
This research received support from NSF grant (CNS-1039646) Named Data Networking.
Archipelago (Ark) is CAIDA's active measurement infrastructure. It aims to enable large-scale Internet measurements, while reducing the effort needed to develop, deploy and conduct sophisticated experiments. Ark represents a step toward a community-oriented measurement infrastructure as it allows CAIDA collaborators to run their vetted measurement tasks on a security-hardened distributed platform.
- By the end of 2011, we increased the number of vantage points to 57 Ark monitors deployed in 29 countries.
- We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. In 2011, we released the following tools to the public: kapar, MIDAR, Motu, mper, and rb-mperio.
- We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2011, Ark had 28 monitors collecting the data on the emerging IPv6 global topology.
- We continued support for spoofer experiment (collaboration with R. Beverly, NPS).
In 2011, CAIDA researchers published 9 papers and non-CAIDA researchers published 11 papers that used Ark data.
Funding SourcesArk infrastructure receives support from:
- NSF grant (CNS-0958547) "Internet Laboratory for Empirical Network Science (iLENS)"
- NSF grant (OCI-0963073) "IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks"
- DHS Science and Technology Directorate contract (N66001-08-C-2029) "Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security"
We develop and maintain a passive data collection system known as the Network Telescope, in order to study security related events by monitoring and analyzing unsolicited traffic arriving to a globally routed underutilized /8 network.
- Since data storage is becoming considerably more expensive, we prioritized telescope data curation and meta-data preservation.
- We started improving our software infrastructure for processing, management, analysis, visualization and reporting on data collected with the UCSD Network Telescope.
- We developed iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly defined subsets: 14 source types and 10 inter-arrival-time based groups. We used this tool to observe changes in one-way traffic at the UCSD Network Telescope over the first half of 2011. A paper One-way Traffic Monitoring with iatmon was submitted to PAM.
In March we organized and hosted a one-day Workshop on Network Telescopes to discuss the network and security
research using network telescopes.
- Nevil Brownlee presented Network Telescope Data Analysis: IBR Monitoring at the workshop.
- Dr. Tanja Zseby (Fraunhofer Institute for Open Communication Systems, Berlin, Germany) joined CAIDA in October as a Visiting Scholar for one year to work on darknet data analysis.
Student InvolvementSarah Larsen, UCSD undergraduate student, worked as an assistant System Administrator.
Our Network Telescope received support from:
The goal of the Department of Homeland Security project Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) is to provide vetted researchers with current network operational data in a secure and controlled manner that respects the security, privacy, legal, and economic concerns of Internet users and network operators. CAIDA supports PREDICT goals as Data Provider and Data Host and also plays an advisory role in developing technical, legal, and practical aspects of PREDICT policies and procedures.
- We received six user requests via the PREDICT portal during 2011 all of whom received access to our data.
- We completed the CAIDA Anonymized 2011 Internet Traces Dataset that contains traffic traces from our two monitors deployed on high-speed backbone links.
- We continued drafting a proposed framework document in the spirit of the Belmont Report that would address ethical principles and guidelines for the protection of human subjects in Information and Communications Technologies research.
- We attended the Workshop on Research Data Lifecycle Management and participated in discussions of best practices and funding models for selecting, storing, describing, preserving, and sharing the digital research data.
- On 28 December 2011, the Department of Homeland Security (DHS) posted Ethical Principles Guiding Information and Communication Technology Research: The Menlo Report and its Companion Report and announced the reports in the Federal Register. DHS also posted the Interaction of the Menlo Report and Revisions to the Common Rule-Comments in Response to the Advanced Notice of Proposed Rulemaking (ANPRM).
- Moving Forward, Building an Ethics Community (panel statement), Proceedings of the Workshop on Ethics in Computer Security Research (WECSR 2011).
- Internet measurement data management challenges, Workshop on Research Data Lifecycle Management, Princeton, NJ, Jul 2011.
- We participated in two PREDICT PI meetings and contributed to developing PREDICT policies, data sharing, and marketing efforts. The PI kc claffy made the following presentations:
- In March 2011, Erin Kenneally moderated a panel at the Workshop on Ethics in Computer Security Research (WECSR 2011).
- We co-organized the 4th CAIDA-WIDE-CASFI Joint Measurement Workshop (Tokyo, Japan). The Workshop covered miscellaneous research and technical topics of mutual interest for CAIDA (USA), WIDE (Japan), and CASFI (South Korea) researchers.
Support for this work comes from DHS contract, (DHS D07PC75579) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".
Originally funded by the NSF award (OCI-0137121) "Correlating Heterogeneous Measurement Data to Achieve System-Level Analysis of Internet Traffic Trends", CAIDA built the Internet Measurement Data Catalog (IMDC) to facilitate searching for and sharing of data and metadata among researchers. Since its launch in June 2006 at www.datcat.org the catalog has received contributions of metadata indexing nearly 19TB of data. Lack of funding and increased Oracle database licensing cost required that we disable the IMDC temporarily while we integrate lessons learned into our transition from this research prototype to the proposed increased operational capabilities.
Based on the lessons we learned during the development and operation of IMDC, we began to upgrade and modify the underlying DatCat service with three tasks: streamline the user experience by simplifying the metadata entry process; migrate from a proprietary database backend (Oracle) to a completely open source solution; and expand the community of the catalog users to a broader range of cybersecurity and other researchers. We completed the third task this year and plan to complete the first two tasks in 2012.
- We designed and developed a public forums interface integrated with the IMDC to hold discussion of data sharing issues and to answer frequently asked questions regarding the IMDC and the information it contains.
Student InvolvementJesse Weinstein, UCSD undergraduate student, worked as an assistant Programmer Analyst.
In 2011, our DatCat research received support from:
NSF International Research Network Connections Program (IRNC) has funded five projects to provide network connections linking U.S. research networks with peer networks in other parts of the world. The goal of our IRNC Special Project is to support the IRNC community measurement efforts by fostering and leading discussion of how to best make IRNC data and statistics available, and by adapting CAIDA measurement technologies for IRNC community needs.
- We added Internet Protocol Version 6 (IPv6) capabilities to the Coralreef suite of network data collection and analysis tools for processing network traces and flows. We also added support for prefix preserving IPv6 address anonymization, an option to apply IPv4 anonymization policy to IPv4 addresses embedded within IPv6 addresses (IPv4-mapped, SIIT, Teredo, 6to4, 6over4, ISATAP), an option to anonymize IP addresses in nested headers (e.g. IPIP, or the original IP header in an ICMP error message) as well as an option to leave multicast addresses intact. Our next step will be to extend the Coralreef Report Generator software to better visualize the IPv6 traffic separately from the IPv4 packets.
- CAIDA held several conference calls with IRNC ProNET PI Julio Ibarra and his staff to discuss how he might instrument a hybrid network router that transits both OpenFlow as well as IP traffic. We discussed use of CAIDA's Coralreef suite of data collection, analysis, and reporting tools for reporting and visualization of the IP portion of the traffic.
We made progress on extending our Archipelago measurement infrastructure
to monitor IRNC sites.
- With an introduction by IRNC ProNet PI Julio Ibarra, we obtained contacts at the Academic Network for State of Sao Paulo (ANSP) and signed and Ark Memorandum of Cooperation (MoC) with them.
- With an introduction by IRNC ProNet PI David Lassner, we worked with Australia's Academic and Research Network (AARNet). AARNet accepted our MoC and donated hardware for a new Ark server in Perth, Australia.
- ProNet PI Steve Huter provided contacts with two network engineers in Gambia, where we will deploy an Ark monitor.
- IRNC Network Engineer John Hicks provided contacts with the University of Peradeniya in Sri Lanka where we are pursuing the deployment of another Ark monitor.
- We developed an IRNC Wiki page with the intention for it to serve as a collection point for IRNC related activities.
- We contributed slides focused on promoting our Ark measurement infrastructure to potential monitor hosting sites for presentations at the APRICOT-APAN 2011 conference and at the GLIF Technical Working Group Meeting.
- In October, we participated in the IRNC PI IRNC PI Meeting and in a concurrent NSF panel (Washington DC) that focused on how IRNC SP projects might leverage each other and how the IRNC program might contribute [data] to the research community.
This project is funded by NSF grant (OCI-0963073) "IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks".
CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.
2011 Tool Development
MIDAR stands for Monotonic ID-Based Alias Resolution, is a tool developed by CAIDA that builds on recent work in alias resolution using IP-ID time stamps to scale related techniques to the size of large-scale Internet topologies (millions of nodes) with greater precision and sensitivity. MIDAR, our Monotonic ID-Based Alias Resolution tool, provides an extremely precise ID comparison test based on monotonicity rather than proximity. MIDAR integrates multiple probing methods, multiple vantage points, and a novel sliding-window probe scheduling algorithm to increase scalability to millions of IP addresses. Experiments show that MIDAR's approach is effective at minimizing the false positive rate sufficiently to achieve a high positive predictive value at Internet scale.
The "kapar" tool is inspired by the promising foundation presented in Mehmet Gunes' APAR, CAIDA wrote a highly optimized implementation for production use on large-scale Internet topologies, as well as fixing a few bugs and experimenting with our own improvements to the algorithm.
mper is a probing engine that clients can use to conduct network measurements using ICMP, UDP, and TCP probes.
rb-mperio is a RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. rb-mperio v0.3.0 was released on September 30, 2011.
Motu is a simple tool for dealiasing pairs of IPv4 addresses. Version 1.0.1 was released on October 5, 2011.
CAIDA Tools Download Report
The table below displays all CAIDA developed and currently supported tools distributed via our home page at http://www.caida.org/tools/ and the number of downloads of each version during 2011.
Tool Description Downloads Autofocus Internet traffic reports and time-series graphs. 290 Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 108 CoralReef Measures and analyzes passive Internet traffic monitor data. 482 Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 95 dnsstat DNS traffic measurement utility. 169 iffinder Discovers IP interfaces belonging to the same router. 278 libsea Scalable graph file format and graph library. 212 kapar Graph-based IP alias resolution. 18 MIDAR Identifies IPv4 addresses belonging to the same router (aliases) using shared monotonic IP ID counters. 33 Motu Dealiases pairs of IPv4 addresses. 14 mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 50 otter Visualizes arbitrary network data. 238 plot-latlong Plots points on geographic maps. 276 plotpaths Displays forward traceroute path data. 53 rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 47 RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 321 sk_analysis_dump A tool for analysis of traceroute-like topology data. 58 topostats Computes various statistics on network topologies. 133 Walrus Visualizes large graphs in three-dimensional space. 2506
Data Collected in 2011
In 2011, CAIDA captured the following raw data:
- traceroutes probing IPv4 and IPv6 address space collected by the Archipelago infrastructure
- passive traffic traces from the equinix-chicago and equinix-sanjose monitors connected to Tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA
- passive traffic traces from the UCSD Network Telescope
- Anonymized High-speed Internet Traces 2011
- Anonymized Internet backbone trace dataset for IPv6 Day
- Macroscopic Internet Topology Data Kits (ITDKs): ITDK-2011-04 and ITDK-2011-10
- AS adjacencies (AS links)
- DNS Names for IPv4 Routed /24 Topology Dataset
The table below lists the amount of data collected in our ongoing data collection operations.
1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
Data Type First date Last date Total size1 Macroscopic Topology Measurements, IPv4 (Archipelago) 2011-01-01 2011-12-31 596.9 GiB (1.9 TiB) Macroscopic Topology Measurements, IPv6 (Archipelago) 2011-01-01 2011-12-31 1.9 GiB (6.6 GiB) Internet backbone Traces 2011-01-20 2011-12-15 3.1 TiB (6.8 TiB)3 "Live" Network Telescope Data 2011-01-01 2011-12-31 29.9 TiB (59.7 TiB)2,4 DNS Names for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 7.8 GiB (29.5 GiB) AS Links for IPv4 Routed /24 Topology Dataset 2011-01-01 2011-12-31 155.8 MiB (636.5 MiB) Macroscopic Internet Topology Data Kit (ITDK) 2011-04-01 2011-11-03 361.5 MiB (1.9 GiB) DNS root/gTLD RTT Dataset 2011-03-16 2011-12-31 448.7 MiB
2The size of this data set varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
3This includes traces on April 13 during DITL 2011, and traces on 8 June 2011 (IPv6 Day)
4This includes 279 GB of data collected during DITL 2011 and 95 GB on IPv6 Day.
Datasets Distributed in 2011
CAIDA makes some datasets publicly available without restrictions to the user, while access to other datasets is restricted to academic researchers, CAIDA members, and government contractors with data access subject to certain safeguards designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.
Publicly Available Data
These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.
Dataset Unique visitors (IPs) Data Downloaded AS Rank 23 4.4 MiB AS Links (AS Adjacencies) 644 22.5 GiB AS Relationships 862 5.7 GiB Router Adjacencies 267 626.2 MiB AS Taxonomy 156 81.4 MiB * Witty Worm Dataset 223 319.7 MiB Code-Red Worms Dataset 527 6.3 GiB We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers. * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
Restricted Access Data
These datasets require that users:
- be academic or government researchers, or join CAIDA;
- request an account and provide a brief description of their intended use of the data; and
- agree to an Acceptable Use Policy.
Dataset Unique visitors (usernames) Data Downloaded * Anonymized Internet Backbone Traces 187 22.4 TiB Backscatter Datasets 34 245.2 GiB (Raw Topology Traces from Archipelago infrastructure) 50 1.9 TiB Raw Topology Traces (skitter) 25 82.1 GiB DNS Names for IPv4 Routed /24 Topology Dataset 31 53.9 GiB Macroscopic Internet Topology Data Kit 70 43.5 GiB Witty Worm Dataset 15 190.4 GiB DNS Root/gTLD server RTT Dataset 7 12.9 MiB DDoS Attack Dataset 60 230.0 GiB Telescope Datasets 17 268.1 GiB * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly under-counting the total volume of data served through our dataservers.
Restricted Access Data Requests
The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.
We received about 33 more requests in 2011 then in 2010, and approved 46 more requests for access to restricted datasets. About 77.1 % of the users that are granted access actually accessed our webservers to download data.
Dataset Number of requests received Number of users granted access Number of users that accessed data Anonymized Backbone and Peering Link Traces 270 208 168 Active Topology Trace Datasets 153 127 83 Backscatter Datasets 51 34 28 Witty Worm Dataset 15 11 10 DNS Root/gTLD server RTT Dataset 10 8 6 DDoS Attack Dataset 91 62 51 Telescope Datasets 29 22 18 Totals 619 472 364
As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.
From January 11-13, 2011, the University of Cyprus (UCY) hosted an interdisciplinary "Network Geometry" workshop jointly organized by CAIDA, UCSD and UCY. The agenda included short presentations by participants as well as extensive time for discussions and interactions.
On February 9-11, 2011, CAIDA hosted the 3rd workshop on Active Internet Measurements supporting science and policy. This workshop continues the series of Internet Statistics and Metrics Analysis (ISMA) workshops that are held to discuss the current and future state of Internet measurement and analysis.
On March 22, 2011, CAIDA hosted a half-day workshop on network and security research using Network Telescopes. The agenda included short presentations by participants, discussions, and interactions. Some participants attended remotely via web videoconference.
As part of our efforts on the Internet Laboratory for Empirical Network Science (iLENS) Project, CAIDA hosted a workshop on August 22nd, 2011 to discuss scalable measurement and analysis of BGP and traceroute data.
On December 1-2, 2011, CAIDA and Georgia Tech hosted its second Workshop on Internet Economics. The workshop included presentations by participants, and in depth discussions on how to improve the realism and utility of Internet interdomain connectivity models for trend analysis, as well as predictions of how the Internet ecosystem will look 5-15 years from now. A two-day event to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. The event brought together a mix of academia and industry to discuss the topics surrounding the field of Internet infrastructure economics and AS peering policies and practices.
On December 5th, 2011, the 4th CAIDA-WIDE-CASFI Joint Measurement Workshop was held in Tokyo, Japan. This workshop continues a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The Workshop covered miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants and brought various groups together to share their latest research.
UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES was to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar fostered communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provided young researchers a forum to practice their presentation and communication skills. The seminars continued in 2011 to bring in attendees from a diversity of disciplines.
The following table contains the papers published by CAIDA for the calendar year of 2011. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.
||Analysis of Country-wide Internet Outages Caused by Censorship||Internet Measurement Conference (IMC)|
||Twelve Years in the Evolution of the Internet Ecosystem||IEEE/ACM Transactions on Networking|
||"Network Neutrality": the meme, its cost, its future||ACM SIGCOMM Computer Communication Review (CCR)|
||Underneath the hood: ownership vs. stewardship of the internet||ACM SIGCOMM Computer Communication Review (CCR)|
||Hidden variables in bipartite networks||Physical Review E|
||Tracking IPv6 Evolution: Data We Have and Data We Need||ACM SIGCOMM Computer Communication Review (CCR)|
||Internet measurement data management challenges||Workshop on Research Data Lifecycle Management|
||The 3rd Workshop on Active Internet Measurements (AIMS-3) Report||ACM SIGCOMM Computer Communication Review (CCR)|
||MERLIN: MEasure the Router Level of the INternet||Conference on Next Generation Internet|
||Geocompare: a comparison of public and commercial geolocation databases - Technical Report||Cooperative Association for Internet Data Analysis (CAIDA)|
||Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture - Technical Report||Cooperative Association for Internet Data Analysis (CAIDA)|
||Moving Forward, Building an Ethics Community (Panel Statements)||Workshop on Ethics in Computer Security Research (WECSR)|
||Measured Impact of Crooked Traceroute||ACM SIGCOMM Computer Communication Review (CCR)|
||Percolation in Self-Similar Networks||Physical Review Letters|
The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2011. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.
In 2011, CAIDA's web site continued to attract considerable attention from a broad, international audience. The wave of heavy traffic that occurred mid-year is attributed to increased downloads of the recently updated AS Core IPv4 and IPv6 graph, which was publicized at the various conferences and workshops that CAIDA staff attended.
The graph and table below present the monthly history of traffic to www.caida.org for 2011. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.
|Month||Unique visitors||Number of visits||Pages||Hits||Bandwidth|
|Jan 2011||33,133||56,462||171,962||893,932||45.15 GB|
|Feb 2011||33,574||54,227||185,918||844,551||49.11 GB|
|Mar 2011||31,451||53,835||150,413||799,771||40.46 GB|
|Apr 2011||32,061||52,687||154,328||791,861||38.57 GB|
|May 2011||32,351||54,849||153,959||729,150||39.48 GB|
|Jun 2011||28,704||50,236||154,292||698,430||81.40 GB|
|Jul 2011||26,937||48,059||164,147||648,543||37.84 GB|
|Aug 2011||25,420||45,281||139,964||587,850||46.25 GB|
|Sep 2011||25,736||45,590||134,314||587,518||32.38 GB|
|Oct 2011||28,251||52,674||162,092||676,460||32.02 GB|
|Nov 2011||28,763||52,291||171,833||808,939||33.79 GB|
|Dec 2011||26,131||48,488||145,888||694,610||28.29 GB|
CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2011. The image below shows the functional organization of CAIDA. Please check the home page for more complete information about CAIDA staff.
CAIDA Functional Organization Chart
CAIDA thanks our 2011 sponsors, members, and collaborators.
The charts below depict funds received by CAIDA during the 2011 calendar year.
|Funding Source||Allocations||Percentage of Total|
Figure 1. Allocations by funding source received during 2011.
The charts below depict CAIDA's Annual Expense Report for the 2011 calendar year.
|LABOR||Salaries and benefits paid to staff and students|
|IDC||Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).|
|SUPPLIES & EXPENSES||Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.|
|TRAVEL||Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.|
|EQUIPMENT||Computer hardware or other equipment costing more than $5000.|
|TRANSFERS||Exchange of funds between groups for recharge for IT desktop support and Oracle database services.|
|Program Area||Expenses||Percentage of Total|
|Supplies and Expenses||100,964||4%|
Figure 2. 2011 Operating Expenses
|Program Area||Expenses||Percentage of Total|
Figure 3. 2011 Expenses by Program Area