Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:
- provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
- foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
- improve the integrity of the field of Internet science,
- inform science, technology, and communications public policies.
- Executive Summary
- Research Areas
- Infrastructure Projects
- Web Site Usage
- Organizational Chart
- Funding Sources
- Operating Expenses
This annual report covers CAIDA's activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.
We continued to make advances in Internet topology research, supported by our expanding Ark measurement infrastructure. We collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6 -- by the end of 2012, 28 of our 64 Ark hosting sites provided IPv6 connectivity and topology measurements. Using our new alias resolution measurement system, which integrates and improves on the best available technology for IP address alias resolution, we collected, analyzed, processed and released our fifth published Internet Topology Data Kit (ITDK), reflecting measurements taken in July 2012. The July 2012 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. After an extensive exercise with our validation data via AS Rank, we also spent many months this year overhauling our AS relationship inference algorithm so that we can add AS relationship annotations to future ITDKs.
On the theoretical side of topology research, we developed a new model in which new connections optimize certain trade-offs between popularity and similarity of nodes, instead of simply preferring popular nodes. This framework has a geometric interpretation in which popularity preference emerges from local optimization. In contrast to standard preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, accurately predicting the probability of new links. We developed a related framework to support mapping a real network into a hyperbolic plane in a way congruent with this model of network growth. Perhaps our most exciting theoretical result was our discovery of structural similarity (power-law graph with strong clustering) between a casual network representing the large-scale structure of spacetime in our accelerating universe, and complex networks such as the Internet, social, or biological networks. We collaborated with supercomputing experts at SDSC to run HPC simulations that provided evidence that this structural similarity is due to asymptotic equivalence in large-scale growth dynamics of complex networks and spacetime in the universe.
In 2012 we continued applying our theoretical, empirical, and practical understandings of the Internet's evolution to the challenge of enable dramatically more scalable global Internet routing. We continued our partnership in the Named Data Networking project, a 12-university collaboration funded by NSF's Future Internet Architecture (FIA) Research program to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP addresses, but also data (content) itself. This approach shifts the focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of today's Internet: routing scalability, network security, content protection and privacy. In 2012 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation. The most challenging part of this routing research as it pertains to the Internet still lies ahead, and will require a broader community of engaged thinkers: application of these and other theoretical results to real-world Internet security, economic, and policy contexts.
A more immediate architectural need of the global Internet has inspired us to study the transition to IPv6. The two main lessons we can glean from the scant data available are: (i) architectural transitions - even those deemed minor but essential - are slow; (ii) the U.S. is behind other regions of the world in IPv6 deployment, and has not thus far invested in shedding quantitative light on this problem, despite making attempts to lightly nudge the market toward wider IPv6 adoption. With support from NSF, we collaborated with the Naval Postgraduate School (Rob Beverly) we studied the deployment of IPv6 at the Autonomous System (AS) level using historical BGP data and recent active measurements, to compare IPv4 topology structure and adoption trends. While most core Internet transit providers have deployed IPv6, edge networks are lagging. IPv6 deployment is stronger in Europe and the Asia-Pacific region, than in North America. The IPv6 topology is characterized by a single dominant player, Hurricane Electric, which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. We found that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ. To support a separate but related modeling effort, we developed and conducted a survey of network operators to gauge IPv6 deployment patterns and plans. Based on the results we hope to refine and re-issue a survey next year, to inform and parameterize a predictive model of possible IPv6 future trajectories.
We made significant progress on our Internet economics research, one goal of which is to create a scientific basis for modeling Internet interdomain interconnection and dynamics, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow. We developed and published a holistic cost model that can help operators evaluate the costs of various routing and peering decisions, among other network operation costs. Using traffic data from a large carrier network, our model revealed how network operators can significantly reduce the cost of carrying traffic in their networks by adjusting routing for a small fraction of total traffic. We also published a paper on our GENESIS simulator, which embodies a computational model of interdomain network formation that captures key factors influencing network formation dynamics: highly skewed traffic matrix, policy-based routing, geographic co-location constraints, and the costs of transit/peering agreements. This simulator enables us to study ``what-if'' questions, such as asking how open peering strategies affect networks in terms of topology, traffic flow, and financial health. We continued studying available interdomain traffic matrix (ITM) data, and discovered that we can model the traffic sent by an AS as either a log-normal or Pareto distribution, depending on whether congestion levels. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We also held a successful interdisciplinary Workshop on Internet Economics (WIE) in December 2012 (co-hosted with MIT's Dave Clark), focused on reaching consensus on definitions and data to support a regulatory framework for a converged communications infrastructure.
In early 2012, we undertook a new three-year research effort to study large-scale Internet outages, under an exciting new Transition to Practice area of NSF's Secure and Trustworthy Cyberspace research program. In this project we are applying our successful results in studying the Egypt and Libya censorship-induced outages (our IMC2011 paper) to the development, testing, and deployment of an operational capability to detect, monitor, and characterize future episodes of Internet connectivity disruptions. In early 2012, we published a study that used the UCSD darknet traffic data to analyze other outages caused by geophysical disasters -- the earthquakes in Christchurch and Tohoku in 2011 -- which won an ACM SIGCOMM CCR award for one of the best CCR papers of 2012.
We continued to dedicate resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation's International Research Network Connections (IRNC) program, and the Department of Homeland Security's Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project (http://www.predict.org). The PREDICT funding provides essential support for deployment and operations of our measurement infrastructure, and the collection, curation, and sharing of several unprecedented data sets available to researchers (http://www.caida.org/data/). We are responsive to researcher requests for additional/different Internet data sets, to the extent possible given our resources. We have found an increasing number of disciplines (physicists, sociologists, biologists) interested in our Internet measurement data sets and research results as they apply to other complex network structure, behavior, and evolution.
Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, 16 peer-reviewed papers, 5 technical and workshop reports, 47 presentations, 13 blog entries, 7 animations, and (six) workshops, and a seminar series. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at http://www.caida.org/home/about/progplan/progplan2010/. We will be creating a new 3-year program plan in 2013. Please do not hesitate to send comments or questions to info at caida dot org.
CAIDA's long-term topology research agenda included two strategic areas: 1) macroscopic Internet topology measurements and analysis (in both IPv4 and IPv6 address space (see below in the Exploring the evolution of IPv6 section); and 2) topology modeling.
Our Internet topology research integrates strategic measurement and analysis capabilities has enabled us to provide comprehensive annotated Internet topology maps, as well as a platform capable of Internet infrastructure assessments
- We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the fifth calendar year of the IPv4 Routed /24 Topology Dataset collection. We continued to collect automated DNS reverse lookups for IP addresses discovered by Ark probes and annotated the IPv4 topology data with corresponding DNS names. We also created new IPv4 AS Core Graph visualizations using April 2011 Ark data.
- Completing the work we started in 2011, we published "Internet-Scale IPv4 Alias Resolution with MIDAR". This paper documents our work on resolving observed interfaces into routers (alias resolution) based on similarities in IP ID time series produced by coordinated active probing of different IP addresses.
- Our improved measurement and analysis techniques culminated in collecting, analyzing, processing and releasing an Internet Topology Data Kit (ITDK) synthesizing the IPv4 Routed Topology Dataset and targeted alias resolution measurements conducted in July 2012. The July 2012 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses.
- We studied the impact of Multiprotocol Label Switching (MPLS) widely deployed in the Internet for over a decade on Internet topology measurements. It is possible that some MPLS configurations lead to false router-level links inferences in maps derived from traceroute data. In Revealing MPLS tunnels obscured from traceroute, we introduced a measurement-based classification of MPLS tunnels, identifying tunnels where IP hops are revealed but not explicitly tagged as label switching routers, as well as tunnels that obscure the underlying path. In our data, we found that at least 30% of the paths we test traverse an MPLS tunnel.
- In "Measuring the Evolution of Internet Peering Agreements", we explored the possibility of studying the full connectivity of a small set of ASes (usable monitors) that provide BGP feeds to Routeviews/RIPE collectors. We developed CMON, an algorithm to classify the links of the usable monitors as transit or settlement-free. We then classified the usable monitors as transit providers (large and small), content producers, content consumers and education/research networks. We highlighted key differences in the evolution of connectivity of the usable monitors, and measured transitions between different relationships for the same pair of ASes. We presented this work at International Federation for Information Processing Networking Conference.
- We published a technical report Internet Topology Data Comparison describing the results of our systematic comparison of Internet topologies derived from different data sources and characterizing the Internet at three granularities relevant to both research and operations of network infrastructure: IP address (interface), router, and Autonomous System.
- We launched the new version of our interactive CAIDA AS Rank website. It represents CAIDA's ranking of ASes and organizations inferred from BGP routing data collected by the Route Views Project and RIPE NCC and from WHOIS databases maintained by the Regional and National Internet Registries. In September 2012 we also resumed regular production of the AS-level topologies annotated with business relationships between ASes dataset after completing revisions and improvements of our algorithms inferring these relationships.
- Revealing MPLS tunnels obscured from traceroute, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 2, pp. 87--93, Apr 2012.
- Internet-Scale IPv4 Alias Resolution with MIDAR, IEEE/ACM Transactions on Networking, vol. PP, no. 99, May 2012.
- Measuring the Evolution of Internet Peering Agreements, Technical Report. IFIP Networking 2012, May 2012.
- Internet Topology Data Comparison, Technical Report. May 2012.
- Border Gateway Protocol (BGP) and Traceroute Data Workshop Report, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 3, pp. 28--31, Jul 2012.
- The 4th Workshop on Active Internet Measurements (AIMS-4) Report, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 3, pp. 34--38, Jul 2012.
- We published a report from the Border Gateway Protocol (BGP) and Traceroute Data Workshop we conducted in August 2011.
- We organized and hosted the 4th Active Internet Measurements (AIMS-4) workshop. The workshop report is available.
- kc claffy wrote a blog commentary "Shutting the phone network off while you're running out of internet protocol numbers" in January 2012.
- CAIDA researchers made 10 presentations on topology measurements and analysis results at various venues.
Ongoing data releasesThe following topology datasets were shared:
- The IPv4 Routed /24 Topology Dataset from Ark measurements
- The AS Relationships Dataset
- Daily files of the DNS reverse name lookups for the IPv4 core traceroute data.
- The adjacency matrix of the observed Internet AS-level graph computed daily from Ark measurements
- The Routeviews Prefix to AS mappings Dataset (pfx2as) created on a daily basis starting from 2005-05-09.
- The Macroscopic Internet Topology Data Kit for July (ITDK-2012-07)
- The Macroscopic Internet Topology Data Kit from historical Skitter data from 2002 and 2003.
Three UCSD undergraduate students assisted CAIDA personnel with various tasks for topology measurement and analysis via the Research Experience for Undergraduates (REU) program, including peering data extraction (J. Delaney), curation of traceroute data to remove IXP address space, poster design and layout of the AS Core Graph visualizations (Justin Cheng) and AS Rank website (Jonathan Yuan).
Funding SourcesOur topology measurement and analysis activities received support from:
- NSF grant (CNS-0958547) Internet Laboratory for Empirical Network Science (iLENS) and REU Supplement
- NSF grant (OCI-0963073) IRNC-SP: Sustainable data-handling and analysis methodologies for IRNC networks and REU Supplement
- DHS Science and Technology Directorate contract (N66001-08-C-2029) Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security
- DHS Science and Technology Directorate contract (N66001-12-C-0130) Cartographic Capabilities for Critical Cyberinfrastructure
- a University Research Program gift from Cisco Systems, Inc.
The goal of this research is to derive network models capable of explaining common structural characteristics of large real networks, such as the Internet, social networks, and many other complex networks. In particular, we seek to understand how these characteristics affect the various processes that run on top of these networks, such as routing, information sharing, data distribution, searching, and epidemics. Understanding the mechanisms that shape the structure and drive the evolution of real networks can also have important applications in designing more efficient recommender and collaborative filtering systems, and for predicting missing and future links - an important problem in many disciplines.
- The principle that "popularity is attractive" underlies preferential attachment, which is a common explanation for the emergence of scaling in growing networks. Yet in Popularity versus Similarity in Growing Networks we showed that popularity is just one dimension of attractiveness; another dimension is similarity. We developed a novel framework, Popularity x Similarity Optimization (PSO) model, in which new connections optimize certain trade-offs between popularity and similarity, instead of simply preferring popular nodes. The framework has a geometric interpretation in which popularity preference emerges from local optimization. In contrast to standard preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, predicting the probability of new links with high precision. These results were published in Nature.
- We investigated the question of whether one can map a real network into the hyperbolic plane in a way congruent with the PSO model. We developed a systematic framework called HyperMap that accomplishes this task by replaying the network's geometric growth. In Replaying the Geometric Growth of Complex Networks and Application to the AS Internet, we applied the HyperMap to the Autonomous Systems (AS) topology of the real Internet and showed that it was able to identify communities of ASes that belong to the same geographic region. Moreover, our framework was able to predict missing links with high precision. We presented these results at the Workshop on Mathematical performance Modeling and Analysis (MAMA) and published them in ACM SIGMETRICS Performance Evaluation Review.
- In Network Cosmology, we showed that a casual network representing the large-scale structure of spacetime in our accelerating universe is a power-law graph with strong clustering, similar to many complex networks such as the Internet, social, or biological networks. We conducted simulations making use of the high performance computing resources available at the San Diego Supercomputer Center and demonstrated that this structural similarity is a consequence of the asymptotic equivalence between the large-scale growth dynamics of complex networks and causal networks. Our findings published in Nature Scientific Reports suggest that unexpectedly similar laws govern the dynamics of complex networks and spacetime in the universe.
- Popularity versus Similarity in Growing Networks, Nature, vol. 489, pp. 537--540, Sep 2012.
- Network Cosmology, Nature Scientific Reports, vol. 2, no. 793, Nov 2012.
- Replaying the Geometric Growth of Complex Networks and Application to the AS Internet, ACM SIGMETRICS Performance Evaluation Review, vol. 40, no. 3, pp. 104--106, Dec 2012.
- CAIDA continued hosting the UCSD Complex Network Seminar Different Angles on Network Complexity, Engineering, and Science (DANCES). The seminar brought together researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc.)
- PI D. Krioukov made 10 presentations on complex networks theory and models at various venues.
Graduate student Chiara Orsini from the University of Pisa, Italy, worked at CAIDA for three months in 2012 analyzing building blocks of various network topologies. UCSD undergraduate students assisted CAIDA personnel with various tasks for network topology modeling via the Research Experience for Undergraduates (REU) program. In particular, Justin Cheng worked on illustrations and graphs for the Nature papers, and Jessica Ha helped coordinate the DANCES workshop series and other community outreach.
Funding SourcesOur complex network research received support from:
Our research on the future of the Internet is currently focused on two primary areas: 1) contributing to the NSF-funded Named Data Networking (NDN) network architecture project; and 2) studying the growing usage of the Internet Protocol version 6 (IPv6).
The main goal of this collaborative project is research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer routing directly on content names. By naming data instead of locations, this architecture aims to transition the Internet from its current reliance on "where" (addresses and hosts) to "what" (the content that users and applications care about).
- Co-PI k claffy led the Evaluation and Measurement team activities, while co-PI D. Krioukov participated in Theory and Routing/Forwarding team activities.
- We continued to maintain a local node on the national NDN testbed using the CCNX hub software.
- We host a desktop computer configured with NDN-based video and audio software (provided by UCLA Center for Research in Engineering, Media, and Performance) and participate in team experiments to test instrumented environments, participatory sensing, and media distribution via the NDN infrastructure.
- CAIDA researchers modeled the network growth on the NDN testbed. We assigned to the testbed gateways the hyperbolic coordinates of the ASes obtained in hyperbolic mapping in our paper, Sustaining the Internet with Hyperbolic Mapping paper, and then simulated the network growth by connecting each node to a varying number of hyperbolically closest nodes. We measured the efficiency of greedy forwarding in the resulting networks and found it efficient and resilient with regard to node removals.
- CAIDA team assumed the overall management of NDN project and coordination of activities among 10 participating institutions. We host and maintain the internal NDN project Wiki.
- CAIDA researchers attended the second NDN Project retreat held at Colorado State University. kc claffy blogged about this meeting in "The 2nd NDN Project Retreat".
- CAIDA hosted and participated in the third NDN Project retreat at the UC San Diego campus.
- CAIDA researchers participated in two NSF Future Internet Architecture Program Meeting and contributed to discussions of the four funded projects and the security features inherent to each of the architectures.
This research was supported by the NSF grant (CNS-1039646) Named Data Networking.
CAIDA aims to measure the evolution of IPv6 in three dimensions: topology, traffic, and performance. Our goal is to uncover characteristics of current IPv6 deployment that can be used to infer how to advance IPv6 deployment, either via technical capability or policy development.
- We completed the fourth full calendar year of the IPv6 Topology Dataset collection and created new IPv6 AS Core Graph visualizations using April 2011 Ark data.
- We conducted the worldwide IPv6 Network Operator Survey to parameterize our future IPv6 modeling work.
- We studied the deployment of IPv6 at the Autonomous System (AS) level using historical BGP data and recent active measurements, and compared the properties of the IPv6 topology with those of the IPv4 topology. In "Measuring the Deployment of IPv6: Topology, Routing and Performance", we discuss observed trends in global IPv6 adoption. While most core Internet transit providers have deployed IPv6, edge networks are lagging. IPv6 deployment is stronger in Europe and the Asia-Pacific region, than in North America. The IPv6 topology is characterized by a single dominant player, Hurricane Electric, which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. We found that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ. We presented these results at the Internet Measurement Conference (IMC).
- We began to work on large-scale IPv6 alias resolution. We developed a method to overcome the challenges posed by the enormous address space and are currently in the process of validating it. As part of this activity, we began to explore a fingerprint-based technique for IPv6 alias resolution based on inducing fragmentation by routers. We demonstrated perfect alias resolution accuracy in a controlled environment, and on a small subset of the production IPv6 Internet for which we have ground truth. We plan to continue refining this technique to achieve large-scale Internet-wide IPv6 alias resolution.
- We worked on improving our IPv6 probing strategies. We implemented: 1) a set of scripts to perform parallel processing of IPv6 trace files; 2) computation of some simple statistics on those traces; 3) code to calculate various statistics on the execution times to study the possible gain due to parallelization when analyzing all IPv6 files; and 4) code to compute a histogram showing the number of responses per hop (which determines the number of resulting IP/AS links). We also developed an algorithm for finding the fully explored prefix bits, e.g., to determine which leading bits of subnets have been fully enumerated by a given set of target addresses.
- Measuring the Deployment of IPv6: Topology, Routing and Performance, Internet Measurement Conference, pp. 537--550, Nov 2012.
- We organized and hosted the 4th Active Internet Measurements (AIMS-4) workshop. The agenda included a session devoted to IPv6 research issues. The workshop report is available.
- CAIDA researchers made 5 presentations on IPv6 research at different venues.
- Matthew Luckie wrote a blog commentary, IPv6: What could be (but isn't yet) (June 2012).
Ongoing data releasesThe following topology datasets are available:
- The IPv6 Topology Dataset from Ark measurements
Our IPv6 research received support from:
The high-level goal of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. We aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow.
- We developed a holistic cost model that operators can use to help evaluate the costs of various routing and peering decisions and for other network operations problems. In Towards a Cost Model for Network Traffic, we used real traffic data from a large carrier network and our model to show how network operators can significantly reduce the cost of carrying traffic in their networks by adjusting the routing for just a small fraction of total flows (and total traffic volume). These results are published in ACM SIGCOMM CCR.
- We developed GENESIS, a computational model of interdomain network formation that captures key factors influencing the network formation dynamics: highly skewed traffic matrix, policy-based routing, geographic co-location constraints, and the costs of transit/peering agreements. In GENESIS: An Agent-based Model of Interdomain Network Formation, Traffic Flow and Economics, we described this model and applied it to the "what if" question asking how the openness towards peering affects the resulting network in terms of topology, traffic flow and economics. We presented these results at INFOCOM.
- We measured the statistical properties of the interdomain traffic matrix (ITM). Our study Towards a Statistical Characterization of the Interdomain Traffic Matrix revealed a sparse ITM and that we can model the traffic sent by an AS as either the log-normal or Pareto distribution, depending on whether the corresponding traffic experiences congestion. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We presented these results at the International Federation for Information Processing (IFIP) Networking Conference.
- We regularly responded to requests from government agencies and policymaking bodies for comments and positions that inform policy with the best available empirical data. kc claffy served on two ICANN advisory committees, RSSAC and SSAC, and continued on in her third year as a member of the FCC Technical Advisory Committee (TAC).
- Towards a Cost Model for Network Traffic, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 1, pp. 54--60, Jan 2012.
- GENESIS: An Agent-based Model of Interdomain Network Formation, Traffic Flow and Economics, IEEE Conference on Computer Communications (INFOCOM) 2012, pp. 1197--1205, Mar 2012.
- Towards a Statistical Characterization of the Interdomain Traffic Matrix, Networking 2012, pp. 111--123, May 2012.
- Workshop on Internet Economics (WIE2011) Report, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 2, pp. 110--114, Apr 2012.
- CAIDA and MIT co-organized the third interdisciplinary Workshop on Internet Economics (WIE): Definitions and Data hosted at UCSD. kc claffy blogged about it in Third Workshop on Internet Economics (WIE2012). The report is available.
- CAIDA researchers made 5 presentations about Internet economics and policies at various venues.
- kc claffy posted a blog commentary about the previous 2nd Workshop on Internet Economics WIE2011.
- kc claffy posted a short DNS related commentary on the CAIDA blog NASA's recent DNSSEC snafu and the checklist.
- Amogh Dhamdhere blogged about Twelve Years in the Evolution of the Internet Ecosystem.
Three UCSD undergraduate students contributed to CAIDA research in economics of the Internet via the Research Experience for Undergraduates (REU) program. Carlos Garibay collected and correlated economic data for ISPs, including income and revenue, number of end-user subscribers, and number of AS-level customers. Andre Gatorano worked on a program to crawl BitTorrent trackers and collect IP addresses of file-sharing clients, and mapped them to Autonomous Systems (ASes) to estimate the size of ASes in terms of the number of end users. Jonathan Yuan assisted the CAIDA webmaster with the preparation of web documents and infrastructure for the WIE workshop, as well as development on the AS Rank website illustrating the interconnections between Autonomous Systems (ASes) and organizations in the Internet.
Our economics research received support from:
- NSF grant (CNS-1017064) NetSE-Econ: The economics of transit and peering interconnections in the Internet and the REU Supplement
- a University Research Program gift from Cisco Systems, Inc.
Our goal is to develop new methods of analysis and aggregation of Internet measurement data from multiple available sources in order to shed light on various Internet security related events, including global connectivity disruptions due to political or catastrophic causes. Our methodology and findings can form the basis for automated early-warning detection systems for large-scale Internet outages.
- We demonstrated how unsolicited one-way Internet traffic (also called Internet background radiation -- IBR) can be used to analyze macroscopic Internet events that are unrelated to malware. In Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the Internet, we examined two phenomena: country-level censorship of Internet communications and natural disasters (earthquakes). We introduced a new metric of local IBR activity based on the number of unique IP addresses per hour contributing to IBR. The advantage of this metric is that it is not affected by bursts of traffic from a few hosts. Although we have only scratched the surface, we are convinced that IBR traffic is an important ingredient for comprehensive monitoring, analysis, and possibly even detection of events unrelated to the IBR itself. When monitoring the impact of events such as natural disasters on network infrastructure, IBR reveals a view of events that is complementary to many existing measurement platforms based on BGP control-plane views or targeted active probing. These findings were published in ACM SIGCOMM CCR. J. Polterock posted a blog commentary Internet Censorship Revealed Through the Haze of Malware Pollution about this publication.
- Traffic classification technology has increased in relevance this decade, as it is now used in the definition and implementation of mechanisms for service differentiation, network design and engineering, security, accounting, advertising, and research. While traffic classification techniques are improving in accuracy and efficiency, the continued proliferation of different Internet application behaviors, in addition to growing incentives to disguise some applications to avoid filtering or blocking, are among the reasons that traffic classification remains one of many open problems in Internet research. In Issues and future directions in traffic classification, we reviewed recent achievements and discussed future directions in traffic classification, along with their trade-offs in applicability, reliability, and privacy. We outlined the persistently unsolved challenges in the field over the last decade, and suggested several strategies for tackling these challenges to promote progress in the science of Internet traffic classification. Our findings are published in IEEE Network.
- While analyzing unsolicited traffic reaching the UCSD Network Telescope, we serendipitously discovered a sophisticated botnet scanning event that covertly scanned the entire IPv4 space in about 12 days in February 2011. We carefully studied this event, including validating and crosscorrelating our observations with other large data set shared by other researchers. We discovered that the scan, conducted by the Sality botnet (one of the largest botnets ever identified by researchers) originated from approximately 3 million distinct IP addresses and employed a heavily coordinated and unusually covert scanning strategy to try to discover and compromise VoIP-related (SIP server) infrastructure. The revealed botnet behavior represents ominous advances in the evolution of modern malware: the use of more sophisticated stealth scanning strategies by millions of coordinated bots. We presented the measurement and analysis results at the Internet Measurement Conference (IMC).
- The fight against malware will benefit greatly (and perhaps require) collaborative sharing of diverse large-scale security-related data sets. In Analysis of Internet-wide Probing using Darknets, we discuss both the technical and the data-sharing policy aspects of this challenge. This discussion was part of the BADGERS workshop.
- We developed visualizations of large-scale Internet events, such as a large region losing connectivity, or a stealth probe of the entire IPv4 address space, by applying a well-known technique in information visualization -- multiple coordinated views -- to Internet-specific data. We animated these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. This capability to simultaneously view multiple dimensions of events enables greater insight into their properties. These results were presented at the Workshop on Internet Visualization and submitted for publication in Computing.
- Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the Internet, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 1, pp. 31--39, Jan 2012.
- Issues and future directions in traffic classification, IEEE Network, vol. 26, no. 1, pp. 35--40, Jan 2012.
- Analysis of Internet-wide Probing using Darknets, Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Oct 2012.
- Analysis of a "/0" Stealth Scan from a Botnet, Internet Measurement Conference (IMC), Nov 2012.
- A Coordinated View of the Temporal Evolution of Large-scale Internet Events, Proceedings of the Workshop on Internet Visualization (WIV), Nov 2012. Accepted for publishing in Computing, Jan 2013.
- CAIDA researchers made more than 10 presentations on various aspects of this research at different venues.
- kc claffy and A. Dainotti attended the NSF SaTC PI meeting and presented a poster Detection and Analysis of Large-scale Internet Infrastructure Outages. A. Dainotti blogged about this meeting in CAIDA at the NSF Secure and Trustworthy Cyberspace (SaTC) Principal Investigators' Meeting.
- K. Benson attended the Student Workshop at ACM CoNEXT and presented a poster Gaining Insight into AS-level Outages through Analysis of Internet Background Radiation.
- Alistair King posted a blog entry Syria disappears from the Internet (December 2012).
Karen Benson, a UCSD graduate student, received training in analysis of large-scale Internet outage events in the course of her thesis work.
Funding SourcesOur support for security and stability research comes from:
- NSF award, (NSF CNS-1228994) Detection and analysis of large-scale Internet infrastructure outages
- NSF award, (NSF CNS 1059439) CRI-Telescope: A Real-time Lens into Dark Address Space of the Internet
- DHS contract, (DHS D07PC75579) Supporting Research and Development of Security Technologies through Network and Security Data Collection
Archipelago (Ark) is CAIDA's active measurement infrastructure. It aims to enable large-scale Internet measurements, while reducing the effort needed to develop, deploy and conduct sophisticated experiments. Ark represents a step toward a community-oriented measurement infrastructure as it allows CAIDA collaborators to run their vetted measurement tasks on a security-hardened distributed platform. The effort has three tasks: (1) adding new monitors in geographic and topological areas lacking coverage; (2) improving tools for processing topology data; (3) enhancing and developing software modules to support new experiments and validation. By lowering the cost to implement scientific Internet measurement experiments, Ark allows researchers to test and evaluate more ambitious, sophisticated and risky ideas. The resulting data enables a wide range of network modeling, simulation, analysis, and theoretical research activities, including historical Internet studies and evaluation of proposed future Internet architectures.
- By the end of 2012, we increased the number of vantage points to 64 Ark monitors deployed in 32 countries. We deployed/replaced 17 monitors in 2012, consisting of 11 1U servers we shipped out (10 to replace broken/obsolete servers), 4 1U servers provided by the sites, and 2 new Raspberry Pi monitors.
- We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2012, Ark had 28 monitors collecting the data on the emerging IPv6 global topology.
- In December 2012, we deployed the first Raspberry Pi-based Ark monitor in Limerick, Ireland. Although tiny, a Raspberry Pi offers a flexible Linux-powered programmable platform that will allow us to scale up the Ark infrastructure.
- We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. We released arkutil, a RubyGem containing various utility classes used by the Ark measurement infrastructure and the MIDAR alias resolution system.
- We continued support for the spoofer experiment (collaboration with Robert Beverly, NPS).
- CAIDA researchers published 4 papers and non-CAIDA researchers published 17 papers that used Ark data.
- We maintain a mailing list of researchers using Ark data and regularly email them with updates and important news about the data.
Jeffrey Syang, a UCSD undergraduate student, worked as a System Administrator Assistant on various Ark-related tasks as part of the Research Experience for Undergraduates (REU) program.
Funding SourcesArk infrastructure receives support from:
- NSF grant (CNS-0958547) Internet Laboratory for Empirical Network Science (iLENS) and the REU Supplement
- NSF grant (OCI-0963073) IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks and the REU Supplement
- DHS Science and Technology Directorate contract (N66001-08-C-2029) Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security
We develop and maintain a passive data collection system known as the UCSD Network Telescope to study security related events on the Internet by monitoring and analyzing unsolicited traffic arriving to a globally routed underutilized /8 network. Network telescopes, which observe unsolicited Internet traffic sent to unassigned address space, are one of the few types of instrumentation that allow global visibility into a wide range of security-related events. In order to maximize the research utility of these data, we are working to enable near-real-time data access to vetted researchers. This innovative shift in network monitoring addresses several pervasive challenges in network traffic research: collection and storage, efficient curation, and privacy-protected sharing of large volumes of data.
- Making vast improvements in our software infrastructure for capture, processing, management, analysis, visualization and reporting on data collected with the UCSD Network Telescope, we developed and released Corsaro, a software suite for performing large-scale analysis of trace data. Although specifically designed for use with passive traces captured by darknets, this software can be used with any type of passive trace data.
- We released iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly defined subsets: 14 source types and 10 inter-arrival-time based groups. In One-way Traffic Monitoring with iatmon we described how we used this tool to observe changes in one-way traffic at the UCSD Network Telescope over the first half of 2011. These findings were presented at PAM 2012,
- One-way Traffic Monitoring with iatmon, Proceedings of the Passive and Active Network Measurement Workshop (PAM), Mar 2012.
- DUST 2012 Workshop Report, ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 5, pp. 49--53, Oct 2012.
- CAIDA organized and hosted the first international Workshop on Darkspace and UnSolicited Traffic Analysis (DUST 2012) providing a forum for discussion of the science, engineering, and policy challenges associated with darkspace and unsolicited traffic analysis. The report is available.
- Josh Polterock posted a blog commentary Targeted Serendipity: the Search for Storage describing how, with the help of NERSC facilities, we were able to preserve our historic darknet data (April 2012).
Tanja Zseby, a visiting scholar from Fraunhofer Institute for Open Communication Systems (FOKUS), Berlin, Germany, worked on darkspace traffic analysis and created educational data kits from the telescope data.
Three UCSD undergraduate students assisted CAIDA personnel with various tasks for network telescope via the Research Experience for Undergraduates (REU) program. Sarah Larsen and Jeffrey Sang assisted the CAIDA System Administrator with various telescope infrastructure development tasks. Florence Yu assisted with community outreach and logistics for telescope infrastructure development tasks.
Our Network Telescope received support from:
- NSF grant (CNS-1059439) CRI-Telescope: A Real-time Lens into Dark Address Space of the Internet and the REU Supplement
- DHS contract (DHS D07PC75579) Supporting Research and Development of Security Technologies through Network and Security Data Collection
- DHS Science and Technology cooperative agreement (DHS FA8750-12-2-0326) Supporting Research and Development of Security Technologies through Network and Security Data Collection
The goal of the Department of Homeland Security project Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) is to provide vetted researchers with current network operational data in a secure and controlled manner that respects the security, privacy, legal, and economic concerns of Internet users and network operators. CAIDA supports PREDICT goals as Data Provider and Data Host and also plays an advisory role in developing technical, legal, and practical aspects of PREDICT policies and procedures.
We collected, hosted, and provided current Internet Topology data to PREDICT:
- Internet Topology measured from Ark Platform (IPv4 Routed /24 Topology, IPv4 Routed /24 DNS Names, IPv6 Topology)
- Internet Topology Data Kits (ITDK)
We collected, hosted, and provided current Blackhole Address Space data to
- the UCSD near-real-time Network telescope Data
We hosted and provided legacy data sets to PREDICT:
- Internet Topology Measurements with skitter
- OC48 Peering Point IP Packet headers
- We completed and submitted to DHS the required deliverables: Project Management Plan, Hosting Infrastructure Description, monthly Financial Status reports, and the first quarterly Technical Report (for the 4th quarter of 2012).
- We received eight user requests via the PREDICT portal during 2012 and granted access to three of them.
- We completed the CAIDA Anonymized 2012 Internet Traces Dataset that contains traffic traces from our two monitors deployed on high-speed backbone links.
- We continued revisions of a government document proposing a framework for ethical guidelines in computer and information security research, based on the principles set forth in the 1979 Belmont Report, a seminal guide for ethical research in the biomedical and behavioral sciences. In our Menlo Report and its companion document, we described how Information and Communication Technology (ICT) research raises new challenges resulting from interactions between humans and communications technologies. We showed that a reinterpretation of ethical principles formulated in the Belmont Report (Respect for Persons, Beneficence, and Justice) and an additional principle, Respect for Law and Public Interest, can lay the groundwork for ethically defensible ICT research.
- We wrote and published a commentary, The Menlo Report", describing the primary challenges faced by the authors of the actual report, summarizing the report content, and suggesting the next steps we should take as a community. This commentary was published in IEEE Security & Privacy. Josh Polterock blogged about the Menlo report in The Menlo Report and its Companion bring ethical guidelines to ITC research.
- We had previously developed an ethical impact assessment (EIA) tool that provided a set of guiding questions to help researchers understand how to apply ethical principles and applications while conducting trusted and sustainable research on ICT. In A Refined Ethical Impact Assessment Tool and a Case Study of its Application, we discussed the various challenges encountered in applying this EIA, presented a new improved EIA framework representing our evolved understanding of the relevant ethical issues, and retrospectively applied the new EIA to an ethically challenging, original study in ICT research. These results were discussed at the Workshop on Ethics in Computer Security Research (WECSR 2012).
- A Refined Ethical Impact Assessment Tool and a Case Study of its Application, Workshop on Ethics in Computer Security Research (WECSR), Mar 2012.
- The Menlo Report, IEEE Security & Privacy, vol. 10, no. 2, pp. 71--75, Mar 2012.
- We participated in three PREDICT PI meetings (in January, May, and November) and contributed to developing PREDICT policies, data sharing, and marketing efforts.
- CAIDA researchers attended and made presentations at the Security at the Cyber Border: Exploring Cybersecurity for International Research Network Connections workshop.
- We co-organized and hosted the 5th CAIDA-WIDE-CASFI Joint Measurement Workshop. The two-day workshop covered miscellaneous research, technical, and data-sharing topics of mutual interest for CAIDA (USA), WIDE (Japan), and CASFI (South Korea) researchers.
Support for this work comes from:
- DHS Science and Technology cooperative agreement (DHS FA8750-12-2-0326) Supporting Research and Development of Security Technologies through Network and Security Data Collection
- DHS contract, (DHS D07PC75579) Supporting Research and Development of Security Technologies through Network and Security Data Collection
We continued development and refinement of an Internet Measurement Data Catalog (IMDC, or DatCat) -- an index of information (metadata) about data sets and their availability under various usage policies. The goal of this project is to confront a significant challenge in network science: reducing the cost of searching for data by organizing metadata about accessible Internet data sets into a single repository. In particular, the current software development aims to support a range of measurable benefits to cyberinfrastructure research: maximizing the re-use of existing Internet data; decreasing the time spent collecting redundant data; reducing the effort needed to start a new study; promoting validation and reproducibility of analyses and results; enabling longitudinal and cross-disciplinary studies of the Internet; and opening up new cross-domain areas of advanced networking research.
- We migrated a backend database from a proprietary software (Oracle) to a completely open source solution.
- We updated and reorganized the internal table structure.
- We brought the new, streamlined DatCat catalog back online for public use.
- We simplified the web-based submission system.
- We implemented standalone publications/collections in DatCat.
- We continued to develop the public forums interface integrated with the IMDC to hold discussion of data sharing issues and to answer frequently asked questions regarding the IMDC and the information it contains.
Two REU-funded undergraduates assisted CAIDA personnel with DatCat research. Jesse Weinstein (UCSD) helped develop the web forum prototype and worked with converting the Oracle to an open source database. Florence Yu assisted with coordination of the AIMS5 workshop where we presented DatCat.
Our DatCat activities are supported by an NSF grant (OCI-1127500) SDCI-DatCat: Metadata Management Software Tools to Support Cybersecurity Research and Development of Sustainable Cyberinfrastructure.
NSF International Research Network Connections Program (IRNC) has funded five ProNet (production network) projects to provide network connections linking U.S. research networks with peer networks in other parts of the world and five Special Projects that primarily address development, measurement, and monitoring of operational networks. The goal of our IRNC Special Project is to support the IRNC community measurement efforts by fostering and leading discussion of how to best make IRNC data and statistics available, and by adapting CAIDA measurement technologies for IRNC community needs.
We continued to extend our Archipelago measurement infrastructure
to monitor IRNC sites.
- We deployed an Ark monitor at Qcell (Serrekunda, Gambia) using the contacts with two network engineers provided by the ProNet PI Steve Huter.
- We deployed an Ark monitor at Rede ANSP (Sao Paulo, Brazil) following a contact provided by the ProNet PI Julio Ibarra.
- We deployed an Ark monitor at AARnet (Perth, Australia) using the contacts provided by the ProNet PI David Lassner.
- We continued our work with IRNC ProNet PI Julio Ibarra at AMPATH to provide advice and assistance with specifying, purchase, configuration, and deployment of a passive monitor running CAIDA's Coralreef software suite to report on the 10GE link between AMPATH and Sao Paolo, Brazil (ANSP).
- We maintained an IRNC Wiki page serving as a collection point for IRNC related activities.
- CAIDA hosted a visit of several members of the Network Startup Resource Center (NSRC) including Dale Smith, Steve Huter, Phil Regnauld, and Hervey Allen. They asked for a one-page brochure they could use to help make a case to the other sites they visit for hosting an Ark monitor. In response to this request, we produced a brochure Why should my network host an Ark node?.
- CAIDA researchers attended and made presentations at the Security at the Cyber Border Workshop. PI k claffy and J. Polterock provided feedback to the Workshop report that describes the community's attempt to recognize and articulate technical and policy cybersecurity considerations related to international research network connections, as well as capture opportunities and challenges for those connections to foster cybersecurity research.
Two UCSD undergraduate students helped CAIDA personnel support the IRNC project. Sarah Larson assisted with system administration and maintenance, and Jessica Ha assisted with community outreach and logistics for our DUST workshop.
Support for CAIDA IRNC measurement activities comes from and NSF grant (OCI-0963073) IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks.
CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.
2012 Tool Development
arkutil is a RubyGem containing various utility classes used by the Archipelago (Ark) measurement infrastructure and the MIDAR alias-resolution system. The latest version of arkutil (v0.12.1) was released on June 23, 2012.
Corsaro is a software suite for performing large-scale analysis of trace data. It was specifically designed to be used with passive traces captured by darknets, but the overall structure is generic enough to be used with any type of passive trace data. Corsaro v1.0.1 was released on October 19, 2012.
During the last decade, unsolicited one-way Internet traffic has been used to study malicious activity on the Internet. To make changes in composition of one-way traffic aggregates more detectable, we have developed iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly-defined subsets. iatmon is a monitor that reads network trace data from a file, or a live interface, using the WAND group's libtrace library. It builds a hash table of source addresses for one-way traffic, and writes summary files describing the one-way sources.
In 2012, iatmon was publicly released. The paper One-way Traffic Monitoring with iatmon, which was presented at PAM 2012, introduces the iatmon tool in depth.
MIDAR stands for Monotonic ID-Based Alias Resolution, is a tool developed by CAIDA that builds on recent work in alias resolution using IP-ID time stamps to scale related techniques to the size of large-scale Internet topologies (millions of nodes) with greater precision and sensitivity. MIDAR, our Monotonic ID-Based Alias Resolution tool, provides an extremely precise ID comparison test based on monotonicity rather than proximity. MIDAR integrates multiple probing methods, multiple vantage points, and a novel sliding-window probe scheduling algorithm to increase scalability to millions of IP addresses. Experiments show that MIDAR's approach is effective at minimizing the false positive rate sufficiently to achieve a high positive predictive value at Internet scale.
In 2012, we released the medium-scale and large-scale MIDAR alias resolution systems. These systems deliver capability to conduct alias resolution on medium size (<40,000 IP addresses) and Internet-scale (at least 2 million) sets of IP addresses with the MIDAR IP ID test, all MIDAR stages and probe methods, from either a single or multiple hosts.
CAIDA Tools Download Report
The table below displays all CAIDA developed and currently supported tools distributed via our home page at http://www.caida.org/tools/ and the number of downloads of each version during 2012.
Tool Description Downloads arkutil RubyGem containing utility classes used by the Archipelago measurement infrastructure and the MIDAR alias-resolution system. 144 Autofocus Internet traffic reports and time-series graphs. 362 Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 210 CoralReef Measures and analyzes passive Internet traffic monitor data. 580 Corsaro Extensible framework for large-scale analysis of passive trace data. 87 Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 136 dnsstat DNS traffic measurement utility. 271 iatmon Ruby+C+libtrace analysis module that separates one-way traffic into clearly-defined subsets. 92 iffinder Discovers IP interfaces belonging to the same router. 362 libsea Scalable graph file format and graph library. 250 kapar Graph-based IP alias resolution. 138 MIDAR Identifies IPv4 addresses belonging to the same router (aliases) using shared monotonic IP ID counters. 278 Motu Dealiases pairs of IPv4 addresses. 81 mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 294 otter Visualizes arbitrary network data. 795 plot-latlong Plots points on geographic maps. 202 plotpaths Displays forward traceroute path data. 73 rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 933 RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 396 sk_analysis_dump A tool for analysis of traceroute-like topology data. 82 topostats Computes various statistics on network topologies. 230 Walrus Visualizes large graphs in three-dimensional space. 1659
Data Collected in 2012
In 2012, CAIDA captured the following raw data:
- traceroutes probing IPv4 and IPv6 address space collected by the Archipelago infrastructure,
- passive traffic traces from the equinix-chicago and equinix-sanjose monitors connected to Tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA,
- passive traffic traces from the UCSD Network Telescope
- Anonymized High-speed Internet Traces 2012,
- Anonymized Internet Traces on World IPv6 Day and World IPv6 Launch Day,
- Macroscopic Internet Topology Data Kit (ITDK): ITDK-2012-07,
- AS adjacencies (AS links) for IPv4 Routed /24 Topology Dataset
- DNS Names for IPv4 Routed /24 Topology Dataset,
- Telescope Sipscan Data Supplement
The table below lists the amount of data collected in our ongoing data collection operations.
Data Type First date Last date Total size1 Macroscopic Topology Measurements, IPv4 (Archipelago) 2012-01-01 2012-12-31 668.6 GiB (2.1 TiB) Macroscopic Topology Measurements, IPv6 (Archipelago) 2012-01-01 2012-12-31 4.3 GiB (14.5 GiB) Internet backbone Traces 2012-01-20 2012-12-22 1.5 TiB (3.5 TiB)3 "Live" Network Telescope Data 2012-01-01 2012-12-31 43.9 TiB (87.7 TiB)2,4 DNS Names for IPv4 Routed /24 Topology Dataset 2012-01-01 2012-12-31 9.0 GiB (35 GiB) AS Links for IPv4 Routed /24 Topology Dataset 2012-01-01 2012-12-31 174.5 MiB (708.6 MiB) Macroscopic Internet Topology Data Kit (ITDK) 2012-07-07 2012-07-22 1.3 GiB (24.0 GiB) DNS root/gTLD RTT Dataset 2012-03-16 2012-12-31 10.1 MiB 1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
3This includes traces on April 18 during DITL 2012, and traces on 6 June 2012 (IPv6 Launch)
4This includes 126 GB of data collected during DITL 2012 and 126 GB on IPv6 Launch.
Datasets Distributed in 2012
CAIDA makes some datasets publicly available without restrictions to the user, while access to other datasets is restricted to academic researchers, CAIDA members, and government contractors with data access subject to certain safeguards designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.
Publicly Available Data
These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.
Dataset Unique visitors (IPs) Data Downloaded AS Rank 1568 852.5 MiB AS Links (AS Adjacencies) 556 15.6 GiB AS Relationships 469 1.9 GiB Router Adjacencies 212 460.9MiB AS Taxonomy 145 66.3 MiB * Witty Worm Dataset 431 546.9 MiB Code-Red Worms Dataset 94 6.3 GiB Telescope Sipscan Data Supplement 25 4.2 GiB We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers. * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
Restricted Access Data
These datasets require that users:
- be academic or government researchers, or join CAIDA;
- request an account and provide a brief description of their intended use of the data; and
- agree to an Acceptable Use Policy.
Dataset Unique visitors (usernames) Data Downloaded * Anonymized Internet Backbone Traces 237 17.3TiB Backscatter Datasets 24 1.2TiB (Raw Topology Traces from Archipelago infrastructure) 52 2.6TiB Raw Topology Traces (skitter) 19 916.6GiB DNS Names for IPv4 Routed /24 Topology Dataset 29 45.5GiB Macroscopic Internet Topology Data Kit 75 32.1GiB Witty Worm Dataset 15 190.4 GiB DNS Root/gTLD server RTT Dataset 7 107.8MiB DDoS Attack Dataset 71 202.6GiB Telescope Datasets 134 444.2GiB * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers.
Restricted Access Data Requests
The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.
We received about 75 more requests in 2012 then in 2011, and approved 46 more requests for access to restricted datasets. About 83.59% of the users that are granted access actually access our webservers to download data.
Dataset Number of requests received Number of users granted access Number of users that accessed data Anonymized Backbone and Peering Link Traces 353 267 229 Active Topology Trace Datasets 138 110 88 Backscatter Datasets 34 26 18 Witty Worm Dataset 18 14 11 DNS Root/gTLD server RTT Dataset 8 7 6 DDoS Attack Dataset 112 75 67 Telescope Datasets 40 25 19 Totals 703 524 438
As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.
On February 8-10, 2012, CAIDA hosted the 4th workshop on Active Internet Measurements (AIMS-4) supporting science and policy. This workshop continues the series of Internet Statistics and Metrics Analysis (ISMA) workshops that are held to discuss the current and future state of Internet measurement and analysis.
On May 14-15, 2012, CAIDA hosted the 1st International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST 2012). The goal of the DUST workshop series is to bring together researchers, operators, and analysts interested in unsolicited traffic analysis, especially traffic destined to unassigned (dark) IP address space.
On August 1-2, 2012, the 5th CAIDA-WIDE-CASFI Joint Measurement Workshop was held in La Jolla, CA. This workshop continues a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and data sharing across international boundaries. The workshop covered miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants and brought various groups together to share their latest research.
On October 11-12, 2012, CAIDA hosted the 3rd NDN Project Retreat, which brought together participants from the collaborative Named Data Networking project to discuss the research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer that routes directly on content names.
ISC and CAIDA hosted the ISC/CAIDA Data Collaboration Workshop on Oct 22, 2012 in Baltimore, MD, co-located with the MAAWG 26th general meeting. The ISC/CAIDA Data Collaboration workshop is a venue for showcasing novel case studies of network and security data analysis and data sharing, discussing data synthesis techniques and technologies, networking between data providers and recipients in research and operations. SIE data contributors can hear and discuss how their shared data is providing value, and attendees can learn how open-source SIE technology can be incorporated into collaborative research data collection and sharing efforts.
On December 12-13, 2012, CAIDA and Georgia Tech hosted its third interdisciplinary Workshop on Internet Economics (WIE). The goal of this workshop series is to provide a forum for researchers, commercial Internet facilities and service providers, technologists, economists, theorists, policy makers, and other stakeholders to empirically inform emerging regulatory and policy debates.
UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES was to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar fostered communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provided young researchers a forum to practice their presentation and communication skills. The seminars continued in 2012 to bring in attendees from a diversity of disciplines.
The following table contains the papers published by CAIDA for the calendar year of 2012. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.
||Replaying the Geometric Growth of Complex Networks and Application to the AS Internet||ACM SIGMETRICS Performance Evaluation Review|
||Analysis of a "/0" Stealth Scan from a Botnet||Internet Measurement Conference (IMC)|
||Network Cosmology||Nature Scientific Reports|
||Measuring the Deployment of IPv6: Topology, Routing and Performance||Internet Measurement Conference (IMC)|
||Analysis of Internet-wide Probing using Darknets||Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS)|
||DUST 2012 Workshop Report||ACM SIGCOMM Computer Communication Review (CCR)|
||Popularity versus Similarity in Growing Networks||Nature|
||Named Data Networking (NDN) Project 2011 - 2012 Annual Report||Named Data Networking (NDN)|
||Peering Strategy Adoption by Transit Providers in the Internet: A Game Theoretic Approach||ACM SIGMETRICS Performance Evaluation Review|
||The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research||U.S. Department of Homeland Security|
||The 4th Workshop on Active Internet Measurements (AIMS-4) Report||ACM SIGCOMM Computer Communication Review (CCR)|
||Border Gateway Protocol (BGP) and Traceroute Data Workshop Report||ACM SIGCOMM Computer Communication Review (CCR)|
||Measuring the Evolution of Internet Peering Agreements||IFIP Networking|
||Internet Topology Data Comparison||Cooperative Association for Internet Data Analysis (CAIDA)|
||Towards a Statistical Characterization of the Interdomain Traffic Matrix||IFIP Networking|
||Revealing MPLS tunnels obscured from traceroute||ACM SIGCOMM Computer Communication Review (CCR)|
||Workshop on Internet Economics (WIE2011) Report||ACM SIGCOMM Computer Communication Review (CCR)|
||One-way Traffic Monitoring with iatmon||Passive and Active Network Measurement Workshop (PAM)|
||The Menlo Report||IEEE Security & Privacy|
||Analysis of peering strategy adoption by transit providers in the Internet||NetEcon|
||GENESIS: An Agent-based Model of Interdomain Network Formation, Traffic Flow and Economics||IEEE Conference on Computer Communications (INFOCOM)|
||A Refined Ethical Impact Assessment Tool and a Case Study of its Application||Workshop on Ethics in Computer Security Research (WECSR)|
||Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the Internet||ACM SIGCOMM Computer Communication Review (CCR)|
||Issues and future directions in traffic classification||IEEE Network|
||Towards a Cost Model for Network Traffic||ACM SIGCOMM Computer Communication Review (CCR)|
The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2012. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.
In 2012, CAIDA's web site continued to attract considerable attention from a broad, international audience.
The graph and table below present the monthly history of traffic to www.caida.org for 2012. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.
|Month||Unique visitors||Number of visits||Pages||Hits||Bandwidth|
|Jan 2012||28,626||52,604||193,506||767,633||38.33 GB|
|Feb 2012||30,785||56,297||186,165||867,278||47.81 GB|
|Mar 2012||31,257||59,346||211,540||935,852||42.68 GB|
|Apr 2012||40,818||68,010||220,749||902,318||37.92 GB|
|May 2012||28,216||52,621||216,732||815,110||46.20 GB|
|Jun 2012||24,943||47,176||165,475||615,695||31.27 GB|
|Jul 2012||23,545||47,244||235,320||651,363||33.22 GB|
|Aug 2012||23,651||46,284||166,655||575,877||34.01 GB|
|Sep 2012||26,427||48,695||176,948||719,467||30.67 GB|
|Oct 2012||37,544||63,471||222,794||950,639||47.41 GB|
|Nov 2012||31,207||54,595||209,570||800,392||36.95 GB|
|Dec 2012||27,742||50,117||175,742||633,596||29.29 GB|
CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2012. The image below shows the functional organization of CAIDA. Please check the home page for more complete information about CAIDA staff.
CAIDA Functional Organization Chart
CAIDA thanks our 2012 sponsors, members, and collaborators.
The charts below depict funds received by CAIDA during the 2012 calendar year.
|Funding Source||Allocations||Percentage of Total|
|Gift & Members||250,760||6%|
Figure 1. Allocations by funding source received during 2012.
The charts below depict CAIDA's Annual Expense Report for the 2012 calendar year.
|LABOR||Salaries and benefits paid to staff and students|
|IDC||Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).|
|ENTERTAINMENT||Hosting official collaborators, visitors, and guests|
|SUPPLIES & EXPENSES||Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.|
|WORKSHOP SUPPORT||Conference facilities, catering, guest travel support, and resources for workshops, conferences, PI meetings, and operational meetings.|
|CAIDA TRAVEL||CAIDA employee trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.|
|EQUIPMENT||Computer hardware or other equipment costing more than $5000.|
|TRANSFERS||Exchange of funds between groups for recharge for IT desktop support and Oracle database services.|
|Program Area||Expenses||Percentage of Total|
|Supplies and Expenses||55,256||2%|
Figure 2. 2012 Operating Expenses
|Program Area||Expenses||Percentage of Total|
|CAIDA internal operations||4,399||0%|
Figure 3. 2012 Expenses by Program Area