CAIDA's Annual Report for 2012

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2012.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Executive Summary

This annual report covers CAIDA's activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.

We continued to make advances in Internet topology research, supported by our expanding Ark measurement infrastructure. We collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6 -- by the end of 2012, 28 of our 64 Ark hosting sites provided IPv6 connectivity and topology measurements. Using our new alias resolution measurement system, which integrates and improves on the best available technology for IP address alias resolution, we collected, analyzed, processed and released our fifth published Internet Topology Data Kit (ITDK), reflecting measurements taken in July 2012. The July 2012 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. After an extensive exercise with our validation data via AS Rank, we also spent many months this year overhauling our AS relationship inference algorithm so that we can add AS relationship annotations to future ITDKs.

On the theoretical side of topology research, we developed a new model in which new connections optimize certain trade-offs between popularity and similarity of nodes, instead of simply preferring popular nodes. This framework has a geometric interpretation in which popularity preference emerges from local optimization. In contrast to standard preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, accurately predicting the probability of new links. We developed a related framework to support mapping a real network into a hyperbolic plane in a way congruent with this model of network growth. Perhaps our most exciting theoretical result was our discovery of structural similarity (power-law graph with strong clustering) between a casual network representing the large-scale structure of spacetime in our accelerating universe, and complex networks such as the Internet, social, or biological networks. We collaborated with supercomputing experts at SDSC to run HPC simulations that provided evidence that this structural similarity is due to asymptotic equivalence in large-scale growth dynamics of complex networks and spacetime in the universe.

In 2012 we continued applying our theoretical, empirical, and practical understandings of the Internet's evolution to the challenge of enable dramatically more scalable global Internet routing. We continued our partnership in the Named Data Networking project, a 12-university collaboration funded by NSF's Future Internet Architecture (FIA) Research program to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP addresses, but also data (content) itself. This approach shifts the focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of today's Internet: routing scalability, network security, content protection and privacy. In 2012 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation. The most challenging part of this routing research as it pertains to the Internet still lies ahead, and will require a broader community of engaged thinkers: application of these and other theoretical results to real-world Internet security, economic, and policy contexts.

A more immediate architectural need of the global Internet has inspired us to study the transition to IPv6. The two main lessons we can glean from the scant data available are: (i) architectural transitions - even those deemed minor but essential - are slow; (ii) the U.S. is behind other regions of the world in IPv6 deployment, and has not thus far invested in shedding quantitative light on this problem, despite making attempts to lightly nudge the market toward wider IPv6 adoption. With support from NSF, we collaborated with the Naval Postgraduate School (Rob Beverly) we studied the deployment of IPv6 at the Autonomous System (AS) level using historical BGP data and recent active measurements, to compare IPv4 topology structure and adoption trends. While most core Internet transit providers have deployed IPv6, edge networks are lagging. IPv6 deployment is stronger in Europe and the Asia-Pacific region, than in North America. The IPv6 topology is characterized by a single dominant player, Hurricane Electric, which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. We found that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ. To support a separate but related modeling effort, we developed and conducted a survey of network operators to gauge IPv6 deployment patterns and plans. Based on the results we hope to refine and re-issue a survey next year, to inform and parameterize a predictive model of possible IPv6 future trajectories.

We made significant progress on our Internet economics research, one goal of which is to create a scientific basis for modeling Internet interdomain interconnection and dynamics, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow. We developed and published a holistic cost model that can help operators evaluate the costs of various routing and peering decisions, among other network operation costs. Using traffic data from a large carrier network, our model revealed how network operators can significantly reduce the cost of carrying traffic in their networks by adjusting routing for a small fraction of total traffic. We also published a paper on our GENESIS simulator, which embodies a computational model of interdomain network formation that captures key factors influencing network formation dynamics: highly skewed traffic matrix, policy-based routing, geographic co-location constraints, and the costs of transit/peering agreements. This simulator enables us to study ``what-if'' questions, such as asking how open peering strategies affect networks in terms of topology, traffic flow, and financial health. We continued studying available interdomain traffic matrix (ITM) data, and discovered that we can model the traffic sent by an AS as either a log-normal or Pareto distribution, depending on whether congestion levels. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We also held a successful interdisciplinary Workshop on Internet Economics (WIE) in December 2012 (co-hosted with MIT's Dave Clark), focused on reaching consensus on definitions and data to support a regulatory framework for a converged communications infrastructure.

In early 2012, we undertook a new three-year research effort to study large-scale Internet outages, under an exciting new Transition to Practice area of NSF's Secure and Trustworthy Cyberspace research program. In this project we are applying our successful results in studying the Egypt and Libya censorship-induced outages (our IMC2011 paper) to the development, testing, and deployment of an operational capability to detect, monitor, and characterize future episodes of Internet connectivity disruptions. In early 2012, we published a study that used the UCSD darknet traffic data to analyze other outages caused by geophysical disasters -- the earthquakes in Christchurch and Tohoku in 2011 -- which won an ACM SIGCOMM CCR award for one of the best CCR papers of 2012.

We continued to dedicate resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation's International Research Network Connections (IRNC) program, and the Department of Homeland Security's Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project (http://www.predict.org). The PREDICT funding provides essential support for deployment and operations of our measurement infrastructure, and the collection, curation, and sharing of several unprecedented data sets available to researchers (https://www.caida.org/data/). We are responsive to researcher requests for additional/different Internet data sets, to the extent possible given our resources. We have found an increasing number of disciplines (physicists, sociologists, biologists) interested in our Internet measurement data sets and research results as they apply to other complex network structure, behavior, and evolution.

Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, 16 peer-reviewed papers, 5 technical and workshop reports, 47 presentations, 13 blog entries, 7 animations, and (six) workshops, and a seminar series. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at https://www.caida.org/about/progplan/progplan2010/. We will be creating a new 3-year program plan in 2013. Please do not hesitate to send comments or questions to info at caida dot org.


Research Areas


Topology of Internet and Complex Networks

CAIDA's long-term topology research agenda included two strategic areas: 1) macroscopic Internet topology measurements and analysis (in both IPv4 and IPv6 address space (see below in the Exploring the evolution of IPv6 section); and 2) topology modeling.

Macroscopic Internet Topology Measurements and Analysis

Goals

Our Internet topology research integrates strategic measurement and analysis capabilities has enabled us to provide comprehensive annotated Internet topology maps, as well as a platform capable of Internet infrastructure assessments

Activities

  1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the fifth calendar year of the IPv4 Routed /24 Topology Dataset collection. We continued to collect automated DNS reverse lookups for IP addresses discovered by Ark probes and annotated the IPv4 topology data with corresponding DNS names. We also created new IPv4 AS Core Graph visualizations using April 2011 Ark data.
  2. Completing the work we started in 2011, we published "Internet-Scale IPv4 Alias Resolution with MIDAR". This paper documents our work on resolving observed interfaces into routers (alias resolution) based on similarities in IP ID time series produced by coordinated active probing of different IP addresses.
  3. Our improved measurement and analysis techniques culminated in collecting, analyzing, processing and releasing an Internet Topology Data Kit (ITDK) synthesizing the IPv4 Routed Topology Dataset and targeted alias resolution measurements conducted in July 2012. The July 2012 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses.
  4. We studied the impact of Multiprotocol Label Switching (MPLS) widely deployed in the Internet for over a decade on Internet topology measurements. It is possible that some MPLS configurations lead to false router-level links inferences in maps derived from traceroute data. In Revealing MPLS tunnels obscured from traceroute, we introduced a measurement-based classification of MPLS tunnels, identifying tunnels where IP hops are revealed but not explicitly tagged as label switching routers, as well as tunnels that obscure the underlying path. In our data, we found that at least 30% of the paths we test traverse an MPLS tunnel.
  5. In "Measuring the Evolution of Internet Peering Agreements", we explored the possibility of studying the full connectivity of a small set of ASes (usable monitors) that provide BGP feeds to Routeviews/RIPE collectors. We developed CMON, an algorithm to classify the links of the usable monitors as transit or settlement-free. We then classified the usable monitors as transit providers (large and small), content producers, content consumers and education/research networks. We highlighted key differences in the evolution of connectivity of the usable monitors, and measured transitions between different relationships for the same pair of ASes. We presented this work at International Federation for Information Processing Networking Conference.
  6. We published a technical report Internet Topology Data Comparison describing the results of our systematic comparison of Internet topologies derived from different data sources and characterizing the Internet at three granularities relevant to both research and operations of network infrastructure: IP address (interface), router, and Autonomous System.
  7. We launched the new version of our interactive CAIDA AS Rank website. It represents CAIDA's ranking of ASes and organizations inferred from BGP routing data collected by the Route Views Project and RIPE NCC and from WHOIS databases maintained by the Regional and National Internet Registries. In September 2012 we also resumed regular production of the AS-level topologies annotated with business relationships between ASes dataset after completing revisions and improvements of our algorithms inferring these relationships.

Publications


Modeling of Complex Networks

Goals

The goal of this research is to derive network models capable of explaining common structural characteristics of large real networks, such as the Internet, social networks, and many other complex networks. In particular, we seek to understand how these characteristics affect the various processes that run on top of these networks, such as routing, information sharing, data distribution, searching, and epidemics. Understanding the mechanisms that shape the structure and drive the evolution of real networks can also have important applications in designing more efficient recommender and collaborative filtering systems, and for predicting missing and future links - an important problem in many disciplines.

Activities

  1. The principle that "popularity is attractive" underlies preferential attachment, which is a common explanation for the emergence of scaling in growing networks. Yet in Popularity versus Similarity in Growing Networks we showed that popularity is just one dimension of attractiveness; another dimension is similarity. We developed a novel framework, Popularity x Similarity Optimization (PSO) model, in which new connections optimize certain trade-offs between popularity and similarity, instead of simply preferring popular nodes. The framework has a geometric interpretation in which popularity preference emerges from local optimization. In contrast to standard preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, predicting the probability of new links with high precision. These results were published in Nature.
  2. We investigated the question of whether one can map a real network into the hyperbolic plane in a way congruent with the PSO model. We developed a systematic framework called HyperMap that accomplishes this task by replaying the network's geometric growth. In Replaying the Geometric Growth of Complex Networks and Application to the AS Internet , we applied the HyperMap to the Autonomous Systems (AS) topology of the real Internet and showed that it was able to identify communities of ASes that belong to the same geographic region. Moreover, our framework was able to predict missing links with high precision. We presented these results at the Workshop on Mathematical performance Modeling and Analysis (MAMA) and published them in ACM SIGMETRICS Performance Evaluation Review.
  3. In Network Cosmology , we showed that a casual network representing the large-scale structure of spacetime in our accelerating universe is a power-law graph with strong clustering, similar to many complex networks such as the Internet, social, or biological networks. We conducted simulations making use of the high performance computing resources available at the San Diego Supercomputer Center and demonstrated that this structural similarity is a consequence of the asymptotic equivalence between the large-scale growth dynamics of complex networks and causal networks. Our findings published in Nature Scientific Reports suggest that unexpectedly similar laws govern the dynamics of complex networks and spacetime in the universe.

Publications

Outreach

  • CAIDA continued hosting the UCSD Complex Network Seminar Different Angles on Network Complexity, Engineering, and Science (DANCES). The seminar brought together researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc.)
  • PI D. Krioukov made 10 presentations on complex networks theory and models at various venues.

Student Involvement

Graduate student Chiara Orsini from the University of Pisa, Italy, worked at CAIDA for three months in 2012 analyzing building blocks of various network topologies. UCSD undergraduate students assisted CAIDA personnel with various tasks for network topology modeling via the Research Experience for Undergraduates (REU) program. In particular, Justin Cheng worked on illustrations and graphs for the Nature papers, and Jessica Ha helped coordinate the DANCES workshop series and other community outreach.

Funding Sources

Our complex network research received support from:

Future Internet Research

Our research on the future of the Internet is currently focused on two primary areas: 1) contributing to the NSF-funded Named Data Networking (NDN) network architecture project; and 2) studying the growing usage of the Internet Protocol version 6 (IPv6).

Named Data Networking (NDN)

Goals

The main goal of this collaborative project is research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer routing directly on content names. By naming data instead of locations, this architecture aims to transition the Internet from its current reliance on "where" (addresses and hosts) to "what" (the content that users and applications care about).

Activities

  1. Co-PI k claffy led the Evaluation and Measurement team activities, while co-PI D. Krioukov participated in Theory and Routing/Forwarding team activities.
  2. We continued to maintain a local node on the national NDN testbed using the CCNX hub software.
  3. We host a desktop computer configured with NDN-based video and audio software (provided by UCLA Center for Research in Engineering, Media, and Performance) and participate in team experiments to test instrumented environments, participatory sensing, and media distribution via the NDN infrastructure.
  4. CAIDA researchers modeled the network growth on the NDN testbed. We assigned to the testbed gateways the hyperbolic coordinates of the ASes obtained in hyperbolic mapping in our paper, Sustaining the Internet with Hyperbolic Mapping paper, and then simulated the network growth by connecting each node to a varying number of hyperbolically closest nodes. We measured the efficiency of greedy forwarding in the resulting networks and found it efficient and resilient with regard to node removals.
  5. CAIDA team assumed the overall management of NDN project and coordination of activities among 10 participating institutions. We host and maintain the internal NDN project Wiki.

Outreach

  • CAIDA researchers attended the second NDN Project retreat held at Colorado State University. kc claffy blogged about this meeting in "The 2nd NDN Project Retreat".
  • CAIDA hosted and participated in the third NDN Project retreat at the UC San Diego campus.
  • CAIDA researchers participated in two NSF Future Internet Architecture Program Meeting and contributed to discussions of the four funded projects and the security features inherent to each of the architectures.

Funding Sources

This research was supported by the NSF grant (CNS-1039646) Named Data Networking.


Exploring the evolution of IPv6: topology, performance, and traffic

Goals

CAIDA aims to measure the evolution of IPv6 in three dimensions: topology, traffic, and performance. Our goal is to uncover characteristics of current IPv6 deployment that can be used to infer how to advance IPv6 deployment, either via technical capability or policy development.

Activities

  1. We completed the fourth full calendar year of the IPv6 Topology Dataset collection and created new IPv6 AS Core Graph visualizations using April 2011 Ark data.
  2. We conducted the worldwide IPv6 Network Operator Survey to parameterize our future IPv6 modeling work.
  3. We studied the deployment of IPv6 at the Autonomous System (AS) level using historical BGP data and recent active measurements, and compared the properties of the IPv6 topology with those of the IPv4 topology. In " Measuring the Deployment of IPv6: Topology, Routing and Performance", we discuss observed trends in global IPv6 adoption. While most core Internet transit providers have deployed IPv6, edge networks are lagging. IPv6 deployment is stronger in Europe and the Asia-Pacific region, than in North America. The IPv6 topology is characterized by a single dominant player, Hurricane Electric, which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. We found that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ. We presented these results at the Internet Measurement Conference (IMC).
  4. We began to work on large-scale IPv6 alias resolution. We developed a method to overcome the challenges posed by the enormous address space and are currently in the process of validating it. As part of this activity, we began to explore a fingerprint-based technique for IPv6 alias resolution based on inducing fragmentation by routers. We demonstrated perfect alias resolution accuracy in a controlled environment, and on a small subset of the production IPv6 Internet for which we have ground truth. We plan to continue refining this technique to achieve large-scale Internet-wide IPv6 alias resolution.
  5. We worked on improving our IPv6 probing strategies. We implemented: 1) a set of scripts to perform parallel processing of IPv6 trace files; 2) computation of some simple statistics on those traces; 3) code to calculate various statistics on the execution times to study the possible gain due to parallelization when analyzing all IPv6 files; and 4) code to compute a histogram showing the number of responses per hop (which determines the number of resulting IP/AS links). We also developed an algorithm for finding the fully explored prefix bits, e.g., to determine which leading bits of subnets have been fully enumerated by a given set of target addresses.

Publications

Outreach

Ongoing data releases

The following topology datasets are available:

Funding Sources

Our IPv6 research received support from:


Economics and Policy

Goals

The high-level goal of this research is to create a scientific basis for modeling Internet interdomain interconnection and dynamics. We aim to understand the structure and dynamics of the Internet ecosystem from an economic perspective, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow.

Activities

  1. We developed a holistic cost model that operators can use to help evaluate the costs of various routing and peering decisions and for other network operations problems. In Towards a Cost Model for Network Traffic, we used real traffic data from a large carrier network and our model to show how network operators can significantly reduce the cost of carrying traffic in their networks by adjusting the routing for just a small fraction of total flows (and total traffic volume). These results are published in ACM SIGCOMM CCR.
  2. We developed GENESIS, a computational model of interdomain network formation that captures key factors influencing the network formation dynamics: highly skewed traffic matrix, policy-based routing, geographic co-location constraints, and the costs of transit/peering agreements. In GENESIS: An Agent-based Model of Interdomain Network Formation, Traffic Flow and Economics, we described this model and applied it to the "what if" question asking how the openness towards peering affects the resulting network in terms of topology, traffic flow and economics. We presented these results at INFOCOM.
  3. We measured the statistical properties of the interdomain traffic matrix (ITM). Our study Towards a Statistical Characterization of the Interdomain Traffic Matrix revealed a sparse ITM and that we can model the traffic sent by an AS as either the log-normal or Pareto distribution, depending on whether the corresponding traffic experiences congestion. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We presented these results at the International Federation for Information Processing (IFIP) Networking Conference.
  4. We regularly responded to requests from government agencies and policymaking bodies for comments and positions that inform policy with the best available empirical data. kc claffy served on two ICANN advisory committees, RSSAC and SSAC, and continued on in her third year as a member of the FCC Technical Advisory Committee (TAC).

Publications

Outreach

Student Involvement

Three UCSD undergraduate students contributed to CAIDA research in economics of the Internet via the Research Experience for Undergraduates (REU) program. Carlos Garibay collected and correlated economic data for ISPs, including income and revenue, number of end-user subscribers, and number of AS-level customers. Andre Gatorano worked on a program to crawl BitTorrent trackers and collect IP addresses of file-sharing clients, and mapped them to Autonomous Systems (ASes) to estimate the size of ASes in terms of the number of end users. Jonathan Yuan assisted the CAIDA webmaster with the preparation of web documents and infrastructure for the WIE workshop, as well as development on the AS Rank website illustrating the interconnections between Autonomous Systems (ASes) and organizations in the Internet.

Funding Sources

Our economics research received support from:


Security and Stability

Goals

Our goal is to develop new methods of analysis and aggregation of Internet measurement data from multiple available sources in order to shed light on various Internet security related events, including global connectivity disruptions due to political or catastrophic causes. Our methodology and findings can form the basis for automated early-warning detection systems for large-scale Internet outages.

Activities

  1. We demonstrated how unsolicited one-way Internet traffic (also called Internet background radiation -- IBR) can be used to analyze macroscopic Internet events that are unrelated to malware. In Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the Internet, we examined two phenomena: country-level censorship of Internet communications and natural disasters (earthquakes). We introduced a new metric of local IBR activity based on the number of unique IP addresses per hour contributing to IBR. The advantage of this metric is that it is not affected by bursts of traffic from a few hosts. Although we have only scratched the surface, we are convinced that IBR traffic is an important ingredient for comprehensive monitoring, analysis, and possibly even detection of events unrelated to the IBR itself. When monitoring the impact of events such as natural disasters on network infrastructure, IBR reveals a view of events that is complementary to many existing measurement platforms based on BGP control-plane views or targeted active probing. These findings were published in ACM SIGCOMM CCR. J. Polterock posted a blog commentary Internet Censorship Revealed Through the Haze of Malware Pollution about this publication.
  2. Traffic classification technology has increased in relevance this decade, as it is now used in the definition and implementation of mechanisms for service differentiation, network design and engineering, security, accounting, advertising, and research. While traffic classification techniques are improving in accuracy and efficiency, the continued proliferation of different Internet application behaviors, in addition to growing incentives to disguise some applications to avoid filtering or blocking, are among the reasons that traffic classification remains one of many open problems in Internet research. In Issues and future directions in traffic classification, we reviewed recent achievements and discussed future directions in traffic classification, along with their trade-offs in applicability, reliability, and privacy. We outlined the persistently unsolved challenges in the field over the last decade, and suggested several strategies for tackling these challenges to promote progress in the science of Internet traffic classification. Our findings are published in IEEE Network.
  3. While analyzing unsolicited traffic reaching the UCSD Network Telescope, we serendipitously discovered a sophisticated botnet scanning event that covertly scanned the entire IPv4 space in about 12 days in February 2011. We carefully studied this event, including validating and crosscorrelating our observations with other large data set shared by other researchers. We discovered that the scan, conducted by the Sality botnet (one of the largest botnets ever identified by researchers) originated from approximately 3 million distinct IP addresses and employed a heavily coordinated and unusually covert scanning strategy to try to discover and compromise VoIP-related (SIP server) infrastructure. The revealed botnet behavior represents ominous advances in the evolution of modern malware: the use of more sophisticated stealth scanning strategies by millions of coordinated bots. We presented the measurement and analysis results at the Internet Measurement Conference (IMC).
  4. The fight against malware will benefit greatly (and perhaps require) collaborative sharing of diverse large-scale security-related data sets. In Analysis of Internet-wide Probing using Darknets, we discuss both the technical and the data-sharing policy aspects of this challenge. This discussion was part of the BADGERS workshop.
  5. We developed visualizations of large-scale Internet events, such as a large region losing connectivity, or a stealth probe of the entire IPv4 address space, by applying a well-known technique in information visualization -- multiple coordinated views -- to Internet-specific data. We animated these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. This capability to simultaneously view multiple dimensions of events enables greater insight into their properties. These results were presented at the Workshop on Internet Visualization and submitted for publication in Computing.

Publications

Outreach

Student Involvement

Karen Benson, a UCSD graduate student, received training in analysis of large-scale Internet outage events in the course of her thesis work.

Funding Sources

Our support for security and stability research comes from:

Infrastructure Projects


Archipelago (Ark)

Goals

Archipelago (Ark) is CAIDA's active measurement infrastructure. It aims to enable large-scale Internet measurements, while reducing the effort needed to develop, deploy and conduct sophisticated experiments. Ark represents a step toward a community-oriented measurement infrastructure as it allows CAIDA collaborators to run their vetted measurement tasks on a security-hardened distributed platform. The effort has three tasks: (1) adding new monitors in geographic and topological areas lacking coverage; (2) improving tools for processing topology data; (3) enhancing and developing software modules to support new experiments and validation. By lowering the cost to implement scientific Internet measurement experiments, Ark allows researchers to test and evaluate more ambitious, sophisticated and risky ideas. The resulting data enables a wide range of network modeling, simulation, analysis, and theoretical research activities, including historical Internet studies and evaluation of proposed future Internet architectures.

Activities

  1. By the end of 2012, we increased the number of vantage points to 64 Ark monitors deployed in 32 countries. We deployed/replaced 17 monitors in 2012, consisting of 11 1U servers we shipped out (10 to replace broken/obsolete servers), 4 1U servers provided by the sites, and 2 new Raspberry Pi monitors.
  2. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2012, Ark had 28 monitors collecting the data on the emerging IPv6 global topology.
  3. In December 2012, we deployed the first Raspberry Pi-based Ark monitor in Limerick, Ireland. Although tiny, a Raspberry Pi offers a flexible Linux-powered programmable platform that will allow us to scale up the Ark infrastructure.
  4. We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. We released arkutil, a RubyGem containing various utility classes used by the Ark measurement infrastructure and the MIDAR alias resolution system.
  5. We continued support for the spoofer experiment (collaboration with Robert Beverly, NPS).

Outreach

  • CAIDA researchers published 4 papers and non-CAIDA researchers published 17 papers that used Ark data.
  • We maintain a mailing list of researchers using Ark data and regularly email them with updates and important news about the data.

Student Involvement

Jeffrey Syang, a UCSD undergraduate student, worked as a System Administrator Assistant on various Ark-related tasks as part of the Research Experience for Undergraduates (REU) program.

Funding Sources

Ark infrastructure receives support from:

UCSD Network Telescope

Goals

We develop and maintain a passive data collection system known as the UCSD Network Telescope to study security related events on the Internet by monitoring and analyzing unsolicited traffic arriving to a globally routed underutilized /8 network. Network telescopes, which observe unsolicited Internet traffic sent to unassigned address space, are one of the few types of instrumentation that allow global visibility into a wide range of security-related events. In order to maximize the research utility of these data, we are working to enable near-real-time data access to vetted researchers. This innovative shift in network monitoring addresses several pervasive challenges in network traffic research: collection and storage, efficient curation, and privacy-protected sharing of large volumes of data.

Activities

  1. Making vast improvements in our software infrastructure for capture, processing, management, analysis, visualization and reporting on data collected with the UCSD Network Telescope, we developed and released Corsaro, a software suite for performing large-scale analysis of trace data. Although specifically designed for use with passive traces captured by darknets, this software can be used with any type of passive trace data.
  2. We released iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly defined subsets: 14 source types and 10 inter-arrival-time based groups. In One-way Traffic Monitoring with iatmon we described how we used this tool to observe changes in one-way traffic at the UCSD Network Telescope over the first half of 2011. These findings were presented at PAM 2012,

Publications

Outreach

Visiting Scholars

Tanja Zseby, a visiting scholar from Fraunhofer Institute for Open Communication Systems (FOKUS), Berlin, Germany, worked on darkspace traffic analysis and created educational data kits from the telescope data.

Student Involvement

Three UCSD undergraduate students assisted CAIDA personnel with various tasks for network telescope via the Research Experience for Undergraduates (REU) program. Sarah Larsen and Jeffrey Sang assisted the CAIDA System Administrator with various telescope infrastructure development tasks. Florence Yu assisted with community outreach and logistics for telescope infrastructure development tasks.

Funding Sources

Our Network Telescope received support from:


Data Sharing for Security / PREDICT

Goals

The goal of the Department of Homeland Security project Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) is to provide vetted researchers with current network operational data in a secure and controlled manner that respects the security, privacy, legal, and economic concerns of Internet users and network operators. CAIDA supports PREDICT goals as Data Provider and Data Host and also plays an advisory role in developing technical, legal, and practical aspects of PREDICT policies and procedures.

Activities

  1. We collected, hosted, and provided current Internet Topology data to PREDICT:
    • Internet Topology measured from Ark Platform (IPv4 Routed /24 Topology, IPv4 Routed /24 DNS Names, IPv6 Topology)
    • Internet Topology Data Kits (ITDK)
  2. We collected, hosted, and provided current Blackhole Address Space data to PREDICT:
    • the UCSD near-real-time Network telescope Data
  3. We hosted and provided legacy data sets to PREDICT:
    • Internet Topology Measurements with skitter
    • OC48 Peering Point IP Packet headers
  4. We completed and submitted to DHS the required deliverables: Project Management Plan, Hosting Infrastructure Description, monthly Financial Status reports, and the first quarterly Technical Report (for the 4th quarter of 2012).
  5. We received eight user requests via the PREDICT portal during 2012 and granted access to three of them.
  6. We completed the CAIDA Anonymized 2012 Internet Traces Dataset that contains traffic traces from our two monitors deployed on high-speed backbone links.
  7. We continued revisions of a government document proposing a framework for ethical guidelines in computer and information security research, based on the principles set forth in the 1979 Belmont Report, a seminal guide for ethical research in the biomedical and behavioral sciences. In our Menlo Report and its companion document, we described how Information and Communication Technology (ICT) research raises new challenges resulting from interactions between humans and communications technologies. We showed that a reinterpretation of ethical principles formulated in the Belmont Report (Respect for Persons, Beneficence, and Justice) and an additional principle, Respect for Law and Public Interest, can lay the groundwork for ethically defensible ICT research.
  8. We wrote and published a commentary, The Menlo Report", describing the primary challenges faced by the authors of the actual report, summarizing the report content, and suggesting the next steps we should take as a community. This commentary was published in IEEE Security & Privacy. Josh Polterock blogged about the Menlo report in The Menlo Report and its Companion bring ethical guidelines to ITC research.
  9. We had previously developed an ethical impact assessment (EIA) tool that provided a set of guiding questions to help researchers understand how to apply ethical principles and applications while conducting trusted and sustainable research on ICT. In A Refined Ethical Impact Assessment Tool and a Case Study of its Application, we discussed the various challenges encountered in applying this EIA, presented a new improved EIA framework representing our evolved understanding of the relevant ethical issues, and retrospectively applied the new EIA to an ethically challenging, original study in ICT research. These results were discussed at the Workshop on Ethics in Computer Security Research (WECSR 2012).

Publications

Outreach

  • We participated in three PREDICT PI meetings (in January, May, and November) and contributed to developing PREDICT policies, data sharing, and marketing efforts.
  • CAIDA researchers attended and made presentations at the Security at the Cyber Border: Exploring Cybersecurity for International Research Network Connections workshop.
  • We co-organized and hosted the 5th CAIDA-WIDE-CASFI Joint Measurement Workshop. The two-day workshop covered miscellaneous research, technical, and data-sharing topics of mutual interest for CAIDA (USA), WIDE (Japan), and CASFI (South Korea) researchers.

Funding Sources

Support for this work comes from:


DatCat: Internet Measurement Data Catalog

Goals

We continued development and refinement of an Internet Measurement Data Catalog (IMDC, or DatCat) -- an index of information (metadata) about data sets and their availability under various usage policies. The goal of this project is to confront a significant challenge in network science: reducing the cost of searching for data by organizing metadata about accessible Internet data sets into a single repository. In particular, the current software development aims to support a range of measurable benefits to cyberinfrastructure research: maximizing the re-use of existing Internet data; decreasing the time spent collecting redundant data; reducing the effort needed to start a new study; promoting validation and reproducibility of analyses and results; enabling longitudinal and cross-disciplinary studies of the Internet; and opening up new cross-domain areas of advanced networking research.

Activities

  1. We migrated a backend database from a proprietary software (Oracle) to a completely open source solution.
  2. We updated and reorganized the internal table structure.
  3. We brought the new, streamlined DatCat catalog back online for public use.
  4. We simplified the web-based submission system.
  5. We implemented standalone publications/collections in DatCat.
  6. We continued to develop the public forums interface integrated with the IMDC to hold discussion of data sharing issues and to answer frequently asked questions regarding the IMDC and the information it contains.

Student Involvement

Two REU-funded undergraduates assisted CAIDA personnel with DatCat research. Jesse Weinstein (UCSD) helped develop the web forum prototype and worked with converting the Oracle to an open source database. Florence Yu assisted with coordination of the AIMS5 workshop where we presented DatCat.

Funding Sources

Our DatCat activities are supported by an NSF grant (OCI-1127500) SDCI-DatCat: Metadata Management Software Tools to Support Cybersecurity Research and Development of Sustainable Cyberinfrastructure.


Sustainable data-handling and analysis methodologies for the IRNC networks

Goals

NSF International Research Network Connections Program (IRNC) has funded five ProNet (production network) projects to provide network connections linking U.S. research networks with peer networks in other parts of the world and five Special Projects that primarily address development, measurement, and monitoring of operational networks. The goal of our IRNC Special Project is to support the IRNC community measurement efforts by fostering and leading discussion of how to best make IRNC data and statistics available, and by adapting CAIDA measurement technologies for IRNC community needs.

Activities

  1. We continued to extend our Archipelago measurement infrastructure to monitor IRNC sites.
    1. We deployed an Ark monitor at Qcell (Serrekunda, Gambia) using the contacts with two network engineers provided by the ProNet PI Steve Huter.
    2. We deployed an Ark monitor at Rede ANSP (Sao Paulo, Brazil) following a contact provided by the ProNet PI Julio Ibarra.
    3. We deployed an Ark monitor at AARnet (Perth, Australia) using the contacts provided by the ProNet PI David Lassner.
  2. We continued our work with IRNC ProNet PI Julio Ibarra at AMPATH to provide advice and assistance with specifying, purchase, configuration, and deployment of a passive monitor running CAIDA's Coralreef software suite to report on the 10GE link between AMPATH and Sao Paolo, Brazil (ANSP).
  3. We maintained an IRNC Wiki page serving as a collection point for IRNC related activities.

Outreach

  • CAIDA hosted a visit of several members of the Network Startup Resource Center (NSRC) including Dale Smith, Steve Huter, Phil Regnauld, and Hervey Allen. They asked for a one-page brochure they could use to help make a case to the other sites they visit for hosting an Ark monitor. In response to this request, we produced a brochure Why should my network host an Ark node?.
  • CAIDA researchers attended and made presentations at the Security at the Cyber Border Workshop. PI k claffy and J. Polterock provided feedback to the Workshop report that describes the community's attempt to recognize and articulate technical and policy cybersecurity considerations related to international research network connections, as well as capture opportunities and challenges for those connections to foster cybersecurity research.

Student Involvement

Two UCSD undergraduate students helped CAIDA personnel support the IRNC project. Sarah Larson assisted with system administration and maintenance, and Jessica Ha assisted with community outreach and logistics for our DUST workshop.

Funding Sources

Support for CAIDA IRNC measurement activities comes from and NSF grant (OCI-0963073) IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks.


Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

2012 Tool Development

arkutil

arkutil is a RubyGem containing various utility classes used by the Archipelago (Ark) measurement infrastructure and the MIDAR alias-resolution system. The latest version of arkutil (v0.12.1) was released on June 23, 2012.

Corsaro

Corsaro is a software suite for performing large-scale analysis of trace data. It was specifically designed to be used with passive traces captured by darknets, but the overall structure is generic enough to be used with any type of passive trace data. Corsaro v1.0.1 was released on October 19, 2012.

iatmon

During the last decade, unsolicited one-way Internet traffic has been used to study malicious activity on the Internet. To make changes in composition of one-way traffic aggregates more detectable, we have developed iatmon (Inter-Arrival Time Monitor), a freely available measurement and analysis tool that allows one to separate one-way traffic into clearly-defined subsets. iatmon is a monitor that reads network trace data from a file, or a live interface, using the WAND group's libtrace library. It builds a hash table of source addresses for one-way traffic, and writes summary files describing the one-way sources.

In 2012, iatmon was publicly released. The paper One-way Traffic Monitoring with iatmon, which was presented at PAM 2012, introduces the iatmon tool in depth.

MIDAR

MIDAR stands for Monotonic ID-Based Alias Resolution, is a tool developed by CAIDA that builds on recent work in alias resolution using IP-ID time stamps to scale related techniques to the size of large-scale Internet topologies (millions of nodes) with greater precision and sensitivity. MIDAR, our Monotonic ID-Based Alias Resolution tool, provides an extremely precise ID comparison test based on monotonicity rather than proximity. MIDAR integrates multiple probing methods, multiple vantage points, and a novel sliding-window probe scheduling algorithm to increase scalability to millions of IP addresses. Experiments show that MIDAR's approach is effective at minimizing the false positive rate sufficiently to achieve a high positive predictive value at Internet scale.

In 2012, we released the medium-scale and large-scale MIDAR alias resolution systems. These systems deliver capability to conduct alias resolution on medium size (<40,000 IP addresses) and Internet-scale (at least 2 million) sets of IP addresses with the MIDAR IP ID test, all MIDAR stages and probe methods, from either a single or multiple hosts.

CAIDA Tools Download Report

The table below displays all CAIDA developed and currently supported tools distributed via our home page at https://catalog.caida.org/software and the number of downloads of each version during 2012.

Tool Description Downloads
arkutil RubyGem containing utility classes used by the Archipelago measurement infrastructure and the MIDAR alias-resolution system. 144
Autofocus Internet traffic reports and time-series graphs. 362
Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 210
CoralReef Measures and analyzes passive Internet traffic monitor data. 580
Corsaro Extensible framework for large-scale analysis of passive trace data. 87
Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 136
dnsstat DNS traffic measurement utility. 271
iatmon Ruby+C+libtrace analysis module that separates one-way traffic into clearly-defined subsets. 92
iffinder Discovers IP interfaces belonging to the same router. 362
libsea Scalable graph file format and graph library. 250
kapar Graph-based IP alias resolution. 138
MIDAR Identifies IPv4 addresses belonging to the same router (aliases) using shared monotonic IP ID counters. 278
Motu Dealiases pairs of IPv4 addresses. 81
mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 294
otter Visualizes arbitrary network data. 795
plot-latlong Plots points on geographic maps. 202
plotpaths Displays forward traceroute path data. 73
rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 933
RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 396
sk_analysis_dump A tool for analysis of traceroute-like topology data. 82
topostats Computes various statistics on network topologies. 230
Walrus Visualizes large graphs in three-dimensional space. 1659

Data

Data Collected in 2012

In 2012, CAIDA captured the following raw data:

We curated and archived several datasets from these data:
  • During the Day In The Life of the Internet (DITL 2012 on April 18) we collected a one-hour passive trace on high-speed internet backbone links (distributed as part of the CAIDA Anonymized High-speed Internet Traces 2012, and retained the "live" data collected on the UCSD Network Telescope as well.
  • The table below lists the amount of data collected in our ongoing data collection operations.

    Data Type First date Last date Total size1
    Macroscopic Topology Measurements, IPv4 (Archipelago) 2012-01-01 2012-12-31 668.6 GiB (2.1 TiB)
    Macroscopic Topology Measurements, IPv6 (Archipelago) 2012-01-01 2012-12-31 4.3 GiB (14.5 GiB)
    Internet backbone Traces 2012-01-20 2012-12-22 1.5 TiB (3.5 TiB)3
    "Live" Network Telescope Data 2012-01-01 2012-12-31 43.9 TiB (87.7 TiB)2,4
    DNS Names for IPv4 Routed /24 Topology Dataset 2012-01-01 2012-12-31 9.0 GiB (35 GiB)
    AS Links for IPv4 Routed /24 Topology Dataset 2012-01-01 2012-12-31 174.5 MiB (708.6 MiB)
    Macroscopic Internet Topology Data Kit (ITDK) 2012-07-07 2012-07-22 1.3 GiB (24.0 GiB)
    DNS root/gTLD RTT Dataset 2012-03-16 2012-12-31 10.1 MiB

    1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
    2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
    3This includes traces on April 18 during DITL 2012, and traces on 6 June 2012 (IPv6 Launch)
    4This includes 126 GB of data collected during DITL 2012 and 126 GB on IPv6 Launch.

    Datasets Distributed in 2012

    CAIDA makes some datasets publicly available without restrictions to the user, while access to other datasets is restricted to academic researchers, CAIDA members, and government contractors with data access subject to certain safeguards designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

    • Publicly Available Data

      These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

    Dataset Unique visitors (IPs) Data Downloaded
    AS Rank 1568 852.5 MiB
    AS Links (AS Adjacencies) 556 15.6 GiB
    AS Relationships 469 1.9 GiB
    Router Adjacencies 212 460.9MiB
    AS Taxonomy 145 66.3 MiB *
    Witty Worm Dataset 431 546.9 MiB
    Code-Red Worms Dataset 94 6.3 GiB
    Telescope Sipscan Data Supplement 25 4.2 GiB
    We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers.
    * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
    • Restricted Access Data

      These datasets require that users:

      • be academic or government researchers, or join CAIDA;
      • request an account and provide a brief description of their intended use of the data; and
      • agree to an Acceptable Use Policy.
    Dataset Unique visitors (usernames) Data Downloaded *
    Anonymized Internet Backbone Traces 237 17.3TiB
    Backscatter Datasets 24 1.2TiB
    (Raw Topology Traces from Archipelago infrastructure)
    52 2.6TiB
    Raw Topology Traces (skitter) 19 916.6GiB
    DNS Names for IPv4 Routed /24 Topology Dataset 29 45.5GiB
    Macroscopic Internet Topology Data Kit 75 32.1GiB
    Witty Worm Dataset 15 190.4 GiB
    DNS Root/gTLD server RTT Dataset 7 107.8MiB
    DDoS Attack Dataset 71 202.6GiB
    Telescope Datasets 134 444.2GiB
    * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers.
    • Restricted Access Data Requests

      The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

      We received about 75 more requests in 2012 then in 2011, and approved 46 more requests for access to restricted datasets. About 83.59% of the users that are granted access actually access our webservers to download data.

    Dataset Number of requests received Number of users granted access Number of users that accessed data
    Anonymized Backbone and Peering Link Traces 353 267 229
    Active Topology Trace Datasets 138 110 88
    Backscatter-2008 Dataset 34 26 18
    Witty Worm Dataset 18 14 11
    DNS Root/gTLD server RTT Dataset 8 7 6
    DDoS Attack Dataset 112 75 67
    Telescope Datasets 40 25 19
    Totals 703 524 438

    Workshops

    As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

    ISMA - 4th Workshop on Active Internet Measurements (AIMS-4)

    On February 8-10, 2012, CAIDA hosted the 4th workshop on Active Internet Measurements (AIMS-4) supporting science and policy. This workshop continues the series of Internet Statistics and Metrics Analysis (ISMA) workshops that are held to discuss the current and future state of Internet measurement and analysis.

    1st International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)

    On May 14-15, 2012, CAIDA hosted the 1st International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST 2012). The goal of the DUST workshop series is to bring together researchers, operators, and analysts interested in unsolicited traffic analysis, especially traffic destined to unassigned (dark) IP address space.

    CAIDA-WIDE-CASFI Workshop

    On August 1-2, 2012, the 5th CAIDA-WIDE-CASFI Joint Measurement Workshop was held in La Jolla, CA. This workshop continues a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and data sharing across international boundaries. The workshop covered miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants and brought various groups together to share their latest research.

    NDN Project Retreat

    On October 11-12, 2012, CAIDA hosted the 3rd NDN Project Retreat, which brought together participants from the collaborative Named Data Networking project to discuss the research, development, and testbed deployment of a new Internet architecture that replaces IP with a network layer that routes directly on content names.

    CAIDA/ISC Data Collaboration Workshop

    ISC and CAIDA hosted the ISC/CAIDA Data Collaboration Workshop on Oct 22, 2012 in Baltimore, MD, co-located with the MAAWG 26th general meeting. The ISC/CAIDA Data Collaboration workshop is a venue for showcasing novel case studies of network and security data analysis and data sharing, discussing data synthesis techniques and technologies, networking between data providers and recipients in research and operations. SIE data contributors can hear and discuss how their shared data is providing value, and attendees can learn how open-source SIE technology can be incorporated into collaborative research data collection and sharing efforts.

    Workshop on Internet Economics

    On December 12-13, 2012, CAIDA and Georgia Tech hosted its third interdisciplinary Workshop on Internet Economics (WIE). The goal of this workshop series is to provide a forum for researchers, commercial Internet facilities and service providers, technologists, economists, theorists, policy makers, and other stakeholders to empirically inform emerging regulatory and policy debates.

    UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)

    Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES was to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar fostered communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provided young researchers a forum to practice their presentation and communication skills. The seminars continued in 2012 to bring in attendees from a diversity of disciplines.

    Publications

    The following table contains the papers published by CAIDA for the calendar year of 2012. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

    Year Month Author(s) Title Publication
    2012 Dec
    1. Benson, Karyn
    2. Dainotti, Alberto
    3. claffy, kc
    4. Aben, Emile
    Gaining Insight into AS-level Outages through Analysis of Internet Background Radiation ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT) Student Workshop
    2012 Dec
    1. Papadopoulos, Fragkiskos
    2. Psomas, Constantinos
    3. Krioukov, Dmitri
    Replaying the Geometric Growth of Complex Networks and Application to the AS Internet ACM SIGMETRICS Performance Evaluation Review
    2012 Nov
    1. Krioukov, Dmitri
    2. Kitsak, Maksim
    3. Sinkovits, Robert
    4. Rideout, David
    5. Meyer, David
    6. Boguñá, Marián
    Network Cosmology Nature Scientific Reports
    2012 Nov
    1. Dhamdhere, Amogh
    2. Luckie, Matthew
    3. Huffaker, Bradley
    4. claffy, kc
    5. Elmokashfi, Ahmed
    6. Aben, Emile
    Measuring the Deployment of IPv6: Topology, Routing and Performance ACM Internet Measurement Conference (IMC)
    2012 Nov
    1. Dainotti, Alberto
    2. King, Alistair
    3. Claffy, Kimberly
    4. Papale, Ferdinando
    5. Pescapè, Antonio
    Analysis of a "/0" Stealth Scan from a Botnet ACM Internet Measurement Conference (IMC)
    2012 Oct
    1. Dainotti, Alberto
    2. King, Alistair
    3. Claffy, Kimberly
    Analysis of Internet-wide Probing using Darknets Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS)
    2012 Oct
    1. Zseby, Tanja
    2. claffy, kc
    DUST 2012 Workshop Report ACM SIGCOMM Computer Communication Review (CCR)
    2012 Sep
    1. Papadopoulos, Fragkiskos
    2. Kitsak, Maksim
    3. Serrano, Mirian Ángeles
    4. Boguñá, Marián
    5. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks Nature
    2012 Sep
    1. Zhang, Lixia
    2. Estrin, Deborah
    3. Burke, Jeffrey
    4. Jacobson, Van
    5. Thornton, James
    6. Uzun, Ersin
    7. Zhang, Beichuan
    8. Tsudik, Gene
    9. claffy, kc
    10. Krioukov, Dmitri
    11. Massey, Dan
    12. Papadopoulos, Christos
    13. Ohm, Paul
    14. Abdelzaher, Tarek
    15. Shilton, Katie
    16. Wang, Lan
    17. Yeh, Edmund
    18. Crowley, Patrick
    Named Data Networking (NDN) Project 2011 - 2012 Annual Report Named Data Networking (NDN)
    2012 Sep
    1. Lodhi, Aemen
    2. Dhamdhere, Amogh
    3. Dovrolis, Constantine
    Peering Strategy Adoption by Transit Providers in the Internet: A Game Theoretic Approach ACM SIGMETRICS Performance Evaluation Review
    2012 Aug
    1. Dittrich, David
    2. Kenneally, Erin
    The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research U.S. Department of Homeland Security
    2012 Jul
    1. claffy, kc
    The 4th Workshop on Active Internet Measurements (AIMS-4) Report ACM SIGCOMM Computer Communication Review (CCR)
    2012 Jul
    1. claffy, kc
    Border Gateway Protocol (BGP) and Traceroute Data Workshop Report ACM SIGCOMM Computer Communication Review (CCR)
    2012 May
    1. Dhamdhere, Amogh
    2. Cherukuru, Himalatha
    3. Dovrolis, Constantine
    4. claffy, kc
    Measuring the Evolution of Internet Peering Agreements IFIP Networking
    2012 May
    1. Mikians, Jakub
    2. Dhamdhere, Amogh
    3. Dovrolis, Constantine
    4. Barlet-Ros, Pere
    5. Solé-Pareta, Josep
    Towards a Statistical Characterization of the Interdomain Traffic Matrix IFIP Networking
    2012 May
    1. Huffaker, Bradley
    2. Fomenkov, Marina
    3. claffy, kc
    Internet Topology Data Comparison Cooperative Association for Internet Data Analysis (CAIDA)
    2012 Apr
    1. Donnet, Benoit
    2. Luckie, Matthew
    3. Mérindol, Pascal
    4. Pansiot, Jean-Jacques
    Revealing MPLS tunnels obscured from traceroute ACM SIGCOMM Computer Communication Review (CCR)
    2012 Apr
    1. claffy, kc
    Workshop on Internet Economics (WIE2011) Report ACM SIGCOMM Computer Communication Review (CCR)
    2012 Mar
    1. Lodhi, Aemen
    2. Dhamdhere, Amogh
    3. Dovrolis, Constantine
    Analysis of peering strategy adoption by transit providers in the Internet NetEcon
    2012 Mar
    1. Brownlee, Nevil
    One-way Traffic Monitoring with iatmon Passive and Active Network Measurement Workshop (PAM)
    2012 Mar
    1. Lodhi, Aemen
    2. Dhamdhere, Amogh
    3. Dovrolis, Constantine
    GENESIS: An Agent-based Model of Interdomain Network Formation, Traffic Flow and Economics IEEE Conference on Computer Communications (INFOCOM)
    2012 Mar
    1. Bailey, Michael
    2. Dittrich, David
    3. Kenneally, Erin
    4. Maughan, Douglas
    The Menlo Report IEEE Security & Privacy
    2012 Mar
    1. Bailey, Michael
    2. Kenneally, Erin
    3. Dittrich, David
    A Refined Ethical Impact Assessment Tool and a Case Study of its Application Workshop on Ethics in Computer Security Research (WECSR)
    2012 Jan
    1. Motiwala, Murtaza
    2. Dhamdhere, Amogh
    3. Feamster, Nick
    4. Lakhina, Anukool
    Towards a Cost Model for Network Traffic ACM SIGCOMM Computer Communication Review (CCR)
    2012 Jan
    1. Dainotti, Alberto
    2. Amman, Roman
    3. Aben, Emile
    4. Claffy, Kimberly
    Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the Internet ACM SIGCOMM Computer Communication Review (CCR)
    2012 Jan
    1. Dainotti, Alberto
    2. Pescapè, Antonio
    3. Claffy, Kimberly
    Issues and future directions in traffic classification IEEE Network

    Presentations

    The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2012. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

    Year Month Presenters(s) Title Venue
    2012 Dec
    1. Dainotti, Alberto
    Analysis of an Internet-wide Stealth Scan from a Botnet USENIX Large Installation System Administration (LISA)
    2012 Dec
    1. Kenneally, Erin
    Of Skunks and Canaries (and maybe rat holes) Workshop on Internet Economics (WIE)
    2012 Dec
    1. Luckie, Matthew
    CAIDA's AS-rank: measuring the influence of ASes on Internet Routing Workshop on Internet Economics (WIE)
    2012 Nov
    1. claffy, kc
    DHS PREDICT project: CAIDA update DHS PREDICT PI Meeting
    2012 Nov
    1. Dainotti, Alberto
    Analysis of a "/0" Stealth Scan from a Botnet ACM Internet Measurement Conference (IMC)
    2012 Nov
    1. Dhamdhere, Amogh
    The Structure and Evolution of the AS-level Internet Workshop on Internet Topology and Economics
    2012 Nov
    1. King, Alistair
    A Coordinated View of Large-Scale Internet Events Workshop on Internet Visualization (WIV)
    2012 Nov
    1. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks USC Information Sciences Institute (ISI)
    2012 Nov
    1. Krioukov, Dmitri
    Large graphs in physics: From statistical mechanics of networks to quantum cosmology University of Houston
    2012 Nov
    1. Luckie, Matthew
    Measuring the Deployment of IPv6: Topology, Routing, and Performance ACM Internet Measurement Conference (IMC)
    2012 Oct
    1. Huffaker, Bradley
    Cartographic Capabilities for Critical Cyberinfrastructure (C4) Cyber Security Division PI Meeting
    2012 Oct
    1. claffy, kc
    Leveraging the Science and Technology of Internet Mapping for Homeland Security Cyber Security Division PI Meeting
    2012 Oct
    1. Dainotti, Alberto
    Analysis of Internet-wide Probing using Darknets BADGERS
    2012 Oct
    1. Dhamdhere, Amogh
    Measuring the Adoption of IPv6 Chinese American Networking Symposium (CANS)
    2012 Oct
    1. Krioukov, Dmitri
    The Universal Laws of Structural Dynamics in Large Graphs DARPA Graph-theoretic Research in Algorithms and the Phenomenology of Social networks (GRAPHS)
    2012 Oct
    1. Krioukov, Dmitri
    Hyperbolic routing in NDN Named Data Network (NDN) Retreat
    2012 Oct
    1. Luckie, Matthew
    CAIDA's AS-rank: measuring the influence of ASes on Internet Routing North American Network Operators' Group (NANOG)
    2012 Aug
    1. Dainotti, Alberto
    Extracting Benefit from Harm: Using Malware Pollution to Analyze the Impact of Political and Geophysical Events on the Internet ACM SIGCOMM
    2012 Aug
    1. King, Alistair
    Corsaro CAIDA-WIDE-CASFI Joint Measurement Workshop
    2012 Aug
    1. Hyun, Young
    Internet Topology Data Kit Update CAIDA-WIDE-CASFI Joint Measurement Workshop
    2012 Jul
    1. Dhamdhere, Amogh
    Measuring and Modeling the Adoption of IPv6 Santa Fe Institute Business Network Topical Meeting on Measurement of Complex Networks
    2012 Jul
    1. Krioukov, Dmitri
    The Universal Laws of Structural Dynamics in Large Graphs DARPA Graph-theoretic Research in Algorithms and the Phenomenology of Social networks (GRAPHS)
    2012 Jul
    1. Dainotti, Alberto
    Analysis of Country-wide Internet Outages Caused by Censorship Internet Engineering Task Force (IETF)
    2012 Jun
    1. Huffaker, Bradley
    CAIDA Research Overview MIC-DHS Meeting for Cyber Security Cooperation
    2012 Jun
    1. Dhamdhere, Amogh
    Peering Strategy adoption by Transit Providers in the Internet: A Game Theoretic Approach Workshop on Pricing and Incentives in Networks (W-PIN)
    2012 May
    1. Brownlee, Nevil
    One way Traffic Monitoring with iatmon International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)
    2012 May
    1. Dainotti, Alberto
    SipScan: the world scanning itself International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)
    2012 May
    1. Zseby, Tanja
    Comparable Metrics for IP Darkspace Analysis International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)
    2012 May
    1. Dhamdhere, Amogh
    Analysis of peering strategy adoption by transit providers in the Internet Simula Research
    2012 May
    1. King, Alistair
    Corsaro International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)
    2012 May
    1. Dhamdhere, Amogh
    Measuring the Evolution of Internet Peering Agreements International Conferences on Networking
    2012 May
    1. Claffy, Kimberly
    DHS PREDICT project: CAIDA update DHS PREDICT PI Meeting
    2012 May
    1. Kenneally, Erin
    Illuminating the way for Trusted Darkspace Data Sharing International Workshop on Darkspace and UnSolicited Traffic Analysis (DUST)
    2012 Apr
    1. Dhamdhere, Amogh
    Analysis of peering strategy adoption by transit providers in the Internet UCSD Complex Networks Seminar (DANCES)
    2012 Mar
    1. claffy, kc
    Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity Science and Technology
    2012 Mar
    1. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks American Physical Society (APS)
    2012 Mar
    1. Krioukov, Dmitri
    Network Cosmology Pacific Coast Gravity Meeting (PCGM)
    2012 Feb
    1. claffy, kc
    Analysis of macroscopic Internet Outages UCSD CSE Perspectives in Computer Science
    2012 Feb
    1. claffy, kc
    Extracting Benefit from Harm: Using Malware Pollution to Analyze the Impact of Political and Geophysical Events on the Internet Asia Pacific Regional Internet Conference on Operational Technologies (APRICOT)
    2012 Feb
    1. Krioukov, Dmitri
    Hyperbolic geometry of large networks DARPA Mathematics Summit
    2012 Feb
    1. Hyun, Young
    Archipelago Measurement Infrastructure: On-Demand IPv4 and IPv6 Topology Measurements ISMA Workshop on Active Internet Measurements (AIMS)
    2012 Feb
    1. Polterock, Joshua
    CAIDA: A Data Sharing Case Study Security at the Cyber Border: Exploring Cybersecurity for International Research Network Connections
    2012 Feb
    1. Krioukov, Dmitri
    Popularity versus Similarity in Growing Networks Caltech Social Science Seminars
    2012 Feb
    1. Kenneally, Erin
    Legal Aikido: A Data-Sharing Framework to Advance Network & Security Research Security at the Cyber Border: Exploring Cybersecurity for International Research Network Connections
    2012 Feb
    1. Luckie, Matthew
    IPv6 deployment: trends and tidbits of 4,800 dual-stack ASes ISMA Workshop on Active Internet Measurements (AIMS)
    2012 Jan
    1. claffy, kc
    Extracting Benefit from Harm: Using Malware Pollution to Analyze the Impact of Political and Geophysical Events on the Internet New Zealand Network Operators' Group (NZNOG)
    2012 Jan
    1. Zseby, Tanja
    Entropy in IP Darkspace Data FloCon
    2012 Jan
    1. claffy, kc
    DHS PREDICT project: CAIDA update DHS PREDICT PI Meeting

    Web Site Usage

    In 2012, CAIDA's web site continued to attract considerable attention from a broad, international audience.

    The graph and table below present the monthly history of traffic to www.caida.org for 2012. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.



    Web Usage Bar Graph
    MonthUnique visitorsNumber of visitsPagesHitsBandwidth
    Jan 201228,62652,604193,506767,63338.33 GB
    Feb 201230,78556,297186,165867,27847.81 GB
    Mar 201231,25759,346211,540935,85242.68 GB
    Apr 201240,81868,010220,749902,31837.92 GB
    May 201228,21652,621216,732815,11046.20 GB
    Jun 201224,94347,176165,475615,69531.27 GB
    Jul 201223,54547,244235,320651,36333.22 GB
    Aug 201223,65146,284166,655575,87734.01 GB
    Sep 201226,42748,695176,948719,46730.67 GB
    Oct 201237,54463,471222,794950,63947.41 GB
    Nov 201231,20754,595209,570800,39236.95 GB
    Dec 201227,74250,117175,742633,59629.29 GB
    Total 354,761 646,6460
    (1.82 visits/visitor)
    2,8381,196
    (3.68 pages/visit)
    9,235,220
    (14.28 hits/visit)
    455.74 GB
    (739.22 kb/visit)

    Organizational Chart

    CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2012. The image below shows the functional organization of CAIDA. Please check the home page for more complete information about CAIDA staff.

    [Image of CAIDA Functional Organization Chart]

    CAIDA Functional Organization Chart


    Funding Sources

    CAIDA thanks our 2012 sponsors, members, and collaborators.

    The charts below depict funds received by CAIDA during the 2012 calendar year.

    Funding Source Allocations Percentage of Total
    NSF 2,139,446 52%
    DARPA 150,000 4%
    DHS 1,543,361 38%
    Gift & Members 250,760 6%
    Total 4,083,567 100%
    [Figure: Allocations by funding source]

    Figure 1. Allocations by funding source received during 2012.


    Operating Expenses

    The charts below depict CAIDA's Annual Expense Report for the 2012 calendar year.

    LABOR Salaries and benefits paid to staff and students
    IDC Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).
    ENTERTAINMENT Hosting official collaborators, visitors, and guests
    SUPPLIES & EXPENSES Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.
    WORKSHOP SUPPORT Conference facilities, catering, guest travel support, and resources for workshops, conferences, PI meetings, and operational meetings.
    CAIDA TRAVEL CAIDA employee trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
    EQUIPMENT Computer hardware or other equipment costing more than $5000.
    TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
    Program Area Expenses Percentage of Total
    Labor 1,849,061 59%
    IDC 1,077,345 34%
    Entertainment 3,487 0%
    Supplies and Expenses 55,256 2%
    Workshop Support 28,594 2%
    CAIDA Travel 65,640 2%
    Equipment 60,784 2%
    Total 3,140,170 100%
    [Figure: Operating Expenses]

    Figure 2. 2012 Operating Expenses



    Program Area Expenses Percentage of Total
    Policy 98,834 3%
    Routing 319,618 10%
    Topology 1,035,309 33%
    Infrastructure 1,514,333 48%
    Security 75,643 2%
    Outreach 92,034 3%
    CAIDA internal operations 4,399 0%
    Total 3,140,170 100%
    [Figure: Expenses by Program Area]

    Figure 3. 2012 Expenses by Program Area

    Published