Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
www.caida.org > projects : : cybersecurity
CAIDA's Cybersecurity Project

CAIDA proposed to apply a decade of experience in Internet topology measurement, analysis, modeling, and visualization to DHS' immediate cybersecurity needs to understand and protect essential U.S. information infrastructure.

After a three-year contract, we have developed and implement new measurement and data collection technologies and infrastructure to improve DHS' situational awareness and understanding of the structure, dynamics and vulnerabilities of the physical and logical topologies of the global Internet.

In our approach we integrated the following six strategic measurement and analysis capabilities to improve DHS' situational awareness of Internet topology structure and behavior:

  • a new architecture to support Internet topology measurement;
  • application of IP alias resolution techniques; (for deriving topologies at both router and service provider granularity from the IP path measurements);
  • conversion of IP/router to AS-level topology graphs;
  • AS taxonomy and relationship inference;
  • geolocation of IP resources;
  • and (interactive) visualization of large annotated graphs.

The objective of these applied research, development, and deployment efforts was to develop the capability to regularly provide richly annotated topology maps of observable Internet infrastructure, as well as a powerful measurement platform capable of performing other types of Internet infrastructure measurement experiments as needed. Our approach has uniquely been able to provide benefits in two areas: (1) improving critical national capabilities in understanding the structure and evolution of our communications infrastructure, based on innovative measurement, analysis and inference techniques; and (2) fill a recognized need in network science: allowing researchers to execute empirical research in cybersecurity without having to build and maintain the requisite global measurement infrastructure.

The main results and achievements are:

1. Ark

We have created a powerful and versatile distributed measurement infrastructure Archipelago (Ark). Archipelago makes use of measurement nodes located in various networks worldwide and connected via the Internet to a central server located at CAIDA. Co-funded by DHS and NSF, we have grown Ark from 13 monitors in December 2007 to 54 monitors as of July 2011, deployed in 30 countries on 6 continents. 26 monitors are enabled to conduct both IPv4 and IPv6 measurements. Ark has pioneered new features and functionality of distributed measurement infrastructure, including flexible and efficient measurement and data collection methods. It has proven successful at providing both benefits described above: unprecedented intelligence regarding macroscopic Internet connectivity, and support for several third-party cybersecurity-related global Internet measurement experiments.

Ark is now continuously gathering the largest set of IPv4 and IPv6 topology data made available to academic researchers and government agencies. For IPv4 topology, Ark monitors continuously measure IP-level paths to a dynamically generated list of IP addresses covering all /24 prefixes (about 9.4 million as of May 2011) in routed IPv4 address space. Measurement parallelization allows us to cycle through probing each routed /24 prefix in about two days. Over the lifetime of Ark, we have collected more than 4 billion traceroutes (1.6 TB of data). IPv6-capable monitors conduct continuous probing of BGP-announced IPv6 prefixes (/48 or shorter, nearly 4,000 prefixes as of December 2010). Each Ark monitor probes a single random destination in each prefix; a full probing cycle takes 48 hours.

We also field-tested Ark several times in support of global measurement experiments that resulted in several conference and journal publications (listed below). These experiments included investigation of relative topological coverage of different forward path probing methods (IMC'08), evaluating the efficacy of deployed Internet source address validation filtering (IMC'09), measuring the impact of certain causes of missing hops in traceroute paths (CCR'11), and a comparison of public and commercial geolocation databases (NMMC'11). This range of scientific experiments has successfully demonstrated our vision of a metaphorical distributed measurement "operating system" to support empirical Internet science.

Based on our experience with Ark, CAIDA has also made recommendations for designing and operating the next generation of Internet topology measurement platforms.

Contributions

CategoryContribution
InfrastructureArk
Service
SoftwareDistributed measurement-platform "operating system", components available now or to be released by end of contract:
  • Coordination software: Marinda, a distributed tuple space implementation used for coordinating measurements across monitors
  • Probing application software: mper, probing engine based on scamper; rb-mperio, Ruby library for writing Ruby measurement scripts that use mper as probing engine
  • Data management software: ark-collector: Ark service for automatic fault-tolerant downloading of measurement data from monitors; ScamperDataFeed/ScamperIO, Ruby classes for programmatically controlling scamper from a Ruby script;
  • System-management software: ArkUtil, a library of utilities including software to remotely manage software running on monitors;
  • Researcher-supporting software: Eva (Ruby library for writing efficient event-driven applications in Ruby);
  • Data analysis software: rb-asfinder, a Ruby library for fast IP-to-prefix/AS lookups using a Patricia trie (wraps our C++ Patricia trie implementation); rb-wartslib: Ruby library for reading and writing scamper warts files; rb-judy: Ruby wrapper for the Judy array library
  • Topology statistics analysis tool:
  • CATCH topo-on-demand demo: performing user-specified distributed traceroutes/pings from Ark platform on demand;
  • Prototype services to support clients interacting over the Marina tuple space: asfinder, generic service using rb-asfinder to map IP addresses to prefixes and ASes; geoloc, generic service for geolocating IP addresses;
Data Ongoing data collections
  1. IPv4 Routed /24 (topology probes to each /24, continuously)
  2. IPv4 Routed /24 DNS Names (DNS resolution for the above)
  3. IPv6 Topology (topology probes to each routed IPv6 prefix)
  4. Internet Topology Data Kit (ITDK) (curated IPv4 data)
  5. AS Links: IPv4 Routed /24 AS Links (AS adjacencies)
  6. AS Relationships (inferred AS relationships)
  7. AS Rank (inferred AS ranking by customer cone or degree)
Statistical information per monitor Ark statistics page includes both statistics aggregated across all monitors (e.g, distribution of IP-hop path lengths, AS-hop path lengths, and RTTs across all monitors, data that has notably been relevant for S-BGP conversations regarding average AS path lengths actually taken by packets (not available via BGP data)) as well as distributions of data from each monitor:
  • median RTT per country and US state (geographic map)
  • AS hop dispersion graphs (by AS hop and IP hop)
  • IP hop dispersion graphs
  • distribution of path lengths (IP and AS)
  • RTT distribution (CCDF and quartiles vs hop distance)
  • RTT vs geographic distance
Published experiments using Ark (Another 107 articles in Google Scholar cite or use data from Ark, as of 02 September 2011.)
  1. "Traceroute Probe Method and Forward IP Path Inference", IMC'08, Matthew Luckie, Young Hyun, and Bradley Huffaker.
  2. "Understanding the efficacy of deployed internet source address validation filtering", IMC'09, Rob Beverly, Arthur Berger, Young Hyun, KC Claffy.
  3. "Measured impact of crooked traceroute", CCR, Jan 2011 Matthew Luckie, Amogh Dhamdhere, KC Claffy, David Murrell.
  4. "Geocompare: a comparison of public and commercial geolocation databases", Network Mapping and Measurement Conference, May 2011. Bradley Huffaker, Marina Fomenkov, and KC Claffy (see 5. "Geolocation" below)
  5. "Efficient Internet Topology Discovery Techniques", Masters Thesis, U. Waikato, Alistair King.

2. Mapping IP Addresses to Routers

We have tested previously existing small-scale experimental methods for mapping IP addresses to routers and re-designed and re-implemented them into more robust and scalable alias resolution techniques. Our newly developed tools, kapar and MIDAR, are working on Internet-scale topologies (millions of addresses) identifying addresses belonging to the same routers with greater precision and sensitivity than was previously achievable.

Contributions

CategoryContribution
Software
  • Multi-level Distributed Alias Resolution (MIDAR)
  • kapar (extended/upgraded version of APAR) (to be released by end of contract)
Publications
  1. "Internet-scale IP alias resolution techniques", Ken Keys, CCR, January 2010.
  2. "Toward Topology Dualism: Improving the Accuracy of AS Annotations for Routers", PAM'2010, Bradley Huffaker, Amogh Dhamdhere, Marina Fomenkov and KC Claffy.
  3. "MERLIN: MEasure the Router Level of the INternet", Conference on Next Generation Internet (NGI'2011), Pascal Mérindol, Benoit Donnet, Jean-Jacques Pansiot, Matthew Luckie and Young Hyun.
  4. "Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture", submitted to Transactions on Networking, June 2011. Ken Keys, Young Hyun, Matthew Luckie, KC Claffy.

3. Internet Topology Data Kit (ITDK)

In 2010, we applied our state-of-the-art measurement and analysis techniques to collect, analyze, process and release three Internet Topology Data Kit (ITDK) data sets. Each ITDK starts with two weeks of traceroute data probing IPv4 addresses and includes: two versions of router-level topologies (derived using different combinations of alias resolution methods); router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. ITDKs are work in progress, we continue to refine data analysis, inferences, and annotation methods, and will be augmenting ITDKs with additional annotations.

Contributions

CategoryContribution
DataInternet Topology Data Kit (ITDK) (three in April 2010, will be two in 2011); each ITDK has two-router level topologies: one optimized for accuracy, the other for completeness.

data files: routers, links, router-to-AS mappings, DNS, AS relationships, geolocation.

4. AS Rank

Using routing data obtained from the Route Views Project and RIPE NCC, in conjunction with multiple data analysis methodologies, we have developed a procedure to rank Autonomous Systems (AS Rank) based on our inferred economics of AS business relationships observable in the global routing tables. Position of each AS in the Internet hierarchy is determined as a function of the number of IP prefixes advertised by this AS, its customer ASes, their customers ASes, and so on. We implemented our AS ranking algorithm as an interactive web page which allows users to provide a feedback correcting false relationship inferences. AS-ranking is also work in progress, we are improving our ranking algorithms taking into account user suggestions and expanding the range of data used for analysis.

Contributions

CategoryContribution
Software-as-Serviceinteractive AS ranking
Dataresulting files from above

5. Geolocation

Since geolocation of IP resources is such an essential component of our mapping research, and of much other Internet research, we undertook a systematic quantitative comparison of currently available geolocation service providers. We added depth to previous contributions by analyzing inconsistencies across databases for different geographic regions and organization types. We compared results on both country and lat-long granularities using a methodology that compares each database against the majority vote across all databases with answers for a given IP address. To compare databases at a lat-long granularity we used an 80 km threshold for two lat-longs coordinates to be in the same geographic region. We described our process for selecting this threshold, and our centroid-based algorithm for comparing database lat-long results against a majority of responses from the set of databases we evaluated. While not a foolproof methodology -- the databases could all be converging to the same wrong answers over time -- it assumes that database providers successfully work toward improving the accuracy of their databases over time. In the absence of substantial ground truth, our method offers a systematic way to study the geolocation databases to reveal insights, summarized at the end of the paper. We intend to re-run the comparison experiment using additional databases later in 2011.

Contributions

CategoryContribution
SoftwareAnalysis tools for doing the comparisons across databases (available on request)
Publication

6. AS router-level and AS-level graph

We developed an integrated visualization of topological connectivity of a single AS on the router- and the AS-levels that can be overlaid onto a world map thus elucidating the geographic coverage of the AS. This visualization is the first step toward creating a comprehensive, operationally useful view of intra- and inter- AS connectivity depicting not only the network topology (the number of neighbors, the types of connecting links, etc.), but including also geographic and economic attributes such as ownership structure, regional presence, financial indicators.

Contributions

CategoryContribution
Software-as-Serviceinteractive AS ranking, e.g. AS3356's router-level graph and AS-level graph

7. Outreach: Workshops

During the course of the contract, CAIDA hosted three annual workshops on Active Internet Measurements (AIMS), as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. The AIMS workshops are intended to advance our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies to address future data needs of the network and security research communities. For three years, the workshop has fostered interdisciplinary conversation among researchers, operators, and government, focused on analysis of goals, means, and emerging issues in active Internet measurement projects. The first workshop emphasized discussion of existing hardware and software platforms for macroscopic measurement and mapping of Internet properties, in particular those related to cybersecurity. The second workshop included more performance evaluation and data-sharing approaches. In the third workshop (in February 2011) we expanded the workshop agenda to include active measurement topics of more recent interest: broadband performance; gauging IPv6 deployment; and measurement activities in international research networks.

Contributions

CategoryContribution
Workshops
Publications
  1. "The Workshop on Active Internet Measurements (AIMS) Report", ACM SIGCOMM Computer Communications Review (CCR), Oct. 2009
  2. "AIMS-2 Workshop on Active Internet Measurements", ACM SIGCOMM Computer Communication Review (CCR), Oct. 2010.
  3. "AIMS-3 Workshop on Active Internet Measurements", ACM SIGCOMM Computer Communication Review (CCR), July. 2011.

Related Links

  Last Modified: Wed Oct-26-2011 9:55:56 PDT
  Page URL: http://www.caida.org/projects/cybersecurity/index.xml