CAIDA proposed to apply a decade of experience in Internet topology measurement, analysis, modeling, and visualization to DHS' immediate cybersecurity needs to understand and protect essential U.S. information infrastructure.
After a three-year contract, we have developed and implement new measurement and data collection technologies and infrastructure to improve DHS' situational awareness and understanding of the structure, dynamics and vulnerabilities of the physical and logical topologies of the global Internet.
In our approach we integrated the following six strategic measurement and analysis capabilities to improve DHS' situational awareness of Internet topology structure and behavior:
- a new architecture to support Internet topology measurement;
- application of IP alias resolution techniques; (for deriving topologies at both router and service provider granularity from the IP path measurements);
- conversion of IP/router to AS-level topology graphs;
- AS taxonomy and relationship inference;
- geolocation of IP resources;
- and (interactive) visualization of large annotated graphs.
The objective of these applied research, development, and deployment efforts was to develop the capability to regularly provide richly annotated topology maps of observable Internet infrastructure, as well as a powerful measurement platform capable of performing other types of Internet infrastructure measurement experiments as needed. Our approach has uniquely been able to provide benefits in two areas: (1) improving critical national capabilities in understanding the structure and evolution of our communications infrastructure, based on innovative measurement, analysis and inference techniques; and (2) fill a recognized need in network science: allowing researchers to execute empirical research in cybersecurity without having to build and maintain the requisite global measurement infrastructure.
The main results and achievements are:
1. Ark
We have created a powerful and versatile distributed measurement infrastructure Archipelago (Ark). Archipelago makes use of measurement nodes located in various networks worldwide and connected via the Internet to a central server located at CAIDA. Co-funded by DHS and NSF, we have grown Ark from 13 monitors in December 2007 to 54 monitors as of July 2011, deployed in 30 countries on 6 continents. 26 monitors are enabled to conduct both IPv4 and IPv6 measurements. Ark has pioneered new features and functionality of distributed measurement infrastructure, including flexible and efficient measurement and data collection methods. It has proven successful at providing both benefits described above: unprecedented intelligence regarding macroscopic Internet connectivity, and support for several third-party cybersecurity-related global Internet measurement experiments.
Ark is now continuously gathering the largest set of IPv4 and IPv6 topology data made available to academic researchers and government agencies. For IPv4 topology, Ark monitors continuously measure IP-level paths to a dynamically generated list of IP addresses covering all /24 prefixes (about 9.4 million as of May 2011) in routed IPv4 address space. Measurement parallelization allows us to cycle through probing each routed /24 prefix in about two days. Over the lifetime of Ark, we have collected more than 4 billion traceroutes (1.6 TB of data). IPv6-capable monitors conduct continuous probing of BGP-announced IPv6 prefixes (/48 or shorter, nearly 4,000 prefixes as of December 2010). Each Ark monitor probes a single random destination in each prefix; a full probing cycle takes 48 hours.
We also field-tested Ark several times in support of global measurement experiments that resulted in several conference and journal publications (listed below). These experiments included investigation of relative topological coverage of different forward path probing methods (IMC'08), evaluating the efficacy of deployed Internet source address validation filtering (IMC'09), measuring the impact of certain causes of missing hops in traceroute paths (CCR'11), and a comparison of public and commercial geolocation databases (NMMC'11). This range of scientific experiments has successfully demonstrated our vision of a metaphorical distributed measurement "operating system" to support empirical Internet science.
Based on our experience with Ark, CAIDA has also made recommendations for designing and operating the next generation of Internet topology measurement platforms.
Contributions
| Category | Contribution |
| Infrastructure | Ark |
| Service |
|
| Software | Distributed measurement-platform "operating system", components available now or to be released by end of contract:
|
| Data |
Ongoing data collections
|
| Statistical information per monitor |
Ark statistics page includes both statistics aggregated across all monitors
(e.g, distribution of IP-hop path lengths, AS-hop path lengths,
and RTTs across all monitors, data that has notably been
relevant for S-BGP conversations regarding average AS path
lengths actually taken by packets (not available via BGP data))
as well as distributions of data from each monitor:
|
| Published experiments using Ark (Another 107 articles in Google Scholar cite or use data from Ark, as of 02 September 2011.) |
|
2. Mapping IP Addresses to Routers
We have tested previously existing small-scale experimental methods for mapping IP addresses to routers and re-designed and re-implemented them into more robust and scalable alias resolution techniques. Our newly developed tools, kapar and MIDAR, are working on Internet-scale topologies (millions of addresses) identifying addresses belonging to the same routers with greater precision and sensitivity than was previously achievable.
Contributions
| Category | Contribution |
| Software |
|
| Publications |
|
3. Internet Topology Data Kit (ITDK)
In 2010, we applied our state-of-the-art measurement and analysis techniques to collect, analyze, process and release three Internet Topology Data Kit (ITDK) data sets. Each ITDK starts with two weeks of traceroute data probing IPv4 addresses and includes: two versions of router-level topologies (derived using different combinations of alias resolution methods); router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. ITDKs are work in progress, we continue to refine data analysis, inferences, and annotation methods, and will be augmenting ITDKs with additional annotations.
Contributions
| Category | Contribution |
| Data | Internet Topology Data Kit (ITDK)
(three in April 2010, will be two in 2011);
each ITDK has two-router level topologies:
one optimized for accuracy,
the other for completeness.
data files: routers, links, router-to-AS mappings, DNS, AS relationships, geolocation. |
4. AS Rank
Using routing data obtained from the Route Views Project and RIPE NCC, in conjunction with multiple data analysis methodologies, we have developed a procedure to rank Autonomous Systems (AS Rank) based on our inferred economics of AS business relationships observable in the global routing tables. Position of each AS in the Internet hierarchy is determined as a function of the number of IP prefixes advertised by this AS, its customer ASes, their customers ASes, and so on. We implemented our AS ranking algorithm as an interactive web page which allows users to provide a feedback correcting false relationship inferences. AS-ranking is also work in progress, we are improving our ranking algorithms taking into account user suggestions and expanding the range of data used for analysis.
Contributions
| Category | Contribution |
| Software-as-Service | interactive AS ranking |
| Data | resulting files from above |
5. Geolocation
Since geolocation of IP resources is such an essential component of our mapping research, and of much other Internet research, we undertook a systematic quantitative comparison of currently available geolocation service providers. We added depth to previous contributions by analyzing inconsistencies across databases for different geographic regions and organization types. We compared results on both country and lat-long granularities using a methodology that compares each database against the majority vote across all databases with answers for a given IP address. To compare databases at a lat-long granularity we used an 80 km threshold for two lat-longs coordinates to be in the same geographic region. We described our process for selecting this threshold, and our centroid-based algorithm for comparing database lat-long results against a majority of responses from the set of databases we evaluated. While not a foolproof methodology -- the databases could all be converging to the same wrong answers over time -- it assumes that database providers successfully work toward improving the accuracy of their databases over time. In the absence of substantial ground truth, our method offers a systematic way to study the geolocation databases to reveal insights, summarized at the end of the paper. We intend to re-run the comparison experiment using additional databases later in 2011.
Contributions
| Category | Contribution |
| Software | Analysis tools for doing the comparisons across databases (available on request) |
| Publication |
|
6. AS router-level and AS-level graph
We developed an integrated visualization of topological connectivity of a single AS on the router- and the AS-levels that can be overlaid onto a world map thus elucidating the geographic coverage of the AS. This visualization is the first step toward creating a comprehensive, operationally useful view of intra- and inter- AS connectivity depicting not only the network topology (the number of neighbors, the types of connecting links, etc.), but including also geographic and economic attributes such as ownership structure, regional presence, financial indicators.
Contributions
| Category | Contribution |
| Software-as-Service | interactive AS ranking, e.g. AS3356's router-level graph and AS-level graph |
7. Outreach: Workshops
During the course of the contract, CAIDA hosted three annual workshops on Active Internet Measurements (AIMS), as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. The AIMS workshops are intended to advance our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies to address future data needs of the network and security research communities. For three years, the workshop has fostered interdisciplinary conversation among researchers, operators, and government, focused on analysis of goals, means, and emerging issues in active Internet measurement projects. The first workshop emphasized discussion of existing hardware and software platforms for macroscopic measurement and mapping of Internet properties, in particular those related to cybersecurity. The second workshop included more performance evaluation and data-sharing approaches. In the third workshop (in February 2011) we expanded the workshop agenda to include active measurement topics of more recent interest: broadband performance; gauging IPv6 deployment; and measurement activities in international research networks.
Contributions
| Category | Contribution |
| Workshops | |
| Publications |
|
![[CAIDA - Cooperative Association for Internet Data Analysis logo]](/images/caida_globe_faded.png)