These recommendations derive from our previous integration of strategic measurement and analysis capabilities that has enabled us to provide annotated topology maps of observable Internet infrastructure, as well as a powerful measurement platform capable of performing various types of Internet infrastructure assessments to DHS S&T in response to BAA07-09. We recommend further research and development of new technologies for mapping router-level and AS-level topologies that will increase their completeness, increase their accuracy, and enrich the annotations provided. We recommend implementing such techniques on distributed network research measurement infrastructure, as well as to create a new on-demand measurement functionality to address a current gap in the U.S. governments visibility into critical cyberinfrastructure. Ultimately, we recommend integration of the developed technology into a common platform to deliver richer cybersecurity-relevant knowledge to DHS than existing data sources have thus far provided. The resulting technologies and data would improve our ability to identify critical Internet resources, enhance Internet monitoring and modeling capabilities, and support the development of secure routing protocols.
Topology maps are an important tool for those who wish to describe, analyze, and model the Internets dynamic behavior and evolution . Several different topological layers (or granularities) are relevant to understanding the Internet as critical infrastructure, e.g., fiber, IP address, router, Points-of-Presence (PoPs), ISP (AS). Router-level and PoP-level topology maps can powerfully inform and calibrate assessments of Internet infrastructure vulnerabilities [1, 3, 4, 10]. ISP-level topologies, sometimes called interdomain routing topologies, are critical to a deeper understanding of technical, economic, policy, and security needs of the largely unregulated peering ecosystem. Regardless of which layer of topology one seeks to map, epistemological obstacles pervade the state-of-the-art methodologies.
For example, underpinning most research into the Internet's router-level topology are data sets collected using traceroute-based algorithms. Traceroute shows the sequence of router interfaces on the path from the source to the destination, and executing traceroute from multiple sources to multiple destinations reveals many router interfaces and links, although it is possible to infer false links from this data . A critical step in creating accurate maps from traceroute data is mapping IP addresses to routers, a process known as alias resolution. A router by definition has at least two interfaces, with Internet core routers often having dozens. Alias resolution identifies which interface IP addresses belong to the same routers, which is required to convert the IP-level topology discovered by traceroute to a more useful router-level topology.
CAIDA has been measuring, analyzing, modeling, and visualizing Internet topology since 1998, and our recommendations reflect this experience. As part of previous work (BAA07-09), we integrated state-of-the-art strategic capabilities to acquire preliminary situational awareness of Internet topology structure and behavior: topology measurement; second generation IP alias resolution techniques and other heuristic methods to convert to IP/router and AS-level topologies; and limited annotation and visualization capability . We have also architected a data mining and analysis process for collection, curation, correlation and statistical processing of raw data on connectivity and routing gathered from a large cross-section of the global Internet, to derive a comprehensive Internet Topology Data Kit (ITDK). Raw data sources include forward IP paths collected from traceroute-like measurement systems, BGP and geolocation information from a variety of sources, and DNS hostname information gathered in parallel with topology probing. Intermediate processing involves IP address alias resolution, geolocation of routers, extraction of AS paths from BGP data, inferences of AS relationships, assignment of ASes to individual routers, and construction of an AS-level topology on top of the router-level topology to produce a dual topology. A large suite of software tools supports the collection and analysis processes.
The DHS-S&T-funded Ark platform has improved cybersecurity-related situational awareness of the Internet through macroscopic active measurements, including providing the most comprehensive and coherent pictures of Internet topologies available to date, both at the AS- and router-level. The unique characteristics of the Ark platform place it in a special niche, capable of supporting a range of scientific uses from sourcing sophisticated measurements from dedicated machines, to targeting simple measurements at the Ark nodes from a diverse set of topological sources. Ark enables us to perform more complete alias resolution [1, 3] and annotate routers with AS ownership  and derive a more detailed and validated topological view than has ever previously been available for analysis. We have shared raw and curated forms of our resulting data with the research community at unprecedented scale to enable research reproducibility and correlation with other data sources. Annual workshops have enabled us to get feedback from the community and explore cooperative solutions and coordinated strategies to address future data needs of the network and security research communities [7, 8, 9].
We recommend further research and development of new technologies for mapping router-level and AS-level topologies that will increase their completeness, increase their accuracy, and enrich the annotations provided. We recommend implementing such techniques on distributed network research measurement infrastructure, as well as to create a new on-demand measurement functionality to address a current gap in the U.S. governments visibility into critical cyberinfrastructure. Ultimately, we recommend integration of the developed technology into a common platform to deliver richer cybersecurity-relevant knowledge to DHS than existing data sources have thus far provided.
We suggest several approaches to increasing the completeness of data representing the Internet core: installing new monitoring infrastructure in underserved regions; integrating new techniques to improve the efficiency and coverage of IP-level topology probing; and analyzing and correlating forward path (traceroute) data with other types of recently available reachability data to augment undersampled portions of the IP topology graph.
Increasing the accuracy of these graphs relies on a number of data sources not previously available. We recommend use of the growing set of ground truth data to improve the current algorithms for identifying peering links, and we also recommend investigation and mitigation of the impact of false link inferences on router/PoP-level and AS-level graphs . We recommend further design and development of a user-friendly interactive validation functionality for PoP-level map inferences, to lower the barrier to providing ground truth information. We would use IP address lists from recently available Internet-scale data sets as additional input for the alias resolution process.
We also recommend additional annotations on topology maps with meta-data of direct relevance to DHS's situational awareness mission, e.g., AS type, owning organization, and peering link type. We suggest research in how to extract anomalies from measurements such as the current Ark topology and performance reporting now supported for individual monitors, e.g., performance changes to regions of interest around the world as observed across monitors in different countries. We also recommend investigaton of how to integrate additional forward path (traceroute-inferred) information into topology graphs, e.g., latency and path stability, which prefixes are announced by a given AS.
We also recommend development of: a series of improved Internet Topology Data Kits; topology visualization software to reveal insight into ownership structure, business relationships, geographic coverage, and financial indicators of ISPs; support for interactive queries regarding observable performance changes and trends from, to, and across specific regions of the world; and support for interactive user corrections of AS meta-data, such as AS category (e.g., backbone, content provider, exchange point), geolocation, and organization ownership. Geolocation requires an understanding of which external databases and possibly augmenting algorithms are the most appropriate for a needed level of precision .
We also suggest creation of a new on-demand capability to address gaps in the U.S. government's current visibility into critical cyberinfrastructure, in particular its limited ability to observe macroscopic reachability changes, such as the unreachability of an entire geographic region. This will require a user-friendly graphical user interface whereby a DHS-appointed official could request to view existing measurement results, such as "Show me all connectivity statistics from all monitors to all addresses that geolocate to Egypt, Libya, and Algeria." This new functionality would allow the user to select probing destinations by country, AS, BGP prefix, or organization, using previously developed research to map ASes to owning organizations.
There are several scientific, technical, and engineering issues that need to be addressed and resolved for the proposed recommendations to succeed. First, although we have tremendously improved the accuracy of the best alias resolution methods on topologies with a few million IP addresses [1, 3], extending these methods with fundamentally new technological capabilities will further improve the accuracy as well as the scalability of our process. Second, mapping IP addresses to ASes and then organizations to derive an ISP-level map is yet another challenging process. Since publicly available BGP repositories are biased toward revealing primary transit paths, other data is required to capture the complete interdomain connectivity of ASes, including settlement-free or partially paid peering links, backup links, and knowledge of exchange points. We recommend leveraging other data sources and relationships with cooperating ISPs to better understand these relationships.
Finally, acquisition of ground truth data for validation is widely recognized as the greatest obstacle to Internet topology research. The current instantiation of our AS Rank service has now gathered an unprecedented quantity of AS-level ground truth volunteered by representatives of ASes via our interactive interface.
[Disclosure: CAIDA is proposing a form of the above recommendations to DHS S&T's new Cybersecurity BAA 11-02 TTA-07.]
- Ken Keys, "Internet-Scale IP Alias Resolution Techniques" published in ACM SIGCOMM Computer Communication Review (CCR), January 2010. Vol 40, no. 1, pp. 50-55.
- Bradley Huffaker, Marina Fomenkov, kc claffy, "Geocompare: a comparison of public and commercial geolocation databases - Technical Report", published in May 2011. Presented at the Network Mapping and Measurement Conference (NMMC) in May 2011.
- Ken Keys, Young Hyun, Matthew Luckie, kc claffy, "Internet-Scale IPv4 Alias Resolution with MIDAR: System Architecture - Technical Report", published in May 2011.
- Bradley Huffaker, Amogh Dhamdhere, Marina Fomenkov, kc claffy, "Toward Topology Dualism: Improving the Accuracy of AS Annotations for Routers", from the proceedings of the Passive and Active Measurement Conference (PAM), April 2010. Published in "Lecture Notes in Computer Science" April 2010, vol 6032, pp. 101-110.
- Matthew Luckie, Amogh Dhamdhere, kc claffy, David Murrell, "Measured Impact of Crooked Traceroute", published in CCR online, January 2011
- kc claffy, Young Hyun, Ken Keys, Marina Fomenkov, Dmitri Krioukov, "Internet Mapping: from Art to Science", from the proceedings of the IEEE DHS Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH) Conference in March, 2009, pp. 205-211.
- kc claffy, Marina Fomenkov, Ethan Katz-Bassett, Robert Beverly, Beverly A. Cox, Matthew Luckie, "The Workshop on Active Internet Measurements (AIMS) Report", appeared as an editorial in CCR Online, and in the ACM SIGCOMM Computer Communication Review (CCR), October 2009. Volume 39, no. 5, pp. 32-36.
- kc claffy, Emile Aben, Jordan Augé Robert Beverly, Fabian Bustamante, Benoit Donnet, Timur Friedman, Marina Fomenkov, Peter Haga, Matthew Luckie, Yuval Shavitt, "The ISMA 2010 AIMS-2 - Workshop on Active Internet Measurements Report", published in ACM SIGCOMM Computer Communication Review Online (CCR Online) in October 2010.
- kc claffy, "The ISMA 2011 3rd Workshop on Active Internet Measurements (AIMS-3) Report", February 2011.
- Pascal Mérindol, Benoit Donnet, Jean-Jacques Pansiot, Matthew Luckie, Young Hyun, "MERLIN: MEasure the Router Level of the INternet", Technical Report 2010-3, September 2010.
- Srinivas Shakkottai, Marina Fomenkov, Ryan Koga, Dmitri Krioukov, kc claffy, "Evolution of the Internet AS-Level Ecosystem", published in the European Physical Journal B, vol. 74, no. 2, March 2010, pp. 271-278. First presented at The First International Conference on Complex Sciences: Theory and Applications (Complex'2009).