2 Performance Goals
3 Detailed Technical Approach
3.1 Archipelago (step 1a)
3.2 Alias resolution (step 1b)
3.3 AS mapping (step 1c)
3.4 Dual AS+router-level topology construction (step 1d)
3.5 Topology analysis software (step 1e)
3.6 Annotations and AS relationships (step 2a)
3.7 Geographic locations and latencies (step 2b and 2c
3.9 Visualizations, step 2d
(see PDF for full proposal)
1 ProposalThe Regents of the University of California; University of California, San Diego on the behalf of San Diego Supercomputer Center under the program Cooperative Association for Internet Data Analysis (CAIDA) offer the technical proposal that has as its main deliverable, periodic updates for router- and AS-level Internet topologies integrated into the dual-layer router+AS-level topologies, and richly annotated with AS business relationships, geographic, latency, etc., attributes. To achieve this main task, the project will also deliver a new Internet topology data acquisition infrastructure and Internet topology data processing, analysis, annotation, and generations software. We describe the details of our technical approach in Section 3.
2 Performance GoalsThe proposed work directly targets the goals and deliverables outlined in the BAA, in particular in TTA5: Internet Tomography/Topology. The resulting technologies and data will increase our ability to understand the structure, dynamics, and vulnerabilities of Internet topology that includes U.S. critical infrastructure.
3 Detailed Technical ApproachOur main deliverable will be periodic updates of richly annotated dual router+AS-level topologies of Internet infrastructure. To achieve this goal we will use a tightly integrated methodology consisting of the following building blocks, described in detail in the following section.
Data acquisition and analysis.
- scamper data collection (IP-level) on archipelago. Use our traceroute-based scamper measurement tool on archipelago, our new active measurement platform, to continuously collect raw traceroute data.
- Alias resolution (IP®router-level). Resolve IP interface addresses observed in scamper topology measurements to common routers using state-of-the-art alias resolution techniques. Provide an evaluation of the relative accuracy of available techniques.
- AS mapping (IP®AS-level). Map IP-links (pairs of observed adjacent IP interfaces) to AS-links (pairs of BGP-peering ASes) via BGP tables using the CAIDA-developed ASFinder tool.
- AS+router-level merge. Merge the router-level topologies from step 1b and AS-level topologies from step 1c into integrated dual-layer AS+router-level topologies using our powerful dK-series framework.
- Topology characteristics. Release software that analyzes the most important and definitive topology characteristics, including those indicative of vulnerability to various forms of attacks.
- IPv6 measurements prototype. Prototype IPv6 topology data collection using scamper and archipelago
Topology annotations and visualizations.
- AS relationships and taxonomy. Annotate the AS-level topologies obtained at step 1c with AS business relationships and AS types.
- Geolocation. Annotate merged router+AS-level topologies with node geolocations, using the best available techniques and our dK-series method.
- Latencies (optional). Annotate merged router+AS-level topologies with link latencies, using the best available traceroute data and the dK-series method.
- Visualizations. Develop new techniques and adapt existing ones, e.g., Walrus, to visualize our richly annotated topologies.
- Topology comparisons. Using the set of important metrics defined in step 1e, compare the archipelago-, skitter-, and DIMES-extracted topologies and relate the differences to specifics of the associated datasets.
- AS-ranking++ Improve and enrich our AS-ranking suite, as-rank.caida.org
- Topology generator. Release a topology generator producing annotated AS+router-level topologies for analyses and "what-if" experiments.
- Telco hotel data integration. If the telco hotel data mentioned in the BAA is made available, integrate the knowledge into the AS-level map.
- iffinder. The CAIDA's iffinder tool implements one of the first IP alias resolutions techniques introduced in the Mercator project . The tool sends UDP probe packets to all or a subset of IP addresses seen in the traces with destination UDP ports set to presumably unused values. If router R receives such a packet from prober P destined to one of R's IP interfaces, X, while R's route back to P goes via some other of R's IP interfaces, Y, then R is supposed to reply to P with an ICMP Port Unreachable message with its source address set to Y. Prober P can thus conclude that X and Y belong to the same router.
The ally tool is a part of the Rocketfuel tool. It uses a combination of techniques, all providing some level of confidence that two IP addresses are configured on the same router. Ally resolution techniques include:
- checking source addresses of responses to probe packets as described above;
- checking if IP identifier fields in pairs of response packets are approximately consecutive;
- utilizing the ICMP rate limiting function, i.e., the feature that the router responds only to the first probe of a burst of probe packets.
- checking for differences in TTL values: significantly different TTL values indicate that a pair of IP addresses are not aliases.
- AAR and APAR. More recent are the the Analytical and Probe-based Alias Resolvers (AAR and APAR). The idea behind both techniques is to recognize the structure of the set of IP addresses observed in traces against common IP address assignment schemes. For example, IP addresses configured on point-to-point interfaces often belong to either /30 or /31 subnets. Given this observation, we can check for the boundary IP addresses in such /30 and /31 subnets in the original and reverse traces, thus inferring which IP addresses are likely to be configured on the same router. For example, if the direct trace is two IP addresses X,Y, while the reverse trace is Y¢,X¢, and both pairs (X,X¢) and (Y,Y¢) belong to the same /30 or /31 subnets, then we can conclude that X and Y¢ are configured on the same router. The authors claim that this approach is more accurate, efficient, and simpler than all other existing techniques.
- map3: router ID ® a set of AS numbers announcing IP addresses that are part of this router ID.
- use the dK-series extended for dual graph (see Section 3.8 for more details);
- filter out all AS links not present in the AS-level graph;
- filter out AS links violating routing polices imposed by inferred AS relationships;
- assign routers to ASes maximizing the number of valid paths according to inferred AS relationships;
- assign routers to ASes based on AS sizes estimated by AS degree or by the total number of observed IP addresses advertised by this AS;
- assign routers to ASes based on the number of IP addresses attached to the same router and advertised by the same AS;
- combinations of the above.
- Spectrum. The smaller the number of alternative paths between a pair of communicating nodes in the network, the easier it is for an attacker to sever all these paths to disrupt the communication. The average path diversity can be estimated as the minimum number of links one needs to cut to decompose the network into isolated pieces of approximately equal size. The spectrum of the topology graph representing a network, i.e., a collection of the eigenvalues of its adjacency matrix, provides tight estimates for this min-cut number.
- Betweenness. The number of shortest paths passing through a particular node or link, called betweenness, is a measure of its communication importance. The higher the betweenness of a node or link, the more communication paths will be disrupted and/or re-routed if an attacker interrupts the normal operation of this node or link.
- Assortativity. All other things equal, networks with more links connecting high-degree nodes, i.e., nodes directly connected to many other nodes, to low-degree nodes are more vulnerable than networks with links interconnecting nodes of similar degrees. The former are called disassortative, while the latter are assortative. The reason that that the min-cut of disassortative network is smaller is that disassortative networks have fewer links in the network core than assortative networks do. Because links in the core interconnect high-degree network communication hubs, severance of such links decomposes a network into disconnected islands. The smaller the number of such links, the easier it is for an attacker to disrupt communications of the global network.
- customer-to-provider (c2p), connecting customer and provider ASes;
- peer-to-peer (p2p), connecting two peer ASes; and
- sibling-to-sibling (s2s), connecting two sibling ASes.
- exporting routes to a provider or a peer, an AS advertises its local routes and routes received from its customer ASes only;
- exporting routes to a customer or a sibling, an AS advertises all its routes, i.e., its local routes and routes received from all its AS neighbors.
- uphill zero or more c2p links, followed by
- top zero or one p2p links, followed by
- downhill zero or more p2c links,
- shortest paths between nodes are shorter than in reality;
- path diversity is larger than in reality;
- traffic load on nodes and links is lower than in reality.
3.8 dK-seriesThroughout the proposed work we will use the powerful dK-series framework that we introduced in P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat, Systematic topology analysis and generation using degree correlations, in SIGCOMM, 2006. The dK-series formalism provides a calculus for analysis of correlations within a network topology. In its simplest form, dK-series are simply correlations among node degrees. More formally, the dK-series for a given network topology is a collection of dK-distributions defined as correlations of degrees of nodes within subnetworks of size d of the given network. Our most recent work with dK-series extends the formalism to apply to correlations among any kind of node or link annotations, with node degrees being the simplest case of node annotations. The dK-series formalism is elegantly simple and generic in nature. It provides a rigorous analytical framework and unprecedented power to create, analyze, and compare annotated dual AS+router-level topologies. Specifically, we consider the following three concrete examples of how we will use dK-series at different stages of the project:
- Topology analysis (step 1e). As discussed in Section 3.5 and in our previous work, structural topology characteristics such as spectrum, betweenness, distance distribution, assortativity, clustering, are related to the robustness of the topology, defined in terms of minimal number of links or nodes that must be removed in order to fragment a network. Our dK-series approach introduces a basis that unifies all these metrics-as well as any new metrics-into a single series of metrics. The dK-series method allows for evaluation of the robustness of a given topology without having to compute all these metrics - good news since computation of some of them can be prohibitively hard. Instead, we can limit computations to appropriate dK-distributions and as soon as they match the dK-distribution of some point-of-reference topology whose robustness properties are already known, we can rely on the dK-randomness theory to conclude that robustness properties of the given topology are approximately the same.
AS+router-level quasi-duality (step 1d).
The input to step 1d is scamper (traceroute) data processed at steps 1b and 1c to derive router- and AS-level topologies. We thus have information on:
- number n(k) of routers of degree k;
- number n(k,k¢) of router-links connecting routers of degrees k and k¢;
- number N(K) of ASes of degree K;
- number N(K,K¢) of AS-links connecting ASes of degrees K and K¢;
- number N(K,K¢;k) of AS-links connecting ASes of degrees K and K¢, and going through a router of degree k;
- number n(k,k¢;K) of router-links connecting routers of degrees k and k¢, and addressed from IP address space advertised by an AS of degree K;
- Geolocations and latencies (step 2b and 2c). One type of correlation in topologies annotated with geolocations and latencies is between geographic distance and inferred latencies of links. Drastic anti-correlation between two statistics indicates problems with either geolocation mappings, or latency inferences, or both. We can use this type of correlation as a cross-validation tool to verify our inferences (cf. Section 3.7).