The Internet is composed of thousands of ISPs that operate individual parts of the Internet infrastructure. ISPs engage in both formal and informal relationships to collectively and ubiquitously route traffic in the Internet. These relationships are usually realized in the form of business agreements that translate into engineering constraints on traffic flows within and across individual networks participating in the global Internet routing system.
Accurate data on the structure of actual relationships among ASes is required for many research efforts concerned with performance, robustness, and evolution of the global Internet. Examples of both research and operational tasks that cannot neglect AS relationships include:
- realistic simulations trying to model path inflation effects caused by routing policies;
- understanding how packets are routed in the Internet and how to optimize Internet paths by analyzing existing deficiencies;
- development of more scalable interdomain routing protocols and architectures, like HLP, that take into account the structure of AS relationships to optimize their performance;
- evaluating how AS relationships affect the evolution of the Internet infrastructure and constructing economy-based models of the global Internet growth;
- analysis of the spectrum of BGP configuration scenarios in order to develop more expressive routing protocols and configuration languages;
- inference of AS paths in the Internet;
- modeling the structure of routing tables and developing synthetic routing tables needed for simulations of routing table lookup algorithms;
- development of better topology generators that account for the topological idiosyncrasies associated with AS relationships;
- selection of data centers for server replicas by measuring the origin of traffic to existing servers and evaluating connectivity and AS relationships of candidate data centers; and
- selection of peers or upstream providers based on connectivity and AS relationships of candidate ISPs.
figure 1. Types of AS relationships. The ASes at the bottom of the graph, D, E, and F, are customers of those above. ISPs in the middle, B and C, are both providers of ASes below and customers of ISPs above. ISPs B and C are also peers of each other. ISP A at the top is a provider to B and C and a customer of no one.
Although business agreements between ISPs can be complicated, the original model introduced by Gao 1 abstracts business relationships into the following three most common types:
- customer-to-provider (c2p) (or if looked at from the opposite direction, provider-to-customer p2c),
- peer-to-peer (p2p), and
- sibling-to-sibling (s2s)
The justification for this classification is that an AS must buy transit services for any traffic destined to parts of the Internet that this AS neither owns nor can reach through its customers. In Figure 1, arrow directions reflect flows of money -- ASes at lower levels are customers who pay ISPs (providers) at higher levels in exchange for access to the rest of the Internet, also known as transit. We refer to links between a customer and a provider as c2p (p2c) links. In Figure 1, D->B, E->B, F->C, B->A, and C->A are c2p links.
A p2p link connects two ISPs who have agreed to exchange traffic on a quid pro quo basis. Peers should exchange traffic only between each other and each other's customers. Peering allows growing ISPs to save money on transit costs they would otherwise have to pay to deliver traffic to/from their customers. In Figure 2, B-C is a p2p link, unidirectional since neither B nor C pays the other for the traffic they exchange.
An s2s link connects two ASes with a common administrative boundary. Such links usually appear as a result of mergers and acquisitions, or under certain network management scenarios.
figure 2. The top two paths 1 and 2 are valid, while the bottom two 3 and 4 are invalid.
We use the notion of money transfers between ASes to define valid and invalid AS paths. A valid path between source and destination ASes is one in which for every ISP providing transit (a transit provider), there exists a customer immediately adjacent to the ISP in the AS path. An invalid path has at least one transit provider not paid by a neighbor in the path.
In figure 2 the top two examples are valid paths, while the bottom two are invalid. In Example 1 the transit providers are A, B, and C. ISPs B and C pay to A, D pays to B, and F pays to C. In Example 2 the transit providers are B and C, and they are paid by D and F, respectively. In contrast, in Example 3 the transit provider is B, but not only does no one pay B, but B itself pays both A and Z. Example 4 also illustrates a situation where nobody pays transit provider B.
In other words, a valid path must have the following valid path pattern: zero or more c2p links, followed by zero or one p2p link, followed by zero or more p2c links. In addition, s2s links can appear in any number anywhere in the path.
History of inference algorithms
Gao's1 pioneering work inspired many researchers to seek approaches to inferring ISP business relationships using information from publicly available BGP routing tables. Gao used the concept of valid paths as the basis for her inference heuristic and identified the "top" provider in a given path based on AS degree (the number of ASes with observed connectivity to a given AS).
Subramanian et al.2 provided a more elegant mathematical formulation based on the concept of valid paths, but they simplified the problem by not inferring s2s links. Using maximization of valid paths as an objective, they formulated the AS relationship inference problem as a combinatorial optimization problem: given an undirected graph G derived from a set of BGP paths P, assign the edge type (c2p or p2p) to every edge in G such that the total number of valid paths in P is maximized. They called the problem the type-of-relationship (ToR) problem, conjectured that it is NP-complete, and provided a heuristic solution.
Di Battista et al.3 and independently Erlebach et al.4 proved that the ToR problem is indeed NP-complete. The latter proved also that it is even harder, APX-complete. More importantly for practical purposes, both studies demonstrated that p2p links cannot be inferred in the ToR problem formulation, and they developed mathematically rigorous approximate solutions to the ToR problem but inferred only c2p and p2c links. No technique thus far reliably identifies p2p links.
Dimitropoulos, et al.6 identified still other issues with the ToR formulation, like the random breaking of ties which can yield obviously incorrect inferences, e.g., well-known large providers are inferred as customers of small ASes. In the first paper6 we handled this issue with multiobjective optimization techniques that incorporated AS degree into the inference. In a subsequent paper7 we introduced improved algorithms that determine not only c2p but also p2p links (for those we can detect from BGP data). These improvements achieved more accurate AS relationship inferences, which we demonstrate against ground truth for a set of ASes. Benjamin Hummel and Sven Kosub 8 introduced the idea that the resulting graph should be acyclic, i.e. should contain no cycles, and presented a new algorithm that does the assignment and reduces the number of cycles. Our current technique creates no cycles, but at the cost of "valid" (valley-free) paths in the graph. M. Luckie et al.9 dropped the valley-free assumption and instead relied on three assumptions about the Internet's inter-domain structure: (1) an AS enters into a provider relationship to become globally reachable; and (2) there exists a peering clique of ASes at the top of the hierarchy, and (3) there is no cycle of p2c links.
One metric of the resulting AS relationship graph that allows comparison across ASes is the customer cone -- the set of ASes, IPv4 prefixes, or IPv4 addresses that can be reached from a given AS following only customer links.
Looking specifically at the AS customer cone, we define an AS A's AS customer cone as the AS A itself plus all the ASes that can be reached from A following only p2c links in BGP paths we observed. In other words, A's customer cone contains A, plus A's customers, plus its customers' customers, and so on.
Each AS announces a set of IPv4 prefixes. Each IPv4 prefix represents a set of contiguous IPv4 addresses which are routed as a unit. Prefixes can be nested, with the most specific prefix used for routing over less specific prefixes. To find the set of prefixes which are reachable in AS A's IPv4 prefix customer cone create the union of prefixe announced by all ASes found in AS A's AS customer cone. AS A's IPv4 address customer cone is the set of addresses covered by AS A's IPv4 prefix customer cone. Prefixes overlap, which represent a set of IPv4 addresses.
The size of the customer cone of an AS reflects the number of other elements (ASes, IPv4 prefixes, or IPv4 addresses) found in it's set. An AS in the customer cone is assumed to pay, directly or indirectly, for transit, and provides a coarse metric of the size or influence of an AS in the routing system.
figure 3 depicts several AS customer cones, ASes D, E, F , and I all sit at the bottom of the hierarchy and so only have a single AS in their cone. H ranks a little bit higher with 2 ASes. Both C and B tie with 3 ASes. Note that B and C both have E in their respective cones. A is ranked at the top of the hierarchy with 6 ASes in its customer cone.
ASes with large customer cones play an important role in the Internet's capital and governance structure. At the top of this hierarchy are ISPs commonly known as Tier-1 ISPs, which do not pay for transit to upstream providers at all; instead they peer with each other to provide connectivity to all destinations in the Internet. At the bottom of the hierarchy are customer ASes who do not have their own customers and pay providers to reach all destinations in the Internet.
We define peering cone size ratio as the ratio in customer cone sizes of a pair of ASes if they (hypothetically) peered. Similar customer cone sizes will have this ratio closer to 100, also an indication the ASes have more incentive to peer. The closer this ratio is to zero, the larger the difference in customer cone sizes, and the less incentive the larger provider will have to peer with the smaller.
If a given link (AS relationship) is currently a customer link and changes to a peering link, then the provider-turned-peer's customer cone will likely shrink, because the cone will lose any customer ASes that the given AS used to access through that customer-turned-peer. The customer-turned-peer's cone size will not change since the provider-turned-peer is not included in the customer cone. To compare magnitude of differences, the peering cone size ratio always uses the larger customer cone as the denominator. For example, for AS pair S and N, with customer cone sizes C(S) and C(N), respectively , if C'(S) and C'(N) were their respective customer cone sizes if S and N became p2p peers (with other links unchanged), then the peering cone size ratio is C'(N)/C'(S) if C'(S) > C'(N), otherwise C'(S)/C'(N).
figure 4 illustrates how different relationships affect the customer cone sizes of AS A and B. If the original graph had B as a customer of A then A's cone contains 7 ASes: A,B,C,D,E,F,G. B's cone contains three ASes:B,F,G. If the link between A and B is changed to a peering link, A loses customers B and G, which it had access to exclusively through its customer relationship with B. A's cone does not lose F, since it can still reach it through its customer relationship with C. A's cone size thus shrinks to 4 ASes:A,C,D,F. Since AS B did not previously reach any customers through A, its customer cone is unaffected by this change.
ISPs maintain a set of points-of-presence (PoPs) -- locations where ISPs have routers/servers and related equipment/personnel -- across the world. The refer to the set of these PoP locations as the ISP's geographic footprint . Geographic footprint is an important part of peer selection, as an ISP can only peer with other ISPs at locations where both ISPs have a PoP.
To infer an ISP's geographic footprint, we start with the set of prefixes in the BGP tables from the Route Views project routeview2 node and RIPE NCC's rcc12 node. We then break down these prefixes into the smallest set of IP addresses which Netacuity maps to the same geographic location. We set this as the lower bound on the number of metro areas in which the ISP has presence.
Caveats and Limitations
Although we know of no more rigorous empirical analysis of macroscopic Internet topology enriched with AS relationships, we recognize that resource limitations constrain the quality of the science we can do.
- AS relationships are more complex than allowed for in our approach. The semantics of routing relationships between the same two ASes can differ by peering location or even by prefix; our model oversimplifies these cases by assigning a single relationship to each pair of ASes.
- A truly accurate picture of the Internet topology would require collection of data from every AS, while our automated ranking procedure is limited to the measurement points publicly available at Route Views.
- As in all analyses of massive datasets, our heuristics have a number of associated external parameters. We fine tune the values of these parameters based on our pre-existing (but limited) notion of the correct answer as well as experience with the algorithm that suggests auspicious ranges.
More monitor points, more probing, cross-correlative analysis in conjunction with other sources of data, and more powerful data processing techniques to support larger topology samples would improve the integrity and utility of the relationship-based AS ranking.
Inferred AS Relationships Dataset
CAIDA provides two AS Relationship datasets: serial-1 and serial-2. The 'serial-1' directory contains AS relationships inferred from BGP using a method similar to the method described in "AS Relationships, Customer Cones, and Validation". Serial-2 adds links inferred from BGP communities using the method described in "Inferring Multilateral Peering" and traceroute.
Serial-1 data is available from 1998 to present, with one file created per month. Each file contains a full AS graph derived from RouteViews and RIPE RIS BGP table snapshots taken at 24-hour intervals over a 5-day period. The AS relationships available are customer-provider (and provider-customer in the opposite direction), and peer-to-peer. See the README in the Serial-1 data directory for details of the file formats.
- Extract and clean AS paths from the BGP table snapshots.
- Infer the ASes that form a clique of transit-free networks at the top of the AS topology (Tier-1 ASes), or use a supplied list of these ASes.
- Break AS paths into triplets, and use these triplets to infer customer-provider relationships.
- Infer peer-to-peer relationships for links that the previous step did not annotate with a customer-provider relationship.
Serial-2 Data is available from October 2015 to present, with one file created per month. In addition to the links from the serial-1 graph, we add AS links inferred from BGP communities collected from IX looking glass servers collected in a single day and traceroute data collected on the same day from CAIDA's ark monitors.
To do this we first infer which AS owns each router independent of the interface addresses observed at that router. The ownership inferences are based on IP-to-AS mapping derived from public BGP data, list of peering prefixes from PeeringDB, and the previously inferred business AS relationships. Then we convert the observed IP path into an AS path using the router ownership information (rather than mapping each observed IP to AS directly) and retain the first AS link in the resulting path for the AS graph.
Links discovered in this way are assumed to be peering links, since customer provider links are normally visible in the Routeviews BGP tables.
- Collect BGP communites from IX looking glass servers.
- Infer peering links between pairs of AS which accept routes from each other.
- Collect archived BGP data from Routeviews and RIPE RIS.
- Infer peering links at points in the observed AS paths that cross an known IX.
- Collect traceroutes from ark monitors.
- Convert the IP path to AS path using inferred ownership and keep the first AS link in the path.
- Merge all newly inferred links to the serial-1 graph as peering links
For details of the algorithms used to infer AS relationships, see the following papers:
- AS Relationships: Inference and Validation
- Inferring AS Relationships: Dead End or Lively Beginning?
- Inferring Multilateral Peering
- AS Relationships, Customer Cones, and Validation
- Inferring Complex AS Relationships
- IPv6 AS Relationships, Cliques, and Congruence
For more information, email email@example.com.
Acceptable Use Agreement
Please read the terms of the CAIDA Acceptable Use Agreement (AUA) for Publicy Accessible Datasets below:
When referencing this data (as required by the AUA), please use:
The CAIDA AS Relationships Dataset, <date range used>You are required to report your publications using this dataset to CAIDA.
- Access the publicly available CAIDA AS Relationships Dataset
- AS Rank: AS relationships are used to determine the ranking of ASes based on the size of customer cones. The rankings are available through the AS Rank web application.
- The paper AS Relationships, Customer Cones, and Validation further describes the concepts covered on this page.
- In the paper Revealing the Autonomous System Taxonomy: The Machine Learning Approach, AS relationships play an important role in classification of ASes into the following broad categories: large ISPs, small ISPs, customer networks, university networks, Internet exchange points, and network information centers. An empirical AS taxonomy based on these categories is available for download at the Autonomous System Taxonomy Repository page.
- In the paper Stable and Practical AS Relationship Inference with ProbLink, a probabilistic AS relationship inference algorithm called ProbLink was explored to overcome the challenges in inferring hard links, such as non-valley-free routing, limited visibility, and non-conventional peering practices.
- L. Gao, On Inferring Autonomous System Relationships in the Internet, IEEE/ACM Transactions on Networking, December 2001.
- L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, Characterizing the Internet Hierarchy from Multiple Vantage Points, IEEE INFOCOM, 2002.
- G. Di Battista, M. Patrignani, and M. Pizzonia, Computing the Types of the Relationships between Autonomous Systems, IEEE INFOCOM, 2003.
- T. Erlebach, A. Hall, and T. Schank, Classifying Customer-Provider Relationships in the Internet, Proceedings of the IASTED International Conference on Communications and Computer Networks (CCN), 2002.
- J. Xia and L. Gao, On the Evaluation of AS Relationship Inferences, IEEE Globecom, 2004.
- X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, and G. Riley, Inferring AS Relationships: Dead End or Lively Beginning, International Workshop on Efficient and Experimental Algorithms (WEA), 2005.
- X. Dimitropoulos, D. Krioukov, M. Fomenkov, B. Huffaker, Y. Hyun, kc claffy, and G. Riley, AS Relationships: Inference and Validation, ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp.29-40, 2007.
- B. Hummel and S. Kosub Acyclic Type-of-Relationship Problems on the Internet: An Experimental Analysis , In Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC'2007), pages 221-226. ACM Press, New York, NY, 2007.
- M. Luckie, B. Huffaker, k. claffy, A. Dhamdhere, and V. Giotsas, AS Relationships, Customer Cones, and Validation , in Internet Measurement Conference (IMC), Oct 2013, pp. 243--256.