The Internet is composed of thousands of ISPs that operate individual parts of the Internet infrastructure. ISPs engage in both formal and informal relationships to collectively and ubiquitously route traffic in the Internet. These relationships are usually realized in the form of business agreements that translate into engineering constraints on traffic flows within and across individual networks participating in the global Internet routing system.
Accurate data on the structure of actual relationships among ASes is required for many research efforts concerned with performance, robustness, and evolution of the global Internet. Examples of both research and operational tasks that cannot neglect AS relationships include:
- realistic simulations trying to model path inflation effects caused by routing policies;
- understanding how packets are routed in the Internet and how to optimize Internet paths by analyzing existing deficiencies;
- development of more scalable interdomain routing protocols and architectures, like HLP, that take into account the structure of AS relationships to optimize their performance;
- evaluating how AS relationships affect the evolution of the Internet infrastructure and constructing economy-based models of the global Internet growth;
- analysis of the spectrum of BGP configuration scenarios in order to develop more expressive routing protocols and configuration languages;
- inference of AS paths in the Internet;
- modeling the structure of routing tables and developing synthetic routing tables needed for simulations of routing table lookup algorithms;
- development of better topology generators that account for the topological idiosyncrasies associated with AS relationships;
- selection of data centers for server replicas by measuring the origin of traffic to existing servers and evaluating connectivity and AS relationships of candidate data centers; and
- selection of peers or upstream providers based on connectivity and AS relationships of candidate ISPs.
Although business agreements between ISPs can be complicated, the model introduced by Gao [GAO] abstracts business relationships into the following three most common types: customer-to-provider (c2p) (or if looked at from the opposite direction, provider-to-customer p2c), peer-to-peer (p2p), and sibling-to-sibling (s2s).
The justification for this classification is that an AS must buy transit services for any traffic destined to parts of the Internet that this AS neither owns nor can reach through its customers. In Figure 1, where arrow directions reflect flows of money, ASes at lower levels pay ISPs at higher levels in exchange for access to the rest of the Internet. The customer ISP pays the provider ISP for transit. Links between a customer and a provider are c2p (p2c) links. In Figure 1, examples of c2p links are D->B, E->B, F->C, B->A, and C->A.
A p2p link connects two ISPs who have agreed to exchange traffic on a quid pro quo basis. Peers exchange traffic only between each other and each other's customers. This relationship allows peering ISPs to save money on transit costs they would otherwise have to pay to their transit providers for such traffic. In Figure 1, B-C is a p2p link. Note the lack of direction of the link, indicating that neither B nor C is paying each other for the traffic they exchange.
An s2s link connects two ASes administratively belonging to the same ISP. Such links usually appear as a result of mergers and acquisitions, or under certain network management scenarios.
We also use the notion of money transfers between ASes to define valid and invalid AS paths. A valid path between source and destination ASes is one in which for every ISP providing transit (a transit provider), there is a payee. The payee of the transit provider must be its immediate neighbor in the path. An invalid path is one in which there is at least one transit provider not paid by a neighbor in the path.
In Figure 2 the top two examples are valid paths, while the bottom two are invalid. In Example 1 the transit providers are A, B, and C. ISPs B and C pay to A, D pays to B, and F pays to C. In Example 2 the transit providers are B and C, and they are paid by D and F respectively. In contrast, in Example 3 the transit provider is B, but not only does no one pay B, but B itself pays both A and Z. Example 4 also illustrates a situation where nobody pays transit provider B.
We conclude that a valid path must have the following valid path pattern: zero or more c2p links, followed by zero or one p2p link, followed by zero or more p2c links. In addition, s2s links can appear in any number anywhere in the path.
History of inference algorithms
Service providers consider the policy details of their business relationships as proprietary information and do not generally make them public. Therefore, Internet researchers have to rely upon indirect AS relationship inference algorithms in order to build a picture of Internet business structure.
Gao's pioneering work [GAO] inspired many researchers to seek approaches to inferring ISP business relationships using information from publicly available BGP routing tables. Gao used the concept of valid paths as the basis for her inference heuristic and identified the top provider in a given path based on AS degree (the number of ASes connected to a given AS).
Subramanian et al. [SARK] slightly relaxed the problem by not inferring s2s links, and provided a more elegant mathematical formulation based on the concept of valid paths. Assuming maximization of the number of valid paths as a natural objective, they formulated the AS relationship inference problem as a combinatorial optimization problem: given an undirected graph G derived from a set of BGP paths P, assign the edge type (c2p or p2p) to every edge in G such that the total number of valid paths in P is maximized. [SARK] called the problem the type-of-relationship (ToR) problem, conjectured that it is NP-complete, and provided a heuristic solution.
Di Battista et al. [DPP] and independently Erlebach et al. [EHS] proved that the ToR problem is indeed NP-complete. EHS proved also that it is even harder, APX-complete. More importantly for practical purposes, both DPP and EHS demonstrated that p2p links cannot be inferred in the ToR problem formulation and developed mathematically rigorous approximate solutions to the ToR problem but inferred only c2p and p2c links. We note that neither [SARK] nor [GAO] technique offers a solution to the problem of reliable identification of p2p links due to their low accuracy as demonstrated by Xia et al. [XG].
In addition to its inability to infer p2p links, there are other issues with the ToR formulation that we identified in [DKH1]. In particular, for some links either relationship (c2p or p2c) results in the same number of invalid paths. As a result, ToR labels such links randomly, classifying them as c2p or p2c with 50%-50% probability. In some cases this approach leads to obviously incorrect inferences, e.g., well-known large providers are inferred as customers of small ASes. In [DKH1] we resolved this issue by using multiobjective optimization techniques incorporating both the notion of valid paths and AS importance as reflected in AS degree. In [DKH2] we introduce new improved algorithms to determine not only c2p but also p2p links. These improvements achieve high levels of accuracy of AS relationship inference as we demonstrate via direct validation with network administrators of a set of ASes.
One useful view of the resulting relationship graph is to examine the customer cone -- the set of ASes that can be reached from each AS following only its customer links. The size of the customer cone of an AS reflects the number of ASes that pay, directly or indirectly for transit, and provides a better metric of the size of an AS than its degree. A naive method to maximize the number of valid paths creates Strongly Connected Components (SCC)s in the graph [HK]. By definition, an AS i can reach every other AS j in the SCC by following customer links, and hence it can also reach every AS in the customer cone of AS j. In all cases we examined this SCC eventually contains all Tier 1 ASes, making it impossible to differentiate between them. We are currently working on a new method which prevents the creation of an SCC, albeit at the cost of reducing the number of valid paths from 99.6% to around 97%.
We implemented the following methodology of the relationship-based AS ranking:
- Build an AS-level graph of the Internet from publicly available BGP table data and annotate links in this graph with inferred AS relationships.
- Define the AS customer cone of an AS A as the AS A itself plus all the ASes that can be reached from A following only p2c and s2s links (but not c2p or p2p!). In other words, A's customer cone is A, plus A's customers, plus its customers' customers, and so on.
- Rank ASes by the following three customer cone size metrics: the number of ASes in the cone, the number of unique prefixes advertised by these ASes, and the number of /24 blocks in the union of these prefixes.
The size of the AS customer cone in terms of the number of ASes in the cone provides a coarse measure since individual AS sizes can differ drastically.
ASes with larger customer cones have an especially important role in the Internet's capital and governance structure. At the top of this hierarchy are ISPs commonly known as Tier-1 ISPs. They do not pay for transit to upstream providers at all; instead they peer with each other to guarantee their connectivity to all destinations in the Internet. At the bottom of the hierarchy are customer ASes who do not have their own customers and pay providers to reach all destinations in the Internet.
ISPs maintain a set of points-of-presence (PoPs) -- locations where ISPs have routers/servers and related equipment/personnel -- across the world. The refer to the set of these PoP locations as the ISP's geographic footprint . Geographic footprint is an important part of peer selection, as an ISP can only peer with other ISPs at locations where both ISPs have a PoP.
To infer an ISP's geographic footprint, we start with the set of prefixes in the BGP tables from the Route Views project routeview2 node and RIPE NCC's rcc12 node. We then break down these prefixes into the smallest set of IP addresses which Netacuity maps to the same geographic location. We set this as the lower bound on the number of metro areas in which the ISP has presence.
Although we know of no more rigorous empirical analysis of macroscopic Internet topology enriched with AS relationships, we recognize that resource limitations constrain the quality of the science we can do.
- AS relationships are more complex than allowed for in our approach. The semantics of routing relationships between the same two ASes can differ by peering location or even by prefix; our model oversimplifies these cases by assigning a single relationship to each pair of ASes.
- A truly accurate picture of the Internet topology would require collection of data from every AS, while our automated ranking procedure is limited to the measurement points publicly available at Route Views.
- As in all analyses of massive datasets, our heuristics have a number of associated external parameters. We fine tune the values of these parameters based on our pre-existing (but limited) notion of the correct answer as well as experience with the algorithm that suggests auspicious ranges.
More monitor points, more probing, cross-correlative analysis in conjunction with other sources of data, and more powerful data processing techniques to support larger topology samples would improve the integrity and utility of the relationship-based AS ranking.
Download the AS links annotated with AS relationships Dataset
Please cite this dataset as follows:
The CAIDA AS Relationships Dataset, <date range used>. http://www.caida.org/data/as-relationships/
For more information, email firstname.lastname@example.org.
Data is available from 2004 to present, with one file created per week in 2006 and one per month in prior years. Each file contains a full AS graph derived from RouteViews BGP table snapshots taken at 8-hour intervals over a 5-day period. The AS relationships available are customer-provider (and provider-customer in the opposite direction), peer-to-peer, and sibling-to-sibling. See the comments at the beginning of each file for details of the file format.
The general procedure for creating a file is as follows:
- Extract all AS links from RouteViews snapshots.
- Infer customer-provider relationships, and annotate AS links.
- Infer peer-to-peer relationships, and annotate AS links, possibly overriding customer-provider relationships inferred in step 2.
- Heuristically fix suspicious looking inferred relationships (e.g., a low-degree AS acting as provider to a high-degree AS).
- Infer sibling ASes (that is, ASes belonging to the same organization) from WHOIS, and annotate AS links, possibly overriding previous relationship annotations.
- AS Relationships: Inference and Validation
- Inferring AS Relationships: Dead End or Lively Beginning?
- AS relationships are used to determine the ranking of ASes based on the size of customer cones. The rankings are available through the AS rank CGI.
- In the paper Revealing the Autonomous System Taxonomy: The Machine Learning Approach, AS relationships play an important role in classification of ASes into the following broad categories: large ISPs, small ISPs, customer networks, university networks, Internet exchange points, and network information centers. An empirical AS taxonomy based on these categories is available for download at the Autonomous System Taxonomy Repository page.
|[GAO]||L. Gao, On Inferring Autonomous System Relationships in the Internet, IEEE/ACM Transactions on Networking, December 2001.|
|[SARK]||L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, Characterizing the Internet Hierarchy from Multiple Vantage Points, IEEE INFOCOM, 2002.|
|[DPP]||G. Di Battista, M. Patrignani, and M. Pizzonia, Computing the Types of the Relationships between Autonomous Systems, IEEE INFOCOM, 2003.|
|[EHS]||T. Erlebach, A. Hall, and T. Schank, Classifying Customer-Provider Relationships in the Internet, Proceedings of the IASTED International Conference on Communications and Computer Networks (CCN), 2002.|
|[XG]||J. Xia and L. Gao, On the Evaluation of AS Relationship Inferences, IEEE Globecom, 2004.|
|[DKH1]||X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, and G. Riley, Inferring AS Relationships: Dead End or Lively Beginning, International Workshop on Efficient and Experimental Algorithms (WEA), 2005.|
|[DKH2]||X. Dimitropoulos, D. Krioukov, M. Fomenkov, B. Huffaker, Y. Hyun, kc claffy, and G. Riley, AS Relationships: Inference and Validation, ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp.29-40, 2007.|
|[HK]||B. Hummel and S. Kosub Acyclic Type-of-Relationship Problems on the Internet: An Experimental Analysis , In Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC'2007), pages 221-226. ACM Press, New York, NY, 2007.|