BGP data
The AS topology snapshots used in this paper were created from publicly available BGP dumps provided by Routeviews and RIPE NCC's collectors. To create a topology snapshot, we first collected 5 routing table dumps from all available Routeviews and RIPE collectors over the course of a month. We then applied a majority filtering algorithm to only retain those AS paths that were seen in a majority of the samples. We then extracted AS links from the set of persistent AS paths. Note that these topologies are not the most complete topologies available in each time period. The topology here represents the primary topology, meaning that we remove backup and transient links that may occur during routing events. Please refer to the paper for details of the majority filtering algorithm. The raw data (AS paths, BGP tables etc.) is available on request. We provide the processed topology snapshots after annotating AS links with business relationships (see below).
AS relationships
We used the BGP data to annotate each interdomain link with one of three simplified business relationships -- customer-provider (the customer pays the provider), settlement-free peer (typically no money is exchanged), and sibling (both ASes belong to the same organization) -- using the classification algorithm by Lixin Gao.
AS classification
We classify ASes according to their business functions. Our classification is based on the average customer and peer degrees of an AS over the entire lifetime of that AS. Please refer to the paper for details of the classification.
Additional AS information
We extract additional information about each AS using WHOIS queries. For this purpose, we use the WHOIS service provided by Team Cymru. Their WHOIS service provides information such as the registry where an AS is registered (arin/ripencc/apnic/lacnic/afrinic), the country code for that AS, and a brief description of the AS.
Datasets
We provide two versions of these datasets, corresponding to the Internet Measurement Conference (2008) and the IEEE/ACM Transactions on Networking (2011) versions of this paper.
- imc08|ton11.rel_files.tar.gz (relationship snapshots
) is the set of relationship snapshots, one file
for each snapshot.
File format:
<AS1> <AS2> <rel>
where the relationships are:
0 = sibling
1 = customer-provider
2 = provider-customer
3 = peer
- AS.train (AS classification training
data ) is the training data used for creating the
decision tree classifier.
File format:
<AS_number>|<AS_description>|<AS_type>
- imc08|ton11.sn.3m.AS.class (AS classification
data ) is the output of the decision tree classifier which
classifies each AS according to its business type.
File format:
<AS_number> <AS_type>
where the AS types are:
1 = Enterprise Customer
2 = Small Transit Provider
3 = Large Transit Provider
3 = Content/Access/Hosting Provider
- imc08|ton11.sn.3m.AS.info (AS information data
) gives, for each ASN, the country code, registry
(ARIN/RIPENCC/APNIC/LACNIC/AFRINIC) where the ASN is registered, and a
brief description of that AS.
File format:
<AS_number> <AS_type>