Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis > research : routing : prefix_as_comparison
IP Prefix-to-AS Mapping comparison

For various topology-related projects, we need a mapping from an IP address to the Autonomous System (AS) who is assigned (some say "owns") that IP address. The most common approach to map IP addresses to ASes is to use BGP table dumps from public sources such as Routeviews and RIPE, and then perform a longest-prefix match on the set of prefixes. In the past we have chosen a single table from each data source to maximize coverage while minimizing computational overhead. We have recently analyzed the utility of adding more BGP tables to this process, in terms of the increase in address space coverage, ASes, prefixes, AS links, and AS paths from each additional table. As another source of calibration, we also compare the IP-AS mapping from Routeviews and RIPE tables with that obtained from Team Cymru's whois service.

Amogh Dhamdhere did this analysis while at CAIDA in spring 2010.

General table statistics

We collected routing table dumps from each Routeviews and RIPE collector from March 05, 2010 at approximately the same time (00:00 hrs). We first calculate the following statistics separately for each table dump from Routeviews and RIPE.

1) The address space coverage provided by the table, in terms of the number of IP addresses.
2) The number of prefixes seen in the table.
2) The number of unique AS paths seen in the table.
3) The number of AS links seen in the table.
3) The number of origin ASes seen in the table.
4) The mean number of Addresses Per AS (APA) which is the mean of the distribution of the number of IP addresses owned by an AS.
5) The standard deviation of the number of Addresses Per AS, which is a measure of the diversity of the IP-AS mapping provided by the table.

The entries of the table below are color-coded, with red indicating the largest value in each column. If a particular row (corresponding to a routing table) has red entries all through, this routing table is best with respect to each metric analyzed.

Routeviews tables

tablecoverageAS pathsprefixesAS linksorigin ASesAPA meanAPA std dev
rib_eqx.20100305.0156.bz2 2211947849 453125 315010 126574 33808 65426 987668
rib_linx.20100305.0021.bz2 2229336991 831343 318390 134212 33922 65719 990002
rib_paix.20100305.0135.bz2 2212584490 409250 317935 125970 34010 65056 984136
rib_rv2.20100305.0000.bz2 2212616212 1552841 321308 144576 34410 64301 978268
rib_rv4.20100305.0000.bz2 2220951409 306116 315464 122800 34163 65010 982993
rib_wide.20100305.0132.bz2 2212738760 198064 317018 103266 33718 65624 986999

Ripe tables

tablecoverageAS pathsprefixesAS linksorigin ASesAPA meanAPA std dev
bview_rrc00.20100304.2359.gz 2212478168 462998 323207 125788 34241 64614 980963
bview_rrc01.20100304.2359.gz 2228603950 503862 314037 128236 34021 65506 988936
bview_rrc03.20100304.2359.gz 2212323364 541541 316163 129518 34175 64735 982198
bview_rrc04.20100304.2359.gz 2212418821 291745 331780 121244 34086 64906 983440
bview_rrc05.20100304.2359.gz 2212414720 386706 314316 123754 34117 64847 982893
bview_rrc06.20100304.2359.gz 2212142272 95925 312927 93776 33691 65659 990074
bview_rrc07.20100304.2359.gz 2211947552 236494 313136 116048 33875 65297 986934
bview_rrc10.20100304.2359.gz 2222686249 277884 312981 115750 33880 65604 988153
bview_rrc11.20100304.2359.gz 2211833892 379801 314201 125562 34021 65013 984654
bview_rrc12.20100304.2359.gz 2211800864 349557 312952 121780 33917 65212 986417
bview_rrc13.20100304.2359.gz 2212372000 417226 326348 123346 34203 64683 981556
bview_rrc15.20100304.2359.gz 2212315401 197206 315115 108410 33935 65192 986180
bview_rrc16.20100304.2359.gz 2214412879 143079 320967 110684 33833 65451 988009

For Routeviews, we found that table RV2 maximized the number of AS paths, unique prefixes, unique AS links and origin ASes, although the RV_linx table covered the most IPv4 address space. For RIPE, we found that no single table dump (from March 05, 2010) was best with respect to all metrics; RRC01 covered the most IPv4 address space.

The utility of adding additional tables

Next we studied the utility of adding additional routing tables. We started with a base table -- the table dump from March 05, 2010 that gave the largest coverage of IP address space. We then added a single table at a time in decreasing order of address space coverage, calculating changes in the following metrics due to the newly added table:

1) Additional address space coverage.
2) Address space allocation changed.
3) Unique ASes and origin ASes.
4) Unique AS links.
5) Unique prefixes.
6) Unique more specific prefixes.
7) More specific prefixes with different origin AS.

Tables ordered by overall address space coverage

We first performed this analysis starting with the table dump that covered the largest amount of IPv4 address space as the base table. For the set of table dumps we collected on March 05, 2010, RV_linx covered the largest amount of address space. We then added individual tables in decreasing order of address space coverage, measuring the change in the above listed metrics caused by the additional table. We found that additional tables led to less than 1% increase in address space coverage, address allocation change, and numbers of unique ASes and unique origin ASes. As far as the number of ASes (origin as well as non-origin ASes) is concerned, this result confirms previous observations that most ASes are observable from even a few vantage points.

However, for other metrics additional tables matter: adding a table can yield up to 4.8% more AS links than seen in the base table, up to 4.6% more prefixes and 4.7% more specific prefixes. For most of the added tables, between 10% and 70% of the more specific prefixes actually give a different origin AS.

[graph: all_addr_ch]
[graph: all_unq_AS]
[graph: all_unq_lnk]
[graph: all_unq_pfx]
[graph: all_new_MSP]
[graph: all_new_MSP_DO]

Choosing the best additional table

Starting from the base table RV_linx (which covers the largest amount of IPv4 address space) from the Routeviews and RIPE dumps collected on March 05, 2010, we evaluated which table was the best to add in order to optimize a certain metric in the aggregated table. We selected the metric of IP-AS mapping changes caused by the new table, which quantifies how much new IP-AS mapping information the new table provides. We let tcuml be the cumulative prefix-AS mapping at a given point, i.e., after adding the Nth table. Of the remaining tables, we must determine which table, when added to tcuml, causes the largest change in IP-AS mapping. We found that even by adding the "best" table for this purpose at every step resulted in less than 1% change in the address space coverage and IP-AS mapping.

[graph: all_bestcomb_addr_ch]

Comparison with Team Cymru's WHOIS service

As a form of independent validation, or at least calibration, we compared the IP-AS mappings we obtained from Routeviews and RIPE BGP dumps with the IP-AS mapping service provided by Team Cymru on March 22, 2010. For this purpose, we constructed a sample list of 24M IP addresses collected from Ark traces seen during our IPv4 topology probing during January 2010. To limit the load on Cymru servers, we queried (on March 22, 2010) only one address per /24 (the *.1 address), thus reducing the set of queried addresses to 2.7M. For each address, we compare the AS returned by Cymru with the AS obtained from the prefix-AS mappings derived from Routeviews and RIPE table dumps collected on the same date. (Cym = mapping from Cymru's service, and Tab = mapping from RV+RIPE tables).

tableIP addressesIP-AS mismatchesmismatch %Single Cym undef%Single Tab undef%Single mismatch%MOAS missing in Tab%MOAS missing in Cym%MOAS missing both%
RV LINX270241486760.3770.8250331573630533521322570.08
RIPE RRC01270241496520.4610.64094.238334034623618791980.08
all RV+RIPE2702414178390.717811290.7210049567664.26142345753.2

  • The "single Cym undef" column refers to the number of mismatches that were because Cymru did not have a matching prefix for that IP address.
  • "Single Tab undef" is for mismatches where our combination of routing tables did not have a match for the IP address.
  • "Single mismatch" refers to the case where both the table and Cymru each found a different (but only one) matching AS.
  • "MOAS missing in Tab" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in our tables.
  • "MOAS missing in Cym" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in the Cymru lookup.
  • "MOAS missing in both" refers to the cases where the IP address mapped to multiple origin ASes, and some AS from that set was missing in both the table dumps and Cymru lookups.
We first compare the IP-AS mapping obtained from Cymru with that obtained from the largest Routeviews and RIPE collectors in terms of address space coverage -- RV-LINX from Routeviews and RRC01 from RIPE. Overall, we find a 0.3% mismatch between the IP-AS mapping from RV-LINX and Cymru, and 0.4% between RRC01 and Cymru. We find that 36% of IP-AS mismatches between RV-LINX and Cymru (and 40% between RRC01 and Cymru) are due to IP addresses for which Cymru and the table dump return a single, but different AS. A significant fraction of IP-AS mismatches (55% between RRC01 and Cymru and 60% between RV-LINX and Cymru) is due to MOASes. Analyzing the IP-AS mismatches due to MOASes, we find that for 19% of such mismatches between RRC01 and Cymru (and 25% between RV-LINX and Cymru), the table dumps return origin ASes that are not in the Cymru mapping. The overall IP-AS mismatch fraction between Cymru and our tables increases to 0.7% as we go from using the largest Routeviews and RIPE collectors to a combination of all Routeviews and RIPE table dumps. We find that 56% of IP-AS mismatches between the combined table dumps and Cymru are due to IP addresses for which Cymru and the table dump return a single, but different AS. As expected, we find that the combined table dumps result in an increased fraction (34%) MOAS mismatches, where an AS was missing in the Cymru mapping.

Temporal analysis

We also analyzed how the previously defined metrics changed in routing tables over time. We collected 10 routing table dumps in March 2010 from Routeview's RV-LINX and RIPE's RRC01, which are the largest collectors in terms of IPv4 address space coverage. We labeled these table dumps chronologically as table 0 to table 9. Note that consecutive tables are themselves 3 days apart. We then calculated our list of metrics for the following cases:
1) Across immediately adjacent tables: For i= 0 .. 8, consider table i as the base table, and compute the effect of integrating table i+1 with table i on the list of metrics.
2) Compare the first table with each successive table: Consider table 0 as the base table, and study the effect of integrating each table i with table 0 on the previous metrics.

Comparing adjacent tables

We first compare immediately adjacent tables from Routeviews and RIPE with respect to the difference in IP-AS mapping, number of unique ASes and origin ASes, and the number of unique AS links. We consider each pair of immediately adjacent tables, using the first of the pair as the base table, and computing changes in our metrics if the second table were combined with the first. Overall, the metrics did not change significantly across the adjacent tables we analyzed. Over all pairs of adjacent tables, we see < 2% increase in the address space coverage, < 3% change in IP-AS mapping < 1% of unique ASes, < 1% of unique origin ASes, and < 3% of unique AS links.

[graph: rv_ripe_sp_addr_ch]
[graph: rv_ripe_sp_unq_AS]
[graph: rv_ripe_sp_unq_lnk]

Comparing additional tables with the first table

We performed a slightly different temporal analysis, treating the first table (of the 10) collected in the month as the base table. We added successive individual tables collected over the month to this first table, and computed the change in IP-AS mapping, number of unique ASes and AS links over the duration of the month. Note that we still only consider pairs of tables, with the first table fixed.

As expected, the difference from comparing the first table with each subsequent table in the month is greater than when we compare consecutive table pairs. Also, the difference increases as we combine the first table with tables later in the month. We see the largest difference in the number of unique AS links, where the difference between the first and the last table of the month is about 5%.

[graph: rv_ripe_fp_addr_ch]
[graph: rv_ripe_fp_unq_AS]
[graph: rv_ripe_fp_unq_lnk]
  Last Modified: Fri Aug-6-2010 12:19:20 PDT
  Page URL: