The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

IP Prefix-to-AS Mapping comparison

For various topology-related projects, we need a mapping from an IP address to the Autonomous System (AS) that owns that IP address. The most common approach to map IP addresses to ASes is to use BGP table dumps from public sources like Routeviews and RIPE, and then perform a longest-prefix match on the set of prefixes. We are currently using one routing table from Routeviews (RV2) and one table from RIPE (RRC12) to map IP addresses to ASes.

The goal of this analysis is to study whether the current choice of routing tables is the best, in terms of various different metrics that we are interested in. Further, we study the utility of adding more tables, in terms of the increase in address space coverage, new ASes, prefixes, AS links, and AS paths that the additional table gives. We also compare the IP-AS mapping from Routeviews and RIPE tables with that obtained from Team Cymru's WHOIS service.

General table statistics

We first calculate the following statistics for each routing table collected from Routeviews and RIPE:

1) The address space coverage provided by the table, in terms of the number of IP addresses.
2) The number of prefixes seen in the table.
2) The number of unique AS paths seen in the table.
3) The number of AS links seen in the table.
3) The number of origin ASes seen in the table.
4) The mean APA which is the mean of the distribution of the number of IP addresses owned by an AS.
5) The std deviation of APA, which is a measure of the diversity of the IP-AS mapping provided by the table.

The entries of the table are color-coded, such that the entry in red indicates the largest value across a column. If a particular row (corresponding to a routing table) has red entries all through, then it indicates that that routing table is the best with respect to each metric of interest.

Routeviews tables

table coverage AS paths prefixes AS links origin ASes APA mean APA std dev
rib_eqx.20100305.0156.bz2 2211947849 453125 315010 126574 33808 65426 987668
rib_linx.20100305.0021.bz2 2229336991 831343 318390 134212 33922 65719 990002
rib_paix.20100305.0135.bz2 2212584490 409250 317935 125970 34010 65056 984136
rib_rv2.20100305.0000.bz2 2212616212 1552841 321308 144576 34410 64301 978268
rib_rv4.20100305.0000.bz2 2220951409 306116 315464 122800 34163 65010 982993
rib_wide.20100305.0132.bz2 2212738760 198064 317018 103266 33718 65624 986999

Ripe tables

table coverage AS paths prefixes AS links origin ASes APA mean APA std dev
bview_rrc00.20100304.2359.gz 2212478168 462998 323207 125788 34241 64614 980963
bview_rrc01.20100304.2359.gz 2228603950 503862 314037 128236 34021 65506 988936
bview_rrc03.20100304.2359.gz 2212323364 541541 316163 129518 34175 64735 982198
bview_rrc04.20100304.2359.gz 2212418821 291745 331780 121244 34086 64906 983440
bview_rrc05.20100304.2359.gz 2212414720 386706 314316 123754 34117 64847 982893
bview_rrc06.20100304.2359.gz 2212142272 95925 312927 93776 33691 65659 990074
bview_rrc07.20100304.2359.gz 2211947552 236494 313136 116048 33875 65297 986934
bview_rrc10.20100304.2359.gz 2222686249 277884 312981 115750 33880 65604 988153
bview_rrc11.20100304.2359.gz 2211833892 379801 314201 125562 34021 65013 984654
bview_rrc12.20100304.2359.gz 2211800864 349557 312952 121780 33917 65212 986417
bview_rrc13.20100304.2359.gz 2212372000 417226 326348 123346 34203 64683 981556
bview_rrc15.20100304.2359.gz 2212315401 197206 315115 108410 33935 65192 986180
bview_rrc16.20100304.2359.gz 2214412879 143079 320967 110684 33833 65451 988009

We find that in the case of Routeviews, the table RV2 is the best in terms of the number of AS paths, unique prefixes, unique AS links and origin ASes. RV_linx, however, provides the most address space coverage. For RIPE, we find that no single table is best with respect to all metrics. RRC01 gives the most coverage of address space. The currently used table (RRC12) does not seem to be the best with respect to any particular metric.

The utility of adding additional tables

Here, we study the utility of adding additional routing tables. We start with a base table, and keep adding new tables successively. We then measure how the following metrics change due to the addition of the new table.

1) Additional address space coverage provided by the new table.
2) Address space allocation changed due to the new table.
3) Unique ASes and origin ASes provided by the new table.
4) Unique AS links provided by the new table.
5) Unique prefixes provided by the new table.
6) Unique more specific prefixes provided by the new table.
7) More specific prefixes with different origin AS provided by the new table.

Tables ordered by overall address space coverage

In the first comparison, we use the table with the largest address space coverage as the starting (base) table. We then successively add additional tables in the decreasing order of address space coverage, measuring the change in the aforementioned properties caused by the additional table.

We find that additional tables lead to less than 1% increase in address space coverage, address allocation change, the number of unique ASes and unique origin ASes. The largest change caused by additional tables is in the number of unique AS links, unique prefixes and more specific prefixes. In those cases, additional tables can result in between 1% and 10% increase in the number of unique AS links, prefixes and more specific prefixes.

Adding additional routing tables leads to less than 1% increase in address space coverage. Also, less than 1% of the IP-AS allocations change when we add successive tables.

[graph: all_addr_ch]

Adding additional routing tables leads to less than 1% increase in the number of unique ASes that we see. This confirms previous observations that most ASes are seen from even a small number of vantage points.

[graph: all_unq_AS]

The effect of adding additional tables is larger in the case of unique AS links. We find that additional tables can yield up to 7% more AS links than seen in the base table.

[graph: all_unq_lnk]

We study the number of additional prefixes that we gain by adding more routing tables. We find that additional tables yield up to 8% more prefixes. The fraction of more specific new prefixes closely follows the fraction of new prefixes.

[graph: all_unq_pfx]
[graph: all_new_MSP]

Finally, we study how many of the more specific new prefixes yield a differnt origin AS. We find that for most of the tables, between 10% and 70% of the more specific prefixes actually give a different origin AS.

[graph: all_new_MSP_DO]

Tables ordered by current usage

We study our current usage, which consists of one table from Routeviews (RV2), and one table from RIPE (RRC12). We use these tables as the starting point, and add successive tables in decreasing order of the address space coverage.

We find that adding additional tables starting from our current usage causes less than 1% change in the address space coverage, address space allocation, the number of unique ASes and unique origin ASes. The change in the number of unique AS links, unique prefixes and unique more specific prefixes is between 1% and 10%.

[graph: cur_addr_ch]
[graph: cur_unq_AS]
[graph: cur_unq_lnk]
[graph: cur_unq_pfx]
[graph: cur_new_MSP]
[graph: cur_new_MSP_DO]

Choosing the best additional table

In this section, we determine the best next table to add, in order to optimize a certain metric in the aggregate. The metric we use is the change in address space allocation caused by the new table. At a particular point, let tcuml be the cumulative prefix-AS mapping up to that point. We now have a set of remaining tables, and must determine the best table to add to tcuml, i.e., the table that would cause the largest change in address space allocation. This can be done in parallel for each possible table t to add.

We find that considering all tables, adding the best possible next table still results in less than 1% change in the address space coverage and address allocation. We see the same trend when we start with the currently used tables (RV2+RRC12) and successively add the best possible tables.

[graph: all_bestcomb_addr_ch]
[graph: cur_bestcomb_addr_ch]

Comparison with Team Cymru's WHOIS service

We compare the IP-AS mapping obtained from Routeviews and RIPE BGP dumps with the IP-AS mapping service provided by Team Cymru. For this purpose, we construct a sample list of 24M IP addresses collected from Ark traces seen in January 2010. To limit the load on Cymru servers, we query only one address per /24 (the *.1 address), thus reducing the set of queried addresses to 2.7M. For each address, we compare the AS returned by Cymru with the AS obtained from the prefix-AS mappings derived from Routeviews and RIPE tables. (Cym = mapping from Cymru's service, and Tab = mapping from RV+RIPE tables).

table addresses mismatch mismatch %Single Cym undef % Single Tab undef % Single mismatch % MOAS missing in Tab % MOAS missing in Cym % MOAS missing both %
RV2+RRC12 2702414 7926 0.3 85 1 226 2.8 2850 36 394 5 4355 56 16 0.2
all RV+RIPE 2702414 17839 0.7 178 1 129 0.72 10049 56 766 4.2 6142 34 575 3.2

The "single Cym undef" column refers to the number of mismatches that were because Cymru did not have a matching prefix for that IP address. "Single Tab undef" is for mismatches where our combination of routing tables did not have a match for the IP address. "Single mismatch" refers to the case where both the table and Cymru found a matching AS, but the ASes differred. "MOAS missing in Tab" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in our tables. "MOAS missing in Cym" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in the Cymru lookup. "MOAS missing in both" refers to the cases where the IP address mapped to multiple origin ASes, and some AS from that set was missing in both the table dumps and Cymru lookups. Overall, the difference between our current prefix-AS data (RV2+RRC12) and Cymru mapping is around 0.3%. The largest fraction of mismatches are caused due to addresses which have a single, but different matching AS in the Cymru mapping and our tables. A significant fraction of mismatches is also caused due to MOASes. In particular, we find that our BGP tables find origin ASes for the same prefix that are not seen in the Cymru mapping. The overal mismatch fraction increases as we go from the currrent subset of tables (RV2+RRC12) to a combination of all Routeviews and RIPE tables.

Temporal analysis

In this section, we study how the previously defined metrics change as we use routing tables separated in time. For this purpose, we collected 10 routing tables from one Routeviews and one RIPE collector over the duration of one month. We then compare the same metrics defined previously for the following cases:
1) Compare consecutive tables - Consider Table i as the base table, and study the effect of combining table i+1 with respect the previous metrics.
2) Compare the first table with each successive table -- Consider Table 0 as the base table, and study the effect of combining each table i with respect to the previous metrics.

Comparing consecutive tables

We first compare consecutive tables from Routeviews and RIPE with respect to the difference in address space allocation, number of unique ASes and origin ASes, and the number of unique AS links. We use each pair of consecutive tables, and consider the first of those as the base table. Then we study the the changes in the previously enumerated properties if the second table were combined with the first. (Note that consecutive tables are themselves 3 days apart)

Overall, we find that the metrics do not change significantly across consecutive tables. Over all pairs of consecutive tables, we see < 2% increase in the address space coverage, < 3% change in address allocation, < 1% of unique ASes, < 1% of unique origin ASes, and < 3% of unique AS links.

[graph: rv_ripe_sp_addr_ch]
[graph: rv_ripe_sp_unq_AS]
[graph: rv_ripe_sp_unq_lnk]

Comparing additional tables with the first table

In this comparison, we consider the first table collected in the month as the base table. Now, we add successive tables collected over the month, and find the change in address space allocation, number of unique ASes and AS links over the duration of the month. Note that we still only consider pairs of tables, with the first table fixed.

As expected, the difference from comparing the first table with each subsequent table in the month is greater than when we compare consecutive tables. Also, the difference increases as we combine the first table with tables later in the month. We see the largest difference in the number of unique AS links, where the difference between the first and the last table of the month is about 5%.

[graph: rv_ripe_fp_addr_ch]
[graph: rv_ripe_fp_unq_AS]
[graph: rv_ripe_fp_unq_lnk]
Published