IP Prefix-to-AS Mapping comparison

For various topology-related projects, we need a mapping from an IP address to the Autonomous System (AS) that owns that IP address. The most common approach to map IP addresses to ASes is to use BGP table dumps from public sources like Routeviews and RIPE, and then perform a longest-prefix match on the set of prefixes. We are currently using one routing table from Routeviews (RV2) and one table from RIPE (RRC12) to map IP addresses to ASes.

The goal of this analysis is to study whether the current choice of routing tables is the best, in terms of various different metrics that we are interested in. Further, we study the utility of adding more tables, in terms of the increase in address space coverage, new ASes, prefixes, AS links, and AS paths that the additional table gives. We also compare the IP-AS mapping from Routeviews and RIPE tables with that obtained from Team Cymru's WHOIS service.

General table statistics

We first calculate the following statistics for each routing table collected from Routeviews and RIPE:

1) The address space coverage provided by the table, in terms of the number of IP addresses.
2) The number of prefixes seen in the table.
2) The number of unique AS paths seen in the table.
3) The number of AS links seen in the table.
3) The number of origin ASes seen in the table.
4) The mean APA which is the mean of the distribution of the number of IP addresses owned by an AS.
5) The std deviation of APA, which is a measure of the diversity of the IP-AS mapping provided by the table.

The entries of the table are color-coded, such that the entry in red indicates the largest value across a column. If a particular row (corresponding to a routing table) has red entries all through, then it indicates that that routing table is the best with respect to each metric of interest.

Routeviews tables

table	coverage	AS paths	prefixes	AS links	origin ASes	APA mean	APA std dev
rib_eqx.20100305.0156.bz2	2211947849	453125	315010	126574	33808	65426	987668
rib_linx.20100305.0021.bz2	2229336991	831343	318390	134212	33922	65719	990002
rib_paix.20100305.0135.bz2	2212584490	409250	317935	125970	34010	65056	984136
rib_rv2.20100305.0000.bz2	2212616212	1552841	321308	144576	34410	64301	978268
rib_rv4.20100305.0000.bz2	2220951409	306116	315464	122800	34163	65010	982993
rib_wide.20100305.0132.bz2	2212738760	198064	317018	103266	33718	65624	986999

Ripe tables

table	coverage	AS paths	prefixes	AS links	origin ASes	APA mean	APA std dev
bview_rrc00.20100304.2359.gz	2212478168	462998	323207	125788	34241	64614	980963
bview_rrc01.20100304.2359.gz	2228603950	503862	314037	128236	34021	65506	988936
bview_rrc03.20100304.2359.gz	2212323364	541541	316163	129518	34175	64735	982198
bview_rrc04.20100304.2359.gz	2212418821	291745	331780	121244	34086	64906	983440
bview_rrc05.20100304.2359.gz	2212414720	386706	314316	123754	34117	64847	982893
bview_rrc06.20100304.2359.gz	2212142272	95925	312927	93776	33691	65659	990074
bview_rrc07.20100304.2359.gz	2211947552	236494	313136	116048	33875	65297	986934
bview_rrc10.20100304.2359.gz	2222686249	277884	312981	115750	33880	65604	988153
bview_rrc11.20100304.2359.gz	2211833892	379801	314201	125562	34021	65013	984654
bview_rrc12.20100304.2359.gz	2211800864	349557	312952	121780	33917	65212	986417
bview_rrc13.20100304.2359.gz	2212372000	417226	326348	123346	34203	64683	981556
bview_rrc15.20100304.2359.gz	2212315401	197206	315115	108410	33935	65192	986180
bview_rrc16.20100304.2359.gz	2214412879	143079	320967	110684	33833	65451	988009

We find that in the case of Routeviews, the table RV2 is the best in terms of the number of AS paths, unique prefixes, unique AS links and origin ASes. RV_linx, however, provides the most address space coverage. For RIPE, we find that no single table is best with respect to all metrics. RRC01 gives the most coverage of address space. The currently used table (RRC12) does not seem to be the best with respect to any particular metric.

The utility of adding additional tables

Here, we study the utility of adding additional routing tables. We start with a base table, and keep adding new tables successively. We then measure how the following metrics change due to the addition of the new table.

1) Additional address space coverage provided by the new table.
2) Address space allocation changed due to the new table.
3) Unique ASes and origin ASes provided by the new table.
4) Unique AS links provided by the new table.
5) Unique prefixes provided by the new table.
6) Unique more specific prefixes provided by the new table.
7) More specific prefixes with different origin AS provided by the new table.

Tables ordered by overall address space coverage

In the first comparison, we use the table with the largest address space coverage as the starting (base) table. We then successively add additional tables in the decreasing order of address space coverage, measuring the change in the aforementioned properties caused by the additional table.

We find that additional tables lead to less than 1% increase in address space coverage, address allocation change, the number of unique ASes and unique origin ASes. The largest change caused by additional tables is in the number of unique AS links, unique prefixes and more specific prefixes. In those cases, additional tables can result in between 1% and 10% increase in the number of unique AS links, prefixes and more specific prefixes.

Adding additional routing tables leads to less than 1% increase in address space coverage. Also, less than 1% of the IP-AS allocations change when we add successive tables.

Adding additional routing tables leads to less than 1% increase in the number of unique ASes that we see. This confirms previous observations that most ASes are seen from even a small number of vantage points.

The effect of adding additional tables is larger in the case of unique AS links. We find that additional tables can yield up to 7% more AS links than seen in the base table.

We study the number of additional prefixes that we gain by adding more routing tables. We find that additional tables yield up to 8% more prefixes. The fraction of more specific new prefixes closely follows the fraction of new prefixes.

Finally, we study how many of the more specific new prefixes yield a differnt origin AS. We find that for most of the tables, between 10% and 70% of the more specific prefixes actually give a different origin AS.

Tables ordered by current usage

We study our current usage, which consists of one table from Routeviews (RV2), and one table from RIPE (RRC12). We use these tables as the starting point, and add successive tables in decreasing order of the address space coverage.

We find that adding additional tables starting from our current usage causes less than 1% change in the address space coverage, address space allocation, the number of unique ASes and unique origin ASes. The change in the number of unique AS links, unique prefixes and unique more specific prefixes is between 1% and 10%.

Choosing the best additional table

In this section, we determine the best next table to add, in order to optimize a certain metric in the aggregate. The metric we use is the change in address space allocation caused by the new table. At a particular point, let t_cuml be the cumulative prefix-AS mapping up to that point. We now have a set of remaining tables, and must determine the best table to add to t_cuml, i.e., the table that would cause the largest change in address space allocation. This can be done in parallel for each possible table t to add.

We find that considering all tables, adding the best possible next table still results in less than 1% change in the address space coverage and address allocation. We see the same trend when we start with the currently used tables (RV2+RRC12) and successively add the best possible tables.

Comparison with Team Cymru's WHOIS service

We compare the IP-AS mapping obtained from Routeviews and RIPE BGP dumps with the IP-AS mapping service provided by Team Cymru. For this purpose, we construct a sample list of 24M IP addresses collected from Ark traces seen in January 2010. To limit the load on Cymru servers, we query only one address per /24 (the *.1 address), thus reducing the set of queried addresses to 2.7M. For each address, we compare the AS returned by Cymru with the AS obtained from the prefix-AS mappings derived from Routeviews and RIPE tables. (Cym = mapping from Cymru's service, and Tab = mapping from RV+RIPE tables).

table	addresses	mismatch	mismatch %	Single Cym undef	%	Single Tab undef	%	Single mismatch	%	MOAS missing in Tab	%	MOAS missing in Cym	%	MOAS missing both	%
RV2+RRC12	2702414	7926	0.3	85	1	226	2.8	2850	36	394	5	4355	56	16	0.2
all RV+RIPE	2702414	17839	0.7	178	1	129	0.72	10049	56	766	4.2	6142	34	575	3.2

The "single Cym undef" column refers to the number of mismatches that were because Cymru did not have a matching prefix for that IP address. "Single Tab undef" is for mismatches where our combination of routing tables did not have a match for the IP address. "Single mismatch" refers to the case where both the table and Cymru found a matching AS, but the ASes differred. "MOAS missing in Tab" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in our tables. "MOAS missing in Cym" refers to the cases where the IP address mapped to multiple origin ASes, and one of the ASes was missing in the Cymru lookup. "MOAS missing in both" refers to the cases where the IP address mapped to multiple origin ASes, and some AS from that set was missing in both the table dumps and Cymru lookups. Overall, the difference between our current prefix-AS data (RV2+RRC12) and Cymru mapping is around 0.3%. The largest fraction of mismatches are caused due to addresses which have a single, but different matching AS in the Cymru mapping and our tables. A significant fraction of mismatches is also caused due to MOASes. In particular, we find that our BGP tables find origin ASes for the same prefix that are not seen in the Cymru mapping. The overal mismatch fraction increases as we go from the currrent subset of tables (RV2+RRC12) to a combination of all Routeviews and RIPE tables.

Temporal analysis

In this section, we study how the previously defined metrics change as we use routing tables separated in time. For this purpose, we collected 10 routing tables from one Routeviews and one RIPE collector over the duration of one month. We then compare the same metrics defined previously for the following cases:
1) Compare consecutive tables - Consider Table i as the base table, and study the effect of combining table i+1 with respect the previous metrics.
2) Compare the first table with each successive table -- Consider Table 0 as the base table, and study the effect of combining each table i with respect to the previous metrics.

Comparing consecutive tables

We first compare consecutive tables from Routeviews and RIPE with respect to the difference in address space allocation, number of unique ASes and origin ASes, and the number of unique AS links. We use each pair of consecutive tables, and consider the first of those as the base table. Then we study the the changes in the previously enumerated properties if the second table were combined with the first. (Note that consecutive tables are themselves 3 days apart)

Overall, we find that the metrics do not change significantly across consecutive tables. Over all pairs of consecutive tables, we see < 2% increase in the address space coverage, < 3% change in address allocation, < 1% of unique ASes, < 1% of unique origin ASes, and < 3% of unique AS links.

Comparing additional tables with the first table

In this comparison, we consider the first table collected in the month as the base table. Now, we add successive tables collected over the month, and find the change in address space allocation, number of unique ASes and AS links over the duration of the month. Note that we still only consider pairs of tables, with the first table fixed.

As expected, the difference from comparing the first table with each subsequent table in the month is greater than when we compare consecutive tables. Also, the difference increases as we combine the first table with tables later in the month. We see the largest difference in the number of unique AS links, where the difference between the first and the last table of the month is about 5%.