Their share: diversity and disparity in IP traffic

Supplemental material for the PAM 2004 paper.

Traffic Analysis Tables and Plots

All analysis on these pages involving prefixes and ASes is based on RouteViews snapshots taken on Aug 14, 2002 and May 8, 2003. Since not all peers are equally represented in RouteViews snapshots, we use only peers that have "full size" tables, and when an AS has more than one router peering with RouteViews, we use the BGP data from only one router, in order to avoid bias. Specifically, for the 2002 snapshot, we use routes from 35 peers that each announce at least 111k prefixes, and for the 2003 snapshot, we use 39 peers announcing at least 119k prefixes. The prefix cutoffs are subjective, but they are chosen at a more or less clear gap in the distribution of table sizes.

When prefixes are announced by more than one AS, we choose the most frequently occurring origin AS, as seen by our set of RouteViews peers. We break ties by choosing the lowest numbered AS. Furthermore, we consider only semiglobal prefixes--that is, prefixes announced by more than half of the (selected) peers. This means prefixes announced by at least 18 peers for the 2002 snapshot and 20 peers for the 2003 snapshot. Similarly, we consider only semiglobal ASes--that is, the origin ASes of semiglobal prefixes. Atoms are computed with findBgpAtoms, which is a part of the rv2atoms package.

The matrix below points to a large number of tables and plots that describe each dataset included in the paper in great detail. The provided tables show

the diversity of objects at fixed percentiles of traffic (99, 95, 90, and 50)
the number of objects responsible for fixed percentiles of traffic (90 and 50)
the number of objects that individually contribute 1% of the traffic
the crossover split and various crossover statistics, including the count, minimum size, and traffic volume of elephants
the geographic distribution of traffic

Please see the description of the tables and the description of the plots before proceeding.

Hint: The best way to browse the plots is by clicking on the links appearing to the right of each plot rather than by scrolling around the page. By navigating with the links, you can easily compare two plots with different parameters (e.g., bytes vs. packets, or Backbone 1 vs. Backbone 2) by "flashing" back and forth between them like with a flipbook. Be sure to resize your browser window so that the plots and the navigation links appear together. Also, when viewing the plots for the first time, be sure to let your browser download all the images (64 in all) before you start clicking on the hyperlinks. Otherwise, your browser may fail to load all images.

BB1 2002-08-14 (D04)
tables:	bytes	\|	packets
plots:	bytes	\|	packets

UNI 2002-08-14 (D05)
tables:	bytes	\|	packets
plots:	bytes	\|	packets

BB1 2003-05-07 (D08)
tables:	bytes	\|	packets
plots:	bytes	\|	packets

BB2 2003-05-07 (D09)
tables:	bytes	\|	packets
plots:	bytes	\|	packets

Coverage of Semiglobal Prefixes and Semiglobal ASes

This analysis of coverage is based on the same two RouteViews snapshots used for all analysis on these supplemental webpages. We exclude transit-only ASes and private ASes (which have a number equal to or higher than 64,512). When prefixes are announced by more than one AS, we choose the most frequently occurring origin AS, as seen by our set of RouteViews peers. We break ties by choosing the lowest numbered AS. We consider only semiglobal prefixes and semiglobal ASes (that is, the origin ASes of semiglobal prefixes). The following table summarizes the BGP data used for computing coverage:

Summary of RouteViews BGP Data
snapshot date	peers	min. prefixes per peer	semiglobal prefixes	non-private origin ASes	private origin ASes
2002-08-14	35	111k	112,148	13,409	0
2003-05-08	39	119k	121,257	15,002	2

The table below shows the coverage of prefixes and ASes by the IP addresses seen in a given dataset. The "src/dst" column provides the combined coverage of the source and destination addresses.

On Jun 18, 2004, the following table was updated to have the correct values in the "src/dst" columns. The "src/dst" columns were previously computed by taking the simple sum of the corresponding "source" and "destination" columns. This method is incorrect because there are objects (prefixes or ASes) common to both the "source" and "destinations" columns.

	prefix						AS
	source		destination		src/dst		source		destination		src/dst
D04N(0)	25.2%	28,207	9.9%	11,140	32.9%	36,910	38.8%	5,201	13.7%	1,835	45.8%	6,148
D04S(1)	7.5%	8,412	40.0%	44,834	44.5%	49,952	15.0%	2,013	45.9%	6,150	52.4%	7,022
D05O(0)	0.4%	484	41.2%	46,224	41.2%	46,261	1.9%	255	73.7%	9,882	73.7%	9,886
D05I(1)	39.5%	44,330	0.5%	612	39.5%	44,341	70.6%	9,464	2.4%	319	70.6%	9,464
D08N(0)	28.0%	33,939	6.6%	7,983	32.7%	39,652	52.1%	7,812	10.6%	1,592	56.3%	8,447
D09S(0)	2.8%	3,370	15.9%	19,296	18.5%	22,475	4.1%	609	25.8%	3,866	29.1%	4,372
D09N(1)	15.3%	18,595	1.2%	1,510	16.4%	19,932	23.5%	3,529	1.5%	224	24.5%	3,676

Coverage of Semiglobal Prefixes and Semiglobal ASes (RouteViews)

Traffic Excluded from Analysis

Some of the traffic observed on a link have been excluded from the analysis on these supplementary webpages. Specifically, traffic originated from or destined to the following categories of IP addresses have been excluded:

addresses matching bogon prefixes
addresses without a matching semiglobal prefix
addresses without a matching atom

These categories of IP addresses overlap, with category 1 being a subset of 2, and category 2 in turn being a subset of 3. (We correctly deal with accidental announcement of bogon prefixes.) Traffic attributable to category 1 addresses is not included in any statistics provided at the granularity of IP addresses. In particular, percentages are taken with respect to the total volume after excluding this traffic. Similarly, traffic attributable to category 2 is not included in any statistics on prefixes or ASes, and traffic attributable to category 3 is not included in any statistics on atoms. Note, in theory, the adjusted total volume of traffic for IP-based analysis could be higher than for prefix/AS-based analysis, and similarly, the adjusted total for prefix/AS-based analysis could be higher than for atom-based analysis.

For our purposes, we use the following minimal list of bogons, derived from RFC 3330, "Special-Use IPv4 Addresses" (note, we do not include the multicast block, 224.0.0.0/4, in this list):

prefix	description
0.0.0.0/8	hosts on "this" network (RFC3330)
10.0.0.0/8	private network (RFC1918)
127.0.0.0/8	loopback interface (RFC3330)
169.254.0.0/16	link local (RFC3330)
172.16.0.0/12	private network (RFC1918)
192.0.2.0/24	test net (for use as examples in documentation, RFC3330)
192.168.0.0/16	private network (RFC1918)
198.18.0.0/15	network device benchmarking (RFC3330)
255.255.255.255/32	"limited broadcast" (RFC3330)

An address may have a matching prefix but not a matching semiglobal prefix. This can happen, for example, if the matching prefix is longer than /24 and if most peers filter out prefixes longer than /24's.

An address may have a matching prefix but not a matching atom if the matching prefix is not announced by all RouteViews peers. This follows from the particular definition of atoms we used, which requires that the constituent prefixes of an atom be announced by all participating peers.

The following table summarizes the amount of excluded traffic attributable to two of the three categories of excluded IP addresses. For each category, the table lists the number of IP addresses (addr), packets (pkt), and bytes (byte) attributable to that category. (Note: As stated before, the "no semiglobal" category is a superset of the "bogon IP" category.)

	source						destination
	bogon IP			no semiglobal			bogon IP			no semiglobal
	addr	pkt	byte	addr	pkt	byte	addr	pkt	byte	addr	pkt	byte
D04N(0)	4.4k	2.1M	124M	5.2k	2.1M	129M	11	41k	18M	185	61k	33M
D04S(1)	1.3k	225k	83M	1.8k	326k	88M	2	3.0k	257k	234k	567k	80M
D05O(0)	0	0	0	1	116	4.6k	725	198k	9.1M	319k	1.6M	217M
D05I(1)	4.3k	56k	5.7M	4.6k	402k	76M	26	5.7k	284k	42	261k	129M
D08N(0)	5.6k	2.7M	197M	7.0k	2.8M	205M	7	2.4k	190k	2.3k	7.1k	648k
D09S(0)	383	2.6M	1.3G	426	2.6M	1.3G	0	0	0	101	25k	20M
D09N(1)	1.2k	218k	183M	1.5k	252k	185M	0	0	0	5	58k	18M

Traffic Volume of Bogon IP Addresses and IP Addresses without Matching Semiglobal Prefixes

Density of ASes

The following plot shows the density of ASes by traffic volume for one dataset. The x-axis is binned logarithmically by base 2; that is, the n^th bin consists of all x such that

2ⁿ <= x < 2ⁿ⁺¹ .

Related Objects

See https://catalog.caida.org/paper/2004_diversity/ to explore related objects to this document in the CAIDA Resource Catalog.