Their share: diversity and disparity in IP traffic

Supplemental material for the PAM 2004 paper.

Traffic Analysis Tables and Plots

All analysis on these pages involving prefixes and ASes is based on RouteViews snapshots taken on Aug 14, 2002 and May 8, 2003. Since not all peers are equally represented in RouteViews snapshots, we use only peers that have "full size" tables, and when an AS has more than one router peering with RouteViews, we use the BGP data from only one router, in order to avoid bias. Specifically, for the 2002 snapshot, we use routes from 35 peers that each announce at least 111k prefixes, and for the 2003 snapshot, we use 39 peers announcing at least 119k prefixes. The prefix cutoffs are subjective, but they are chosen at a more or less clear gap in the distribution of table sizes.

When prefixes are announced by more than one AS, we choose the most frequently occurring origin AS, as seen by our set of RouteViews peers. We break ties by choosing the lowest numbered AS. Furthermore, we consider only semiglobal prefixes--that is, prefixes announced by more than half of the (selected) peers. This means prefixes announced by at least 18 peers for the 2002 snapshot and 20 peers for the 2003 snapshot. Similarly, we consider only semiglobal ASes--that is, the origin ASes of semiglobal prefixes. Atoms are computed with findBgpAtoms, which is a part of the rv2atoms package.

The matrix below points to a large number of tables and plots that describe each dataset included in the paper in great detail. The provided tables show

  • the diversity of objects at fixed percentiles of traffic (99, 95, 90, and 50)
  • the number of objects responsible for fixed percentiles of traffic (90 and 50)
  • the number of objects that individually contribute 1% of the traffic
  • the crossover split and various crossover statistics, including the count, minimum size, and traffic volume of elephants
  • the geographic distribution of traffic

Please see the description of the tables and the description of the plots before proceeding.

Hint: The best way to browse the plots is by clicking on the links appearing to the right of each plot rather than by scrolling around the page. By navigating with the links, you can easily compare two plots with different parameters (e.g., bytes vs. packets, or Backbone 1 vs. Backbone 2) by "flashing" back and forth between them like with a flipbook. Be sure to resize your browser window so that the plots and the navigation links appear together. Also, when viewing the plots for the first time, be sure to let your browser download all the images (64 in all) before you start clicking on the hyperlinks. Otherwise, your browser may fail to load all images.

BB1 2002-08-14 (D04)

tables: bytes | packets
plots: bytes | packets
UNI 2002-08-14 (D05)

tables: bytes | packets
plots: bytes | packets
BB1 2003-05-07 (D08)

tables: bytes | packets
plots: bytes | packets
BB2 2003-05-07 (D09)

tables: bytes | packets
plots: bytes | packets

Coverage of Semiglobal Prefixes and Semiglobal ASes

This analysis of coverage is based on the same two RouteViews snapshots used for all analysis on these supplemental webpages. We exclude transit-only ASes and private ASes (which have a number equal to or higher than 64,512). When prefixes are announced by more than one AS, we choose the most frequently occurring origin AS, as seen by our set of RouteViews peers. We break ties by choosing the lowest numbered AS. We consider only semiglobal prefixes and semiglobal ASes (that is, the origin ASes of semiglobal prefixes). The following table summarizes the BGP data used for computing coverage:

Summary of RouteViews BGP Data
snapshot
date
peersmin. prefixes
per peer
semiglobal
prefixes
non-private
origin
ASes
private
origin
ASes
2002-08-1435111k 112,14813,4090
2003-05-0839119k 121,25715,0022

The table below shows the coverage of prefixes and ASes by the IP addresses seen in a given dataset. The "src/dst" column provides the combined coverage of the source and destination addresses.

On Jun 18, 2004, the following table was updated to have the correct values in the "src/dst" columns. The "src/dst" columns were previously computed by taking the simple sum of the corresponding "source" and "destination" columns. This method is incorrect because there are objects (prefixes or ASes) common to both the "source" and "destinations" columns.

prefix AS
source destination src/dst source destination src/dst
D04N(0) 25.2%28,2079.9%11,14032.9%36,91038.8%5,20113.7%1,83545.8%6,148
D04S(1) 7.5%8,41240.0%44,83444.5%49,95215.0%2,01345.9%6,15052.4%7,022
D05O(0) 0.4%48441.2%46,22441.2%46,2611.9%25573.7%9,88273.7%9,886
D05I(1) 39.5%44,3300.5%61239.5%44,34170.6%9,4642.4%31970.6%9,464
D08N(0) 28.0%33,9396.6%7,98332.7%39,65252.1%7,81210.6%1,59256.3%8,447
D09S(0) 2.8%3,37015.9%19,29618.5%22,4754.1%60925.8%3,86629.1%4,372
D09N(1) 15.3%18,5951.2%1,51016.4%19,93223.5%3,5291.5%22424.5%3,676
Coverage of Semiglobal Prefixes and Semiglobal ASes (RouteViews)

Traffic Excluded from Analysis

Some of the traffic observed on a link have been excluded from the analysis on these supplementary webpages. Specifically, traffic originated from or destined to the following categories of IP addresses have been excluded:

  1. addresses matching bogon prefixes
  2. addresses without a matching semiglobal prefix
  3. addresses without a matching atom

These categories of IP addresses overlap, with category 1 being a subset of 2, and category 2 in turn being a subset of 3. (We correctly deal with accidental announcement of bogon prefixes.) Traffic attributable to category 1 addresses is not included in any statistics provided at the granularity of IP addresses. In particular, percentages are taken with respect to the total volume after excluding this traffic. Similarly, traffic attributable to category 2 is not included in any statistics on prefixes or ASes, and traffic attributable to category 3 is not included in any statistics on atoms. Note, in theory, the adjusted total volume of traffic for IP-based analysis could be higher than for prefix/AS-based analysis, and similarly, the adjusted total for prefix/AS-based analysis could be higher than for atom-based analysis.

For our purposes, we use the following minimal list of bogons, derived from RFC 3330, "Special-Use IPv4 Addresses" (note, we do not include the multicast block, 224.0.0.0/4, in this list):

prefixdescription
0.0.0.0/8 hosts on "this" network (RFC3330)
10.0.0.0/8 private network (RFC1918)
127.0.0.0/8 loopback interface (RFC3330)
169.254.0.0/16 link local (RFC3330)
172.16.0.0/12 private network (RFC1918)
192.0.2.0/24 test net (for use as examples in documentation, RFC3330)
192.168.0.0/16 private network (RFC1918)
198.18.0.0/15 network device benchmarking (RFC3330)
255.255.255.255/32 "limited broadcast" (RFC3330)

An address may have a matching prefix but not a matching semiglobal prefix. This can happen, for example, if the matching prefix is longer than /24 and if most peers filter out prefixes longer than /24's.

An address may have a matching prefix but not a matching atom if the matching prefix is not announced by all RouteViews peers. This follows from the particular definition of atoms we used, which requires that the constituent prefixes of an atom be announced by all participating peers.

The following table summarizes the amount of excluded traffic attributable to two of the three categories of excluded IP addresses. For each category, the table lists the number of IP addresses (addr), packets (pkt), and bytes (byte) attributable to that category. (Note: As stated before, the "no semiglobal" category is a superset of the "bogon IP" category.)

source destination
bogon IP no semiglobal bogon IP no semiglobal
addrpktbyteaddrpktbyteaddrpktbyteaddrpktbyte
D04N(0) 4.4k2.1M124M5.2k2.1M129M1141k18M18561k33M
D04S(1) 1.3k225k83M1.8k326k88M23.0k257k234k567k80M
D05O(0) 00011164.6k725198k9.1M319k1.6M217M
D05I(1) 4.3k56k5.7M4.6k402k76M265.7k284k42261k129M
D08N(0) 5.6k2.7M197M7.0k2.8M205M72.4k190k2.3k7.1k648k
D09S(0) 3832.6M1.3G4262.6M1.3G00010125k20M
D09N(1) 1.2k218k183M1.5k252k185M000558k18M
Traffic Volume of Bogon IP Addresses and IP Addresses without Matching Semiglobal Prefixes

Density of ASes

The following plot shows the density of ASes by traffic volume for one dataset. The x-axis is binned logarithmically by base 2; that is, the nth bin consists of all x such that

2n <= x < 2n+1 .

Related Objects

See https://catalog.caida.org/paper/2004_diversity/ to explore related objects to this document in the CAIDA Resource Catalog.