Table of Contents
The motivation for this work is the following broad question:
What drives the demand for IPv4 address space?
The many facets to this complex question make a full analysis
infeasible, so the analysis presented here focuses on the following
much more narrowly-defined question:
Do new players (organizations that have received only 1 allocation)
or old players (organizations that have received multiple allocations
over time) drive the growing demand and consumption of IPv4 address space?
We want to understand how the address space is distributed among
and concentrated in two categories of organizations, the new players and
the old players. Naively speaking, if new players hold most
of the address space, then they drive the demand for new space and thus
an expansion of the Internet itself through increases in the number of
participant networks and thus diversity of connectivity around the world.
Conversely, if old players hold most of the address space,
then they drive demand for new space and thus expansion of existing
networks through increases in host density of existing networks,
rather than by increases in the number of independent networks.
The underlying causes of demand for IPv4 space are more complex than
these naive interpretations suggest; here we take only first steps
toward analyzing this issue.
In summary, we find a definite trend in address allocations toward
increasing concentration of address space in old players. As of August
2005, old players make up only between 2% and 11.2% of all organizations,
but they hold more than half of the address space. We also find that most
organizations have relatively few allocations. Depending on the
method of counting, there are between 188k and 1.07M organizations in
our dataset. Of these, only 28 organizations have 32 or more
allocations, and fewer than 500 organizations have 8 or more
allocations. So the vast bulk of organizations have fewer than 8
allocations. In answer to our narrower question, we can say that old
players make use of most of the address space, and we naively
interpret this to mean that expansion of existing networks, rather
than emergence of new networks, drives demand for IPv4 space.
The remainder of this page describes our methodology, presents the
details of our analysis, and provides links to our analysis scripts and
data.
We first delineate three aspects of IPv4 address space:
allocated space, routed space, and occupied space.
Allocated space is the address space given by a Regional Internet
Registry (RIR) to an organization for use. Routed space is
allocated space that actually appears, via BGP announcement, as a route
in global routing tables. Occupied space is allocated space that has
actually been assigned to end hosts, routers, or other network
devices. Information on allocated space appears in whois
records, while BGP tables contain information on routed space.
Information on occupied space is not readily available, though it can
be partly inferred through active network measurements.
Which of these aspects of address space to study depends on the
goal of the analysis. We first seek to understand the group dynamics
and other macroscopic properties of address space allocation
to support informed policy making. Specifically, we are interested
in studying the way IPv4 space has been distributed to, and concentrated in,
organizations over time, and thus we only analyze allocated
address space.
This analysis makes use of the allocation event as the measurement
unit, which we define as all of the address blocks (1 block in the
common case) allocated to a given organization on the same
date. For example, if some organization X received 3
non-overlapping address blocks on Jan 1st and 2 non-overlapping address
blocks on Feb 15th, then we say X has 2 allocations rather than
5 allocations. We formulate our definition in this way because of our
interest in allocation behavior, since the allocation of
multiple address blocks on the same date likely represents a
single allocation event. We then replay the sequence of
address allocations made over time, as recorded in whois
records, to reconstruct the distribution of address space to
organizations at each time step. To classify organizations as new
players and old players, we keep track of the number of
allocations each organization has received up to a given point in time.
We base our analysis on address allocation data from a full whois
dump from ARIN on Aug 31, 2005. (The concise ARIN allocation statistics
files publicly available for download do not include information on
which organization has received each block of address space and thus are
unsuitable for our analysis.) To identify organizations, we rely on the
value of the OrgID field, which is a key in the ARIN database that
links together address blocks and organizations. No other RIR provides
whois dumps with such a well-defined mandatory link between
address blocks and organizations, and this is the main reason why we have
confined our analysis to ARIN data only. We leave the analysis of data from the
other RIRs to future work.
ARIN whois dumps are a rich source of allocation data and include
some of the earliest allocations ever made. However, to provide a
more insightful analysis, and one perhaps more representative of the
recent trends across all RIRs, we exclude the following from our analysis:
-
/8
allocations made directly by IANA to end sites prior to
the establishment of the RIRs (e.g., the 18.0.0.0/8 allocation to MIT),
- allocations managed by the US Department of Defense NIC (DoDNIC), and
- allocations managed by JPNIC.
The resulting pruned dataset (the
first dataset) has 1.07M
OrgID values and on the order of 1M overlapping allocated blocks
covering the equivalent of 42.6 /8's (which equal to a prefix length of
/2.59).
The ARIN whois dumps contain a large number of fine-grained
suballocation records; for example, there are DSL providers who have
entered a whois record for each of their customers--a mix of home
users and small organizations. These special customer records have
an easily identifiable OrgID value in the ARIN records; namely,
they consist of the letter "C" followed by 8 decimal digits, e.g.,
C00797141. The first dataset contains these customer records, but
including them in our analysis has some potential drawbacks because not
every customer record represents an organization and because some
organizations have multiple customer OrgID values. Therefore, to
balance our analysis, we also examine a second dataset, obtained by
pruning out all
883k customer records from the first dataset. The second dataset has 188k
OrgID values and on the order of 250k overlapping allocated blocks
covering the same amount of address space as the first dataset (there is
actually a slight discrepancy of 0.026% or the equivalent of a /14.5). The
excluded customer records cover about 3.3 /8's worth of space (that
overlaps with the retained blocks in the second dataset).
We present our analysis results mainly in the form of plots. These plots
show address allocations as a time series broken down by the category of
the receiving organization. The 7 categories (1, 2,
3, 4-7, 8-15, 16-31, and 32+) are
defined by the number of allocations received by an organization over time.
For instance, the 1 category represents organizations that have only
received 1 allocation up to a given point in time, while the
32+ category represents organizations that have received 32 or more
allocations up to a given point in time. These 7 categories are a
refinement of the informal categories defined in the introduction, with the
1 category corresponding to the new players and the remaining
categories corresponding to the old players.
We create the plots as follows:
for a date t, we examine all prefixes that are allocated on or
before t [the set P(t)], and based on the number of prefixes held by
each organization at the time t, determine the exact fraction
of the total address space [covered by P(t)] held by each category of
organization. This breakdown appears on a plot as a vertical slice at
x=t. We then repeat this process for every date t on the
x-axis. Note that the breakdown computed at x=t is never
influenced in any way by prefixes allocated after t. See the Methodology Details appendix for full
details on how the address space covered by overlapping prefixes is
attributed to the organizations holding those prefixes.
Figure 1. Breakdown by count of organizations in each category for
both datasets. The biggest difference is in the count of the
1 (new players) category, which ends at 1.05M for dataset 1
(black line) and at 167k for dataset 2 (grey line). The
differences in the other categories are minor and are not shown.
(This is not a stacked plot, unlike all remaining plots.)
Figure 1 shows the breakdown of allocations by the
count of organizations in each category for both datasets, and illustrates
the main difference between the two datasets. Dataset 1 includes customer
records while dataset 2 does not, and the impact of these records is
to significantly increase the number of organizations in the 1 (new
players) category for dataset 1 (black line), which reaches 1.05M at the
end. In contrast, the corresponding count is only 167k for dataset 2 (grey
line). There are around 35 times more new players than old players when
customer records are included (dataset 1) and around 6 times more new
players when customer records are excluded (dataset 2). Thus, even without
the special customer records, new players dominate in count.
Only 28 organizations have 32 or more allocations (the 32+
category), and fewer than 500 organizations have 8 or more allocations (the
8-15, 16-31, and 32+ categories). These counts are
small even when compared to the lesser of the new player counts, 167k for
dataset 2, which is at least 3 orders of magnitude greater than 28 and at
least 2 orders of magnitude greater than 500. Hence, organizations do not
commonly have many allocations (which does not necessarily imply
anything about the amount of address space held by organizations, since
individual allocations can vary greatly in size, e.g., from a /9 to a /29).
There are also slight differences in the counts of the other categories
between the two datasets, but they make little qualitative difference.
Figure 2. Breakdown by normalized amount of space in each category
for dataset 1 (which includes customer records). The indisputable
trend is toward increasing concentration of the address space in
old players -- as of August 2005, 56.4% of the address space is
held by old players. (Stacked plot.)
Figure 2 is a stacked plot for the first dataset
showing (on the right y-axis) the breakdown of allocations by the amount of
space held by each category of organization, plotted as a fraction of
total allocated space at a given point in time. The figure also
shows (on the left y-axis) the total amount of allocated space over
time. Fewer than the equivalent of 5 /8's were allocated prior to
1990, so we consider the breakdown only for the period after that
year. The trend in the breakdown is toward an increasing
concentration of the address space in old players -- by August 2005,
old players hold 56.4% of the address space.
Figure 3. Breakdown by normalized count of
organizations in each category for dataset 1 (which includes
customer records). The trend shows the old players constitute
a decreasing fraction of the total -- by August 2005, old
players represent less than 2%. (Stacked plot.)
Figure 3 is complementary to Figure 2 and presents a stacked plot for the first dataset
showing (on the right y-axis) the breakdown of allocations by the count of
organizations in each category. The graph shows a count of organizations as a
fraction of the total number of organizations at a given point in time.
The figure also shows (on the left y-axis) the total number of
organizations over time. The tremendous apparent growth in the number of
organizations after 2000 is largely due to the increasing number of
customer records. The trend in the breakdown is toward the old players
making up a decreasing fraction of the total, and by August 2005, old
players make up fewer than 2%. More impressively, fewer than 2% of all
organizations hold 56.4% of the address space.
However, the large disparity suggested by this figure may
be an overestimation caused by the possible incorrect inflation of the new
player count by the 883k customer records in the dataset. Therefore, we
cautiously interpret the figure as suggesting an upper bound on the
disparity. See Figure 5 for a lower bound.
Figure 4. Breakdown by normalized amount of space in each category
for dataset 2 (which does not include customer records). Even more
so than for dataset 1, the trend shows an increasing concentration
of the address space in old players, and by August 2005, old
players make up 63.4% of the address space. For comparison, the
white lines show the corresponding counts for dataset 1. (Stacked
plot.)
Figure 4 shows the same type of information as Figure 2 but for dataset 2. For comparison, the figure
also includes white lines showing the corresponding counts for dataset 1.
The trend shows an increasing concentration of the address space in old
players, even more so than for dataset 1, and by August 2005 , old players
hold 63.4% of the address space. As one would expect, the even higher
concentration in dataset 2 is caused by a shift to the old players of much
of the 3.3 /8's covered by customer records. Most of this shift is to the
4-7, 8-15, 16-31, and 32+ categories.
Because of the exclusion of customer records, the degree of address space
concentration suggested by this plot is most likely an overestimation and
therefore should be considered an upper bound. Similarly, the degree of
concentration suggested by Figure 2 should be
considered a lower bound.

Figure 5. Breakdown by normalized count of organizations in
each category for dataset 2 (which does not include customer
records). Despite more than a 100-fold increase in the total
number of organizations, the ratio of new players to old
players remains relatively steady from 1990 onwards. For
comparison, the white line shows the normalized count of the
1 category for dataset 1. (Stacked plot.)
Figure 5 shows the same type of information as Figure 3 but for dataset 2. For comparison, the figure
also includes a white line showing the normalized count of the 1
category for dataset 1. Despite more than a 100-fold increase in the total
number of organizations, the ratio of new players to old players remains
relatively steady from 1990 onwards, a marked contrast to the
increasing dominance of new players for dataset 1. Although the growth
rates are similar, new players still dominate over old players in absolute
number, with old players making up only 11.2% of all organizations in Aug
2005. In other words, by August 2005, 11.2% of all organizations hold
63.4% of the address space.
Because of the exclusion of customer records, the degree of disparity
suggested by this plot is most likely an underestimation and therefore
should be considered a lower bound. Similarly, the degree of disparity
suggested by Figure 3 should be considered an upper
bound.
Our analysis shows a trend in address allocations toward an
increasing concentration of the address space in old players. For
dataset 1, which considers customers as organizationally independent
of their provider, fewer than 2% of organizations hold 56% of the
address space, while for dataset 2, which considers customers as part
of their provider's organization (consistent with the difficulty of
changing providers without changing addresses), around 11% of
organizations hold 63% of the space.
We also find that most organizations have relatively few allocations.
In particular, fewer than 500 organizations have 8 or more
allocations, and the number of organizations with fewer than 8
allocations is at least 2 orders of magnitude greater than 500
(dataset 1 has 187k total organizations, and dataset 2 has an upper
bound of 1.07M total organizations). These 500 organizations hold
an equivalent of between 10 and 12 /8's, or around a quarter of the
total ARIN space covered by our analysis.
In answer to our narrower question, we can say that old players control
most of the address space. Under the naive
interpretation, this suggests that expansion of existing networks
drives the demand for IPv4 space rather than the creation of new
networks. A more thorough interpretation is left as future work.
This appendix describes in detail the procedure for attributing the
address space covered by overlapping prefixes to the organizations
holding those prefixes. Two prefixes can overlap in only one way;
namely, with one prefix wholly enclosed within the other. (Partial
overlap between two prefixes is illegal and does not occur in our
dataset. When two prefixes are equal, we exclude the prefix with the
older allocation date from our analysis.) We employ the three rules
described in the remainder of this section to handle overlapping
prefixes.
In the following descriptions, we represent a prefix with the
notation (1-----), where parentheses and hypens indicate the
span of the prefix and where the enclosed number is an arbitrary ID
that also indicates the allocation order. Time always flows from top
to bottom--that is, prefixes on the same line represent allocations
made at the same time, while prefixes on two separate lines represent
allocations made one after another, with the prefix higher up on the
page being allocated earlier than the prefix lower down on the page.
-
Rule 1 (Null). We ignore a suballocation by the same organization.
(1-------------------------) OrgID: X
(2----------) OrgID: X
If an organization holding a prefix makes a suballocation of that
prefix to itself, then we ignore that suballocation
in our analysis; that is, all the space covered by the suballocation
is counted only in the larger enclosing prefix. For example, suppose
an organization has a /16 assigned to it, from which it makes a /24
suballocation also registered to itself. Then that /24
suballocation is ignored for the purposes of counting. This rule
doesn't apply to suballocations assigned to a different
organization (e.g., an ISP may have a /16 from which it has assigned
/24's to customers).
In the following diagram, we show the application of Rule 1 to the
suballocation of prefix 2 from 1 shown in the above diagram. The line
of dots (....) indicates the span of a prefix that is ignored.
(1-------------------------)
(2..........)
-
Rule 2 (Suballocation). When a suballocation is to a different organization, the space covered by the suballocation is not counted in the parent prefix.
(1-------------------------) OrgID: X
(2----------) OrgID: Y
If an organization holding a prefix makes a suballocation of that
prefix to another organization, then the space covered by the
suballocation is counted in the suballocation but not in the original
parent prefix. That is, for the purposes of counting, we effectively
transfer the address space covered by a suballocation from the parent
prefix to the suballocation. In the above example, organization
X has made a suballocation of prefix 1 to organization
Y, leading to the transfer of the overlapping address space
from prefix 1 to prefix 2, as the following diagram illustrates. We
indicate this transfer of address space by cutting out the
suballocation from the parent prefix. (We use the vertical bar
character "|" simply to visually reinforce the boundary of the
suballocation.)
(1..........|--------------) OrgID: X
(2----------) OrgID: Y
-
Rule 3 (Subsumption). When a prefix subsumes a smaller prefix allocated earlier,
the overlapping address space is not counted in the larger prefix.
(1----------) OrgID: X
(2-------------------------) OrgID: Y
Normally, a smaller prefix is suballocated from a larger prefix, but
sometimes the process is reversed, with a larger prefix being
allocated later that subsumes an existing smaller prefix. For the
purposes of counting, when subsumption occurs, the address space of
the overlapping region in the larger prefix is not counted with the
larger prefix but attributed to the smaller prefix. Effectively, a
hole, equal to the smaller prefix, is punched out of the larger
prefix, as the following diagram shows:
(1----------) OrgID: X
(2..........|--------------) OrgID: Y
The following series of diagrams illustrates the application of the
above three rules. The 5 rows represent 5 consecutive time steps,
with a single prefix being introduced at each successive step.
Time Step | Rule | State |
| 0 | n/a |
(1----) (2----------------)
|
| 1 | suballocation |
(1----) (2-----|..........)
(3---------)
|
| 2 | subsumption |
(1----) (2-----|..........)
| | | (3---------)
(4-----|.....|-------|.................|-----)
|
| 3 | suballocation |
(1----) (2-----|..........)
| | | (3....|----)
(4-----|.....|-------|......|.....|....|-----)
(5----)
|
| 4 | suballocation & subsumption |
(1----) (2-----|..........)
| | | (3....|----)
(4-----|.....|---|...|......|.....|....|.....)
| | (5----) | |
(6--|.................|-----)
|
The full set of scripts needed to perform the analysis described on
this page is available at ipv4-concentration-analysis-1.0.tar.gz
You must additionally request and download a bulk whois dump from
ARIN. To exactly reproduce our work, you must have the bulk whois
dump from 2005-08-31, but because whois dumps are not archived, older
snapshots like this are unlikely to be available from ARIN.
The Xmgrace project files, which contain data as well as plot
settings, for the 5 plots appearing on this page are also available
for download. The plot thumbnails below link to the corresponding
project files. To download a project file, right-click a thumbnail
and choose Save Link As... (or the equivalent for your browser).
Figure 1. Breakdown by count of organizations in each category for both datasets.
|
|
Figure 2. Breakdown by normalized amount of space in each category for dataset 1 (which includes customer records).
|
|
Figure 3. Breakdown by normalized count of organizations in each category for dataset 1 (which includes customer records).
|
|
|
Figure 4. Breakdown by normalized amount of space in each category for dataset 2 (which does not include customer records).
|
|
Figure 5. Breakdown by normalized count of organizations in each category for dataset 2 (which does not include customer records).
|
|