Center for Applied Internet Data Analysis
research : traffic-analysis : arin-heatmaps
Measuring the use of IPv4 space with Heatmaps
To visualize the use of IPv4 Internet address space, we create heatmaps that use intensity of color (heat) to show the use of addresses belonging to the same network. These heatmaps also make use of a fractal mapping technique that describes a space-fitting curve. This technique, most recently popularized by Randall Munroe's xkcd #195, John Heidemann's ping-based Censuses of IPv4 Space and Duane Wessels' maps of Routeviews BGP, open DNS resolvers, and RIR IPv4 whois data, keeps adjacent IP addresses close to one another in the map. By creating these maps of observable empirical data, we hope to learn about how the current IPv4 address space is used. We use traffic data samples from two OC192 core backbone links in the U.S., and meta-data (only for ARIN's data) on what fraction and type of addresses are observably sending traffic on a busy backbone link at a busy weekday hour.

We first extracted IP addresses seen in 1 hour of traffic on both directions of a tier1 ISP backbone link between Chicago and Seattle (sample taken April 2008) and of a backbone link between Los Angeles and San Jose (sample taken July 2008). The unique number of /24 networks that we see on those links is roughly 11% of the total number of /24 networks of the whole IPv4 space.

Visualization method

These heatmaps depict levels of IPv4 traffic observed on two backbone links for each political category of IPv4 address: RIR-allocated, reserved, unallocated, legacy, legacy_rsa (details in the Data section). Each map represents the entire IPv4 address space as a Hilbert curve [3], where each pixel's color represents the observed fraction of IPv4 addresses belonging to the same /24 network (black = no data, blue = 2 ips observed, red = all ips seen). If we see n hosts for a single /24 network, we calculate its corresponding pixel color k as: k = 255 * ln(n) / ln(256). That is, for each IPv4 address seen in traffic over a link during the measurement window, we logarithmically increase the "temperature" (color) of its containing /24 network. In contrast to previous heatmaps [heidemann, wessels], we have remapped /8 blocks so that different categories are contiguous, which emphasizes the sizes of the categories. Note that we use data regarding which addresses are legacy from ICANN, and we only have data on which legacy addresses have RSAs signed from ARIN. Data from other RIRs, as well as other traffic data showing legitimate address usage, would improve the accuracy of the results.

Figures 1a and 1b show two heatmaps: one of the link between Chicago and Seattle (left), and one of the link between Los Angeles and San Jose (right) separately. In Figure 2a the heatmap represents IP addresses seen on either backbone links across their respective monitored hour (which was not at the same time.) The maps indicate that most traffic activity for this backbone traffic sample is concentrated in the allocated IPv4 space. Most of the traffic in our data set came from and was directed to this type of address. The legacy space (35% of the total address space) did not appear to generate or receive much Internet traffic during the monitoring interval. The measurements showed occasional traffic using in reserved and unallocated address space, traffic that appeared to be primarily scanning or multicast activity in our samples.

Since scanning activities mislead our methodology by "heating" networks as they scan them, whether those IPs respond or not, we next tried to remove patterns that look like scans and attacks and plot heatmaps with remaining "legitimate" traffic. Specifically, we first removed all non-TCP traffic, which leaves 94.57% of the bytes (UDP had 4.70%, ICMP 0.07%) and 88.15% of the packets (UDP had 10.13%, ICMP 0.40%) on Chicago-Seattle link, and 91.78% of the bytes (UDP had 7.26%, ICMP 0.07%) and 84.96% of the packets (UDP had 13.94%, ICMP 0.35%) on Los Angeles-San Jose link. We then removed all TCP packets with no payload, to filter out SYN floods, which left 94.09% of the total bytes and 83.21% of the total packets on the Chicago-Seattle link trace, and 91.01% of the total bytes and 78.13% of the total packets in the trace from the Los Angeles-San Jose link.

 Figures 1a and 1b: IPv4 traffic heatmaps of Chicago-Seattle link (left) and of Los Angeles-San Jose link (right) in 2008. The labels a_{arin,afrinic,apnic,lacnic,ripe} and l_{arin,afrinic,apnic,lacnic,ripe} stand for space allocated by and legacy space administered by the corresponding RIR. l_other is the legacy space not administered by an RIR. leg_rsa is the legacy space cover either by a standard RSA or specific legacy RSA.

Filtered HeatMaps

In Figures 2a, 2b and 2c we visualize how the address space in use changes as we filter out categories of likely non-legitimate traffic. Figure 2a (left) shows IP addresses for all IP traffic on both links, for only TCP traffic (Figure 2b, center), and for IP addresses that carry TCP data packets (Figure 2c, right). Figure 3 animates these images (press "start") to depict how filtering out pollution changes the sizes of each category.

The map gathered by considering all IP traffic clearly shows four /8 blocks (one RIR-allocated, three legacy) substantially in use until we filter out these types of likely non-legitimate traffic. The three blocks in the legacy space see unusually high ICMP (scan) traffic, and filtering out non-TCP traffic turns them completely off. Further filtering out traffic not carrying TCP data packets turns off the fourth block, consistent with scanning behavior across a significant contiguous portion of IPv4 space. We also filter out such traffic 'noise' within blocks, though it is harder to see the difference in the heatmap for blocks with considerable legitimate traffic levels. Interestingly, the fully utilized ("hot") /24s marked in red in Figure 2 turn out to send little actual TCP data traffic -- they go from red to blue or green when filtering out all non-TCP-data traffic -- suggesting they were hot because they were the target of a DoS attack during the trace interval.

 Figures 2a, 2b, and 2c: Overall heatmap of both Chicago-Seattle link and Los Angeles-San Jose link. From left to right: IP addresses for all IP traffic, for only TCP traffic, for IP addresses that carry TCP data traffic. The labels a_{arin,afrinic,apnic,lacnic,ripe} and l_{arin,afrinic,apnic,lacnic,ripe} stand for space allocated by and legacy space administered by the corresponding RIR. l_other is the legacy space not administered by an RIR. leg_rsa is the legacy space cover either by a standard RSA or specific legacy RSA.

 Figure 3: Overall heatmap of both Chicago-Seattle and Los Angeles-San Jose links (slideshow cycles through three views).

Caveats

We emphasize that the absence of traffic activity on this map may not mean much: one trace on one link can hardly be argued as representative of all Internet traffic, and existence of traffic on a few links is not the most satisfying definition of "address in use" anyway. But for macroscopic questions like "Is legacy address space being used on the Internet approximately the same way that RIR-allocated address space is being used?" which can inform policy discussions, we find insight even in this small a data sample.

Data

The "Internet Assigned Numbers Authority" [1] (IANA) is responsible for the allocation of IP addresses to regional address registries (RIRs), who then allocate them to organizations who need them, either Local or National Internet Address Registries (LIRs and NIRs) for further suballocation, or directly to Internet Service Providers and end sites participating in the global BGP routing system. The five global RIRs are: ARIN for North America and parts of the Caribbean, RIPE NCC for Europe, the Middle East and Central Asia, APNIC for Asia and the Pacific region, LACNIC for Latin America and parts of the Caribbean region, and AFRINIC for Africa. IANA maintains a table of allocations of IPv4 8-bit prefixes [2] using four categories:

 allocated Allocated by the RIRs (under a standard RSA) reserved Reserved address space (RFC1918, multicast etc.) unallocated The free pool of IPv4 addresses legacy IPv4 address space given out before RIRs, and not under an RSA

The Legacy space can be split into three subsets:

• Legacy RSA: (5.4%)
Address blocks that are covered either by a standard RSA or specific legacy RSA. These are labeled leg_rsa . We only have data from ARIN for this class of legacy address space.
• Legacy space not under RSA, but administered by an RIR: (54.3%)
These blocks are labeled l_afrinic,l_apnic,l_arin,l_lacnic,l_ripe .
• Legacy space not under RSA and not administered by an RIR: (40.3%)
These blocks are labeled l_other .

To the best of our knowledge, administered by doesn't mean an address block is under an RSA with an RIR.

We first extracted IP addresses seen in 1 hour of traffic on both directions of a tier1 ISP backbone link between Chicago and Seattle (sample taken April 2008) and of a backbone link between Los Angeles and San Jose (sample taken July 2008). On the Chicago-Seattle link we observe 923,896 /24 networks with at least 1 IP address in use; on the Los Angeles-San Jose link we observe 1,137,559 /24 networks. The unique number of /24 networks that we see on those links is roughly 11% of the total number of /24 networks of the whole IPv4 space.

References

• [1] http://www.iana.org/