skitter Destination Lists

| 
|
|
Details on the creation and composition of the set of addresses ("destination lists") probed by skitter.
| 
|

|
Introduction
CAIDA Internet topology monitors continuously probe hosts
("destinations") in its target list ("destination list"). The primary
criterion in selecting destinations for monitoring is their
responsiveness. A destination responds to a probe if it returns an
ICMP ECHO_REPLY in answer to an ICMP ECHO_REQUEST. Responding
destinations provide RTT and non-truncated path information.
Destinations with a low response rate, or those not replying at all,
not only contribute little useful information, but also considerably
slow down the probing process.
The pool of replying destinations is constantly decreasing due to
proliferation of firewalls, changing IP addresses and other
reasons. Therefore, we need to refresh our lists about every 8 to 12
months for continuing projects.
Current Destination List
-
IPv4
list
-
Description:
A set of 971k IP addresses representing a wide cross section of
the routed IPv4 space and consisting of end hosts, web servers,
routers, etc. These addresses are known to respond to ICMP
ECHO_REQUEST probing by tools such as skitter. This set of
addresses covers 79,660 (62.6% of 127,209 total) semiglobal
prefixes and 917,529 /24's.
(A semiglobal prefix is a prefix announced by at least 22 of the
41 RouteViews peers [one per AS] active on Dec 9, 2003. There
were 127,202 semiglobal prefixes during the making of this list.)
-
Goal of the list:
The goal of this destination list is to provide representative
coverage of the routed IPv4 space for topology measurements.
Ideally, this list should have an IP address in each populated /24
subdivision of the globally routed space, but this ideal is not
achievable because of (1) considerations of list size, (2) ICMP
blocking by firewalls, and (3) prefixes that have been requested to
be excluded from probing. Nevertheless, we have tried as best we can
to maximize the coverage of prefixes by including at least one destination in
every responding /24 prefix subdivision and by including at least two
destinations in every semiglobal prefix.
By including a destination in every responding /24 subdivision, we
ensure that large semiglobal prefixes are represented in proportion
to their size. This approach is particularly important for large
prefixes (such as /8's) that contain many customer networks but have
few, if any, announced prefixes for these subnetworks.
-
Creating the list:
The process includes the following steps.
-
Collecting initial set of IP addresses.
We gather as many actively used IP addresses as we can from
a number of sources:
-
11.3M addresses in 11 OC48 packet traces taken in two Tier
1 ISP networks (7 traces in 2003; rest in 2001 and 2002;
all source addresses, and only destinations that received
more than 3 packets on average per flow),
-
6.5M hosts sending ICMP echo reply packets on an OC48 ISP
link, as collected by NeTraMet over a 20-hour period,
-
5.9M source addresses (both directions) observed on an OC48
ISP link, as collected by NeTraMet over an 18-hour period,
-
16.3M hosts that sent RFC1918 updates to f-root and m-root
in 2003 (about 1 year of data collection at f-root, and about
11 months at m-root),
-
834k hosts that accessed CAIDA's website in a period of
3 2/3 years,
-
1.7M hosts that accessed NetGeo in a period of 4 years, and
-
427k destinations in the previous ipv4.20030225.865k
skitter list that respond >= 90% of the time.
The above sources yield 33.6M total unique addresses covering
101k (79.5% of 127k) semiglobal prefixes (the coverage, when
excluding addresses carried over from the previous list,
is 91.6k [72.0%] semiglobal prefixes).
-
Filtering out addresses that do not want to be
probed.
Over the years, we have accumulated a list of addresses and
prefixes that complained about receiving skitter packets. These
addresses are carefully excluded from any future probing.
-
Finding responding destinations.
We run skdesttest to determine which candidate destinations
respond to ICMP probing. (skdesttest is a CAIDA
tool that can send ICMP Echo Requests to a large number of
hosts in a net-friendly manner.) We also calculate response
rate by making multiple passes (or "cycles") through the set of
candidate destinations, with the probes to any given
destination being separated by many hours.
All in all, in the process of creating the new list, we execute
around 6 dozen skdesttest cycles (of several sets of candidate
destinations) in a 5 week period starting in Dec 2003.
In addition to probing the 33.6M addresses gathered by various
means, we also systematically probe addresses in routed prefixes
in order to find responding destinations in a greater number
of prefixes (since address space coverage is our goal). In
particular, we decompose prefixes that either have no
responding addresses or have fewer than 3 highly-responsive
addresses into /27's or /28's and probe the .1 address in
each /27 or /28.
The following paragraphs describe the probing procedure and
the various probing sets in great detail.
We first run skdesttest for just 1 cycle on the initial
collection of 33.6M addressses, excluding addresses from the
previous list, and find 8.2M responding destinations
covering 50k semiglobal prefixes. We then probe these
responding destinations over 23 cycles to calculate response
rate.
We next generate a second set of addresses by systematically
picking addresses in prefixes that either have no responding
addresses or have fewer than 3 highly-responsive addresses
(highly-responsive addresses respond 16 or more times in 18
cycles of probing). We decompose such underrepresented
prefixes to /28's and pick the .1 address in each /28.
This second set has 43.1M addresses with the following
composition:
no responses: 31.4M /28's in 35.8k prefixes
low response: 6.9M /28's in 19.3k prefixes
unprobed semiglobals: 7.6M /28's in 35.6k prefixes
Running skdesttest for 4 cycles on this second set produces
723k responding addresses in 37k prefixes.
We next probe 24 cycles of all responding destinations up to
this point to calculate their response rate. Of the 9.1M
addresses probed, 8.5M respond at least once (3.3M respond
every time), and they cover 71.5k (56.2% of 127k) semiglobal
prefixes and 646k /24's.
Finally, we probe 15 /8's, enclosing 9.3k semiglobal prefixes,
that were inadvertently blocked from probing in earlier stages.
We also probe a new set of 6.3M addresses collected by NeTraMet
over an 8-hour period (all source addresses observed on a large
ISP link). This final probe set has 22.4M addresses, of which
13M are addresses in the initial set of 33.6M collected
addresses that had been blocked, and 3M are addresses
systematically picked to ensure that every /27 of each blocked
prefix has a probe address.
Twelve cycles of this final set produces 12.3M responding
addresses in 34.2k prefixes (6.9k previously blocked prefixes)
and 475k /24's. This set increases the number of prefixes
known to have responding addresses by 8.1k.
-
Producing the final IPv4
list.
We produce the final set of candidate addresses by combining
- responding destinations found with skdesttest, and
- highly-reponsive (>= 90% response rate) destinations in
the previous skitter list.
This yields a set of 20.8M addresses that respond at least
once and which cover
79,660 (62.6% of 127,209) semiglobal prefixes,
917,529 /24's, and
5,454,724 /28's.
The set of 10.1M addresses with a 50% or greater response
rate cover
76,958 (60.5%) semiglobal prefixes,
818,506 /24's, and
3,858,631 /28's.
The final skitter list versions and their composition are as follows:
| list size |
composition strategy |
actual composition |
| 156,777 |
1 BGP + 1 in /20 |
115,182 /20's or 156,777 /24's |
| 294,233 |
2 BGP + 1 in /21 |
198,757 /21's or 240,708 /24's |
| 634,488 |
2 BGP + 1 in /23 |
564,841 /23's or 580,963 /24's |
| 971,054 |
2 BGP + 1 in /24 |
917,529 /24's |
where
x BGP = x best responding destination(s) per semiglobal prefix
y in /zz = y best responding destination(s) in every /zz subdivision
NOTE: The same address can be chosen as the best responding
in both a given semiglobal prefix and a given prefix length subdivision.
Each skitter list is a strict superset of all smaller
lists.
Past Destination Lists
| Family |
Version |
Date |
Size |
Monitors |
| IPv4 list |
ipv4.20030225.147k |
02/25/2003 |
147,016 |
champagne, kaist, riesling |
|
ipv4.20030225.366k |
02/25/2003 |
365,605 |
apan-jp, iad, lhr, nrt, sjc, yto |
|
ipv4.20030225.814k |
02/25/2003 |
814,356 |
mwest, uoregon |
|
| DNS Clients list |
dns.20030113.0147k |
01/13/2003 |
147,943 |
a-root, b-root, d-root, e-root, f-root, g-root, h-root, i-root,
k-root, m-root, k-peer, cdg-rssac |
|
| Archive |
various |
1998-2002 |
various |
retired |
-
IPv4
list
-
Goal of the list:
Provide representative coverage of the routable IPv4 space for
topology measurements. Ideally, this list should have an IP address
in each populated /24 prefix of the global Internet space.
Current versions of the IPv4 list attempt to include (1) one or
two destination in every semiglobal prefix and (2) destination in
every CIDR block of a certain prefix length. Semiglobal prefix is
announced by at least 18 of the 36 Route Views peers providing
largest BGP tables. Specifically, we used a Route Views snapshot
taken at noon on Feb 15, 2003. By including a destination in every
CIDR block of a given length, we ensure that large semiglobal
prefixes are represented in proportion to their size. This approach
is particularly important for large prefixes (/8) that contain many
customer networks but have few, if any, announced prefixes for
these subnetworks.
-
Creating the list:
The process includes the following steps.
-
Collecting initial set of IP addresses.
We glean as many as possible IP addresses from a number of sources
available:
- existing skitter lists;
- intermediate addresses in skitter traces;
- users who have accessed CAIDA website;
- users who have made requests to NetGeo;
- various packet traces;
-
Filtering out addresses that do not want to be
probed.
Over years of topology probing project, we have accumulated a list
of addresses, networks and prefixes that complained about receiving
skitter packets. These addresses are carefully excluded from
any future probing.
-
Running CAIDA tool skdesttest.
- The tool takes a list of candidate IP addresses and a list of IP
prefixes that we want our measurements to represent. It tries to
find addresses within each prefix that "are alive", i.e. respond to
a ping. Note that the tool pings each destination in the candidate
list only once, thus making very low impact on the network.
- If there is no candidate addresses in a given prefix or if none
of them responds to a ping, skdesttest probes pre-determined
addresses in this /24 network that are statistically likely to have
a responding host (currently, up to 14 addresses per /24
network).
-
Test run of preliminary destination list on
mw.skitter.caida.org skitter monitor.
This monitor is one of our best connected monitors. It generally
has the best response rate and is capable of handling larger
destination lists than other monitors. Destinations that do not
respond to probing from this monitor (or have a low response rate)
are even less likely to respond to other monitors.
-
Producing the final IPv4
list.
Using test run data from mw.skitter.caida.org (Jan 10 - Feb 12,
2003) and from a-root.skitter.caida.org (Feb 7 - Feb 13, 2003) we
selected destinations with a response rate of at least 50% and made
three versions of IPv4 list.
1) one destination per semiglobal prefix, one destination per
/20.
List: ipv4.20030225.147k = 147,016 destinations.
Runs on champagne, kaist, riesling.
2) two destinations per semiglobal prefix, one destination per
/22.
List: ipv4.20030225.366k = 365,605 destinations.
Runs on apan-jp, iad, lhr, nrt, sjc, yto.
3) the union of the version (2) and best responding destinations
per /26.
List: ipv4.20030225.814k = 814,356 destinations.
Runs on mwest, uoregon.
-
DNS
list
-
Goal of the list:
Provide representative coverage of clients querying DNS root name
servers. Ideally, this list should include name servers that supply
DNS services to large numbers of clients. For representativeness,
we would like to have an IP address in each routed BGP prefix.
Specifically, we tried to match the list of prefixes from a BGP
table from January 11, 2003 by including a destination in every
CIDR block of the /24 length. This approach ensures that large BGP
prefixes are represented more in proportion to their size.
-
Creating the list:
The process includes the following steps.
-
Collecting initial set of IP addresses.
We use the dnsstat tool to collect the statistics of
queries on root name servers and extract IP addresses of hosts
addressing the roots. In August 2002, we run this tool for 7 days
on E, F, I, K, and M root servers and for 4 days on A root server
and obtained a few million of IP addresses. Out of this pool, we
kept 330610 addresses that sent at least one MX query and either A
type query or PTR request.
-
Test run of candidate addresses.
We run the list of candidate IP addresses on all skitter
monitors co-located with root servers and discarded 117770
destinations that did not reply to our probes during the week of
January 7 - January 13, 2003.
-
Producing the final DNS list.
We tried to find one destination in each /24 and preferred
destinations with a higher response rate.
List: dns.20030113.0147k = 147,943 destinations.
Runs on a-root, b-root, d-root, e-root, f-root, g-root, h-root,
i-root, k-root, m-root, k-peer, and cdg-rssac.
|
|