News
On Feb 8, 2008, after 10 years of data collection and 4TB of data,
we deactivated skitter data collection and transitioned to our next
generation topology measurement infrastructure named
Archipelago (Ark). We already perform
large-scale topology measurements on Ark, and we recommend researchers
use this new dataset, which employs an improved measurement methodology.
The new IPv4
Routed /24 Topology Dataset collected on Ark extends back
to Sep 13, 2007 and overlaps with the last five months of skitter
data.
Goals of the project
CAIDA started the Macroscopic Topology project in 1998. Our tools have been
tracking global IP level connectivity by sending probe packets from
a set of source
monitors to hundreds of thousands of destinations stratifying the current
IPv4 address space as well as the Earth.
The gathered data
- characterize macroscopic connectivity and performance of the Internet,
- allow various topological and geographical representations
at multiple levels of aggregation granularity,
- provide a valuable input for empirically-based modelling
of the Internet behavior and properties.
The archive of raw data,
sample analysis code, and daily snapshots of measurements are available to the
research community. The AS adjacencies derived daily from our active connectivity measurements are also available.
Data collection
We use two sources of data for Macroscopic Topology studies:
forward Internet (IP) path information and inter-domain BGP routing tables.
CAIDA has developed a special tool
skitter
which actively probes forward IP paths and round trip times (RTTs) from a
skitter host to a specified list of destinations. We have deployed
a number of monitors around the world. Each skitter monitor
continuosly sends probe packets to destinations in its target list.
The number of times each destination is probed per day depends
primarily on the the total number of destinations in the target list and,
to a lesser extent, on the current global conditions of the network.
We store data in individual files classified by skitter host and
by day, where day is defined as 24 hour period starting from midnight UTC.
We obtain routing infromation from inter-domain BGP routing tables provided by
Route Views project.
This project gathers BGP routing perspectives from more than 60 major ISPs
worldwide. Each BGP table is a list of AS paths that packets should traverse from
a given router to the prefix containing its destination IP address. The AS
terminating an AS path for a given prefix in a core routing table is
administratively responsible for this prefix and is called an origin AS.
We use the combined BGP table to map IP addresses in our IP paths
to their origin ASes. As of 2002, the combined table typically has nearly 120K
globally routable prefixes.
Advantages and limitations of the data
-
skitter
Five year collection of skitter paths is the most comprehensive
archive of macroscopic topology measurements in the world. These data are
available to the Internet research community and
are a key input for realstic simulation and modeling research efforts.
However, it is important to clearly understand the intrinsic limitations of
the topology data obtained with the skitter tool.
- The success of our measurements depends on both the target
destination and intermediate IP addresses in a path returning an ICMP ECHO_REPLY
in answer to an ICMP ECHO_REQUEST sent by our tool. If ICMP packets get
filtered out at some hops then we cannot obtain a complete path and/or RTT to
the final destination which obviously decreases the amount of useful information
from a given probe.
- The skitter tool cannot map IP paths behind firewalls or Network
Address Translators (NATs). Continuing proliferation of these security means
depletes the world-wide pool of destinations suitable for our monitoring. Over
the lifetime of the project, we have noticed that the number of replying
destinations in our lists decays at the rate of 2-3% per month.
- A fraction of destinations in our lists has IP addresses assigned by
Dynamic Host Configuration Protocol (DHCP). The association between such an
address and an actual host is temporary and random thus making topology
measurements to a DHCP address of little value. We are working on a tool
that should allow us to distinguish between permanent and DHCP IP addresses.
- Even the best topology coverage by skitter monitors is far
from complete. We strive to have one monitored destination in each /24 network
(256 IP addresses). However, there are over 16 million potential /24 segments
in the IPv4 address space, and about 4 million of them are currently routable.
Our largest lists usually have only about 800 K destinations.
Contingent on funding availability, we will address the limitations of
skitter methodology. We also plan to release a new version of this
tool for dealing with IPv6 address space in 2003.
-
BGP tables
Using publicly available BGP tables is a popular method for inferring
Internet structure. The tables are easy to parse, process and comprehend.
However, the usefullness of these data is also limited.
BGP connectivity:
- does not capture lateral connectivity among regional networks;
- does not reveal short-term AS path variations and AS load balancing;
- most important, does not reflect how traffic actually travels
toward a destination network.
Out of these two methodologies of studying the Internet topology, the
skitter tool yields a finer granularity and more precise view of the
Internet connectivity than can be inferred from BGP tables alone.
Auxilliary tools and utilities
The Macroscopic Topology project uses a number of auxilliary tools enhancing
the functionality of the skitter tool.
-
iffinder
A skitter monitor finds and records a single interface in each
intermediate router along the path to a destination. However, routers usually
have a few interfaces and it is possible that these interfaces also will be
discovered by skitter probes. Accepting each interface as a separate
router leads to inflation of resulting IP graph (in comparison with the actual
network of Internet hosts) and falsely increases the length of shortest
paths calcualted from the graph. To minimize this type or error CAIDA has
developed a tool iffinder that attempts to discover which interfaces
belong to the same router. By using this tool we aggregate
IP-level graphs generated from skitter data into router-level graphs.
-
dnsstat
This tool collects statistics of DNS queries on a specific nameserver or client.
It counts numbers of messages and numbers of queries. The subjects of queries
are never recorded. The dnsstat package is based on CAIDA's
CoralReef tool.
We used the dnsstat tool in order to build the DNS Clients list
currently probed by 12 skitter monitors. The statistics of DNS queries
were collected on seven DNS root servers.
-
skdesttest
We spend significant efforts on building representative target lists
for skitter probing. skdesttest is a tool that helps us
to cull suitable destinations from much larger lists of candidate IP
addresses collected elsewhere.
The tool takes a list of IP prefixes that we want our measurements to represent
and a list of candidate IP addresses. We can assign
certain weights to candidates based on project-dependent criteria.
skdesttest then tries to find a given number of highest-weight addresses
(or all of them) within each prefix that "are alive", i.e. respond to a ping.
Note that the tool pings each destination in the candidate list
only once, thus making very low impact on the netowrk. Also, it does not
ping any "forbidden" addresses such as
broadcast addresses (host part all 0's or all 1's) of any prefix,
addresses that are not globally routable unicast ones, and any addresses
in the blocked prefixes specified by the user.
Finally, if none of the candidate addresses with nonnegative weight in a given
prefix responds to skdesttest ping, the tool may try certain "autogenerated"
hosts (assuming they were not already in the candidate list).
For example, for prefix 192.0.0.0/12, "autogen 0.0.0.1 255.255.255.254"
would generate 192.0.0.1 and 192.15.255.254. If those do not
respond, the tool will continue probing addresses with negative weight
until one responds or the candidate list is exhausted.