CAIDA Home
 www.caida.org > projects : : macroscopic
    donate     contact     search:
CAIDA: Cooperative Association for Internet Data Analysis
Macroscopic Topology Measurements

-----summary of contents-----

This page describes CAIDA Macroscopic Topology Measurements project that actively measures connectivity and latency data for a wide cross-section of the commodity Internet.

An analysis of Macroscopic IPv6 Topology Measurements is available.


-----end summary of contents-----

News

On Feb 8, 2008, after 10 years of data collection and 4TB of data, we deactivated skitter data collection and transitioned to our next generation topology measurement infrastructure named Archipelago (Ark). We already perform large-scale topology measurements on Ark, and we recommend researchers use this new dataset, which employs an improved measurement methodology. The new IPv4 Routed /24 Topology Dataset collected on Ark extends back to Sep 13, 2007 and overlaps with the last five months of skitter data.

Goals of the project

CAIDA started the Macroscopic Topology project in 1998. Our tools have been tracking global IP level connectivity by sending probe packets from a set of source monitors to hundreds of thousands of destinations stratifying the current IPv4 address space as well as the Earth.

The gathered data
  • characterize macroscopic connectivity and performance of the Internet,
  • allow various topological and geographical representations at multiple levels of aggregation granularity,
  • provide a valuable input for empirically-based modelling of the Internet behavior and properties.

The archive of raw data, sample analysis code, and daily snapshots of measurements are available to the research community. The AS adjacencies derived daily from our active connectivity measurements are also available.


Data collection

We use two sources of data for Macroscopic Topology studies: forward Internet (IP) path information and inter-domain BGP routing tables.

CAIDA has developed a special tool skitter which actively probes forward IP paths and round trip times (RTTs) from a skitter host to a specified list of destinations. We have deployed a number of monitors around the world. Each skitter monitor continuosly sends probe packets to destinations in its target list. The number of times each destination is probed per day depends primarily on the the total number of destinations in the target list and, to a lesser extent, on the current global conditions of the network. We store data in individual files classified by skitter host and by day, where day is defined as 24 hour period starting from midnight UTC.


We obtain routing infromation from inter-domain BGP routing tables provided by Route Views project. This project gathers BGP routing perspectives from more than 60 major ISPs worldwide. Each BGP table is a list of AS paths that packets should traverse from a given router to the prefix containing its destination IP address. The AS terminating an AS path for a given prefix in a core routing table is administratively responsible for this prefix and is called an origin AS. We use the combined BGP table to map IP addresses in our IP paths to their origin ASes. As of 2002, the combined table typically has nearly 120K globally routable prefixes.


Advantages and limitations of the data

  • skitter
    Five year collection of skitter paths is the most comprehensive archive of macroscopic topology measurements in the world. These data are available to the Internet research community and are a key input for realstic simulation and modeling research efforts. However, it is important to clearly understand the intrinsic limitations of the topology data obtained with the skitter tool.
    1. The success of our measurements depends on both the target destination and intermediate IP addresses in a path returning an ICMP ECHO_REPLY in answer to an ICMP ECHO_REQUEST sent by our tool. If ICMP packets get filtered out at some hops then we cannot obtain a complete path and/or RTT to the final destination which obviously decreases the amount of useful information from a given probe.
    2. The skitter tool cannot map IP paths behind firewalls or Network Address Translators (NATs). Continuing proliferation of these security means depletes the world-wide pool of destinations suitable for our monitoring. Over the lifetime of the project, we have noticed that the number of replying destinations in our lists decays at the rate of 2-3% per month.
    3. A fraction of destinations in our lists has IP addresses assigned by Dynamic Host Configuration Protocol (DHCP). The association between such an address and an actual host is temporary and random thus making topology measurements to a DHCP address of little value. We are working on a tool that should allow us to distinguish between permanent and DHCP IP addresses.
    4. Even the best topology coverage by skitter monitors is far from complete. We strive to have one monitored destination in each /24 network (256 IP addresses). However, there are over 16 million potential /24 segments in the IPv4 address space, and about 4 million of them are currently routable. Our largest lists usually have only about 800 K destinations.

    Contingent on funding availability, we will address the limitations of skitter methodology. We also plan to release a new version of this tool for dealing with IPv6 address space in 2003.


  • BGP tables
    Using publicly available BGP tables is a popular method for inferring Internet structure. The tables are easy to parse, process and comprehend. However, the usefullness of these data is also limited.

    BGP connectivity:
    1. does not capture lateral connectivity among regional networks;
    2. does not reveal short-term AS path variations and AS load balancing;
    3. most important, does not reflect how traffic actually travels toward a destination network.

Out of these two methodologies of studying the Internet topology, the skitter tool yields a finer granularity and more precise view of the Internet connectivity than can be inferred from BGP tables alone.


Auxilliary tools and utilities

The Macroscopic Topology project uses a number of auxilliary tools enhancing the functionality of the skitter tool.

  • iffinder

    A skitter monitor finds and records a single interface in each intermediate router along the path to a destination. However, routers usually have a few interfaces and it is possible that these interfaces also will be discovered by skitter probes. Accepting each interface as a separate router leads to inflation of resulting IP graph (in comparison with the actual network of Internet hosts) and falsely increases the length of shortest paths calcualted from the graph. To minimize this type or error CAIDA has developed a tool iffinder that attempts to discover which interfaces belong to the same router. By using this tool we aggregate IP-level graphs generated from skitter data into router-level graphs.

  • dnsstat

    This tool collects statistics of DNS queries on a specific nameserver or client. It counts numbers of messages and numbers of queries. The subjects of queries are never recorded. The dnsstat package is based on CAIDA's CoralReef tool. We used the dnsstat tool in order to build the DNS Clients list currently probed by 12 skitter monitors. The statistics of DNS queries were collected on seven DNS root servers.

  • skdesttest

    We spend significant efforts on building representative target lists for skitter probing. skdesttest is a tool that helps us to cull suitable destinations from much larger lists of candidate IP addresses collected elsewhere.

    The tool takes a list of IP prefixes that we want our measurements to represent and a list of candidate IP addresses. We can assign certain weights to candidates based on project-dependent criteria. skdesttest then tries to find a given number of highest-weight addresses (or all of them) within each prefix that "are alive", i.e. respond to a ping. Note that the tool pings each destination in the candidate list only once, thus making very low impact on the netowrk. Also, it does not ping any "forbidden" addresses such as broadcast addresses (host part all 0's or all 1's) of any prefix, addresses that are not globally routable unicast ones, and any addresses in the blocked prefixes specified by the user.

    Finally, if none of the candidate addresses with nonnegative weight in a given prefix responds to skdesttest ping, the tool may try certain "autogenerated" hosts (assuming they were not already in the candidate list). For example, for prefix 192.0.0.0/12, "autogen 0.0.0.1 255.255.255.254" would generate 192.0.0.1 and 192.15.255.254. If those do not respond, the tool will continue probing addresses with negative weight until one responds or the candidate list is exhausted.


Cooperative Association for Internet Data Analysis (CAIDA)
  Last Modified: Tues Apr-8-2008 10:23:58 PDT
  Page URL: http://www.caida.org/projects/macroscopic/index.xml