CAIDA Home
 Macroscopic Topology | IMDC | COMMONS | Network Telescope | Ark | Day in the Life | Coralreef | IPNC  
 www.caida.org > projects : macroscopic : list.xml
    visit     contact     search:
CAIDA: Cooperative Association for Internet Data Analysis
skitter Destination Lists

-----summary of contents-----
Details on the creation and composition of the set of addresses ("destination lists") probed by skitter.
-----end summary of contents-----

Introduction

CAIDA Internet topology monitors continuously probe hosts ("destinations") in its target list ("destination list"). The primary criterion in selecting destinations for monitoring is their responsiveness. A destination responds to a probe if it returns an ICMP ECHO_REPLY in answer to an ICMP ECHO_REQUEST. Responding destinations provide RTT and non-truncated path information. Destinations with a low response rate, or those not replying at all, not only contribute little useful information, but also considerably slow down the probing process.

The pool of replying destinations is constantly decreasing due to proliferation of firewalls, changing IP addresses and other reasons. Therefore, we need to refresh our lists about every 8 to 12 months for continuing projects.

Current Destination List

Family Version Date Size Monitors
IPv4 list ipv4.20040120.0971k 01/20/2004 971,054 all monitors


  1. IPv4 list

    • Description:
      A set of 971k IP addresses representing a wide cross section of the routed IPv4 space and consisting of end hosts, web servers, routers, etc. These addresses are known to respond to ICMP ECHO_REQUEST probing by tools such as skitter. This set of addresses covers 79,660 (62.6% of 127,209 total) semiglobal prefixes and 917,529 /24's.

      (A semiglobal prefix is a prefix announced by at least 22 of the 41 RouteViews peers [one per AS] active on Dec 9, 2003. There were 127,202 semiglobal prefixes during the making of this list.)

    • Goal of the list:
      The goal of this destination list is to provide representative coverage of the routed IPv4 space for topology measurements. Ideally, this list should have an IP address in each populated /24 subdivision of the globally routed space, but this ideal is not achievable because of (1) considerations of list size, (2) ICMP blocking by firewalls, and (3) prefixes that have been requested to be excluded from probing. Nevertheless, we have tried as best we can to maximize the coverage of prefixes by including at least one destination in every responding /24 prefix subdivision and by including at least two destinations in every semiglobal prefix.

      By including a destination in every responding /24 subdivision, we ensure that large semiglobal prefixes are represented in proportion to their size. This approach is particularly important for large prefixes (such as /8's) that contain many customer networks but have few, if any, announced prefixes for these subnetworks.

    • Creating the list:
      The process includes the following steps.

      1. Collecting initial set of IP addresses.
        We gather as many actively used IP addresses as we can from a number of sources:

        • 11.3M addresses in 11 OC48 packet traces taken in two Tier 1 ISP networks (7 traces in 2003; rest in 2001 and 2002; all source addresses, and only destinations that received more than 3 packets on average per flow),
        • 6.5M hosts sending ICMP echo reply packets on an OC48 ISP link, as collected by NeTraMet over a 20-hour period,
        • 5.9M source addresses (both directions) observed on an OC48 ISP link, as collected by NeTraMet over an 18-hour period,
        • 16.3M hosts that sent RFC1918 updates to f-root and m-root in 2003 (about 1 year of data collection at f-root, and about 11 months at m-root),
        • 834k hosts that accessed CAIDA's website in a period of 3 2/3 years,
        • 1.7M hosts that accessed NetGeo in a period of 4 years, and
        • 427k destinations in the previous ipv4.20030225.865k skitter list that respond >= 90% of the time.

        The above sources yield 33.6M total unique addresses covering 101k (79.5% of 127k) semiglobal prefixes (the coverage, when excluding addresses carried over from the previous list, is 91.6k [72.0%] semiglobal prefixes).

      2. Filtering out addresses that do not want to be probed.
        Over the years, we have accumulated a list of addresses and prefixes that complained about receiving skitter packets. These addresses are carefully excluded from any future probing.

      3. Finding responding destinations.
        We run skdesttest to determine which candidate destinations respond to ICMP probing. (skdesttest is a CAIDA tool that can send ICMP Echo Requests to a large number of hosts in a net-friendly manner.) We also calculate response rate by making multiple passes (or "cycles") through the set of candidate destinations, with the probes to any given destination being separated by many hours.

        All in all, in the process of creating the new list, we execute around 6 dozen skdesttest cycles (of several sets of candidate destinations) in a 5 week period starting in Dec 2003.

        In addition to probing the 33.6M addresses gathered by various means, we also systematically probe addresses in routed prefixes in order to find responding destinations in a greater number of prefixes (since address space coverage is our goal). In particular, we decompose prefixes that either have no responding addresses or have fewer than 3 highly-responsive addresses into /27's or /28's and probe the .1 address in each /27 or /28.

        The following paragraphs describe the probing procedure and the various probing sets in great detail.

        We first run skdesttest for just 1 cycle on the initial collection of 33.6M addressses, excluding addresses from the previous list, and find 8.2M responding destinations covering 50k semiglobal prefixes. We then probe these responding destinations over 23 cycles to calculate response rate.

        We next generate a second set of addresses by systematically picking addresses in prefixes that either have no responding addresses or have fewer than 3 highly-responsive addresses (highly-responsive addresses respond 16 or more times in 18 cycles of probing). We decompose such underrepresented prefixes to /28's and pick the .1 address in each /28. This second set has 43.1M addresses with the following composition:
                    no responses: 31.4M /28's in 35.8k prefixes
                    low response: 6.9M /28's in 19.3k prefixes
            unprobed semiglobals: 7.6M /28's in 35.6k prefixes
        
        Running skdesttest for 4 cycles on this second set produces 723k responding addresses in 37k prefixes.

        We next probe 24 cycles of all responding destinations up to this point to calculate their response rate. Of the 9.1M addresses probed, 8.5M respond at least once (3.3M respond every time), and they cover 71.5k (56.2% of 127k) semiglobal prefixes and 646k /24's.

        Finally, we probe 15 /8's, enclosing 9.3k semiglobal prefixes, that were inadvertently blocked from probing in earlier stages. We also probe a new set of 6.3M addresses collected by NeTraMet over an 8-hour period (all source addresses observed on a large ISP link). This final probe set has 22.4M addresses, of which 13M are addresses in the initial set of 33.6M collected addresses that had been blocked, and 3M are addresses systematically picked to ensure that every /27 of each blocked prefix has a probe address.

        Twelve cycles of this final set produces 12.3M responding addresses in 34.2k prefixes (6.9k previously blocked prefixes) and 475k /24's. This set increases the number of prefixes known to have responding addresses by 8.1k.

      4. Producing the final IPv4 list.
        We produce the final set of candidate addresses by combining
        1. responding destinations found with skdesttest, and
        2. highly-reponsive (>= 90% response rate) destinations in the previous skitter list.

        This yields a set of 20.8M addresses that respond at least once and which cover
              79,660 (62.6% of 127,209) semiglobal prefixes,
             917,529 /24's, and
           5,454,724 /28's.
        
        The set of 10.1M addresses with a 50% or greater response rate cover
              76,958 (60.5%) semiglobal prefixes,
             818,506 /24's, and
           3,858,631 /28's.
        
        The final skitter list versions and their composition are as follows:

        list size composition strategy actual composition
        156,777 1 BGP + 1 in /20 115,182 /20's or 156,777 /24's
        294,233 2 BGP + 1 in /21 198,757 /21's or 240,708 /24's
        634,488 2 BGP + 1 in /23 564,841 /23's or 580,963 /24's
        971,054 2 BGP + 1 in /24 917,529 /24's

        where
           x BGP = x best responding destination(s) per semiglobal prefix
        y in /zz = y best responding destination(s) in every /zz subdivision
        
        NOTE: The same address can be chosen as the best responding in both a given semiglobal prefix and a given prefix length subdivision. Each skitter list is a strict superset of all smaller lists.

Past Destination Lists

Family Version Date Size Monitors
IPv4 list ipv4.20030225.147k 02/25/2003 147,016 champagne, kaist, riesling
ipv4.20030225.366k 02/25/2003 365,605 apan-jp, iad, lhr, nrt, sjc, yto
ipv4.20030225.814k 02/25/2003 814,356 mwest, uoregon
DNS Clients list dns.20030113.0147k 01/13/2003 147,943 a-root, b-root, d-root, e-root, f-root, g-root, h-root, i-root, k-root, m-root, k-peer, cdg-rssac
Archive various 1998-2002 various retired


  1. IPv4 list

    • Goal of the list:
      Provide representative coverage of the routable IPv4 space for topology measurements. Ideally, this list should have an IP address in each populated /24 prefix of the global Internet space.

      Current versions of the IPv4 list attempt to include (1) one or two destination in every semiglobal prefix and (2) destination in every CIDR block of a certain prefix length. Semiglobal prefix is announced by at least 18 of the 36 Route Views peers providing largest BGP tables. Specifically, we used a Route Views snapshot taken at noon on Feb 15, 2003. By including a destination in every CIDR block of a given length, we ensure that large semiglobal prefixes are represented in proportion to their size. This approach is particularly important for large prefixes (/8) that contain many customer networks but have few, if any, announced prefixes for these subnetworks.

    • Creating the list:
      The process includes the following steps.

      1. Collecting initial set of IP addresses.
        We glean as many as possible IP addresses from a number of sources available:
        - existing skitter lists;
        - intermediate addresses in skitter traces;
        - users who have accessed CAIDA website;
        - users who have made requests to NetGeo;
        - various packet traces;

      2. Filtering out addresses that do not want to be probed.
        Over years of topology probing project, we have accumulated a list of addresses, networks and prefixes that complained about receiving skitter packets. These addresses are carefully excluded from any future probing.

      3. Running CAIDA tool skdesttest.
        - The tool takes a list of candidate IP addresses and a list of IP prefixes that we want our measurements to represent. It tries to find addresses within each prefix that "are alive", i.e. respond to a ping. Note that the tool pings each destination in the candidate list only once, thus making very low impact on the network.
        - If there is no candidate addresses in a given prefix or if none of them responds to a ping, skdesttest probes pre-determined addresses in this /24 network that are statistically likely to have a responding host (currently, up to 14 addresses per /24 network).

      4. Test run of preliminary destination list on mw.skitter.caida.org skitter monitor.
        This monitor is one of our best connected monitors. It generally has the best response rate and is capable of handling larger destination lists than other monitors. Destinations that do not respond to probing from this monitor (or have a low response rate) are even less likely to respond to other monitors.

      5. Producing the final IPv4 list.
        Using test run data from mw.skitter.caida.org (Jan 10 - Feb 12, 2003) and from a-root.skitter.caida.org (Feb 7 - Feb 13, 2003) we selected destinations with a response rate of at least 50% and made three versions of IPv4 list.

        1) one destination per semiglobal prefix, one destination per /20.
        List: ipv4.20030225.147k = 147,016 destinations.
        Runs on champagne, kaist, riesling.

        2) two destinations per semiglobal prefix, one destination per /22.
        List: ipv4.20030225.366k = 365,605 destinations.
        Runs on apan-jp, iad, lhr, nrt, sjc, yto.

        3) the union of the version (2) and best responding destinations per /26.
        List: ipv4.20030225.814k = 814,356 destinations.
        Runs on mwest, uoregon.



  2. DNS list

    • Goal of the list:
      Provide representative coverage of clients querying DNS root name servers. Ideally, this list should include name servers that supply DNS services to large numbers of clients. For representativeness, we would like to have an IP address in each routed BGP prefix. Specifically, we tried to match the list of prefixes from a BGP table from January 11, 2003 by including a destination in every CIDR block of the /24 length. This approach ensures that large BGP prefixes are represented more in proportion to their size.

    • Creating the list:
      The process includes the following steps.

      1. Collecting initial set of IP addresses.
        We use the dnsstat tool to collect the statistics of queries on root name servers and extract IP addresses of hosts addressing the roots. In August 2002, we run this tool for 7 days on E, F, I, K, and M root servers and for 4 days on A root server and obtained a few million of IP addresses. Out of this pool, we kept 330610 addresses that sent at least one MX query and either A type query or PTR request.

      2. Test run of candidate addresses.
        We run the list of candidate IP addresses on all skitter monitors co-located with root servers and discarded 117770 destinations that did not reply to our probes during the week of January 7 - January 13, 2003.

      3. Producing the final DNS list.
        We tried to find one destination in each /24 and preferred destinations with a higher response rate.

        List: dns.20030113.0147k = 147,943 destinations.
        Runs on a-root, b-root, d-root, e-root, f-root, g-root, h-root, i-root, k-root, m-root, k-peer, and cdg-rssac.


Cooperative Association for Internet Data Analysis (CAIDA)
  Last Modified: Tues Jan-8-2008 15:42:53 PDT
  Maintained by: Alex Ma
  Page URL: http://www.caida.org/projects/macroscopic/list.xml