This dataset contains information useful for studying the topology of the Internet. Data is collected by a globally distributed set of Ark monitors. The monitors use team-probing to distribute the work of probing the destinations among the available monitors.
We collect data by sending scamper probes continuously to destination IP addresses. Destinations are selected randomly from each routed IPv4 /24 prefix on the Internet such that a random address in each prefix is probed approximately every 48 hours (one probing cycle). Because team-probing distributes the probing work across all monitors, a single destination /24 will be probed by only one monitor in each probing cycle. The current list of routed IPv4 prefixes was created using RouteViews BGP tables from
July 3, 2013 . Rather than having a static list of IP addresses to probe, we dynamically pick a new random address in each /24 prefix for every new cycle of probing.The current prefix list includes approximately (with 6.5 million for data collected before November 2007).10.13 million prefixesScamper:
- Measures Forward IPv4 Paths
- scamper records an IPv4 address seen at each hop from a source to a destination by incrementing the "time to live" (TTL) of each IPv4 packet header, and recording replies from each router leading to the destination host.
- Measures Round Trip Times (RTT)
- scamper collects round trip time measured to each intermediate router as well as to the destination host.
In the current configuration, scamper probes with ICMP packets, using the Paris traceroute technique (ICMP-paris) to improve measurement integrity across load-balanced links. Data prior to November 2007 used an alternate UDP traceroute method. Data collected for each path probed includes:
Scamper also is able to collect Path MTU information, but current measurements do not include that information. A sample binary warts file is available.
- RTTs, including both intermediate hops and the destination
- IPID, TOS, and TTL, and size fields of response packets
- IP length, TTL, and TOS fields of the probe packet that reached each hop (extracted from the response packet)
- The ICMP type and code of responses
Data has been collected continuously since September 13, 2007, and is made available in hour-duration files for the most recent ten days, as well as a historical archive of 24-hour-duration files for the duration of IPv4 Routed /24 Topology project.
Caveats that apply to this dataset:
- The IPv4 Routed /24 Topology Dataset uses a dynamic destination list. Measurements to consistent IPv4 addresses are not available in this dataset.
- Because team-probing distributes measurements across many monitors, the randomly selected IP addresses in a given routed prefix are not probed by the same set of monitors consistently over time.
Reading Topology Data
You can analyze this data (available in the warts format) with the sc_analysis_dump tool included in the scamper distribution package. The sc_analysis_dump tool prints out information about each trace in an easy-to-parse textual format (one trace per line). You would typically write a perl script to analyze the output of sc_analysis_dump.
Another tool you may want to consider is the warts-dump tool, which is also included in the scamper distribution. The output of warts-dump is somewhat less easy to parse, but warts-dump prints out practically all information contained in a warts file.
Finally, you can write your analysis scripts in the Ruby language using rb-wartslib, an easy-to-use Ruby binding to the warts I/O library.
Data availability
- Data older than one year is available as a public dataset. Access to these data is subject to the terms of the following CAIDA Acceptable Use Agreement (printable version in PDF format).
When referencing this data (as required by the AUA), please use:
The CAIDA UCSD IPv4 Routed /24 Topology Dataset - < dates used >, http://www.caida.org/data/active/ipv4_routed_24_topology_dataset.xml.- Access to the most recent one year of data is provided through the website of the Information Marketplace for Policy and Analysis of Cyber-risk and Trust (IMPACT) and subject to the following:
When referencing this data (as required by the TOU), please use:
- IMPACT Memorandum of Agreement (MOA)
- CAIDA Terms of Use (TOU).
The CAIDA UCSD IPv4 Routed /24 Topology Dataset - < dates used >, www.impactcybertrust.org, DOI 10.23721/107/1354084
Please, report your publications using this dataset to CAIDA.
Request Data Access
- Access the publicly available CAIDA IPv4 Routed /24 Topology Dataset (and other topology data)
- Request Access to the restricted CAIDA IPv4 Routed /24 Topology Dataset via IMPACT
Topology Datasets
- Freely Available Datasets
- The Ark IPv4 Routed /24 Topology Dataset (data older than one year only)
- The Ark IPv4 Routed /24 DNS Names Dataset (data older than one year only)
- Ark Internet Topology Data Kits (ITDK) (data older than one year only)
- The Ark IPv6 Topology Dataset
- The Ark IPv6 DNS Names Dataset
- The IPv6 Routed /48 Topology Dataset
- IPv4 Routed /24 AS Links (September 2007 - ongoing)
- IPv6 AS Links (December 2008 - ongoing)
- AS Rank
- AS Relationships
- Skitter Macroscopic Topology Data
- Skitter Internet Topology Data Kits (ITDK) - April 2002 and April/May 2003
- Skitter AS Links (January 2000 - February 2008)
- Skitter Router Adjacencies
- AS Taxonomy
- PAM 2010 "Improving AS Annotations" Supplement
- Restricted Access Datasets
- The Ark IPv4 Routed /24 Topology Dataset (incl. most recent one year)
- The Ark IPv4 Routed /24 DNS Names Dataset (incl. most recent one year)
- The Ark IPv4 Prefix-Probing Dataset (incl. most recent one year)
- Ark Internet Topology Data Kits (ITDK) (incl. most recent one year)
- Complete Routed-Space DNS Lookups
References
For more information on CAIDA topology measurements, see:
For more information on topology measurements in general see: