Skitter AS Links Dataset
Skitter AS Links Dataset
The skitter infrastructure was retired on February 8, 2008 in favor of the next generation Archipelago (Ark) topology measurement infrastructure. Current AS Links data is available in the IPv4 Routed /24 AS Links Dataset.
The Skitter AS Links Dataset has also been called AS Adjacencies.
As a part of the Macroscopic Topology Project, CAIDA posts the adjacency matrix of the Internet AS-level graph computed daily from observed skitter measurements.
As a traceroute-based tool, skitter provides a view of Internet topology that differs from those derived from BGP tables, e.g. RouteViews. Because skitter data reflects packets that have actually traversed a forward path to a destination, rather than paths calculated and propagated across the loosely coupled BGP system, it is more likely than BGP data in isolation to faithfully correspond to IP topology. We note that while inherent limitations of the traceroute-based probing methodology hinder 100%-accurate extraction of the real Internet topology from skitter data, we seek data sources that are collectively most likely to capture a precise and coherent snapshot of macroscopic Internet structure.
Ideally, an AS-level graph would just list links between ASes. In practice, mapping skitter-observed IP addresses into AS numbers (using RouteViews BGP data) involves potential distortion due to IP prefixes advertised by:
- AS-sets (an aggregated set of ASes advertises the prefix);
- multi-origin ASes, aka MOASes (several separate ASes advertise the same prefix);
- no AS (some IP addresses appear in topology probes but are not advertised by any AS).
The data files we provide here preserve all three effects listed above and observed in actual measurements. The data analyst must decide (and make clear in explanations) how to process the described exceptions, e.g., indirect links may be either discarded or counted as real links. The AS links data files also contain information on the time of the probes and which skitter monitor observed a particular link. The data file headers have further details on the file format.
Notes and Caveats
- Some skitter monitors take more than 24 hours to execute one cycle of probing addresses in their destination lists. Therefore, no single data file provides a complete view of the topology observed by skitter since each data file is a result of aggregation over a 24-hour period only. For ITDK, we usually merge data for a period of approximately two to three weeks.
- Trace files from the new skdriver-based skitter (from late Jun 2004 onward) are not necessarily aligned on day boundaries. For example, in the extreme case, if the cycle starting time is 11:59:59 PM on Oct 1st, then the first trace file of the cycle will be named Oct 1st even though almost all traces were taken on Oct 2nd. Furthermore, different monitors start their cycles at different times. Hence, the AS links file for a given day may be derived, in the worst case, from a 48-hour period. So even though AS links files are produced separately for each day, the granularity of the analysis should not be assumed to be a single day.
-
A former bug in the generation scripts caused some daily AS links
files to include more days of data than the nominal single day. This
bug affected AS links files dated from late Jun 2004 to May 4, 2005,
which corresponds to about the first 11 months of the newer
skdriver-based skitter data set.
Around late Jun 2004, the naming of skitter files changed in response to improvements in skitter itself. In the past, traces were stored in daily files based on timestamp, but starting around Jun 2004, traces began to be organized instead by "cycle" (that is, a single pass through a destination list). A cycle has a starting time, which is used to name the files containing the traces of that cycle. For various reasons, the traces of a cycle are split into multiple files corresponding to consecutive non-overlapping 24-hour periods beginning on the cycle starting time. These files have a shared prefix containing the date of the cycle starting time, and are differentiated by a numeric suffix, starting with 000. For example, the following are the three files making up the cycle starting on 20050927 for the m-root monitor:
l006.m-root.20050927_000.arts l006.m-root.20050927_001.arts l006.m-root.20050927_002.arts
The file 20050927_001 corresponds to the second day of traces for this cycle (that is, these traces were nominally collected on 20050928), and similarly, 20050927_002 corresponds to the third day of traces (that is, 20050929).
The bug caused the skitter AS links file for each day covered by a cycle (that is, Sep 27th, 28th, and 29th for the above example) to contain the AS links for all the days in the cycle. The actual AS links themselves are undistorted, so the main problem caused by this bug is a slight decrease in the granularity of the data analysis (coarsening from a nominal single day to several days [about 6 days in the worst case]). - The dates in AS links filenames are in UTC, matching the UTC dates in skitter trace files.
- The time zone of the dates in the filenames of RouteViews BGP snapshots changed from Pacific Time to UTC on Mar 4, 2003. So there is a slight mismatch between the dates of skitter files (UTC) and BGP snapshots (Pacific Time) prior to this switchover in 2003, although this shouldn't make much of a difference in the appropriateness of the BGP snapshots selected for prefix-to-AS mappings.
- The first known link between two ASes is the only link reported, so if both an indirect and a direct link between the ASes was observed, and the indirect link was seen first, only an indirect link between the ASes would be reported; the direct link is not reported.
- The gap size reported is the first gap size observed for an indirect link between two ASes. If an indirect link with a smaller gap size is observed, the smaller gap size is not reported.
Acceptable Use Agreement
Please read the terms of the CAIDA Acceptable Use Agreement (AUA) for Publicy Accessible Datasets below:
When referencing this data (as required by the AUA), please use:
The Skitter AS Links Dataset - <dates used>,You are required to report your publications using this dataset to CAIDA.
https://www.caida.org/catalog/datasets/skitter_aslinks_dataset/
Data Access
Access the archived public CAIDA Skitter AS Links Dataset (January 2000 - February 2008)
Access the current public CAIDA Ark IPv4 Routed /24 AS Links Dataset (September 2007 - present)
Topology Datasets
- Freely Available Datasets
- The Ark IPv4 Routed /24 Topology Dataset (data older than one year only)
- The Ark IPv4 Routed /24 DNS Names Dataset (data older than one year only)
- IPv4 TNT MPLS Topology Dataset (data older than one year only)
- Ark Internet Topology Data Kits (ITDK) (data older than one year only)
- The Ark IPv6 Topology Dataset
- The Ark IPv6 DNS Names Dataset
- The IPv6 Routed /48 Topology Dataset
- IPv4 Routed /24 AS Links (September 2007 - ongoing)
- IPv6 AS Links (December 2008 - ongoing)
- AS Rank
- AS Relationships
- Skitter Macroscopic Topology Data
- Skitter Internet Topology Data Kits (ITDK) - April 2002 and April/May 2003
- Skitter AS Links (January 2000 - February 2008)
- Skitter Router Adjacencies
- AS Taxonomy
- PAM 2010 "Improving AS Annotations" Supplement
- Restricted Access Datasets
- The Ark IPv4 Routed /24 Topology Dataset (incl. most recent one year)
- The Ark IPv4 Routed /24 DNS Names Dataset (incl. most recent one year)
- IPv4 TNT MPLS Topology Dataset (incl. most recent one year)
- The Ark IPv4 Prefix-Probing Dataset (incl. most recent one year)
- Ark Internet Topology Data Kits (ITDK) (incl. most recent one year)
- Complete Routed-Space DNS Lookups
References
For more information on Autonomous Systems:
For more information on CAIDA topology measurements, see:
For more information on topology measurements in general see: