Skitter AS Links Dataset
This dataset is cataloged in DatCat with handle http://imdc.datcat.org/collection/1-000W-X=CAIDA+skitter+AS+Links+Topology.The skitter infrastructure was retired on February 8, 2008 in favor of the next generation Archipelago (Ark) topology measurement infrastructure. Current AS Links data is available in the IPv4 Routed /24 AS Links Dataset.
The Skitter AS Links Dataset has also been called AS Adjacencies.
As a part of the Macroscopic Topology Project, CAIDA posts the adjacency matrix of the Internet AS-level graph computed daily from observed skitter measurements.
As a traceroute-based tool, skitter provides a view of Internet topology that differs from those derived from BGP tables, e.g. RouteViews. Because skitter data reflects packets that have actually traversed a forward path to a destination, rather than paths calculated and propagated across the loosely coupled BGP system, it is more likely than BGP data in isolation to faithfully correspond to IP topology. We note that while inherent limitations of the traceroute-based probing methodology hinder 100%-accurate extraction of the real Internet topology from skitter data, we seek data sources that are collectively most likely to capture a precise and coherent snapshot of macroscopic Internet structure.
Ideally, an AS-level graph would just list links between ASes. In practice, mapping skitter-observed IP addresses into AS numbers (using RouteViews BGP data) involves potential distortion due to IP prefixes advertised by:
- AS-sets (an aggregated set of ASes advertises the prefix);
- multi-origin ASes, aka MOASes (several separate ASes advertise the same prefix);
- no AS (some IP addresses appear in topology probes but are not advertised by any AS).
The data files we provide here preserve all three effects listed above and observed in actual measurements. The data analyst must decide (and make clear in explanations) how to process the described exceptions, e.g., indirect links may be either discarded or counted as real links. The AS links data files also contain information on the time of the probes and which skitter monitor observed a particular link. The data file headers have further details on the file format.
Notes and Caveats
- Some skitter monitors take more than 24 hours to execute one cycle of probing addresses in their destination lists. Therefore, no single data file provides a complete view of the topology observed by skitter since each data file is a result of aggregation over a 24-hour period only. For ITDK, we usually merge data for a period of approximately two to three weeks.
- Trace files from the new skdriver-based skitter (from late Jun 2004 onward) are not necessarily aligned on day boundaries. For example, in the extreme case, if the cycle starting time is 11:59:59 PM on Oct 1st, then the first trace file of the cycle will be named Oct 1st even though almost all traces were taken on Oct 2nd. Furthermore, different monitors start their cycles at different times. Hence, the AS links file for a given day may be derived, in the worst case, from a 48-hour period. So even though AS links files are produced separately for each day, the granularity of the analysis should not be assumed to be a single day.
- A former bug in the generation scripts caused some daily AS links files to include more days of data than the nominal single day. This bug affected AS links files dated from late Jun 2004 to May 4, 2005, which corresponds to about the first 11 months of the newer skdriver-based skitter data set.
Around late Jun 2004, the naming of skitter files changed in response to improvements in skitter itself. In the past, traces were stored in daily files based on timestamp, but starting around Jun 2004, traces began to be organized instead by "cycle" (that is, a single pass through a destination list). A cycle has a starting time, which is used to name the files containing the traces of that cycle. For various reasons, the traces of a cycle are split into multiple files corresponding to consecutive non-overlapping 24-hour periods beginning on the cycle starting time. These files have a shared prefix containing the date of the cycle starting time, and are differentiated by a numeric suffix, starting with 000. For example, the following are the three files making up the cycle starting on 20050927 for the m-root monitor:
l006.m-root.20050927_000.arts l006.m-root.20050927_001.arts l006.m-root.20050927_002.artsThe file 20050927_001 corresponds to the second day of traces for this cycle (that is, these traces were nominally collected on 20050928), and similarly, 20050927_002 corresponds to the third day of traces (that is, 20050929).
The bug caused the skitter AS links file for each day covered by a cycle (that is, Sep 27th, 28th, and 29th for the above example) to contain the AS links for all the days in the cycle. The actual AS links themselves are undistorted, so the main problem caused by this bug is a slight decrease in the granularity of the data analysis (coarsening from a nominal single day to several days [about 6 days in the worst case]).- The dates in AS links filenames are in UTC, matching the UTC dates in skitter trace files.
- The time zone of the dates in the filenames of RouteViews BGP snapshots changed from Pacific Time to UTC on Mar 4, 2003. So there is a slight mismatch between the dates of skitter files (UTC) and BGP snapshots (Pacific Time) prior to this switchover in 2003, although this shouldn't make much of a difference in the appropriateness of the BGP snapshots selected for prefix-to-AS mappings.
- The first known link between two ASes is the only link reported, so if both an indirect and a direct link between the ASes was observed, and the indirect link was seen first, only an indirect link between the ASes would be reported; the direct link is not reported.
- The gap size reported is the first gap size observed for an indirect link between two ASes. If an indirect link with a smaller gap size is observed, the smaller gap size is not reported.
Data Use Terms and Conditions
Acceptable Use Policy for the files of the Skitter AS Links Dataset
- At the end of the research, or semi-annually (which ever is more frequent), a summary of the research and any findings/conclusions will be reported to CAIDA. If any research is described on the WWW, a URL will be provided. This information is primarily used in reports to our funding agencies.
- In so far as possible, research findings and conclusions using the topology data will be published and/or made publicly available
- All users who publish a document (including web pages and papers) using data from the topology data must provide CAIDA with a copy of the publication.
- All users who publish a document (including web pages, and papers) using data from this dataset must cite:
The Skitter AS Links Dataset - < dates used >, Bradley Huffaker, Young Hyun, Dan Andersen, and kc claffy, http://www.caida.org/data/active/skitter_aslinks_dataset.xml.
- Users are encouraged, but not required, to include the following attribution in the acknowledgments section of their document:
Support for the Skitter AS Links Dataset is provided by DARPA, the National Science Foundation, the WIDE Project, Cisco Systems, the US Department of Homeland Security, and CAIDA Members.
- All users who create a publicly available presentation using data from this dataset must provide CAIDA with a copy of the presentation and must use the full name of the dataset ("The Skitter AS Links Dataset") in the presentation. Users are further encouraged, but not required, to include the URL for the dataset (http://www.caida.org/data/active/skitter_aslinks_dataset.xml) in their presentation.
AS Links Dataset Access
Access the archived Skitter AS Links Dataset (January 2000 - February 2008)
Access the current IPv4 Routed /24 AS Links Dataset (September 2007 - present)
Other Topology Datasets:
- Freely Available Datasets
- Restricted Access Datasets
- Skitter Macroscopic Topology Data (January 1998 - February 2008)
- IPv4 Routed /24 Topology Dataset (September 2007 - ongoing)
- Internet Topology Data Kit (ITDK) - April 2003
- Internet Topology Data Kit (ITDK) - 2010
References
For more information on topology measurements see:
- Skitter macroscopic topology measurements
- University of Oregon Route Views Project
- Scamper home page
- Paris traceroute
- iPlane path measurements
- NetDimes Internet mapping project
The Skitter AS Links Dataset was sponsored by:
![[CAIDA - Cooperative Association for Internet Data Analysis logo]](/images/caida_globe_faded.png)



