CAIDA Home
 by Topic | by Source | by Tool | by Accessibility | How-to | Statistics  
 www.caida.org > data : active : skitter_aslinks_dataset.xml
    visit     contact     search:
CAIDA: Cooperative Association for Internet Data Analysis
Skitter AS Links Dataset

-----summary of contents-----

Recent Updates

Routeviews Prefix to AS mappings Dataset (2008-07-23)Anonymized 2008 Internet Traces Dataset (2008-06-06)IPv4 Routed /24 AS Links Dataset (2008-03-31)Backscatter-2008 Dataset (2007-03-26)IPv4 Routed /24 Topology Dataset (2008-02-01)

|  View Caida Data by:    Topic    Source    Tool    Accessibility    Statistics  |

|  Data Sources:    Realtime Monitors    Passive    Active    Other    External  |

This is an archive of AS links observed by CAIDA's Skitter-based Macroscopic Topology Project per 24 hour period.

-----end summary of contents-----

Skitter AS Links Dataset

The skitter infrastructure was retired on February 8, 2008 in favor of the next generation Archipelago (Ark) topology measurement infrastructure. Current AS Links data is available in the IPv4 Routed /24 AS Links Dataset.

The Skitter AS Links Dataset has also been called AS Adjacencies.

As a part of the Macroscopic Topology Project, CAIDA posts the adjacency matrix of the Internet AS-level graph computed daily from observed skitter measurements.

As a traceroute-based tool, skitter provides a view of Internet topology that differs from those derived from BGP tables, e.g. RouteViews. Because skitter data reflects packets that have actually traversed a forward path to a destination, rather than paths calculated and propagated across the loosely coupled BGP system, it is more likely than BGP data in isolation to faithfully correspond to IP topology. We note that while inherent limitations of the traceroute-based probing methodology hinder 100%-accurate extraction of the real Internet topology from skitter data, we seek data sources that are collectively most likely to capture a precise and coherent snapshot of macroscopic Internet structure.

Ideally, an AS-level graph would just list links between ASes. In practice, mapping skitter-observed IP addresses into AS numbers (using RouteViews BGP data) involves potential distortion due to IP prefixes advertised by:

  • AS-sets (an aggregated set of ASes advertises the prefix);
  • multi-origin ASes, aka MOASes (several separate ASes advertise the same prefix);
  • no AS (some IP addresses appear in topology probes but are not advertised by any AS).

The data files we provide here preserve all three effects listed above and observed in actual measurements. The data analyst must decide (and make clear in explanations) how to process the described exceptions, e.g., indirect links may be either discarded or counted as real links. The AS links data files also contain information on the time of the probes and which skitter monitor observed a particular link. The data file headers have further details on the file format.

Notes and Caveats
  1. Some skitter monitors take more than 24 hours to execute one cycle of probing addresses in their destination lists. Therefore, no single data file provides a complete view of the topology observed by skitter since each data file is a result of aggregation over a 24-hour period only. For ITDK, we usually merge data for a period of approximately two to three weeks. Such merging of the data files provided from this page can be performed by the tool available in the download section (e.g. use script asadj2graph.pl with the "-u" and "-r" options).
  2. Trace files from the new skdriver-based skitter (from late Jun 2004 onward) are not necessarily aligned on day boundaries. For example, in the extreme case, if the cycle starting time is 11:59:59 PM on Oct 1st, then the first trace file of the cycle will be named Oct 1st even though almost all traces were taken on Oct 2nd. Furthermore, different monitors start their cycles at different times. Hence, the AS links file for a given day may be derived, in the worst case, from a 48-hour period. So even though AS links files are produced separately for each day, the granularity of the analysis should not be assumed to be a single day.
  3. A former bug in the generation scripts caused some daily AS links files to include more days of data than the nominal single day. This bug affected AS links files dated from late Jun 2004 to May 4, 2005, which corresponds to about the first 11 months of the newer skdriver-based skitter data set.

    Around late Jun 2004, the naming of skitter files changed in response to improvements in skitter itself. In the past, traces were stored in daily files based on timestamp, but starting around Jun 2004, traces began to be organized instead by "cycle" (that is, a single pass through a destination list). A cycle has a starting time, which is used to name the files containing the traces of that cycle. For various reasons, the traces of a cycle are split into multiple files corresponding to consecutive non-overlapping 24-hour periods beginning on the cycle starting time. These files have a shared prefix containing the date of the cycle starting time, and are differentiated by a numeric suffix, starting with 000. For example, the following are the three files making up the cycle starting on 20050927 for the m-root monitor:
             l006.m-root.20050927_000.arts
             l006.m-root.20050927_001.arts
             l006.m-root.20050927_002.arts
    
    The file 20050927_001 corresponds to the second day of traces for this cycle (that is, these traces were nominally collected on 20050928), and similarly, 20050927_002 corresponds to the third day of traces (that is, 20050929).

    The bug caused the skitter AS links file for each day covered by a cycle (that is, Sep 27th, 28th, and 29th for the above example) to contain the AS links for all the days in the cycle. The actual AS links themselves are undistorted, so the main problem caused by this bug is a slight decrease in the granularity of the data analysis (coarsening from a nominal single day to several days [about 6 days in the worst case]).
  4. The dates in AS links filenames are in UTC, matching the UTC dates in skitter trace files.
  5. The time zone of the dates in the filenames of RouteViews BGP snapshots changed from Pacific Time to UTC on Mar 4, 2003. So there is a slight mismatch between the dates of skitter files (UTC) and BGP snapshots (Pacific Time) prior to this switchover in 2003, although this shouldn't make much of a difference in the appropriateness of the BGP snapshots selected for prefix-to-AS mappings.
  6. The first known link between two ASes is the only link reported, so if both an indirect and a direct link between the ASes was observed, and the indirect link was seen first, only an indirect link between the ASes would be reported; the direct link is not reported.
  7. The gap size reported is the first gap size observed for an indirect link between two ASes. If an indirect link with a smaller gap size is observed, the smaller gap size is not reported.
Data Use Terms and Conditions

Acceptable Use Policy for the files of the Skitter AS Links Dataset

  1. At the end of the research, or semi-annually (which ever is more frequent), a summary of the research and any findings/conclusions will be reported to CAIDA. If any research is described on the WWW, a URL will be provided. This information is primarily used in reports to our funding agencies.
  2. In so far as possible, research findings and conclusions using the topology data will be published and/or made publicly available
  3. All users who publish a document (including web pages and papers) using data from the topology data must provide CAIDA with a copy of the publication.
  4. All users who publish a document (including web pages, and papers) using data from this dataset must cite:

    The Skitter AS Links Dataset - < dates used >, Bradley Huffaker, Young Hyun, Dan Andersen, and k claffy, http://www.caida.org/data/active/skitter_aslinks_dataset.xml.

  5. Users are encouraged, but not required, to include the following attribution in the acknowledgments section of their document:

    Support for the Skitter AS Links Dataset is provided by DARPA, the National Science Foundation, the WIDE Project, Cisco Systems, the US Department of Homeland Security, and CAIDA Members.

  6. All users who create a publicly available presentation using data from this dataset must provide CAIDA with a copy of the presentation and must use the full name of the dataset ("The Skitter AS Links Dataset") in the presentation. Users are further encouraged, but not required, to include the URL for the dataset (http://www.caida.org/data/active/skitter_aslinks_dataset.xml) in their presentation.
AS Links Dataset Access

Access the archived Skitter AS Links Dataset (January 2000 - February 2008)

Access the current IPv4 Routed /24 AS Links Dataset (September 2007 - present)

Other Topology Datasets: References

For more information on topology measurements see:

The Skitter AS Links Dataset was sponsored by:

Defense Advanced Research Projects Agency logo National Science Foundation logo Cisco Systems logo Department of Homeland Security logo

This dataset is cataloged in DatCat with handle http://imdc.datcat.org/collection/1-000W-X=CAIDA+skitter+AS+Links+Topology.

Cooperative Association for Internet Data Analysis (CAIDA)
  Last Modified: Wed Jul-23-2008 11:53:24 PDT
  Maintained by: Bradley Huffaker
  Page URL: http://www.caida.org/data/active/skitter_aslinks_dataset.xml