Skitter AS Links Dataset
Skitter AS Links Dataset
The skitter infrastructure was retired on
February 8, 2008 in favor of the next generation Archipelago (Ark) topology measurement
infrastructure. Current AS Links data is available in
the IPv4
Routed /24 AS Links Dataset.
The Skitter AS Links Dataset has also been called AS Adjacencies.
As a part of the
Macroscopic Topology Project, CAIDA posts the
adjacency matrix of the Internet AS-level graph
computed daily from observed skitter measurements.
As a traceroute-based tool, skitter provides a view
of Internet topology that differs from those
derived from BGP tables, e.g.
RouteViews.
Because skitter data reflects packets that have actually
traversed a forward path to a destination,
rather than paths calculated and propagated across
the loosely coupled BGP system, it is more likely than
BGP data in isolation to faithfully correspond to IP topology.
We note that while inherent limitations of the traceroute-based probing methodology
hinder 100%-accurate extraction of the
real Internet topology from skitter data,
we seek data sources that are collectively
most likely to capture a precise and coherent snapshot
of macroscopic Internet structure.
Ideally, an AS-level graph would just list links
between ASes. In practice, mapping skitter-observed IP addresses into AS numbers
(using RouteViews
BGP data)
involves potential distortion due to IP prefixes advertised by:
- AS-sets (an aggregated set of ASes advertises the
prefix);
- multi-origin ASes, aka MOASes (several separate ASes
advertise the same prefix);
- no AS (some IP addresses appear in topology probes
but are not advertised by any AS).
The data files we provide here preserve all three effects listed
above and observed in actual measurements. The data analyst
must decide (and make clear in explanations) how to process the
described exceptions, e.g., indirect links may be either discarded
or counted as real links. The AS links data files also contain
information on the time of the probes and which skitter monitor
observed a particular link. The data file headers have further
details on the file format.
Notes and Caveats
-
Some skitter monitors take more than 24 hours
to execute one cycle of probing addresses in their destination
lists. Therefore, no single data file provides a complete view of
the topology observed by skitter since each data file is a result
of aggregation over a 24-hour period only. For ITDK,
we usually merge data for a period of approximately two to three weeks.
Such merging of the data files provided from this page can be performed
by the tool available in the download section
(e.g. use script
asadj2graph.pl
with the "-u" and "-r" options).
-
Trace files from the new skdriver-based skitter (from late Jun 2004
onward) are not necessarily aligned on day boundaries. For example,
in the extreme case, if the cycle starting time is 11:59:59 PM on
Oct 1st, then the first trace file of the cycle will be named Oct 1st
even though almost all traces were taken on Oct 2nd.
Furthermore, different monitors start their cycles at different times.
Hence, the AS links file for a given day may be derived, in the worst
case, from a 48-hour period. So even though AS links files are
produced separately for each day, the granularity of the analysis
should not be assumed to be a single day.
-
A former bug in the generation scripts caused some daily AS links
files to include more days of data than the nominal single day. This
bug affected AS links files dated from late Jun 2004 to May 4, 2005,
which corresponds to about the first 11 months of the newer
skdriver-based skitter data set.
Around late Jun 2004, the naming of skitter files changed in response
to improvements in skitter itself. In the past, traces were stored
in daily files based on timestamp, but starting around Jun 2004,
traces began to be organized instead by "cycle" (that is, a single
pass through a destination list). A cycle has a starting time, which
is used to name the files containing the traces of that cycle. For
various reasons, the traces of a cycle are split into multiple files
corresponding to consecutive non-overlapping 24-hour periods beginning
on the cycle starting time. These files have a shared prefix
containing the date of the cycle starting time, and are differentiated
by a numeric suffix, starting with 000. For example, the following
are the three files making up the cycle starting on 20050927 for the
m-root monitor:
l006.m-root.20050927_000.arts
l006.m-root.20050927_001.arts
l006.m-root.20050927_002.arts
The file 20050927_001 corresponds to the second day of traces for this
cycle (that is, these traces were nominally collected on 20050928),
and similarly, 20050927_002 corresponds to the third day of traces
(that is, 20050929).
The bug caused the skitter AS links file for each day covered by a
cycle (that is, Sep 27th, 28th, and 29th for the above example)
to contain the AS links for all the days in the cycle. The actual
AS links themselves are undistorted, so the main problem caused by
this bug is a slight decrease in the granularity of the data analysis
(coarsening from a nominal single day to several days [about 6 days
in the worst case]).
-
The dates in AS links filenames are in UTC, matching the UTC dates
in skitter trace files.
-
The time zone of the dates in the filenames of RouteViews BGP
snapshots changed from Pacific Time to UTC on Mar 4, 2003. So
there is a slight mismatch between the dates of skitter files (UTC)
and BGP snapshots (Pacific Time) prior to this switchover in 2003,
although this shouldn't make much of a difference in the
appropriateness of the BGP snapshots selected for prefix-to-AS
mappings.
-
The first known link between two ASes is the only link
reported, so if both an indirect and a direct link between the
ASes was observed, and the indirect link was seen first, only
an indirect link between the ASes would be reported; the direct
link is not reported.
-
The gap size reported is the first gap size observed for an
indirect link between two ASes. If an indirect link with a
smaller gap size is observed, the smaller gap size is not
reported.
Data Use Terms and Conditions
Acceptable Use Policy for the files of the Skitter AS Links Dataset
- At the end of the research, or semi-annually (which ever
is more frequent), a summary of the research and any
findings/conclusions will be reported to CAIDA. If any
research is described on the WWW, a URL will be provided.
This information is primarily used in reports to our funding
agencies.
- In so far as possible, research findings and conclusions
using the topology data will be published and/or made publicly
available
- All users who publish a document (including web pages
and papers) using data from the topology data must provide
CAIDA with a copy of the publication.
-
All users who publish a document (including web pages, and papers) using data
from this dataset must cite:
The Skitter AS Links Dataset - < dates
used >, Bradley Huffaker, Young Hyun, Dan Andersen, and
k claffy,
http://www.caida.org/data/active/skitter_aslinks_dataset.xml.
-
Users are encouraged, but not required, to include the following
attribution in the acknowledgments section of their document:
Support for the Skitter AS Links Dataset is provided by
DARPA, the National Science Foundation, the WIDE Project,
Cisco Systems, the US Department of Homeland Security, and
CAIDA Members.
-
All users who create a publicly available presentation using
data from this dataset must provide CAIDA with a copy of the
presentation and must use the full name of the dataset ("The
Skitter AS Links Dataset") in the presentation.
Users are further encouraged, but not required, to include
the URL for the dataset
(http://www.caida.org/data/active/skitter_aslinks_dataset.xml)
in their presentation.
AS Links Dataset Access
Access the archived Skitter AS Links Dataset (January 2000 - February 2008)
Access the current IPv4 Routed /24 AS Links Dataset (September 2007 - present)
Other Topology Datasets:
- Freely Available Datasets
- Restricted Access Datasets
References
For more information on topology measurements see:
The Skitter AS Links Dataset was sponsored by:
This dataset is cataloged in DatCat with handle
http://imdc.datcat.org/collection/1-000W-X=CAIDA+skitter+AS+Links+Topology.
|
|