Introduction
The primary goals of Archipelago (Ark) are to achieve
greater scalability and flexibility than our current measurement
infrastructure and to provide a step toward a community-oriented measurement infrastructure
by eventually allowing collaborators to run
their vetted measurement tasks on a security-hardened distributed
platform. Ark is tailored specifically for network measurement, which
allows it to be simpler and to more directly address the needs of
network researchers than is usually the case with a general-purpose
distributed experimental platform.
The initial and primary focus of Ark is to continue the large-scale
traceroute-based active measurements of the skitter infrastructure.
In both role and implementation, Ark subsumes the skitter
infrastructure and represents a natural evolution.
Ark will evolve from the skitter infrastructure by a gradual process
in which pieces of the former infrastructure are extended, enhanced,
and/or replaced.
Architecture
For details on the design and architecture of Ark, please see the
slides for our
talk
presented in Nov 2006 at the
7th CAIDA-WIDE workshop. More details on the tuple space design and example usage are available in the slides for our
UCSD Syslunch talk given in Apr 2007.
Implementation
Before discussing the implementation, we must first clear up the
confusion surrounding the term "skitter", which has the following
three distinct meanings:
- skitter is a measurement tool,
- skitter is an infrastructure, and
- skitter is a project.
The skitter
project is more properly
called the
Macroscopic Topology Project. Ark is the evolution of the skitter
infrastructure, which consists of the
skitter monitors, skitter measurement tool, several other tools, an
internal web server for distributing destination lists, and a file
storage server for collecting traces from monitors and providing the
data for download via a public web server. The most significant
improvements made so far have been the replacement of the
communication component with a
tuple
space and the replacement of the skitter
measurement tool with
scamper.
The skitter measurement tool is a
standalone program that reads a file of destinations and writes a file
of traceroute paths. We are discontinuing our use of the skitter tool
and switching to scamper. Scamper is an active
measurement tool like skitter but more powerful and flexible,
supporting IPv6 and ping measurements in addition to IPv4 and
traceroute measurements. Scamper also supports TCP- and UDP-based
traceroutes in addition to the ICMP-based traceroutes supported by
skitter. TCP-based traceroutes and other non-traditional techniques
are becoming increasingly necessary with the increasing use of
restrictive firewalls in edge networks. Scamper has been in
development for several years by our collaborator Matthew Luckie at the University of Waikato.
Scamper outputs traces in a different file format than skitter, but a
tool is available that reads both skitter-format
(arts++) and scamper-format
(warts) files and outputs data in a
textual format already familiar to researchers who analyze skitter
files. In addition, a programming library is available for reading
and writing the scamper format for researchers wishing to write their
own analysis tools. For these reasons, we expect the switch to the
new data format will have minimal impact on researchers. Source code
to scamper and tools to analyze warts files are available at the
scamper home page. We have made a sample warts file available
for the benefit of researchers.
Data
We have been performing large-scale traceroute measurements with Ark
since September 2007. More information about the data, including how
to obtain the data, is available at the
IPv4 Routed /24 Topology Dataset page.
Active Probes
Has your computer or IP address received a probe from a CAIDA host? Learn more about the probes sent by CAIDA for these
experiments.
Questions about Ark?
Please send questions or comments regarding Ark to
ark-info@caida.org.