Archipelago (Ark for short) is CAIDA's next-generation active measurement
infrastructure and represents an evolution of the
skitter
infrastructure that has been serving the network research community
for over ten years.
Current Monitor Status and Statistics
| (click on image for an interactive Ark monitors map) |
| (click on image for Ark monitor statistics) |
![]() |
| (click on image for the Ark data availability) |
Introduction
Archipelago (Ark) is CAIDA's newest active measurement infrastructure, the next generation in evolution of the skitter infrastructure CAIDA operated for nearly a decade (what is skitter and how is Ark different from skitter?). The primary goals are to
- reduce the effort needed to develop and deploy sophisticated large-scale measurements, and
- provide a step toward a community-oriented measurement infrastructure by allowing collaborators to run their vetted measurement tasks on a security-hardened distributed platform.
Ark is tailored specifically for active network measurement. This allows Ark to be simpler than some other general-purpose distributed experimental platforms, and it allows us to concentrate on providing facilities that directly address the needs of networking research. In particular, we provide a facility for communication and coordination that makes it easier to write distributed measurements that must work together to achieve a goal. We are working on providing a high-level API to ease the challenges of writing measurement tools. Our goal is to lower the barrier to bringing novel and interesting measurement techniques to life.
To enable measurements requiring accurate time synchronization, we are working with Julien Ridoux and Darryl Veitch to deploy RADclock on Ark monitors. As of July 2011, we have deployed RADclock on 28 Ark monitors as well as on several Ark servers. We believe RADclock is the best available solution for highly precise time synchronization over the Internet.
Ark Raspberry Pi-based Network Monitor
We are now deploying small, inexpensive network measurement nodes, based on the Raspberry Pi, in the Ark measurement infrastructure. Although tiny, a Raspberry Pi is as capable as a desktop system of several generations ago and offers a flexible Linux-powered programmable platform for conducting networking research. These systems can be placed anywhere that is convenient for a hosting site, including on someone's desk, and the transition from deploying traditional rack-mounted servers to Raspberry Pi's will allow us to scale up the Ark infrastructure.
Current Measurements
Dataset quick links:
- IPv4 Routed /24 Topology Dataset
- IPv4 Routed /24 AS Links Dataset
- IPv6 Topology Dataset
- Macroscopic Internet Topology Data Kit (ITDK)
We perform traceroute measurements using scamper, a powerful and flexible active measurement tool supporting IPv4, IPv6, traceroute, and ping. Scamper supports TCP-, UDP-, and ICMP-based measurements and Paris traceroute variations. Scamper has been in development for several years by our collaborator Matthew Luckie at the University of Waikato.
We distribute the results of these measurements as the IPv4 Routed /24 Topology Dataset. These measurements have been ongoing since September 2007, and as of Jan 2011, we have collected 10.1 billion traceroutes and 4.0 TB of data.
We augment the Routed /24 Topology Dataset with automated lookups of DNS names. We have an in-house bulk DNS lookup service called HostDB that can look up millions of addresses per day. We look up all intermediate addresses and responding destinations seen in the Topology Dataset.
Finally, we provide the IPv4 Routed /24 AS Links Dataset, which contains Autonomous System (AS) links derived from the IP paths of the Topology Dataset. This AS links dataset is useful for studying the peering relationships between Internet bit transport providers.
We are working on combining multiple alias-resolution techniques (Mercator, APAR, and MIDAR) into a unified tool and system for generating router-level topology from the IP Topology Dataset. We distribute these router-level topologies in the Macroscopic Internet Topology Data Kit (ITDK).
We also collect IPv6 topology from 27 Ark monitors (as of July 2011).
The Spoofer Project
Ark monitors participating in the Spoofer Project gather data on IP spoofing by receiving potentially spoofed traffic and forwarding it on to the Spoofer Project's server at MIT for analysis. Ark Hosting sites interested in participating as receivers need to agree to the Acceptable Use Policy (AUP) for the Spoofer Project.
Tuple Space
One of the distinguishing features of Ark is its focus on coordination. Coordination, broadly speaking, is concerned with planning, executing, and controlling an ensemble of distributed computations. Coordination is the meta-activity that surrounds a computation.
To facilitate coordination, Ark provides a new implementation, called Marinda, of the well-known tuple-space coordination model first introduced by David Gelernter in his Linda coordination language. A tuple space is a distributed shared memory combined with a small number of easy-to-use operations. The tuple space stores tuples, which are arrays of simple values (strings and numbers). Clients retrieve tuples by pattern matching.
The tuple space is a many-to-many communication and coordination medium. Over this medium, measurement clients can interact in sophisticated ways, such as exchanging state and triggering actions among monitors. The tuple space abstraction leads to a peer-to-peer architecture, in which participants can be both a client and a server seamlessly. For example, it is simple to write a traceroute service that takes requests and sends responses over the tuple space. We can then layer on top of these traceroute services clients that trigger traceroutes when certain conditions are met. By lowering the barrier to writing and deploying services to just a few lines of code, the tuple space abstraction allows a rich ecosystem of measurement services to thrive, in the same way that HTML empowered users by allowing anyone to become a publisher on the Internet.
For more information, see the list of coordination references below.
Future Plans
- We will release the source code of the Marinda tuple space implementation under the GPL.
- We will continue implementing the Ark infrastructure software, including a high-level API for performing network measurements and the security layers needed to allow semi-trusted third parties to conduct measurements.
Presentations
- [Feb 2012] Archipelago: On-Demand IPv4 and IPv6Topology Measurements
- [Feb 2011] Archipelago: Updates
- [Feb 2010] Archipelago: Updates and Case Study
- [Feb 2009] Archipelago: update and analyses
- [Aug 2008] Archipelago Measurement Infrastructure: Status and Experiences
- [Apr 2007] Archipelago: A Coordination-Oriented Measurement Infrastructure (details on tuple space)
- [Nov 2006] The Archipelago Measurement Infrastructure (design details)
References
- David Gelernter. Generative communication in Linda. ACM Trans. Program. Lang. Syst, 7(1):80-112, 1985.
- David Gelernter and Nicholas Carrierro. Coordination languages and their significance. Commun. ACM, 35(2):97-107, 1992.
- Sascha Ossowski and Ronaldo Menezes. On coordination and its significance to distributed and multi-agent systems. Concurrency and Computation: Practice and Experience, 18(4):359-370, 2006.
- Nicholas Carriero and David Gelernter. How to write parallel programs: a first course (link to free PDF). MIT Press, Cambridge, MA, 1990.
Has your computer received a probe from an Ark monitor?
Learn more about the probes sent by CAIDA for these experiments.
Hosting an Ark Monitor
For those who might be able to participate in the Ark project, we've prepared a Frequently Asked Questions for sites interested in hosting an Ark monitor.
For an overview of how hosting an Ark monitor can help, we've prepared a flyer for a general review, "Why should my network host an Ark node?". A formal letter explaining the Ark project is also made available.
Questions about Ark?
Please send questions or comments regarding Ark to ark-info@caida.org.
![[CAIDA - Cooperative Association for Internet Data Analysis logo]](/images/caida_globe_faded.png)

Size comparison of the Ark Raspberry Pi-based Network Monitor