Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
Archipelago Measurement Infrastructure
Archipelago (Ark for short) is CAIDA's next-generation active measurement infrastructure and represents an evolution of the skitter infrastructure that has been serving the network research community for over ten years.
|  Ark Home Page    Topology Dataset    Have you received an Ark probe?    Hosting an Ark Monitor    Ark Monitor Statistics  |

Current Monitor Status and Statistics

Interactive Ark monitors map
(click on image for an interactive Ark monitors map)
Ark Monitor Statistics
(click on image for Ark monitor statistics)
Ark Duty Cycle
(click on image for the Ark data availability)

Introduction

Archipelago (Ark) is CAIDA's newest active measurement infrastructure, the next generation in evolution of the skitter infrastructure CAIDA operated for nearly a decade (what is). The primary goals are to

  • reduce the effort needed to develop and deploy sophisticated large-scale measurements, and
  • provide a step toward a community-oriented measurement infrastructure by allowing collaborators to run their vetted measurement tasks on a security-hardened distributed platform.

Ark is tailored specifically for active network measurement. This allows Ark to be simpler than some other general-purpose distributed experimental platforms, and it allows us to concentrate on providing facilities that directly address the needs of networking research. In particular, we provide a facility for communication and coordination that makes it easier to write distributed measurements that must work together to achieve a goal. We are working on providing a high-level API to ease the challenges of writing measurement tools. Our goal is to lower the barrier to bringing novel and interesting measurement techniques to life.

To enable measurements requiring accurate time synchronization, we are working with Julien Ridoux and Darryl Veitch to deploy RADclock on Ark monitors. As of July 2011, we have deployed RADclock on 28 Ark monitors as well as on several Ark servers. We believe RADclock is the best available solution for highly precise time synchronization over the Internet.

Ark Raspberry Pi-based Network Monitor

We are now deploying small, inexpensive network measurement nodes, based on the Raspberry Pi, in the Ark measurement infrastructure. Although tiny, a Raspberry Pi is as capable as a desktop system of several generations ago and offers a flexible Linux-powered programmable platform for conducting networking research. These systems can be placed anywhere that is convenient for a hosting site, including on someone's desk, and the transition from deploying traditional rack-mounted servers to Raspberry Pi's will allow us to scale up the Ark infrastructure.

Current Measurements

Dataset quick links:

The initial focus of Ark is coordinated large-scale traceroute-based topology measurements using a process called team probing. In team probing, we group monitors into teams and dynamically divide up the measurement work among team members. This parallelization allows us to obtain a traceroute measurement to all routed /24's in a short period of time: about 2-3 days for a team of 17-18 monitors probing 9.5 million /24's (that is, the full routed address space subdivided into /24's) at 100pps. We currently have three teams active, and each team probes independently.

We perform traceroute measurements using scamper, a powerful and flexible active measurement tool supporting IPv4, IPv6, traceroute, and ping. Scamper supports TCP-, UDP-, and ICMP-based measurements and Paris traceroute variations. Scamper has been in development for several years by our collaborator Matthew Luckie at the University of Waikato.

We distribute the results of these measurements as the IPv4 Routed /24 Topology Dataset. These measurements have been ongoing since September 2007, and as of Jan 2011, we have collected 10.1 billion traceroutes and 4.0 TB of data.

We augment the Routed /24 Topology Dataset with automated lookups of DNS names. We have an in-house bulk DNS lookup service called HostDB that can look up millions of addresses per day. We look up all intermediate addresses and responding destinations seen in the Topology Dataset.

Finally, we provide the IPv4 Routed /24 AS Links Dataset, which contains Autonomous System (AS) links derived from the IP paths of the Topology Dataset. This AS links dataset is useful for studying the peering relationships between Internet bit transport providers.

We are working on combining multiple alias-resolution techniques (Mercator, APAR, and MIDAR) into a unified tool and system for generating router-level topology from the IP Topology Dataset. We distribute these router-level topologies in the Macroscopic Internet Topology Data Kit (ITDK).

We also collect IPv6 topology from 27 Ark monitors (as of July 2011).

The Spoofer Project

In a collaboration with the Naval Postgraduate School (NPS) and Massachusetts Institute of Technology (MIT), Ark monitors participating in the Spoofer Project help measure the Internet's susceptibility to spoofed source address IP packets. The monitors gather data on IP spoofing by receiving potentially spoofed traffic and forwarding it on to the Spoofer Project's server at MIT for analysis. Ark Hosting sites interested in participating as receivers need to agree to the Acceptable Use Policy (AUP) for the Spoofer Project.

Hosted Experiments

  1. MERLIN: MEasure the Router Level of the INternet: In 2011, researchers at Université de Strasbourg, the Université catholique de Louvain, and Waikato University, ran alias resolution experiments with the probing tool, MERLIN. Run from an Ark monitor in San Diego as a vantage point, MERLIN takes advantage of mrinfo, a multicast management tool that silently collects all IPv4 multicast enabled interfaces of a router and all its multicast links toward its neighbors. Further, the group made use of 1.2 million IP addresses sourced from CAIDA's traceroute measurements conducted on Ark as destination addresses in the measurements to discover MPLS and fingerprint networks on the Internet. A paper describing the results of the experiment were published in the proceedings of the Conference on Next Generation Internet (NGI 2011).

  2. IPv6 Topology Discovery Techniques: Using Ark's topo-on-demand interface, researchers at the Naval Postgraduate School run experiments to conduct IPv6 topology discovery. The vast size of the IPv6 address space (2128 addresses) make random or sequential approaches impractical. These experiments target practical methods of dividing the address space to discover active subnets. A paper describing the results of these experiments were presented at the Passive and Active Network Measurement Workshop (PAM 2013).

  3. Vela: Web Interface to Ark (multiple accounts): The Vela service provides access to Ark's topo-on-demand functionality. Organizations including the Naval Postgraduate School, Department of Homeland Security, and The Réseaux IP Européens Network Coordination Centre (RIPE NCC) access the Ark platform via the vela web interface to run "one-off" measurements. The interface allows users to select a subset of monitors (e.g., all Asian monitors, or one Ark monitor from each continent with IPv6 connectivity) using ping or traceroute.

  4. RIPE-NCC World IPv6 Day Measurements:

    The RIPE NCC performed ongoing measurements related to World IPv6 Day, which took place on 8 June 2011. As a part of this effort, they monitored many of the participating websites from a number of vantage points:

    • Nodes of the RIPE NCC Test Traffic Measurements platform;
    • Nodes of the Archipelago Measurement Infrastructure, kindly contributed by CAIDA; and
    • Nodes contributed by partners and other infrastructures, such as RIPE Atlas.

    From these vantage points, our monitoring effort periodically checked:

    • DNS entries -- to check whether the World IPv6 Day participants had A/AAAA records;
    • RTT distances using ping/ping6;
    • Forward path discovery using traceroute and traceroute6; and
    • HTTP page fetches -- over IPv4 and IPv6 when available.

    For this event, we measured 60 participants from 53 vantage points, and executed 6,651,575 measurements between June 1-12, 2011.

  5. Load balancer turnover and packet field sensitivity: Researchers with the WAND research group at the University of Waikato ran an experiment, conducted over approximately eight weeks, to carry out traceroutes using the Multipath Discovery Algorithm (MDA) in TCP source port, UDP source port and ICMP echo modes. The traces conduct several measurements per destination and investigate the possible existence of successor forwarding decision by fields outside the standard flow 5-tuple. The experiment also studies the efficiency of MDA analysis under different modes of flow ID selection.

    The target addresses were derived from prefixes provided by the Route Views Project. A selection of 400,000 pingable addresses were used to create smaller randomly selected address sets of 70,000 thought to be mid-path, router interfaces. Publication of the experiment is still in progress.

Tuple Space

One of the distinguishing features of Ark is its focus on coordination. Coordination, broadly speaking, is concerned with planning, executing, and controlling an ensemble of distributed computations. Coordination is the meta-activity that surrounds a computation.

To facilitate coordination, Ark provides a new implementation, called Marinda, of the well-known tuple-space coordination model first introduced by David Gelernter in his Linda coordination language. A tuple space is a distributed shared memory combined with a small number of easy-to-use operations. The tuple space stores tuples, which are arrays of simple values (strings and numbers). Clients retrieve tuples by pattern matching.

The tuple space is a many-to-many communication and coordination medium. Over this medium, measurement clients can interact in sophisticated ways, such as exchanging state and triggering actions among monitors. The tuple space abstraction leads to a peer-to-peer architecture, in which participants can be both a client and a server seamlessly. For example, it is simple to write a traceroute service that takes requests and sends responses over the tuple space. We can then layer on top of these traceroute services clients that trigger traceroutes when certain conditions are met. By lowering the barrier to writing and deploying services to just a few lines of code, the tuple space abstraction allows a rich ecosystem of measurement services to thrive, in the same way that HTML empowered users by allowing anyone to become a publisher on the Internet.

For more information, see the list of coordination references below.

Future Plans

  • We will release the source code of the Marinda tuple space implementation under the GPL.
  • We will continue implementing the Ark infrastructure software, including a high-level API for performing network measurements and the security layers needed to allow semi-trusted third parties to conduct measurements.

Presentations

References

Has your computer received a probe from an Ark monitor?

Learn more about the probes sent by CAIDA for these experiments.

Hosting an Ark Monitor

For those who might be able to participate in the Ark project, we've prepared a Frequently Asked Questions for sites interested in hosting an Ark monitor.

For an overview of how hosting an Ark monitor can help, we've prepared a flyer for a general review, "Why should my network host an Ark node?". A formal letter explaining the Ark project is also made available.

Questions about Ark?

Please send questions or comments regarding Ark to ark-info@caida.org.

  Last Modified: Wed Nov-6-2013 17:25:56 PST
  Page URL: http://www.caida.org/projects/ark/index.xml