Archipelago (Ark): CAIDA's active measurement infrastructure serving the network research community since 2007.This page contains information useful to sites interested in hosting an Archipelago node.
Questions about Archipelago (Ark)?
Please send questions or comments regarding Ark to firstname.lastname@example.org.
Institutions with interest in hosting an Ark node should take a look at our brochure "Why should my network host an Ark node?" that details the benefits that hosting an Ark node can provide for the local network as well as the Ark project.
For those requiring more background and a more formal format, we prepared this document explaining the Ark project on official University letterhead.
A site wishing to host an Ark monitor should first review and approve the Memorandum of Cooperation (MOC), which outlines the obligations and expectations of both CAIDA and the hosting site. If needed, CAIDA can work with a prospective site to craft and implement a custom MOC that allows our monitor to conduct measurements within the bounds of a site's local policies.
The Spoofer project is studying the prevalence of IP source-address filtering (BCP 38) among networks attached to the Internet. A hosting site wishing to allow their hosted Ark monitor to receive (not generate) Spoofer measurement traffic should first review and approve the AUP for Spoofer measurements.
We are now deploying small, inexpensive network measurement nodes based on the Raspberry Pi. A Raspberry Pi consumes under 3 watts of power and draws around 700mA. No special cooling is required. These systems can be placed anywhere that is convenient for a hosting site, including on someone's desk.
Can we provide a spare 1U server for CAIDA's use instead of receiving a Raspberry Pi? How about a virtual server?In the past, we have deployed measurements on 1U servers, including under virtualization, but we now prefer to use Raspberry Pi's. If, for whatever reason, a hosting site cannot deploy a Raspberry Pi, then we may consider using a traditional server (possibly under virtualization) if the hosting location is particuarly beneficial in increasing the topological/geographical diversity of available vantage points.
We run our current traceroute measurements to every routed /24 prefix at 100pps for about 35kbps of outgoing traffic. That produces about 5MB of trace data per hour which we download to a CAIDA host concurrently with the measurements. This exemplifies typical bandwidth requirements for an active measurement. We might run a few measurements concurrently, each having about that much bandwidth usage (or more likely less). We do not plan to host any services (web, content, or distributed hash table) that can potentially generate a lot of traffic. Nor will we do any high volume bandwidth measurements (a la Iperf), since we would like to avoid generating complaints from recipients of measurement traffic.
Our goal is for Ark monitors to be used for a wide variety of measurements, and the set of measurements will evolve over time. However, current measurements are about Internet topology and, therefore, they employ similar types of low-level probe packets and receive similar types of responses even though the exact details and goals of the measurements may differ. We provide below a list of these low-level probe and response packets in the form of firewall-like rules. We will contact each hosting site to request permission to conduct any measurements that are significantly different from these current measurements (for example, for
Spoofer measurements). Current measurement traffic consists mostly of outgoing topology probe packets (e.g., ICMP echo request) and their expected responses (e.g., ICMP echo reply). We also need to perform traceroute and ping measurements to Ark monitors themselves, so a firewall should freely allow ICMP request/response traffic in both incoming and outgoing directions (that is, for both measurements to and from monitors).
In addition to measurement traffic, an Ark monitor will need to open a TCP connection to a central server at CAIDA (this will always be an outgoing connection from the Ark box to CAIDA's server), and a monitor will need to allow incoming SSH connections from CAIDA's /24 (but from nowhere else). In general, we do not run any network service except SSH on an Ark box, to increase security.
The following is a summary of the expected traffic in firewall-like rules:
- ntp (123/udp) to your local NTP server, to CAIDA's NTP server, or to the nearest NTP Pool server
- dns (53/udp) to your local DNS server(s) or to CAIDA's DNS server
- TCP connection to CAIDA's tuple-space server from any local (ephemeral) port (for Ark's tuple space communication)
- ICMP echo request, echo reply, port unreachable to any host (for ICMP-based topology measurements to and from the monitor)
- no ICMP rate limiting
- UDP probes from any local port to any host and any port (for UDP-based topology measurements from the monitor)
- TCP probes from any local port to any host and any port regardless of connection state (for non-SYN based TCP measurements such as sending a TCP ACK probe, which won't establish a connection nor be part of an existing connection)
- NTP and DNS responses
- ssh (22/tcp) from only CAIDA's /24 prefix
- ICMP echo request, echo reply, time exceeded, and destination unreachable (type 3, code any) from any host
- no ICMP rate limiting
- TCP packets (SYN, ACK, RST, etc.) from any host and any port regardless of connection state (for TCP-based topology measurements)
IPv6 connectivity is optional but desirable. We are actively
seeking IPv6-capable sites to study the world's adoption of IPv6
following IPv4 address depletion. For this purpose, we need native
IPv6 connectivity and not IPv6 transition technologies like tunnel
brokers and NAT64. We do not need a fixed IPv6 address--a potentially
dynamic autoconfigured address (for example, based on the MAC address
of the network interface) is sufficient. We also do not need a DNS
hostname (a PTR record) assigned to the IPv6 address.
Once deployed, we need regular remote shell access to an Ark monitor
in order to develop, perform, and troubleshoot measurements; and to
manage the system, including configuring, installing, and upgrading
software and the operating system. We use standard OpenSSH for remote
access and take several measures to secure the system against
unauthorized access. Specifically, (1) we generally disallow remote
root logins, (2) only allow passwordless public key-based remote
logins (which protects against password guessing attacks), and (3)
only allow remote logins from a select set of networks (for example,
CAIDA's /24 prefix).
Residential Ark node deployments present two additional challenges for
gaining remote shell access: dynamic DHCP addresses and port blocking.
We overcome both by using the IPv6 transition technology Teredo.
The intended purpose of Teredo is to provide IPv6 access to those who
cannot get native IPv6 from their provider, but Teredo has built-in
support for NAT/firewall traversal, which is our only reason for using
Teredo. We use Teredo to gain remote shell access using standard
OpenSSH without requiring our volunteers to configure port forwarding
on their home (wifi) routers.
Technical details on our particular use of Teredo: Teredo works by tunneling IPv6 packets within IPv4 UDP packets. Teredo clients (end hosts running Teredo to gain IPv6 access) automatically get a globally-routable IPv6 address in a special-purpose IPv6 prefix set aside for use by Teredo. This availability of a globally-routable IPv6 address is what makes Teredo useful for gaining remote shell access to a host behind a NAT--without Teredo, an IPv4 host behind a NAT has a private IPv4 address that cannot be reached from outside the NAT. To guarantee robustness, we (CAIDA) control all of the components of the Teredo service needed to enable this particular scenario--to gain remote shell access to Ark nodes behind NATs. We can use this approach even if/when the rest of the Internet turns off Teredo and supporting services (Teredo 'servers' and 'relays'). In particular, we do not depend on third-party Teredo relays that may suffer degraded performance or go down without notice (which is a weakness of Teredo in its intended role as an IPv6 transition technology). Finally, because the special globally-routable Teredo IPv6 address is formed from the public-facing IPv4 address of the end host (that is, the public IPv4 address of the NAT), the Teredo IPv6 address changes whenever the underlying dynamically-assigned IPv4 address changes. To allow us to automatically detect when the Teredo IPv6 address changes, each Ark node behind a NAT sends an ICMP packet to CAIDA with the IPv6 address in the payload. We use this mechanism to monitor IP connectivity and address changes in IPv4 and IPv6.
In general, we try to perform measurements in ways that reduce the
likelihood of complaints. For example, we do relatively low volume
and low frequency measurements (from the point of view of individual
destinations) and prefer to avoid probing the same destinations
Because complaints do occasionally occur, we try our best to direct the complaints to us rather than to the site hosting a monitor. An important way is by setting up the reverse mapping for a monitor IP address to either monitor.ark.caida.org (e.g., san-us.ark.caida.org) or monitor.ark.caida.site-domain (e.g., san-us.ark.caida.ucsd.edu).
We have weighed the possibility of running a lightweight webserver on the monitors themselves that would describe the measurements, but based on an evaluation of the security vs. benefits tradeoff (something we have considered for many years in the context of the skitter infrastructure), it is not our general policy to set up such a webserver. We can, however, do so upon request by the site hosting a monitor.
Hosting sites should simply forward any complaints they receive to us. We will respond to the complaints, and if necessary, add destinations to our no-probe list which will prevent future complaints from the same destination.
RIPE Atlas, like Ark, is a distributed measurement system that does ping and traceroute measurements.
- RIPE Atlas allows almost anyone to conduct measurements as long as they have credit (by hosting a probe). Access to our on-demand Ark measurements is currently restricted to academic researchers, but no credit is required.
- Ark conducts systematic, large-scale ongoing measurements of the global Internet with the goal of obtaining a broad baseline view of the Internet and its change/evolution over long periods of time. Because of this focus on global coverage, our measurements are not as focused on satisfying immediate operational troubleshooting needs (e.g., why is my network not reachable right now?).
- Ark probes are relatively powerful systems (up to 1GHz quad-core ARM processors with 1GB of RAM and 8GB of flash) running a full Linux distribution. They are used to conduct many other kinds of measurements not currently feasible on RIPE Atlas nodes; e.g., studying congestion at interdomain peering links, the degree and type of header alteration by middle boxes, the degree of filtering of packets with spoofed source addresses. Researchers can run their software on Ark nodes, which Atlas doesn't allow for policy and technical reasons; for example, we conduct large-scale timing-sensitive alias resolution runs using dozens of probes in concert.