Frequently Asked Questions about Ark

If you are interested in hosting an Ark node, please fill out a form to acknowledge the Memorandum of Cooperation and someone will follow up with you. Alternatively, send an email to ark-info@caida.org, and include answers to the following questions if possible:

What ISP (or AS number) provides your network connectivity
What city/country will you be hosting the node from?
Can you provide IPv6 connectivity (not required)?
Can you provide a public IP address (not required, but very useful for us)?

More detailed information useful to sites interested in hosting an Archipelago node follow below.

I've received a probe from an Ark monitor, what is this about?

As part of our research activities, CAIDA runs a number of periodic and ongoing macroscopic topology surveys. See CAIDA Active Probe Information about this probe.

Questions about Archipelago (Ark)

If you have a question or comment regarding Ark, email them to ark-info@caida.org. Frequently asked questions are answered below, along with additional information.

1 - Why should I host an Ark node?

Why should my network host an Ark node?

Institutions with interest in hosting an Ark node should take a look at our brochure "Why should my network host an Ark node?" that details the benefits that hosting an Ark node can provide for the local network as well as the Ark project.
Do you operate nodes at residential locations (i.e. on a home network)?

Yes. In fact an increasing fraction of Ark nodes is operated from home networks with the Ark Raspberry Pi node connected to a home router, and we expect this to increase. For these nodes we typically assume that the router dynamically provides the IP address for the node (by DHCP). If you are interested in hosting an Ark node at home, please, provide information about your network connectivity (see top of this page).
Do you have a more official letter I can send to management that describes the project?

For those requiring more background and a more formal format, we prepared this document explaining the Ark project on official University letterhead.

2 - Memorandum of Cooperation

Memorandum of Cooperation (MOC) Between Hosting Sites and CAIDA

A site wishing to host an Ark monitor should first review and approve the Memorandum of Cooperation (MOC), which outlines the obligations and expectations of both CAIDA and the hosting site. If needed, CAIDA can work with a prospective site to craft and implement a custom MOC that allows our monitor to conduct measurements within the bounds of a site's local policies.

Size comparison of the Ark Raspberry Pi-based Network Monitor

3 - Hardware and Power

What hardware do you prefer?

Most of our Ark monitors are based on the Raspberry Pi. This is now our preferred hardware, primarily because they are small and are relatively cheap. Because they are so small (3 x 4 x 2 inch) they can be easily deployed in a residential setting. We originally started with the Rasberry Pi 1B, but currently a typical new monitor is a Raspberry Pi 4B using a 64GB Class-10 high endurance SD card, and a 5V/2A power supply.
What are the power requirements for the Raspberry Pi?

The original Raspberry Pi 1B consumed about 3 Watts with 0.7A of current. The latest Raspberry Pi (model 4B) consumes about 5-6 Watts of power and draws 2A. For Ark operations, no special cooling is required (although it is optional for the Raspberry Pi 4B).
Can we provide a spare 1U server for CAIDA's use instead of receiving a Raspberry Pi? How about a virtual server?

Yes. Several of our Raspberry Pi Ark monitors are contributed by organizations and individuals (see our deployment page). A Raspberry Pi can be easily integrated into Ark: we can provide a functional image for an SD card online. In the past, we have deployed measurements on 1U servers, including under virtualization. If a hosting site cannot deploy a Raspberry Pi, then we can use a traditional server, possibly under virtualization.
How do I configure a virtual server?
Install a minimal Debian 12 operating system, and ensure that it has SSH installed. Then, as root, install the attached ark.list in /etc/apt/sources.list.d/ and do
- apt update
- apt install ark-users
The ark-users package will provision accounts, which we will then use to access and maintain the system, and configure measurements. These accounts are passwordless, and we can only use them with the corresponding SSH private keys held at CAIDA.

Email us with details of how to reach the node over SSH. The first thing we will do is install the attached hosts.allow into /etc, which will restrict SSH access to CAIDA-owned network prefixes, loopback v4/v6 (for our reverse SSH proxy), and local rfc1918/link-local addresses (for site access, if required). You can add your own prefix to the hosts.allow and drop the hosts.allow file into /etc yourself, if you wish, so that you know that it is done. If the node cannot be reached directly over SSH, either because the system does not have a public IP address, or port 22 inbound is blocked through site policy, we can supply a package that will connect the node to our SSH reverse proxy on request.

4 - Usage Patterns

How much network bandwidth does Ark require, on average?

We run our current traceroute measurements to every routed /24 prefix at 100pps for about 35kbps of outgoing traffic. That produces about 5MB of trace data per hour which we download to a CAIDA host concurrently with the measurements. This exemplifies typical bandwidth requirements for an active measurement. We might run a few measurements concurrently, each having about that much bandwidth usage (or more likely less). We do not plan to host any services (web, content, or distributed hash table) that can potentially generate a lot of traffic. Nor will we do any high volume bandwidth measurements (a la Iperf), since we would like to avoid generating complaints from recipients of measurement traffic.
What type of network traffic should we expect? How should we configure our firewall?
Our goal is for Ark monitors to be used for a wide variety of measurements, and the set of measurements will evolve over time. However, current measurements are about Internet topology and, therefore, they employ similar types of low-level probe packets and receive similar types of responses even though the exact details and goals of the measurements may differ. We provide below a list of these low-level probe and response packets in the form of firewall-like rules. While ark monitors can be placed behind firewalls, we ask site hosts to deploy them where the local security policy does not require the firewall to obstruct outbound packets from the monitor. Placing the ark monitor within the organization's DMZ, if it has one, is ideal.

Current measurement traffic consists mostly of outgoing topology probe packets (e.g., ICMP echo request) and their expected responses (e.g., ICMP echo reply). We also need to perform traceroute and ping measurements to Ark monitors themselves, so a firewall should freely allow ICMP request/response traffic in both incoming and outgoing directions (that is, for both measurements to and from monitors). We will contact each hosting site to request permission to conduct any measurements that are significantly different from these current measurements.

In addition to measurement traffic, an Ark monitor will need to open TCP connections to central servers at CAIDA (this will always be an outgoing connection from the Ark box to CAIDA's servers). We do not run any network service except SSH on an Ark box, to increase security.

The following is a summary of the expected traffic in firewall-like rules:

Outgoing:
- TCP connections to CAIDA's control infrastructure located in 192.172.226.0/24 and 2001:48d0:101:501::/64
- ntp (123/udp) to your local NTP server, to CAIDA's NTP server, or to the nearest NTP Pool server
- dns (53/udp) to your local DNS server(s) or to CAIDA's DNS server
- http (80/tcp and 443/tcp) for fetching web content
- ICMP echo request, echo reply, port unreachable to any host (for ICMP-based topology measurements to and from the monitor)
- no ICMP rate limiting
- UDP probes from any local port to any host and any port (for UDP-based topology measurements from the monitor)
- TCP probes from any local port to any host and any port regardless of connection state (for non-SYN based TCP measurements such as sending a TCP ACK probe, which won't establish a connection nor be part of an existing connection)
Incoming:
- NTP (123/udp), DNS (53/udp), and HTTP (80/tcp and 443/tcp) responses
- SSH (22/tcp) connections from CAIDA's control infrastructure located in 192.172.226.0/24 and 2001:48d0:101:501::/64
- ICMP echo request, echo reply, time exceeded, and destination unreachable (type 3, code any) from any host
- no ICMP rate limiting
- TCP packets from any host and any port, for TCP-based topology measurements, with the exception of inbound SYN packets. We'll need inbound SYN-ACK packets so that we can establish TCP connections with, for example, debian package repositories, so that we can keep the system up to date.

5 - Optional IPv6 Connectivity

IPv6 connectivity is optional but desirable. We are actively seeking IPv6-capable sites to study the world's adoption of IPv6 following IPv4 address depletion. For this purpose, we need native IPv6 connectivity and not IPv6 transition technologies like tunnel brokers and NAT64. We do not need a fixed IPv6 address -- a dynamic autoconfigured address is sufficient. We also do not need a DNS hostname (a PTR record) assigned to the IPv6 address.

6 - Remote Shell Access

Once deployed, we need regular remote shell access to an Ark monitor in order to develop, perform, and troubleshoot measurements; and to manage the system, including configuring, installing, and upgrading software and the operating system. We use standard OpenSSH for remote access and take several measures to secure the system against unauthorized access. Specifically, (1) we generally disallow remote root logins, (2) only allow passwordless public key-based remote logins (which protects against password guessing attacks), and (3) only allow remote logins from a select set of networks (for example, CAIDA's /24 prefix).

For monitors with a known static (fixed) IP address, we use this address for remote access whenever possible. When this is not possible, either because of firewall-related port blocking, or because IP addresses are assigned dynamically by DHCP (common for a residential monitor connected to a cable router), we gain remote shell access by using the CAIDA NAT Portal utility. Each Ark monitor runs a NAT Portal client, which at startup initiates a remote connection with a NAT Portal server at CAIDA. We use this connection for remote login to the monitor. Since the initial connection is done by the monitor, this is not affected by port blocking (firewalls generally block incoming connections only). In addition, this approach does not require prior knowledge of the public IP address of the monitor.

7 - Mitigation and Handling of Complaints

In general, we try to perform measurements in ways that reduce the likelihood of complaints. For example, we do relatively low volume and low frequency measurements (from the point of view of individual destinations) and prefer to avoid probing the same destinations repeatedly.

Because complaints do occasionally occur, we try our best to direct the complaints to us rather than to the site hosting a monitor. An important way is by setting up the reverse mapping for a monitor IP address to either monitor.ark.caida.org (e.g., san-us.ark.caida.org) or monitor.ark.caida.site-domain (e.g., san-us.ark.caida.ucsd.edu).

We have weighed the possibility of running a lightweight webserver on the monitors themselves that would describe the measurements, but based on an evaluation of the security vs. benefits tradeoff (something we have considered for many years in the context of the skitter infrastructure), it is not our general policy to set up such a webserver. We can, however, do so upon request by the site hosting a monitor.

Hosting sites should simply forward any complaints they receive to us. We will respond to the complaints, and if necessary, add destinations to our no-probe list which will prevent future complaints from the same destination.

8 - How do Ark and RIPE Atlas differ?

RIPE Atlas, like Ark, is a distributed measurement system that does ping and traceroute measurements.

RIPE Atlas allows almost anyone to conduct measurements as long as they have credit (by hosting a probe). Access to our on-demand Ark measurements is currently restricted to academic researchers, but no credit is required.
Ark conducts systematic, large-scale ongoing measurements of the global Internet with the goal of obtaining a broad baseline view of the Internet and its change/evolution over long periods of time. Because of this focus on global coverage, our measurements are not as focused on satisfying immediate operational troubleshooting needs (e.g., why is my network not reachable right now?).
Ark probes are relatively powerful systems running a full Linux distribution. They are used to conduct many other kinds of measurements not currently feasible on RIPE Atlas nodes; e.g., studying congestion at interdomain peering links, the degree and type of header alteration by middle boxes, the degree of filtering of packets with spoofed source addresses. Researchers can run their software on Ark nodes, which Atlas doesn't allow for policy and technical reasons; for example, we conduct large-scale timing-sensitive alias resolution runs using dozens of probes in concert.