Questions about Archipelago (Ark)?
Please send questions or comments regarding Ark to ark-info@caida.org.
Acceptable Use Policy (AUP)
CAIDA seeks sites interested in becoming a part of our next generation measurement infrastructure. The first step toward participation comes through negotiation of an AUP in line with site-local policies. Archipelago provides a flexible coordination and control mechanism that allows us to work with sites to craft and implement custom AUPs that allow our monitors to take measurements and behave as dictated by site's local policies.
A hosted ark monitor can participate as a receiver in the Spoofer project after negotiating an AUP for the Spoofer Project. The Spoofer project helps us gather data on what types of IP source address filtering is prevalent among networks attached to the Internet.
Hardware Requirements and Recommendations
- Hardware Requirements
- What is the minimal hardware required to run the Archipelago software?
We have Ark running on a 400MHz PII with 64MB of RAM. However, new deployments should plan for at least a PIII with 256MB of RAM and 4GB of disk.
- What is the preferred hardware?
A host outfitted with a 1.5GHz PIV with 512MB of RAM and 15GB of disk does nicely.
- Should we provision dedicated or can we share the hardware platform with other applications?
Ark prefers a dedicated host to avoid any potential interferences when doing timing-sensitive measurements. This also allows us to lock down the system to impose a systemwide security mechanism and policy using FreeBSD jails and packet filtering. We need root access to a complete host (not jail/chroot/virtual environment) including the ability to patch the kernel. We prefer a dedicated host that we manage and maintain fully (security updates and OS upgrades) though we are happy to have accounts and access for local system administrators.
Virtualization creates problems for us in two related areas, in clocks and in timestamping. We need a real-time clock that never jumps backwards. We also need precise timestamping of packet send and receive times. As a test, we have one Ark monitor running as a guest (virtual server), and we have found RTT values to be coarsely quantized due to lack of clock precision or other clock issues. Even if the clock were precise, accurate, and real-time, we would expect to see latency caused by sharing a single physical box among multiple guest operating systems.
- What is the minimal hardware required to run the Archipelago software?
- Operating System Requirements
We use stock FreeBSD 7.x (though we can use Linux). We have experience in remotely managing the OS, including keeping it patched and doing remote OS upgrades. We like to minimize the amount of work hosting organizations have to do on our behalf.
- Usage Patterns
- How much CPU does the system consume, on average?
CPU usage is currently minimal, even when doing continuous large- scale traceroute measurements at 100pps. Even the 400MHz PII is 90% idle. For a more modern PIV box, CPU usage is hardly noticeable. We do not foresee doing computationally intensive workloads (almost all Internet measurements are I/O bound). We may decide to push out some preparatory work to the monitors in the future (such as preprocessing BGP tables), which would increase CPU usage, but we do not anticipate ever requiring large amounts of CPU resources.
- What are the average memory requirements?
Memory usage is currently minimal (< 35MB in all Ark processes). In fact, many of our boxes have only 128MB of RAM. Nevertheless, a box with 512MB provides an ideal platform for hosting our current experiments with room for future possibilities.
- How much network bandwidth does Ark require, on average?
We run our current traceroute measurements to every routed /24 prefix at 100pps for about 35kbps of outgoing traffic. That produces about 5MB of trace data per hour which we download to a CAIDA host concurrently with the measurements. This exemplifies typical bandwidth requirements for an active measurement. We might run a few measurements concurrently, each having about that much bandwidth usage (or more likely less). We do not plan to host any services (web, content, or distributed hash table) that can potentially generate a lot of traffic. Nor will we do any high volume bandwidth measurements (a la Iperf), since we would like to avoid generating complaints from recipients of measurement traffic.
-
What type of network traffic should we expect? How should we configure our firewall?
Our goal is for Ark monitors to be used for a wide variety of measurements, and the set of measurements will evolve over time. However, current measurements are about Internet topology and, therefore, they employ similar types of low-level probe packets and receive similar types of responses even though the exact details and goals of the measurements may differ. We provide below a list of these low-level probe and response packets in the form of firewall-like rules. We will contact each hosting site to request permission to conduct any measurements that are significantly different from these current measurements (for example, for Spoofer measurements).
Current measurement traffic consists mostly of outgoing topology probe packets (e.g., ICMP echo request) and their expected responses (e.g., ICMP echo reply). We also need to perform traceroute and ping measurements to Ark monitors themselves, so a firewall should freely allow ICMP request/response traffic in both incoming and outgoing directions (that is, for both measurements to and from monitors).
In addition to measurement traffic, an Ark monitor will need to open a TCP connection to a central server at CAIDA (this will always be an outgoing connection from the Ark box to CAIDA's server), and a monitor will need to allow incoming SSH connections from CAIDA's /24 (but from nowhere else). In general, we do not run any network service except SSH on an Ark box, to increase security.
The following is a summary of the expected traffic in firewall-like rules:
- Outgoing:
- ntp (123/udp) to your local NTP server, to CAIDA's NTP server, or to the nearest NTP Pool server
- dns (53/udp) to your local DNS server(s) or to CAIDA's DNS server
- TCP connection to CAIDA's tuple-space server from any local (ephemeral) port (for Ark's tuple space communication)
- ICMP echo request, echo reply, port unreachable to any host (for ICMP-based topology measurements to and from the monitor)
- no ICMP rate limiting
- UDP probes from any local port to any host and any port (for UDP-based topology measurements from the monitor)
- TCP probes from any local port to any host and any port regardless of connection state (for non-SYN based TCP measurements such as sending a TCP ACK probe, which won't establish a connection nor be part of an existing connection)
- Incoming:
- NTP and DNS responses
- ssh (22/tcp) from only CAIDA's /24 prefix
- ICMP echo request, echo reply, time exceeded, and destination unreachable (type 3, code any) from any host
- no ICMP rate limiting
- TCP packets (SYN, ACK, RST, etc.) from any host and any port regardless of connection state (for TCP-based topology measurements)
- Outgoing:
- How much CPU does the system consume, on average?
IPv6 Network Requirements and Recommendations (Desireable)
CAIDA has plans for continued strategic deployment of IPv6-capable Ark measurement nodes. If your site enjoys IPv6 connectivity, the following describes the minimum requirements for conducting measurements on an Ark node.
- Currently, we only accept native IPv6 connectivity.
- An autoconfigured IPv6 address works fine (that is, the address does not have to be manually assigned to the host such that a change in hardware does not change the assignment).
- A DNS PTR record for the IPv6 address offered over either IPv4 or IPv6 transport.
Mitigation and Handling of Complaints
In general, we try to perform measurements in ways that reduce the likelihood of complaints. For example, we do relatively low volume and low frequency measurements (from the point of view of individual destinations) and prefer to avoid probing the same destinations repeatedly.
Because complaints do occasionally occur, we try our best to direct the complaints to us rather than to the site hosting a monitor. An important way is by setting up the reverse mapping for a monitor IP address to either monitor.ark.caida.org (e.g., san-us.ark.caida.org) or monitor.ark.caida.site-domain (e.g., san-us.ark.caida.ucsd.edu).
We have weighed the possibility of running a lightweight webserver on the monitors themselves that would describe the measurements, but based on an evaluation of the security vs. benefits tradeoff (something we have considered for many years in the context of the skitter infrastructure), it is not our general policy to set up such a webserver. We can, however, do so upon request by the site hosting a monitor.
Hosting sites should simply forward any complaints they receive to us. We will respond to the complaints, and if necessary, add destinations to our no-probe list which will prevent future complaints from the same destination.
![[CAIDA - Cooperative Association for Internet Data Analysis logo]](/images/caida_globe_faded.png)