AIMS-17 (GMI-AIMS-5) Hackathon

February 8-9 (2025), GMI project collaborators participated in a hackathon to do some on-site development on Ark related projects.

For more information about the GMI project, see the GMI3S website and GMI3S funding page.

Hackathon Dates: February 8 (Saturday) - 9 (Sunday) 2025
Place: Room 408, San Diego Supercomputer Center, UCSD Campus, La Jolla, CA

Background

We hosted an Ark-focused hackathon (a “Harkathon”, one could call it) the weekend before GMI-AIMS-5 (Feb 8 and 9), focused on team coding challenges that leverage the new scamper libraries, and the Ark infrastructure.

We also hosted hands-on training tracks on Thurs and Friday 13-14 Feb, focused on the UCSD Network Telescope.

Required reading for all hackathon participants:

Hackathon Chairs:

Matthew Luckie (CAIDA/UC San Diego)
kc claffy (CAIDA/UC San Diego)

Scamper Orientation

scamper’s Python module

CAIDA’s scamper python module allows programmers to drive scamper Internet measurements across CAIDA’s Ark. The scamper tool is a packet probing utility used for active measurements of network characteristics like latency, traceroute, and packet loss. CAIDA’s Archipelago (Ark) platform is a global network measurement infrastructure consisting of strategically located vantage points that collect and analyze Internet data.

Running code

- Request access by signing the CAIDA Computer Facilities Usage Agreement
- Hostname provided to hackathon participants - The mux socket is located at /run/ark/mux

Background reading

- Full Documentation - full details on architecture, API, and 4 examples
- Paper on the scamper python module: An Integrated Active Measurement Programming Environment
more details on architecture and 3 examples. Related slidedeck is also available.

Example

The following example implements the well-known shortest-ping measurement technique, which conducts delay measurements to an IP address from a distributed set of vantage points, and reports the shortest of all the observed delays with the name of the vantage point that observed the delay. More details can be found in scamper Python module documentation.

fireball-aims> python3 simple-ping.py /run/ark/mux 192.172.226.122

import sys
from datetime import timedelta
from scamper import ScamperCtrl

if len(sys.argv) != 3:
  print("usage: shortest-ping.py $mux $ip")
  sys.exit(-1)

with ScamperCtrl(mux=sys.argv[1]) as ctrl:
  ctrl.add_vps(ctrl.vps())
  for i in ctrl.instances():
    ctrl.do_ping(sys.argv[2], inst=i)

  min_rtt = None
  min_vp = None
  for o in ctrl.responses(timeout=timedelta(seconds=10)):
    if o.min_rtt is not None and (min_rtt is None or min_rtt > o.min_rtt):
      min_rtt = o.min_rtt
      min_vp = o.inst

  if min_rtt is not None:
    print(f"{min_vp.name} {(min_rtt.total_seconds()*1000):.1f} ms")
  else:
    print(f"no responses for {sys.argv[2]}")

Hackathon projects

ECS-SSD: ECS Scanning: Stateful, Scalable, Distributed

Background: We devised a new method to scan for ECS (EDNS Client Subnet) support. (see: https://arxiv.org/abs/2412.08478 ; code: https://github.com/tumi8/ECSplorer). By keeping state, this method cuts probing costs compared to existing methods. We performed most of our measurements from a single VP, but by using a few additional VPs (4 in total) we showed that our method cannot fully replace a distributed approach.

Mission: Use Ark to investigate the degree to which distributed deployment of this method reveals additional insight.

Intended Outcomes:

Open-source, Ark-interoperable implementation of method,
Analysis: Savings in probing cost and permissibility of potential user measurement interests (parameterized on #domains and #authoritatives) in consideration of residual burden.

Shepherds: Mattijs Jonker (U Twente), Patrick Sattler (TU Munich)

Team Members: Sumanth Rao (UC San Diego)

Outcome:

We translated the response-aware ECS scanning code from Go to Python.

The python code could then use the scamper module to perform the ECS requests. The module simplified our implementation: we only had to replace the actual issuance of the query. Our results showed that single vantage point measurements could cover a significant share of addresses but response behavior to specific queries depends on the location/PoP. We also confirmed our previous results from small scale distributed measurements: that Cloudflare does include /24 scoped responses but performs load balancing on the source address of the query instead of the ECS subnet.

IYP+RPKI: Seeding active measurements with IYP data sources

Background: The Internet Yellow Pages (https://www.iijlab.net/en/members/romain/pdf/romain_imc2024.pdf) provides numerous datasets documenting Internet topology. IYP is a good candidate to automate the selection of targets for active measurements done with Ark.

Mission: Use IYP data to seed/trigger active measurements from Ark and provide a template for doing so.

Intended Outcomes:

Create a simple project that uses both scamper and IYP. It should first query IYP to get a list of target IP addresses or hostnames and then it initiates Ark measurements to these targets.
Documentation for future use. Documenting and making public the above scripts would be beneficial for future research. The documentation should briefly explain how it works and possible variations.
Make a real case example. Finally make a real case example that could be useful for your research. One example analysis is to run Traceroute or HTTP measurements to CrUX Top 1k website for the country where the monitor is located so that we can study the paths or latency to popular content in each country.

Shepherds: Romain Fontugne (IIJ), Malte Tashiro (IIJ)

Team Members: Deepak Gouda (Georgia Tech), Bradley Huffaker (CAIDA/UC San Diego)

Outcome: RPKI invalid prefixes have lower visibility [1] because many large transit networks validate BGP announcements using these ROA objects and drop RPKI invalid prefixes. One study found that 22.3% of Autonomous Systems were fully protected from invalid announcements [2]. However, we still see 1.05% of routed IPv4 prefixes that are RPKI Invalid (per NIST’s RPKI monitor [1]). We ran traceroutes from multiple Ark VPs to active hosts in these RPKI invalid prefixes, and found that certain ASes do not drop all RPKI invalids. We analyzed reachability and RTT to these prefixes.

Percentage of Traceroutes reaching destination
- RPKI Invalid - 25.7%
- RPKI Valid - 50.2%
70.6% RPKI invalids have higher number of intermediate hops
72.5% RPKI invalids have higher RTT

The traceroutes to Cloudflare had the following results:

RPKI Invalid
- Percentage of traceroutes reaching destination - 0.67%
- Mean RTT - 27.5 ms
RPKI Valid
- Percentage of traceroutes reaching destination - 99.26%
- Mean RTT - 8 ms

[1] https://rpki-monitor.antd.nist.gov/
[2] https://rovista.netsecurelab.org/analytics
[3] https://link.springer.com/chapter/10.1007/978-3-030-44081-7_5

Anycast-GEO: Anycast Geolocation using Traceroute

Background: Anycast allows for replicating a service at geographically distributed locations, increasing resilience. Conventional IP geolocation services do not work for anycast IPs, which are announced from multiple locations. (Reading: https://www.sysnet.ucsd.edu/sysnet/miscpapers/manycast2-imc20.pdf )

Mission: Enumerate and geolocate known anycasted prefixes by performing traceroute measurements from geographically distributed VPs (Ark with the new scamper).

Intended Outcomes:

Identify opportunity for using unicast geolocation penultimate hop of the traceroute against anycasted IPs to infer the location of anycast sites.
Develop a method to enumerate different ingress paths of an anycast site.

Shepherds: Remi Hendriks (U Twente), Raffaele Sommese (U Twente)

Team Members: Tim Betzer (TU Munich), Ben Du (CAIDA/UC San Diego), Zesen Zhang (CAIDA/UC San Diego)

Outcome:

Traceroute probing can be used to infer that an IP address is being routed with anycast, i.e., anycast addresses. The intuition is that traceroutes toward an anycast address from distinct Vantage Points (VPs) will reach different anycast sites (instances), with different nearby hops. The top-right image (output 1) shows how traceroutes from two different Ark VPs reach different anycast sites of 1.1.1.1 Using hoiho [1] and IPInfo [2] we infer the location of traceroute hops near the destination, from which we infer the location of the reached anycast site. We find the Ark VP near ORY France reaches the 1.1.1.1 site in Paris, and the VP near FRA reaches 1.1.1.1 in Frankfurt. Since 1.1.1.1 implements CHAOS records [3] we obtain ground truth on the actual locations reached (output 2). Next, we plot the ground-truth CHAOS locations and inferred traceroute locations (bottom right map) where white dots indicate verified traceroute locations, red indicates missed locations, and orange indicates incorrectly inferred locations. This map shows promising results when geolocation anycast sites using traceroute.

Next, we ran this methodology towards 100 known anycast addresses (example shown in output 3). We plot the number of anycast sites inferred using the current state-of-the-art latency-based approach [4] and traceroute, where we found similar enumeration counts.

We implemented all CHAOS and traceroute scripts using Scamper. The methodology is promising; the main limitation is falsely classifying unicast as anycast due to neighboring hops in distinct cities. Future work is to improve the methodology to minimize false classifications, and to perform this measurement at large scale to further assess performance and to provide an elaborate comparison to the latency-based approach.

[1] Learning to extract geographic information from internet router hostnames, Luckie et al. [2] IP geolocation Database, https://ipinfo.io/ [3] Requirements for a mechanism identifying a name server instance, RFC4892, Woolf et al. [4] A fistful of pings: Accurate and lightweight anycast enumeration and geolocation, Cicalese et al.

DNS-RTA: DNS Real-time analysis

Background: DNS security researchers need insights into aggregated statistics regarding DNS infrastructure, such as common hosting infrastructure, performance trends, resolvability patterns, market share of registries. (Reading: https://arxiv.org/pdf/2405.12010 )

Mission: Develop real-time data processing pipeline leveraging Apache Flink/Clickhouse to analyze data streams from newly registered domains and domains of recently issued certificates.

Intended Outcomes:

Identification and signals for intra-day events, such as DNS record modifications, potential domain hijacking, or abusive activities.
Visualization layer (Grafana?) to present a timeline of measurements.

Shepherds: Raffaele Sommese (U Twente), Antonia Affinito (U Twente)

Team Members: Joseph Khoury (Louisiana State University), Bassel Succar (Louisiana State University)

Outcome: This project aimed to analyze the caching behavior of public DNS providers. Specifically, it focused on understanding how multiple caches within each PoP handle user queries. We ran parallel measurements from all Ark Vantage Points (VPs) . We selected a subset of domains from the Tranco list and conducted 30 measurements per resolver for each domain to ensure a comprehensive understanding of cache behavior. These measurements allowed us to aggregate cache counts by domain and compare the caching behavior of popular DNS providers, such as Google’s 8.8.8.8 and Quad9 (9.9.9.9).

Our results showed that countries like the U.S. and Singapore had higher cache counts for Google Public DNS, while Quad9 typically had lower cache counts, suggesting stricter caching policies or lower cache retention. The U.S. region had notably higher cache counts, likely due to stronger caching policies or higher query volume.

For Cloudflare, we observed that the maximum cache count was usually 1, in line with the TruffleHunter paper, but in some cases, it reached 2, deviating from the expected behavior.

TracerouteTrigger: Trigger real-time (Ark) measurements based on background (Atlas) probes

Background: RIPE Atlas has broad coverage but limited ability to perform reactive measurements. Meanwhile, we hypothesize that reactive measurements based on signal in RPKI/BGP/Atlas are a good method for outage detection. This project is similar to (and may be merged with) project TracerouteTrigger during hackathon.

Mission: A potential synergy is to leverage Ark for real-time measurements that react to changes in Atlas background measurements. (Can also use Ark’s background measurements as an alternative) (BGP) Monitor close-to-real-time BGP updates (RIS-live, RIS kafka (proof of concept, can be available) or other BGP feed, e.g. routeviews BGP) and reactively probe prefixes that had an announcement change. Note tradeoff (effectively, a ROC curve) between probing too often and probing only the tail-end of an event.

Intended Outcomes:

Use continuous RIPE Atlas measurements (for example, probe https://docs.google.com/document/d/1XyH6BtPemixVz-ftCknu1VYihg0_DSnCic9tYdSJzNA/edit?tab=t.2nfhgah2g6p4measurements from the anchor mesh) to determine baseline behaviour (path length, latency).
Sample from outliers + baseline to do more intensive reactive measurements using Ark. e.g. baseline latency shift between countries
reactively start measuring the loss to pingable IP addresses in target AS/country.
(BGP) Have a stream of (probabilistic, temporal) transient events in BGP that are possible signal for outages or routing changes
Determine the reference level (latency, path length, …) for targets and use active measurements to improve the reference data and for confirming the signal
Evaluate what thresholds in the BGP signal are enough signal to actively probe.

Shepherds: Ties de Kock (RIPE NCC)

Team Members: Lion Steger (TU Munich), Max Gao (CAIDA/UC San Diego), Alex Maennel (TU Dresden), Johannes Zirngibl (Max Planck Institute for Informatics)

Outcome:

Our system performed traceroutes from all Ark nodes to a pingable IP address in the target prefix almost instantly (first packet: < 0.5s) after an event was detected. Our system targeted a known pingable IP address (via the ISI hitlist) in the prefix for which the event was detected.

We detected events using outlier detection on the continuous RIPE Atlas Anchor Mesh measurements, as well as by monitoring which prefixes experienced a recent RPKI change. Our RPKI stream is close to real-time by monitoring of 10+ RPKI vantage points.

We evaluated the routing changes by comparing the result to traceroutes from the ITDK Internet Topology Data Kit. This is a sparse baseline (we only have the ground truth from one Ark probe).

Future work is to evaluate the dynamic changes in routing after an event, potentially even measuring the ground truth before an RPKI ROA starts to affect Internet routing.

Using Public-Cloud-Providers for opportunistic passive network measurements

Background: Nowadays many internet services are deployed in public clouds. Measuring the Internet background radiation such a service receives is important to understand potential security threats. Traditional telescopes may not provide all the visibility needed, so another lens in the cloud, living next to the Internet services, seems like a promising idea to get important insights.

Mission: Leverage spot instances and similar cloud-specific services (e.g. Lambda) to build a widely distributed network telescope. It should work on major public cloud providers using a uniform interface.

Intended Outcomes:

A list of cloud services and providers that can capture traffic.
A cost/benefit analysis of the services, providers and associated IP leasing.
A uniform interface to schedule measurements and collect results.
Evaluation of whether its possible to influence IP allocation.

Shepherd: Bernhard Degen (U Twente/CAIDA), Nils Kempen (University of Muenster), Ricky Mok (CAIDA/UC San Diego)

Team Members: Syed Mujtaba (CAIDA/UC San Diego), Tanmay Nale (CAIDA/UC San Diego)

Outcome: This graph shows (a subset of) Ark monitors at most five (responsive) hops from DigitalOcean regions. It was created by tracing all VMs in regions that we have in 4 cloud providers from 5 random Ark nodes for each VM. Subsequently, we imported the traceroutes into Neo4j, along with region, Ark monitor and cloud providers information.