Skip to main content

CAIDA's Annual Report for 2025

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, and operating expenses for 2025.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insights into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Executive Summary

In 2025, CAIDA completed the four-year GMI3S Design Phase, designing and prototyping the next generation of Internet measurement infrastructure to support collection, curation, archiving, and expanded sharing of the data and tools essential for understanding and strengthening the security, stability, and resilience of the Internet. The project enabled us to redesign, expand, and modernize several core data collection platforms that capture traffic data, routing (BGP) data, active measurements, and DNS data. Our efforts spanned four components: highly distributed measurement and data acquisition infrastructure capable of capturing several types of security-relevant data and hosting vetted experiments; data management infrastructure supporting data usability, curation, discovery, and sharing; data analytics platforms providing interactive access to strategic derived datasets that reveal vulnerabilities, risks, and insights for mitigating threats to Internet infrastructure; and outreach engaging stakeholders in infrastructure development, use, evaluation, and evolution, while scaling up STEM workforce training. We proposed an Implementation Phase to deploy a subset of this designed infrastructure — focused on the active measurement platform and the ancillary datasets that make its measurements interpretable — which remains pending as of June 2026.

In the meantime, we received funding for five new projects: building dual-stack IPv4/IPv6 telescopes and honeypots across distributed vantage points to collect datasets for analyzing malicious Internet activity; creating modules that enable our telescope data to support ML/AI-driven cybersecurity research; building a routing observatory and operational dashboard to improve the security and resilience of U.S. research and education network infrastructure; developing cyberinfrastructure-ready, data-driven cybersecurity training resources using real-world datasets; and applying AI techniques to infer the utility of datasets and software tools based on their documented use in scientific publications.

These investments in infrastructure continued to pay dividends for the community. CAIDA researchers co-authored 14 peer-reviewed papers in 2025, at venues including ACM SIGCOMM, IMC, CoNEXT, PAM, and TMA. Use of many CAIDA data resources grew by as much as two orders of magnitude over 2024: our AS Rank API received requests from nearly 2 million unique IPs, and our most popular public dataset (Routeviews IPv4 Prefix-to-AS) was downloaded by more than 56,000 unique users. We published two new Internet Topology Data Kits — the latter incorporating MPLS tunnel detection for the first time — and hosted the final two GMI-AIMS workshops, whose hackathons yielded six publications in 2025 with more in progress. We also launched ESCALATE, a new NSF CyberTraining project that develops hands-on cybersecurity education modules using real-world Internet datasets on national high-performance computing resources.

These activities kept us busy enough that we did not compile this report until June 2026. The report highlights our work across these projects, covering what we built, improved, and learned in 2025.

Data Measurement and Acquisition Infrastructure

High-level view of CAIDA's Data Acquisition Infrastructure reflecting our modular and comprehensive approach to building an integrated system that supports research across many aspects of the Internet

High-level view of CAIDA's Data Acquisition Infrastructure reflecting our modular and comprehensive approach to building an integrated system that supports research across many aspects of the Internet


We completed the GMI3S Design Phase, during which we successfully designed and prototyped multiple next-generation Internet measurement infrastructure components to support cybersecurity and networking research. The project addressed critical gaps in collecting, organizing, and using Internet data—particularly data about security vulnerabilities that suffer collective action barriers to mitigation. The primary project outcome was a foundation for an Implementation Phase that will provide the research community with critical tools for understanding and improving Internet security and resilience. We published a comprehensive Monitor Specification Report defining the design and capabilities of next-generation Internet measurement system components, after incorporating feedback from multiple community workshops (AIMS-5 and AIMS-6).

Active Measurement Infrastructure

New active measurement platform architecture that includes programming environment. Based on feedback from community stakeholders over the course of three years of workshops and online meetings, we developed a solution to optimize the tradeoff between capability and security. We designed and prototyped a Python-based integrated programming environment that exposes both a set of distributed VPs, and a set of useful measurement primitives from which to build sophisticated measurement tools. The environment makes it difficult for a researcher to cause harm, intentionally or not, as researchers are restricted to the available measurements. We benefited from significant uptake from interested researchers and partners who used and evaluated our prototype deployment. This year our Ark measurement infrastructure expanded to approximately 300 active VPs: a mix of Raspberry Pis, virtual machines, and containers across multiple continents. Our prototype VP management system (Arkmon) facilitated deployment of 103 containerized nodes, including multiple fleets: NRP (56), Vultr (32), DREN (8), M-Lab (3), and a few one-off home deployments.

Sample view of the Arkmon web-based system for configuring and managing Ark node deployments

Sample view of the Arkmon web-based system for configuring and managing Ark node deployments

Arkmon node management system. We designed a prototype of secure API-based coordination of global monitors enabling per-node configuration, automated tasking, and external host access (Arkmon). Arkmon replaced a legacy system that had accumulated significant technical debt. The new system will provide a Keycloak-secured API as the single point of entry for all operational scripts including SSH configuration, metadata generation, container management, and Spoofer integration, all accessible from outside the CAIDA firewall for the first time. A web interface will support host-facing monitor status, volunteer onboarding workflows, and administrative tools. New features include tracking allowed measurement primitives per monitor and auto-generated tasks for hosts, laying the groundwork for more granular control over what measurements can run on which vantage points.

Measurement Pipeline for Large-Scale Deployment of Cloud Vantage Points. We launched Cloud Ark, using Terraform automation to deploy Ark measurement containers on commercial cloud platforms (AWS, Azure), automatically tagged by region and airport code and connected to Arkmon. A parallel Terraform pipeline supports five cloud providers (AWS, Google, Azure, DigitalOcean, Vultr) with IPv4 and IPv6 for broader measurement campaigns. Two new scamper releases added support for path MTU discovery and early work on the QUIC transport protocol, along with improved Python interfaces for researcher workflows.


One-Way Traffic (Internet Background Radiation)

Architecture of UCSD-NT & STARDUST and STARNOVA.

Architecture of UCSD-NT & STARDUST and STARNOVA.

We continued to expand and modernize the UCSD Network Telescope (UCSD-NT), the world’s largest scientific Internet traffic observatory. Using the 400G traffic aggregator deployed in 2024, we began collecting unsolicited traffic from SDSC’s production network. We identified up to 58 unique IP addresses (either network or broadcast addresses in the SDSC production network) that we can leverage to capture ingress traffic as Internet background radiation. To keep up with relentless growth in this type of traffic, we prototyped a new containerized traffic capturing and monitoring framework, which builds on Open vSwitch to mirror and forward incoming packets to different containers. This framework will lower overhead and increase the reliability of UCSD-NT during surges in incoming traffic.

Overview of the iVoyager infrastructure.

Overview of the iVoyager infrastructure.

iVoyager: IPv6 telescopes. We launched iVoyager (Internet Voyager for Gathering Cyber Threat Intelligence), a new cyberinfrastructure project to expand CAIDA’s ability to gather cyber threat intelligence by deploying distributed dual-stack (IPv4 and IPv6) telescopes and honeypots. Existing network telescopes passively capture only limited types of security events, and the vast IPv6 address space makes traditional scanning-based detection infeasible. iVoyager will provide a flexible virtualized environment for rapidly deploying distributed telescope and honeypot vantage points, including in public clouds. We have deployed a proactive IPv6 telescope at NeocomISP that used BGP prefix announcements, hitlist seeding, SSL certificates, and honeypots to attract scanners, capturing the largest volume of unsolicited IPv6 traffic reported in the literature (Unveiling IPv6 Scanning Dynamics: A Longitudinal Study Using Large Scale Proactive and Passive IPv6 Telescopes, CoNEXT). The project builds directly on STARNOVA’s preliminary IPv6 telescope results.

Distributed (including cloud) telescopes. To explore the feasibility of distributed telescope deployment, a team of students investigated commercial cloud platforms (AWS, Azure, GCP, DigitalOcean, Vultr) during a hackathon at the AIMS-5 workshop in February 2025. In April 2025, the team captured two weeks of multi-region background radiation data across all five providers simultaneously — the first such cloud-distributed telescope experiment reported in the literature; a manuscript reporting the analysis results will be presented at TMA 2026.

We continued monthly one-hour captures of two-way traffic on a 100 Gbps backbone link. When the original Los Angeles–San Jose link was upgraded to 400 Gbps in December 2024, we switched to a different 100 Gbps link between Los Angeles and Dallas to retain our capture capability. We split the resulting data into two catalog datasets — Anonymized Two-Way Traffic Packet Header Traces 2024 (100G) (April–November) and 2025 Traces 100G (January onward). We also provide a Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler (~880 MB) enabling researchers to evaluate the data before committing to large downloads. We publish a publicly available metadata dataset with summary statistics and visualizations of data rates and packet size distributions.

Infrastructure for Data Management, Discovery, Usability

In 2025, we continued enhancing our data management and discovery infrastructure, introducing new tools and tutorials for data processing, validation of data integrity, automated citation tracking, and resource access management. New capabilities included data integrity tooling for telescope and BGP data, a significantly expanded Resource Access Management (RAM) portal, and an automated system to discover publications using CAIDA resources.

Telescope Data Integrity Tools. In collaboration with TU Dresden, we completed a systematic study of UCSD-NT’s packet-capturing reliability and the correctness of the prefix filters used to identify unsolicited traffic. We provided background on the history of the telescope, and focused on increasing operational challenges as the underlying network evolves. We developed and applied techniques to leverage third-party scanning activity to validate the integrity of the data, and to discover misconfigurations in the instrumentation. We illustrate the implications of our analysis using concrete examples. We discuss how our findings generalize to support the expanding ecosystem of other passive techniques, such as honeypots, to track security phenomena. The resulting study was accepted and presented to the ACM SIGCOMM 2025 Experience Track, Lessons Learned from Operating a Large Network Telescope.

Training materials for use of HPC resources for traffic analysis. The huge volume of Telescope traffic presents challenges in performing data analysis. Historically, we mainly provided virtual machines (VMs) for researchers to bring their own code to these highly sensitive data. The computational capacity of our current VMs has reached its limits, prompting the need for redesign. As we explore the appropriate (most cost-effective) architecture, we have leveraged the SDSC Expanse supercomputer (allocations available through ACCESS) to process telescope data, and offered this mode to other researchers who qualify for ACCESS allocations. Ricky Mok, CAIDA research scientist, and Max Gao, CAIDA graduate student, developed tutorials on Telescope data analysis using the SDSC Expanse supercomputer. We conducted a hands-on tutorial session during AIMS 2025 and published the tutorial (slides).

Overview of the Curated AI-ready Network Telescope Datasets for Internet Security (CANIS) infrastructure.

Overview of the Curated AI-ready Network Telescope Datasets for Internet Security (CANIS) infrastructure.

Curated AI-ready Network Telescope Datasets for Internet Security (CANIS) We began a new NSF-funded (CICI) project to transform the applicability of the UCSD Network Telescope for AI-driven cybersecurity research. As AI tools are increasingly applied to telescope data by researchers with varying levels of expertise, data integrity and provenance become critical. CANIS addresses this through three efforts: a monitoring framework that uses active measurements and BGP data to continuously verify telescope data integrity; enhanced metadata pipelines that tag source IPs against blocklists and malware indicators; and a curated library of labeled reference datasets for benchmarking ML models. The project will also migrate telescope data from the legacy FlowTuple format to Apache Parquet for compatibility with modern ML/AI workflows. (CANIS funding website)

Resource Access Management (RAM) Portal. Our Resource Access Management (RAM) portal project expanded in 2025 to support access to multiple CAIDA resources with centralized user permission management and automated approval workflows. Users complete a one-time registration and can then request access to any supported dataset by submitting a brief use-case justification; access to public datasets is granted automatically upon submission. Backend improvements enable individual user permission tracking and request audit trails; the frontend gives both users and administrators clear visibility into request status and approval history. This portal represents CAIDA’s primary mechanism for shifting from fully open downloads to lightweight authenticated access — providing more reliable usage metrics for funding agencies without unduly burdening researchers.

CAIDA Internet Research Resource Catalog. In 2025, we continued enhancing CAIDA’s Internet Research Resource Catalog, which serves researchers worldwide with a comprehensive collection of publications, datasets, software, and related resources linked through rich cross-references. A key new capability was ExPub, a tool that automates the discovery, curation, and metadata extraction for external publications using CAIDA datasets. ExPub integrates Google Scholar results, automatically parses metadata from major academic platforms (IEEE, ACM, Springer, and MDPI), and generates YAML-formatted entries that feed directly into the catalog — significantly reducing the manual effort required to maintain our index of non-CAIDA publications that use CAIDA data and software resources. We continued expanding catalog collections that group related resources for improved discoverability, building on the Acceptable Use Agreement (AUA) integration and collection support added in 2024.

BGP Data Integrity Tools. In 2025, we developed new tooling to improve the integrity and reliability of BGP routing data collected through RouteViews and other sources. MRT archives from RouteViews and RIPE RIS are essential for research and operational insight, but they sometimes contain corruption from storage errors or software bugs, producing misleading results if undetected. The MRT dissection tool, a fast C-based open-source utility, lets researchers and data administrators check MRT files for corruption before use — tracing anomalies back to their original BGP bytes and linking each affected field to the relevant RFC. This enables diagnosis at the byte level rather than inference from downstream parsing failures. A complementary metadata analysis pipeline filters noisy update bursts from RouteViews archives, reducing file sizes by up to 95% while preserving research-relevant content. We are working with the RouteViews team on annotating or pruning noisy update periods prior to distribution, improving the reliability of BGP data shared with the research community. We also built and tested an MRT-tools-based utility that de-duplicated update-record components — everything except the timestamp and routes (NLRIs), including elements like AS Path and Communities — across every update RouteViews stored in January 2024. More than 9 in 10 records were exact duplicates, yielding roughly 40% disk savings. While significant, this result does not appear transformative, so we will not pursue it further in this project. All the code will be open source at https://github.com/CAIDA/mrt-tools.

Proposed workflow of the IDSU project, inferring scientific dataset usage by training large language models (LLM)

Proposed workflow of the IDSU project, inferring scientific dataset usage by training large language models (LLM)

IDSU: Inferring Data Set Utility. Sustaining data collection and curation is expensive, and AI-ready datasets will cost even more to sustain. Funding agencies need ways to assess which datasets meaningfully advance scientific discovery. For years CAIDA has tracked use of its data in publications, partly to support CAIDA’s catalog linking papers to the data or software resources they use. But even state-of-the-art extraction methods cannot handle the complexity and contextual nuance of natural language well enough to both identify a resource and discern whether the paper merely mentions or actually uses it. In 2025 we received an EAGER award (NSF OAC-2526448) to apply AI to this task—and it proved far harder than it sounded. Before we could develop or evaluate anything, we had to hand-label a ground-truth set of papers, capturing the exact sentence that confirmed use rather than a passing reference, and build a separate list of target resources (favoring publicly funded or publicly accessible ones). On the algorithm side, we formalized the full extraction-and-matching workflow—prompt creation, LLM execution, response processing, and evaluation—then ran a comparative evaluation across multiple LLMs, since model performance varies sharply by task. We also tested matching techniques and prompt designs, finding that two constructed few-shot examples beat full papers while using far fewer tokens. Finally, we built a pipeline to process, aggregate, and score LLM responses against ground truth.


Internet (Security and Performance) Data Analytics Platforms and Supporting Software

We evaluate and demonstrate the utility and integrity of our own infrastructure and data by building data analytics tools and platforms that reveal vulnerabilities, risks, and crucial insights for strengthening resilience and mitigating threats to Internet infrastructure. In 2025, highlights included the AVOID project advancing to field deployments in U.S. Department of War (DoW)-relevant regions with a commercial spinout (Revelare Networks), DarkSim evaluation scaling to SDSC Expanse HPC resources, our speed test research program expanding to a seventh platform while publishing new findings on network topology and CDN infrastructure, and our new Routing Observatory (ROOTBEER) project, a collaboration with Internet2.

Inferring Security Properties of Internet Topology (AVOID)

The AVOID prototype design ensures that communications for devices in the tactical bubble only traverse benign commercial cellular network base stations.

The AVOID prototype design ensures that communications for devices in the tactical bubble only traverse benign commercial cellular network base stations.

In collaboration with Johns Hopkins and USC/ISI on the AVOID project, we developed a new API and backend system that accepts traceroute measurements and returns inferred locations, networks, and router vendors along the path, integrating data from CAIDA’s Internet Topology Data Kit. We designed and prototyped a new traceroute visualization and annotation tool, Pathfinder (login required) to give users a deeper view into global Internet topology. Users can submit requests to run, search, or annotate traceroutes, and receive rich information about the organizations, countries, and router vendors associated with each IP address along the path. Users can make requests through a web interface or an API. If a new traceroute is requested, Pathfinder hands it off to Arktrace, which drives Ark monitors to run the traceroute and send the results back. Pathfinder then annotates each IP using a mix of passive data sources (including BGP data) and active measurements.

For a separate task of the AVOID project, we deployed a prototype avoidance routing overlay on Ark nodes in eight cities worldwide (Frankfurt, Hong Kong, New York, Los Angeles, London, Paris, Sydney, Warsaw). We began migrating overlay software to Docker containers for compatibility with Defense Research and Engineering Network (DREN) infrastructure.

Unfortunately, funding for this project ended in 2025; we hope to continue this exciting work when resources allow.

Inferring Security Properties of Internet Traffic to Scalably Detect Anomalies

We continued development of DarkSim, our similarity-based framework for detecting anomalies in network telescope traffic time series, in collaboration with Northwestern University. Building on the benchmark results published at ACM IMC 2024, we designed labeled datasets and evaluation benchmarks for comprehensively assessing DarkSim’s accuracy and robustness, and began reproducing three state-of-the-art comparison methods on SDSC Expanse to enable large-scale, reproducible evaluation. A primary infrastructure challenge — network capacity between CAIDA storage and SDSC Expanse — was identified as a pipeline bottleneck; we began exploring the Open Science Data Federation (OSDF) as a data hosting solution to accelerate data transfer.

Scientific Analysis of Network Performance

Reproducible Assessment of BroadBand Internet Topology and Speed (RABBITS). CAIDA’s speed test research program addresses two complementary questions about the commercial speed test infrastructure used by hundreds of millions of people: whether tests produce reliable and reproducible results, and what network topology and routing conditions explain the performance they report. In 2025, we advanced both lines of work. We continued to update and enhance our prototype of RABBITS, a suite of scripts that leverages headless browsers to perform speed tests with customizable measurement parameters. In addition to the speed test platforms we previously supported (Ookla, Fast.com, Cloudflare, Speedof.me, and M-Lab), we added support for the speed test platform hosted by DREN, which provides the research and engineering backbone network for the U.S. Department of War. We also implemented an analysis program to extract measurement-related events from tens of thousands of browser log entries. We developed JSON-based data formats to report details of HTTP transactions (e.g., URLs, HTTP headers, TCP socket information) and network performance metrics (e.g., latency and instantaneous throughput). This design significantly reduces the time-to-insight for RABBITS users by enabling efficient analysis of complex browser-generated data.

We discovered that Ookla enforces rate-limiting policies on its RESTful API for querying test server information. We updated our server-crawling scripts to respect these limits and utilized public proxies to increase the success rate of server queries. These improvements have paved the way for operationalizing RABBITS for longitudinal data collection. A CAIDA Ph.D. student investigated the dynamics of test server selection logic in Fast.com using CAIDA’s Ark vantage points. We published part of these findings at the Passive and Active Measurement (PAM) Conference 2025 (Appendix A of An Integrated Active Measurement Programming Environment).

FlowTrace+. We continued developing FlowTrace+ using SmartNIC, DPDK, and eBPF, and evaluated it on FABRIC. The prototype intercepts iPerf3 measurement flows initiated by clients and injects measurement probes into the same flow. It filters and records timestamps of both probe and response packets using the corresponding libraries and devices. Our analyses focused on client-to-hop latency and alignment along the end-to-end path to identify potential bottlenecks along paths to speed test servers. We compared DPDK and eBPF prototype implementations and found that DPDK yields higher throughput due to reduced kernel overhead; we plan to deploy FlowTrace+ on AWS to intercept real-world Ookla speed test flows. As part of this project we also analyzed traceroute data from RIPE Atlas, CAIDA’s Ark, and M-Lab to examine correlations between geographic distance and network latency to speed test servers, developing algorithms to detect routing inefficiencies that inflate user-experienced delay. We also studied CDN and cloud infrastructure serving popular websites, finding that even widely used platforms do not deploy content uniformly across all regions, with measurable performance consequences for some users.

Extraction of MPLS topology from traceroute data

One problem in traceroute analysis is that certain types of Multiprotocol Label Switching (MPLS) tunnels hide routers from traceroute output, and there is no simple way to detect or reveal missing routers. In collaboration with Johns Hopkins, we replicated previous work by Vanaubel et al. (Through the wormhole: tracking invisible MPLS tunnels) to characterize and provide a snapshot of the current deployment of MPLS tunnels. We implemented a tool for MPLS detection, called PyTNT, and used it to show that the problematic types of MPLS tunnels remain prevalent, though we inferred a general decrease in MPLS usage across the Internet. We also inferred that public clouds accounted for 3 of the top 10 networks with the most routers observed to be in MPLS tunnels. Finally, we observed more MPLS routers in Europe than any in other continent, and more MPLS routers in the U.S. than any other country. We presented this paper in October 2025 at IMC 2025: Replication: Characterizing MPLS Tunnels over Internet Paths.

Routing Operations Observational Technology: Building to Enable Education and Research (ROOTBEER)

Proposed configuration for a security-focused routing observatory and operational support system

Proposed configuration for a security-focused routing observatory and operational support system

We launched ROOTBEER, a new infrastructure project in partnership with Internet2 to build a security-focused routing observatory and operational dashboard for the U.S. research and education (R&E) network ecosystem. R&E networks prioritize R&E routes over commodity Internet routes to support scientific collaboration, but this creates vulnerabilities: even minor misconfigurations have caused route leaks that inadvertently routed sensitive scientific traffic through unintended international paths. ROOTBEER will adapt CAIDA’s measurement and analysis capabilities to detect such leaks, develop a user-friendly dashboard for R&E network operators, and work with Internet2 to establish a sustainable routing security auditing framework. (ROOTBEER funding website) This project has yielded several new measurement methodologies as well as analytics approaches, the first of which we published in 2025. This method combines BGP and active probing to infer relative route preference policies of research and education (R&E) connected ASes. We inferred that systems in ≈88% of ≈12K prefixes that 2,578 ASes announced in the R&E ecosystem were insensitive to AS path length when selecting provider routes – only ≈8-9% appeared to assign the same local preference to available R&E and commodity routes. We validated our method, and discussed broader application of the method to infer relative route preference, a crucial step in being able to accurately model routing policies. R&E Routing Policy: Inference and Implication.

Platforms in Transition

An increasing focus of our activities is technology transition in support of sustainable models for Internet measurement and data sharing.

GILL → bgproutes.io. Over the last two years, we have helped support the primary authors of GILL, the next-generation BGP data collection platform we introduced at ACM SIGCOMM 2024 (available at bgproutes.io), in their pursuit of commercialization to sustain operational use beyond the research prototype phase.

AVOID → Revelare. The Revelare Networks spinout company (founded in 2024) began a pilot deployment of the commercialized cellular base-station fingerprinting system with deployed U.S. military units.

BGP Related Toolchains We are faced with overload of demand for use of the BGPStream broker deployed at CAIDA and an inevitable sustainability challenge. In surveying users in the U.S. research community, we have learned that most of them have transitioned to use of BGPKit, a newer and better-maintained (although not 100% feature-compatible) solution to real-time BGP data processing. Also, as part of a path forward, IIJ Labs is currently testing a new BGPStream-compatible broker implementation based on the BGPKIT broker architecture. The goal is to improve reliability, simplify maintenance, and reduce operational dependencies between the broker and archive infrastructure while maintaining compatibility with existing BGPStream workflows. The new broker is intended to run in parallel with the existing broker endpoints for evaluation and compatibility testing before any potential migration. In addition, this work serves as a first step toward a more modular and maintainable future BGPStream architecture.

We are also still lightly supporting BGP2GO while we look for a new hosting site. BGP2GO lets users find the MRT files that contain a specific resource and thus avoid the download and processing of unrelated data. Users can compile a customized list of relevant MRT files, share that list with others, or stream the matching MRT files, e.g. using BGPStream. See Streamlining Access to BGP Routing Data announcing availability of the platform. BGP2GO was created and implemented by Thomas Krenc while he was a postdoc at CAIDA. He is currently working as a research scientist with IIJ Labs in Japan and providing support as time permits.

FANTAIL. In 2025, we ended support for the FANTAIL traceroute analytics service due to resource constraints and limited uptake, and plan to open-source all software components in 2026.

Research Highlights

In 2025, CAIDA researchers co-authored 14 peer-reviewed research papers and 3 non-peer-reviewed papers, with additional hackathon-driven work in preparation. Much of this work directly leveraged the measurement infrastructure, platforms, and datasets described in the preceding sections. We summarize the year’s results by theme, followed by the complete list of publications.

Security measurement and network telescopes. Our ACM SIGCOMM 2025 paper distilled two decades of experience operating the UCSD Network Telescope, the largest and longest-operating research darknet, into the first published study of the operational challenges of running such infrastructure, introducing techniques that leverage third-party scanning to validate data integrity and uncover instrumentation misconfigurations. We also broadened where and how unsolicited traffic can be observed: we introduced reflective network telescopes that recover Internet background radiation from routine ICMP error traffic in operational networks, surfacing over 120,000 scanners in 30 days. With collaborators, we deployed the largest-ever proactive IPv6 telescope in a production ISP, collecting over 600 million unsolicited packets across 10 months to characterize IPv6 scanners and their target-selection strategies. Applications of security-relevant data included a threat-hunting study with USC/ISI showing that simple metrics, such as counting unique source /24 blocks, can effectively detect malware outbreaks (demonstrated on the Crackonosh cryptojacking malware), and a longitudinal study of ~700K phishing domains, which found that two-thirds were maliciously registered and that domains remained accessible for an average of 11.5 days after detection.

Routing and topology analysis. At IMC 2025 we combined BGP data with active probing to infer the route-preference policies of research and education (R&E) networks, finding that systems in roughly 88% of the ~12,000 studied R&E prefixes were insensitive to AS-path length and preferred R&E paths when reaching other R&E networks. We developed a scalable method to infer the geographic meaning of undocumented city-level BGP communities, locating 80% of them to within 70 km. We analyzed over 80 billion RouteViews BGP updates to quantify noisy high-frequency routing churn, finding that fewer than 2% of prefixes accounted for over 90% of update messages in some traces. We replicated and extended earlier work on MPLS tunnel characterization, releasing PyTNT, a sustainable detection tool, and showing that traceroute-hiding tunnels remain prevalent even as overall MPLS usage appears to decline. We examined the emerging IPv4 leasing market, finding that leased prefixes were 2.89 times more likely than non-leased prefixes to appear on abuse blocklists. We also proposed a framework to identify “countries in the middle” that transit traffic between residents and their own government websites.

Measurement methods, tools, and policy engagement. We collaborated on the LACeS project, a redesigned longitudinal anycast census system that adds distributed probing, more protocols, and latency measurements to produce fast, accurate daily censuses, released as open source. Two PAM 2025 papers advanced the vantage-point-sharing models that motivated our GMI3S infrastructure design. We demonstrated Hilby, an interactive Hilbert-curve framework for visualizing IPv4 and IPv6 address space at scale. Finally, two papers aimed at the policy research community argued that modern open measurement infrastructure, notably Ark’s safety-constrained Python programming environment, now lets non-experts run sophisticated global Internet experiments in minutes, and encouraged collaboration between the policy and measurement communities.

The complete list of 2025 publications:

  • Lessons Learned from Operating a Large Network Telescope, The first study documenting the operational challenges of running the UCSD Network Telescope, the largest and longest-operating research darknet, introducing techniques that exploit third-party scanning to validate data integrity and uncover instrumentation misconfigurations. ACM SIGCOMM 2025
  • R&E Routing Policy: Inference and Implication, Combines BGP data with active probing to infer route-preference policies of research and education networks, finding that systems in roughly 88% of ~12K studied R&E prefixes were insensitive to AS-path length when selecting provider routes, and were preferring R&E paths to R&E networks. IMC 2025
  • Replication: Characterizing MPLS Tunnels over Internet Paths, Replicates earlier work to snapshot current MPLS tunnel deployment and releases PyTNT, a sustainable detection tool, showing that traceroute-hiding tunnels remain prevalent even as overall MPLS usage appears to be declining. IMC 2025
  • LACeS: An Open, Fast, Responsible, and Efficient Longitudinal Anycast Census System, Presents LACeS, a redesigned anycast census system adding distributed probing, more protocols, and latency measurements to produce fast, accurate daily censuses, validated against testbeds and operator ground truth and released as open source. IMC 2025
  • An Integrated Active Measurement Programming Environment, Proposes and prototypes a programming environment that lets platform operators bound which measurements users may run while giving users reusable building-block functions, easing the trust trade-offs of sharing measurement vantage points. PAM 2025
  • Marionette Measurement: Measurement Support under the PacketLab Model, Analyzes measurement feasibility under the PacketLab vantage-point-sharing model, finding it supports 74% (40 of 54) of surveyed studies, and introduces pktwrap, which ports existing measurement executables to PacketLab without modification. PAM 2025
  • Hilby: Hilbert Interactive Prefix Plots, Demonstrates Hilby, a React framework for interactive Hilbert-curve visualization of IPv4 and IPv6 address space that supports simultaneous aggregation and deaggregation of prefixes at scale. ACM SIGCOMM Demo 2025
  • Analyzing Internet Background Radiation with Reflective Network Telescopes, Introduces “reflective” network telescopes that recover Internet background radiation from routine ICMP error traffic in operational networks without reserving address space or inspecting user traffic, surfacing over 120K scanners and tens of thousands of likely spoofed-DoS victims in 30 days. ANRW 2025
  • Hunting in the Dark: Metrics for Early Stage Traffic Discovery, a collaboration with USC/ISI that examined threat-hunting metrics through the detection of the Crackonosh cryptojacking malware, demonstrating that simple metrics such as counting unique source /24 blocks can effectively detect malware outbreak activity. CAIDA Technical Report, 2025
  • Towards Understanding City-Level Routing using BGP Location Communities, Develops a scalable method to infer the geographic meaning of undocumented city-level BGP communities, locating 80% of them to within 70 km using May 2025 data, with all code and datasets released. CoNEXT 2025
  • Unveiling IPv6 Scanning Dynamics: A Longitudinal Study Using Large Scale Proactive and Passive IPv6 Telescopes, Deploys the largest-ever proactive IPv6 telescope in a production ISP, collecting over 600M unsolicited packets across 10 months to characterize IPv6 scanners, their target-selection strategies, and the techniques that attract IPv6 scan traffic. CoNEXT 2025
  • From Scarcity to Opportunity: Examining Abuse of the IPv4 Leasing Market, Examines the emerging IPv4 leasing market and its security implications, finding that in February 2025 leased prefixes were 2.89× more likely than non-leased prefixes to be flagged on abuse blocklists. TMA 2025
  • Country-in-the-Middle: Measuring Paths between People and their Governments, Proposes a framework to identify “countries in the middle” that transit traffic between residents and their own government websites, refining it on a 149-country pilot before an in-depth study of 11 countries spanning over 9,000 IP-level paths. arXiv 2025
  • Noisy Neighbours: Keep the Neighbourhood Quiet, Analyzes over 80 billion RouteViews BGP updates across several years to quantify “noisy” high-frequency repeated updates, showing that fewer than 2% of prefixes can account for over 90% of update messages in some traces. CNSM 2025
  • Registration, Detection, and Deregistration: Analyzing DNS Abuse for Phishing Attacks, Conducts a 39-month longitudinal study of 690,502 phishing domains across their lifecycle, finding that 66.1% are maliciously registered and that domains remain accessible an average of 11.5 days after detection. arXiv 2025
  • Active Internet measurement to support policy research, Argues that recent measurement infrastructure—particularly CAIDA’s Ark, with its safety-constrained Python programming environment—lowers the barrier for non-experts to run sophisticated global Internet experiments in minutes, and illustrates with policy-relevant questions such as path diversity, web-hosting locality, and anycast deployment to spark cross-disciplinary collaboration. ACM PRIME Workshop 2025
  • The Current State of the Art in Network Measurement, Surveys how modern open infrastructure and tools—such as CAIDA’s Ark and the Internet Yellow Pages—have made sophisticated Internet measurement accessible beyond experts (e.g., a global latency test in ~15 lines of Python) and encourages collaboration between the policy and measurement communities. TPRC 2025

We also made 15 presentations. (See CAIDA’s full presentations list)

Data Distribution Statistics

CAIDA datasets continued to see heavy use across the research community and we summarize highlights below. Note that due to a few retirements and job transitions and lack of dedicated funding for data administration, we have been without a data administrator for almost a year, and although we now have hired a terrific one, it will take us some time to get out of technical debt to effectively track use of our data. Thus this year’s report on data usage statistics will be brief, and we plan to overhaul this section for the 2026 report.

Noteworthy CAIDA Data Sets

We summarize our most noteworthy data sets from 2025 by data category: APIs, publicly downloadable datasets, and datasets available by-request. The volume of usage of many of CAIDA’s resources increased by as much as two orders of magnitude in 2025 compared to 2024. The data presented here is a best approximation at legitimate access and usage of CAIDA data: filtering out bots and scanners, as well as deduplicating repeated downloads or API requests from genuine users.

Public APIs

AS Rank API.

CAIDA’s ranking of Autonomous Systems, AS Rank, continues to be our most widely used API. In 2025 it received requests from nearly 2 million unique IPs. Unsurprisingly, this figure is inflated by automated traffic: the median request count per IP was just 1 (i.e., a large share of these addresses contacted the API exactly once). As shown in the monthly figure to the right, December’s unique-user count is slightly undercounted, as six days (December 24, 25, 26, 29, 30, 31) were omitted due to a large-scale scanning event that logged requests from over 1.2 million unique IPs.

The table below summarizes how request volume was distributed across all unique IPs that made successful requests in 2025 as cumulative thresholds. The overwhelming majority issued only a single request, while a small set of addresses accounted for sustained, repeated use more representative of genuine research access.

Requests to AS Rank API Unique IPs
≥ 1 1,966,218
> 1 562,302
> 5 59,886
> 100 8,507

Note here the scale of traffic: the late-month spike reflects a large-scale scanning event; the scale of the vertical axis is dominated by these anomalous days relative to typical daily traffic. Over an eight-day window between December 24 and 31, daily unique-visitor counts jumped well above the baseline, with the event contributing more than 1.2 million unique IPs in total. Because this traffic is not representative of legitimate API use, we excluded the six affected days from the monthly aggregate above rather than let them distort the year’s trend.


BGPStream API.

BGPStream provides programmatic access to live and historical BGP routing data drawn from the RouteViews and RIPE RIS route-collector projects. Client libraries do not query the collectors directly; instead they contact the BGPStream broker — an HTTP service that indexes the available routing-table snapshots and update files and tells clients where to fetch them. The visitor counts reported here reflect requests to that broker.

In 2025 the broker received requests from 6,852 unique IPs and served 3,831 of them successful responses (HTTP 2xx). Web logs for BGPStream are only available from May onward: earlier in the year we changed how the broker logged incoming requests, and pre-May traffic is unavailable. The monthly figure therefore covers May–December, and the annual total is an eight-month rather than full-year count.


Hoiho API. The publicly available Hoiho API for hostname-based geolocation lookups, released in 2024, continued to serve researchers in 2025. The ITDK 2025-03 and 2025-08 releases provided updated router datasets with improved IPv6 normalization and standardized file formats, refreshing the underlying topology data used by Hoiho to infer its regular-expression-based geolocation rules. No changes to the API interface were required; the updated ITDK data is reflected automatically in inference results.

Public Datasets

Internet Topology Data (traceroute and BGP based). We released two Internet Topology Data Kits in 2025: ITDK 2025-03 and ITDK 2025-08, the latter incorporating MPLS tunnel detection into topology inference for the first time. Both releases operated through the modernized Arkmon interfaces. Pipeline improvements include dynamic task redistribution for fault tolerance, standardized file formats, completed IPv6 normalization, and fixed AS notation parsing. Public ITDK releases were downloaded by 2,585 unique users in 2025.

Routeviews IPv4 Prefix-to-AS. Our most downloaded publicly available dataset in 2025 was the Routeviews IPv4 Prefix-to-AS data, which maps each routed IPv4 prefix to the AS that originates it in BGP. We recorded 56,717 unique IPs successfully downloading the data over the year—reflecting its role as a foundational input to topology, routing, and security analyses across the community.

AS Relationships (BGP-based). We continued regular updates to the AS Relationships Serial-2 dataset, which provides inferred peering and transit relationships between Autonomous Systems. In 2025, this dataset was downloaded by 3,341 unique users, reflecting continued demand for BGP-based topology analysis.

Geolocation MetaData. The GeofeedWHOIS dataset continued daily collection throughout 2025, now covering over 5 million IPv4 and IPv6 prefixes derived from WHOIS records across six registries and network operators. We are assessing the accuracy of self-reported geofeed data and integrating it into broader topology analyses.

Cloud Prefixes. The Cloud Prefixes dataset, launched in late 2024, continued monthly collection and grew substantially in scale. The dataset now provides IP prefix metadata — including associated regions, geographic locations, and hosted services — across more than 106,000 prefixes from eight cloud provider networks: AWS, Azure, Cloudflare, DigitalOcean, Fastly, GCP, IBM, and OCI. We are integrating Cloud Prefixes into topology analyses to characterize the reach and footprint of cloud infrastructure across the global Internet. This dataset was accessed by 2,653 unique users in 2025.

DDoS Attack 2007. The DDoS 2007 attack dataset was made publicly downloadable in August 2025. Over the remainder of the year it was downloaded by 268 unique users through the new public channel, while 109 users continued to obtain it via the pre-existing by-request process.

Summary of access to publicly available datasets.

Public Dataset Name Unique Downloads in 2025
Routeviews IPv4 Prefix to AS Mappings 56,717
Internet Exchange Points 3,576
AS to Organizations 3,487
AS Relationships Serial-2 3,341
Cloud Prefixes 2,653
ITDK 2,585
DDoS Attack 2007 268

The table here lists publicly accessible CAIDA datasets for which unique visitor counts are reliable in 2025. As with the APIs discussed above, several dataset groups receive enough automated traffic to inflate raw download counts; those dominated by bots and scanners are excluded here so the figures reflect our best estimate of actual researcher access.


By-request datasets

Our most sensitive datasets are shared only with vetted researchers under the Data Stewardship Agreements. This data includes recent Ark topology datasets, anonymized passive traces, and data from the UCSD Network Telescope.

Two-way Internet backbone link traffic traces. Since January 2025, we have continued monthly one-hour two-way traffic captures from a 100 Gbps commercial backbone link between Los Angeles and Dallas. All captures are anonymized using CryptoPan and stripped after Layer 4 headers to protect privacy. Access to the Anonymized Two-Way Traffic Packet Header Traces 2025 (100G) is available upon request. We also provide a publicly available metadata dataset with summary statistics, and a 5-second sampler (~800 MB) that lets researchers assess fit before downloading full traces. The complete set of traces from 2025 collected on the 100 Gbps commercial backbone link occupy approximately 25 TB on disk; like the UCSD-NT, we share this large volume of data via OpenStack Swift object storage containers on a by-request basis.

One-Way Traffic (IBR) data from UCSD Network Telescope (UCSD-NT). The UCSD Network Telescope continues to provide valuable data for network security by capturing unsolicited one-way traffic and monitoring Internet background radiation that arises from scanning, misconfiguration, and DDoS backscatter. As of December 31, 2025 we stored over 400 TB of telescope data in OpenStack Swift object storage, which we continue to share with vetted researchers on a by-request basis. (We store many petabytes of older archived telescope data in NERSC’s archival storage system.)

By-request dataset summary. The table here highlights the most popular by-request datasets in 2025, based on the number of approved users. As mentioned above, the DDoS Attack 2007 dataset became public in August of 2025, though some users with prior access continued downloading via the by-request channel.

Restricted Dataset Name Total Users
Anonymized Internet Traces (2019) 167
Anonymized Internet Traces (2018) 122
DDoS Attack 2007 109
Anonymized Internet Traces (2016) 100
IPv6 Launch Passive Traces 43
Internet Topology Data Kit (ITDK) 29
Ark IPv4 Routed /24 Topology 27
Ark IPv4 prefix-probing 26
Anonymized Two-Way Traffic (2024, 100G) 21
Aggregated Daily RSDoS Attack Metadata (Corsaro 2) 16
Anonymized Two-Way Traffic (2025, 100G) 12

Passive traffic capture datasets from earlier years continue to lead the list of our most popular datasets, followed by IPv6 Launch Passive Traces, reflecting sustained interest in longitudinal traffic analysis. Only the three most-accessed Anonymized Internet Traces from 2008–2019 are shown.

Data from the Ark measurement platform remains popular as well, with the ITDK and the IPv4 routed /24 and prefix-probing topology datasets together drawing a consistent base of approved users.


Publications Using CAIDA Data (by Non-CAIDA Authors)

Users of CAIDA datasets agree, as part of our Acceptable Use Agreements, to notify us of publications using CAIDA data. We supplement self-reporting with extensive literature searches across Google Scholar, IEEE Xplore, ACM Digital Library, ScienceDirect, and Springer. As of June 2026, we have indexed 4125 papers in our external publications database. We update this database as we become aware of new publications; please let us know if you know of a paper using CAIDA data not yet on our list: Non-CAIDA Publications using CAIDA Data.

We are aware of 162 publications authored by non-CAIDA researchers that utilized CAIDA data and that were published in 2025. The ExPub tool, introduced in 2025, now automates much of this discovery process, improving coverage and reducing the lag between publication and indexing. We identified 68 new publications that used 10G and 100G passive trace datasets, 48 publications that cite Topology with BGP data, 13 using data from the Archipelago measurement infrastructure, 9 using the now-public DDoS Attack 2007 dataset, 9 using UCSD-NT data, and 8 publications that cited other CAIDA paper data and software.


Infrastructure for Outreach: Workshops and Meetings

We hosted the GMI-AIMS-5 workshop during the week of February 8–14, 2025, at SDSC. A two-day hackathon brought researchers together to evaluate and test the prototype of CAIDA’s new active measurement infrastructure. Approximately 15 project ideas were proposed and six were actively developed, including tools for global EDNS Client Subnet (ECS) scanning, seeding active measurements with Internet Yellow Pages data, geolocating anycast instances, and triggering Ark measurements based on observations from other global infrastructure. In parallel, we offered a hands-on network telescope tutorial that introduced 25 of the 68 workshop participants — primarily graduate students and faculty — to analyzing UCSD-NT traffic data on SDSC Expanse via ACCESS-CI, with supporting materials published on GitHub (slides). A separate hackathon team investigated the cost and feasibility of deploying distributed network telescopes on commercial cloud platforms, subsequently capturing two weeks of data from five providers. The remaining workshop days featured presentations and discussions on Ark infrastructure, Scamper extensions (including QUIC support), Arkmon integration, and progress on routing measurement and network telescope projects.

The final GMI3S workshop, GMI-AIMS-6, brought together approximately 50 participants from academic, industry, and research network organizations on September 25, 2025. Hackathon projects included analyzing and visualizing Scamper measurement results using ClickHouse and Grafana, mapping where large multi-homed networks connect to the public Internet, investigating anycast routing behavior within research and education networks, and mapping hospital-related network IP ranges to support outage monitoring. Researchers also shared results from studies initiated at the February hackathon, including an analysis of round-trip delays from vantage points in 28 countries to the top 1,000 websites that revealed significant regional variation in content delivery localization. The hackathon and related activities led to six publications in 2025, with at least five additional studies in progress. Details about hackathon agendas and projects are available at AIMS-5 and AIMS-6.

We continued stakeholder calls with key collaborators including the Department of War, DREN, RNP (Brazilian NREN), RIPE NCC, LSU, and TU Dresden. We also participated in weekly calls with the SALON group on regulatory and policy developments related to Internet measurement data.

STEM Workforce Training and Development: ESCALATE

In July 2025, CAIDA launched a new CyberTraining project — Engaging Scholars in Cybersecurity Analysis: A Laboratory for Teaching and Education (ESCALATE) — to develop a centralized platform for building, delivering, and sharing cyberinfrastructure-ready cybersecurity education and training resources using real-world Internet datasets. Building on CAIDA’s Internet Data Science for Cybersecurity course, the project will seed the platform with hands-on course modules that give students experience applying data science techniques to network security analysis on global-scale datasets, leveraging NSF-funded high-performance computing resources to overcome the storage and compute barriers that typically prevent classroom use of terabyte-scale data. The project collaborates with Johns Hopkins University and Calvin University to test materials across institutions with different class sizes, campus resources, and student demographics. (ESCALATE project website, NSF OAC-2519416)

Funding and Expenses

The chart below shows CAIDA’s operating expenses, with a breakdown of operating expenses by type and program area:

Expense type Amount ($) Percentage
Supplies & Expenses $87,346.59 2.05%
(UCSD) Benefits $536,581.42 12.58%
Consultants $305,241.54 7.16%
Equipment $108,282.68 2.54%
(UCSD) Indirect Costs $1,277,556.18 29.96%
Labor $1,465,592.69 34.37%
Professional Development $41,611.11 0.98%
Subcontracts $441,774.67 10.36%
Total $4,263,986.88 100%
Research Program Area Amount ($) Percentage
Infrastructure & Data Sharing $1,924,804 45%
Security, Stability, Resilience $1,693,633 40%
Performance $587,877 14%
Outreach & Education $57,673 1%
Total $4,263,987 100%


Supporting Resources

CAIDA’s accomplishments are in large measure due to the high quality of our visiting students and collaborators. We are also fortunate to have financial and IT support from sponsors, members, and collaborators, and monitoring hosting sites. During 2025, CAIDA employed 14 staff (researchers, programmers, data administrators, technical support staff), hosted 1 postdoc, 6 PhD students, 12 masters students, and 32 undergraduate students. We supported 17 Research Experience for Undergraduates (REU) participants.

UC San Diego Graduate Students

Visiting Scholars

Funding Sources