Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:
- provide insights into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
- foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
- improve the integrity of the field of Internet science,
- inform science, technology, and communications public policies.
Executive Summary
In 2024 CAIDA continued to design, prototype, evaluate and implement a new generation of measurement and research infrastructure for the Internet, which supports collection, curation, archiving, and expanded sharing of data and tools essential for understanding and strengthening the security, stability, and resilience of the Internet infrastructure.
Our efforts focused on five key infrastructure components: (1) Highly distributed network measurement and data acquisition infrastructure capable of capturing several types of data relevant to security research, as well as hosting new vetted experiments; (2) Data management infrastructure supporting data usability, curation, discovery and sharing; (3) Data analytics platforms that provide interactive access to strategic derived datasets that reveal vulnerabilities, risks, and crucial insights for strengthening resilience and mitigating threats to Internet infrastructure; and (4) Infrastructure for outreach to engage a range of stakeholders in infrastructure development, use, evaluation and evolution, and in the process scaling up STEM workforce training.
This annual report highlights our work on these key components — describing what we built, improved, and learned in 2024. We include several metrics that reflect the scale and utility of our activities: the volume and diversity of data collected, the number of users and collaborators we support, the external publications based on CAIDA datasets, and our own research contributions. Together, these indicators help show where our infrastructure is being used, and where it is making a meaningful difference.
Data Measurement and Acquisition Infrastructure

High-level view of CAIDA's Data Acquisition Infrastructure reflecting our modular and comprehensive approach to building an integrated system that supports research across many aspects of Internet
In 2024, we exapanded and improved our core data collection platforms, which capture a wide range of Internet measurements including traffic data, routing (BGP) data, active measurements, and DNS data. We discuss the current capabilities, their limitations, research community requirements, and strategies for managing security and privacy concerns. Further, we developed and evaluated preliminary specifications for several infrastructure components. We also explored current and potential approaches to data analysis and visualization, addressing the needs for standardization, interoperability, and AI readiness of collected data.
Active Measurement Infrastructure

Our new active measurement platform architecture provides users with reference implementations of measurement primitives that act as building blocks to more complex measurements. Scamper processes on remote VPs connect to a central controller. Scripts access measurement primitives on VPs using an integrated active measurement development environment deployed on, or next to, the controller.
Network operators and researchers often require the ability to conduct active measurements of networks from a specific location in order to understand some property of the network. However, obtaining access to a vantage point at a given location is challenging, as significant trust barriers may prevent access. Current access control to active measurement infrastructure has two extremes: access that allows trusted users to run arbitrary code, and API access that allows arbitrary users to schedule a (limited) set of measurements and obtain their results.
To address this gap, in 2024 we began development of a new integrated active measurement programming environment that: (1) allows a platform operator to specify the measurements that a user can run, which allows the platform operator to communicate to the VP’s host what their vantage point will do, and (2) provides users with reference implementations of measurement functions that act as building blocks to more complex measurements. Scamper processes on VPs connect to a central controller. Scripts access measurement primitives on VPs using an integrated active measurement development environment deployed on, or next to, the controller.
As we adapted our existing Ark nodes to this new architecture, we also expanded our set of vantage points from under 100 to over 200 vantage points around the globe. The platform now includes Raspberry Pis, virtual machines, and containerized nodes. We developed a scalable, community-oriented platform with new software libraries, a domain-specific language (DSL) interface, and a prototype management system for deploying 100 containerized nodes (NRP: 56, Vultr: 32, DREN: 8, M-Lab: 3, plus others). Containerized Ark nodes, built on Debian Bookworm images, were published to Docker Hub. We automated node initialization, certificate management, and created a NAT portal with certificate-based authentication for secure SSH access. We documented the late 2024 status of this system in a paper (accepted to PAM 2025). This paper detailed three case studies: (1) identifying the VP with the shortest delay to a given IP address; (2) characterizing the Netflix CDN infrastructure; and (3) reproducing a study that inferred the popularity of rare domains by querying large public recursive resolvers. These use cases demonstrated that our approach significantly lowers the barrier for implementing complex measurement experiments while allowing platform operators to precisely define and enforce the types of measurements VPs will perform for site hosts.
One-Way Traffic (Internet Background Radiation)

Architecture of UCSD-NT & STARDUST (dashed blue box) and STARNOVA (dashed green box). New components (green) are labeled with corresponding tasks in the proposal (red squares).
Over the last two decades, CAIDA has operated the world’s largest Internet traffic observatory (UCSD-NT) to capture Internet background radiation (IBR) from a darknet. CAIDA’s UCSD-NT platform enables researchers to access the captured IBR traffic data for security studies, e.g., characterizing distributed denial of service attacks (DDoS), network censorship, and spread of malware. This network telescope is a passive traffic monitoring system, capturing unsolicited traffic directed toward a large segment of mostly unutilized IPv4 address space. The infrastructure captures (unprecedented for the network research community) O(1TB) per day. The data collection pipeline includes capturing raw packets and processing them into a more compressed flow record format for archiving. In parallel, we also extract thousands of time-series statistics directly from the packet headers. Over the years we have invested considerable effort in data management and curation capability to lower the barrier to research use of a large-volume traffic data set.
Given the scarcity of IPv4 address space needed to sustain such instrumentation, we have also begun efforts to extract IBR traffic from active Internet address space (greynets). In 2024 we completed our plan for deployment of a new hardware prototype for a hybrid telescope deployment that includes greynet instrumentation. We procured and deployed a 400G monitor that distills unsolicited traffic from SDSC’s production network. We worked closely with the SDSC security team and identified inactive addresses within the internal routing tables in the SDSC core routers in near real-time. We also began a project to deploy an IPv6 telescope, which requires fundamentally different techniques to attract IBR traffic to extremely sparsely utilized blocks of address space.
Two-way Traffic (Internet backbone link traffic)
For decades, privacy concerns have made it nearly impossible for researchers to access passively collected Internet backbone traffic data. But through longstanding trust relationships and federal research funding, CAIDA partnered with an ISP to monitor strategic commercial backbone links until escalating infrastructure upgrades culminated in the loss of CAIDA’s last such monitoring point in January 2019. In October 2023 we finally deployed a completely new passive traffic monitor on a 100 Gbps backbone link at an IXP in Los Angeles. In 2024, we conducted eight one-hour packet captures (April–November), consolidated into two unidirectional traces. The system interprets key Layer 4 protocols (ICMP, TCP, UDP, etc.) and uses an updated CryptoPan to anonymize IPv4 and IPv6 addresses. Captured data is post-processed, anonymized, and stored in Swift containers with accompanying statistical metadata for researcher access.
BGP Data Monitoring Platform
Our evaluation of the current state of BGP data collection revealed fundamental scalability challenges: collecting comprehensive global routing data requires a vast increase in peers, yet yields highly redundant data with persistent visibility gaps due to BGP’s design. Current platforms store frequent snapshots and all interim updates, compounding redundancy while still peering with only ~1% of active ASes—a figure stagnant for two decades despite ongoing peer additions. Manual peer vetting and the need to balance coverage with cost further strain platform scalability, often forcing researchers to sample the data and overlook unique connectivity. To address these limitations, we collaborated with European researchers at the University of Strasbourg to introduce a new BGP monitoring platform—detailed in our Best Paper at ACM SIGCOMM 2024—which fully automates peering via a public web form and currently supports over 5,500 peers (bootstrapped using RouteViews and RIPE RIS peers) through the GILL prototype, available at bgproutes.io. The system is ongoing development, with commercialization efforts underway to transition it to sustainable operational use.
DNS Data Platform
In 2024, we collaborated with the University of Twente to evaluate DNS monitoring requirements, summarizing results in the paper “DarkDNS: Revisiting the Value of Rapid Zone Update” which we presented at the ACM Internet Measurement Conference. The study explored the benefit of rapid DNS zone updates to contribute to improved Internet infrastructure security, resilience and performance. Such real-time monitoring enables detection of changes in DNS hosting and name server configurations, capturing transient domains that may not appear in ICANN’s daily zone file snapshots (CZDS). We used public sources of data to demonstrate that this finer-grained view could be partially reconstructed with some effort, and would provide additional insight that would advance anti-abuse efforts.
Noteworthy CAIDA Data Sets
We summarize our most noteworthy data sets from 2024 by data category.
One-way Traffic (IBR) data from UCSD (CAIDA’s) Network Telescope (UCSD-NT)
In 2024, we collected approximately 1.1 PB of compressed raw pcap data, which we archive at NERSC. In addition to the darknet traffic, we captured 253 TB of other data, including 28 TB of Ark traceroute measurements
In addition, we updated our old Anonymized telescope passive traffic 2018 sampler. This dataset includes one hour of raw, non-anonymized traffic captured at 13:00 UTC on May 17, 2018. This sample provides researchers with a representative snapshot of unsolicited Internet traffic observed by the UCSD-NT, a globally routed darknet. The dataset includes a single compressed pcap file containing one hour of raw packet-level data, corresponding aggregated flow records in Avro format and RSDoS attack metadata in CSV format. Access to this dataset is available upon submission of a request form. This dataset may be complemented by Anonymized bidirectional passive trace data available from the 10 GB Equinix NYC monitor for that same hour, which provides additional visibility into Internet traffic patterns.
Our telescope dataset collection now includes the Anonymized Network Sensing Graph Challenge dataset, contributed by the MIT Lincoln Laboratory team. This dataset consists of (1) anonymized PCAP files derived from CAIDA’s telescope, (2) anonymized GraphBLAS traffic matrices computed from these PCAP files, (3) anonymized cross-correlations of the CAIDA telescope sources with GreyNoise honeyfarm data, (4) anonymized CAIDA telescope sources that did not appear in the GreyNoise honeyfarm database.
Two-way Internet backbone link traffic traces
Since April 2024, we’ve captured a one-hour two-way traffic trace each month from a 100 Gbps commercial backbone link between Los Angeles and San Jose, CA. To protect privacy, we strip all packet payloads after the layer 4 headers, and anonymize IP (v4 and v6) addresses with CryptoPan. Access to this dataset is available upon submission of a request form. The data is stored in our Swift OpenStack object storage. Each one-directional anonymized pcap file is approximately 1TB. Based on a user survey, we created and shared two related datasets:
(1) Publicly available Passive 100G Metadata – statistics for the 100GB traffic traces, including: date, time, and duration; packets and bytes captured; mean per-second packet rate; mean per-second bit rate; mean link load as a fraction of nominal maximum link capacity. This metadata helps users assess which trace they need to (or have resources to) download.
(2) Restricted Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) Sampler. This dataset captures 5-second snapshots of the November 2024 Anonymized Traces 100G dataset. The sampler dataset (about 800 Mb) allows researchers to explore the data before committing to downloading large volumes, helping them assess whether it meets their research needs. Access to this dataset is available upon submission of a request form.
**Internet Topology Data (traceroute and BGP based)
We released 2024-02 and 2024-08 Internet Topology Data Kits. The 2024-02 ITDK was produced from traceroutes collected from 2024-02-20 to 2024-02-27 from 101 Archipelago (Ark) monitors located in 43 countries. The 2024-08 ITDK was produced from traceroutes collected from 2024-08-28 to 2024-09-01 from 89 Ark monitors located in 42 countries. These ITDK releases consist of (1) an IPv4 router-level topology, (2) an IPv6 router-level topology,(3) router-to-AS assignments, (4) geographic locations of routers, and (5) DNS lookups of all observed IP addresses. Each router-level topology is provided in two files, one giving the nodes and another giving the links. The ITDK dataset includes files that assign ASes and geolocation to each node.
AS Relationships (BGP-based)
We updated our AS Relationships Serial-2 dataset by discontinuing the use of inferences derived from Ark traceroute data. The current version of the Serial-2 dataset combines the Serial-1 dataset with AS relationships inferred from multi-lateral peering, as described in the corresponding 2013 paper
Geolocation MetaData
We released a new geolocation data set: the GeofeedWHOIS dataset. This dataset consists of mappings of geofeed URLs to geofeed files, derived from WHOIS dumps from the registries/network operators
(“afrinic”,“apnic”,“arin”,“jpnic”,“krnic”,“lacnic”,“ripe”); serving as a resource for IP geolocation related research. Data collection for this ongoing dataset began on October 29, 2024.
Cloud Prefixes
We released a new data set in 2024: Cloud Prefixes
This dataset provides a list of IP prefixes for eight cloud providers: AWS, Azure, Cloudflare, DigitalOcean, Fastly, GCP, IBM, OCI. Each IP prefix may be associated with a specific region or service label, providing insights into the geographical distribution or service classification of the IP addresses.
Infrastructure for Data Management, Discovery, Usability
CAIDA Internet Research Resource Catalog
In 2024, we continued enhancing CAIDA’s Internet Research Resource Catalog, a comprehensive collection of publications, datasets, software, and related resources with rich metadata and cross-links that accelerate discovery and innovation in Internet infrastructure research. The catalog now serves around 10,000 unique users each month. In 2024, we added collections to enable sets of related resources to be grouped with their own descriptions and searchability, and we integrated full support for AUAs (Acceptable Usage Agreements), providing a consistent user experience between the catalog and our developing Resource Access Management Portal. We also indexed all of the Internet Yellow Pages (IYP) datasets into the catalog.
Designing Metadata Standards
As part of our effort to codify our dataset formats as we upgrade various infrastructure components, we added the first version of CAIDA’s data ontology to the catalog. This document maps entities in CAIDA’s Annotated Schema to terms used by Schema.org, and illustrates how the schema applies to some of CAIDA’s datasets.
Tools for Data Curation, Documentation, MetaData Extraction
AI-Enabled Extraction of MetaData to Infer Scientific Utility.
For years, CAIDA has tracked the use of its scientific data in research publications, contributing to a rich context catalog that links papers with the datasets and software tools utilized. However, even state-of-the-art techniques for automatically extracting these links struggle with the complexity, variability, and contextual nuances present in natural language. For example, distinguishing between a mere reference to a resource and its actual use in a publication remains a significant challenge. This task requires advanced, context-aware metadata extraction from unstructured text. In 2024 we began to develop techniques to train and use LLMs to support this task, i.e., to extract information about tool and dataset usage from scientific studies of Internet infrastructure. This project will address two critical needs frequently expressed by researchers: guidance on how to get started with Internet-related data, and insights into the utility of specific data sets for advancing research. These insights will also inform funding agencies’ decisions about which data collection and curation efforts should receive continued support.
Packet Filtering Tool Survey.
In collaboration with the Illinois Institute of Technology team we surveyed the networking community to understand which packet filtering tools are commonly used, their pain points, and unmet needs in current workflows. The resulting paper A survey on Packet Filtering appeared in the January 2025 issue of the ACM SIGCOMM Computer Communication Review Journal (ACM CCR)
New Tools for Data Dissemination
One-way UCSD-NT telescope traffic data.
The huge volume of Telescope IBR traffic presents challenges in performing data analysis, both due to privacy concerns and the daunting volume of data. Historically, we typically provided local (at SDSC) virtual machines (VMs) for researchers to bring their own code to process and analyze the traffic data. But hardware limitations and user demand have limited the power of the VMs (8 CPUs, 32 GB RAM, 100 GB storage) and thus the capability of researchers to use the data. In 2024 we tested and evaluated alternative ways to share sensitive data with vetted users:
-
Providing access to a direct stream of curated (reduced from original) RS-DOS event data with trusted collaborators. Our collaborators have used this data to correlate DDoS events with active measurement indicators of performance degradation, e.g. Investigating the impact of DDoS attacks on DNS infrastructure.
-
Access to raw historical pcap files from the NERSC archive, again with trusted collaborators who already have an established relationship with NERSC (Lincoln Labs, a DOD FFRDC). This mode of data sharing with Lincoln Labs has contributed to at least six publications listed in our catalog.
-
Customized transmission of a subset of packets received by the telescope to an industry partner (Domain Tools) over our existing infrastructure. This is the most likely mode that commercial users will want to leverage but is costly to sustain.
-
Leveraging SDSC’s Expanse HPC systems (ACCESS allocation) to process telescope data, a mode we have offered to other researchers who qualify for ACCESS allocations. We developed and taught tutorials on Telescope data analysis using the SDSC Expanse supercomputer, during the AIMS 2025 meeting and shared the tutorial on GitHub
-
We support a time-series dashboard of statistics of traffic coming to the UCSD telescope (available to anyone with Globus or Github account), which researchers use to identify suspicious events before they investigate more granular subsets of the telescope data.
-
Collaboration with Open Science Data Federation. We engaged with the OSDF team to begin the process of launching UCSD-NT telescope data on their platform. In late 2024 we initiated our first upload of encrypted telescope traffic datasets to the OSDF.
Innovations to Promote Data Sharing with Industry Stakeholders.
We designed, prototyped, and evaluated an innovative methodology for sharing sensitive industry data with researchers. In this approach, we provided industry partners with an aggregated list of DDoS targets, which they joined with their proprietary data to identify and reveal gaps in the visibility of academic data sources. We implemented this methodology through a multi-stakeholder working group effort that resulted in a paper, accepted for presentation and publication at ACM IMC 2024. (The Age of DDoScovery: An Empirical Comparison of Industry and Academic DDoS Assessments, IMC)
Exploring the Limits of Differential Privacy (DP).
DP is a powerful technology, but not well-suited to protecting corporate proprietary information while computing aggregate industry-wide statistics. We elucidated this scenario with an example of cybersecurity management data and considered an alternative approach that relies on a pragmatic assessment of harm to add noise to the data. (Exploring the Limits of Differential Privacy, TPRC, Sept 2024. Revised version Differential Privacy, Firm-level Data and the Binomial Pathology, IEEE Security & Privacy, January 2025)
Data Distribution Statistics
CAIDA datasets fall into two categories: public and by-request. By-request datasets are shared with academic researchers, U.S. government agencies, and corporate entities (through the UC San Diego’s Office of Innovation and Commercialization) – after a vetting process. For by-request data, we know who is using it and the intended use. But for public data and tools (including APIs), we do not have that level of detail since we just observe the IP addresses in our system logs. We do our best to clean the logs by filtering out bots and crawlers, and we try to map IPs to organizations and infer locations and types of users (e.g. academic research, government, industry, etc), but there are limitations to this approach. For example, users behind cloud services, VPNs, or shared university networks often look the same to us, and many ISPs (and thus their users) rotate IPs frequently. Additionally, CAIDA manages restricted data and processes access requests through a system that involves significant manual data administrator work.
To improve insight into usage of our public datasets, and better understand and demonstrate their impact, we are introducing a new authentication framework. Users will complete a one-time registration on the CAIDA portal, after which they can request access to any dataset by submitting a brief data use justification form. Access to public data sets will be granted automatically upon submission. This process allows us to collect more reliable usage metrics while keeping access lightweight and user-friendly. To streamline these processes, we designed and implemented a new Resource Access Management portal (RAM) to centralize user information, manage access requests, and track metadata about resources, including restricted datasets and APIs. In 2024 we tested our authentication component on our new BGP2Go service. In 2025 we plan to transition to the use of our system to access other CAIDA datasets.
In 2024, due to hardware failures we are missing web activity logs for November and December, so we cannot report how many users (based on unique IPs) accessed our tools, APIs, or downloaded data during that period, nor compare usage with previous years. However, based on the first 10 months of data, we can still identify the following trends in platform and dataset usage across our services.

AS Rank monthly usage statistics for 2024 show a sharp increase in activity at the start of the year, with unique IP counts rising to approximately 16,000 in January and peaking at around 20,000 in February. (Note log data outage in Nov-Dec.)
AS Rank remains our most frequently used platform. While the number of unique IPs accessing AS Rank varies by month, it averages around 10,000. In January and February 2024, usage spiked to approximately 16,000 and 20,000 unique IPs, respectively.
BGPStream is our next most frequently used API, with around 2,000 unique users during the first 10 months of 2024.
Among downloadable public datasets, the most popular include the RouteViews IPv4 Prefix to AS mappings datasets (around 7,300 downloads), AS Relationships Serial-2 (around 3,700), AS to organizations mappings (around 2,000), the 2019 IMS Hoiho paper data supplement (around 1,400), and the Ark IPv6 traceroute (around 1,500).
The table below highlights the most accessed “by-request” datasets from CAIDA in 2024, based on the number of approved users. Leading the list is the 2019 Passive Anonymized Internet Traces dataset with 112 users, followed by the DDoS 2007 Attack dataset with 66 users. While we provided access to the most recent 2024 anonymized passive traces captured on a 100 Gb link (granting access to 18 users) several passive traffic capture datasets from earlier years (2014–2018) also remain in high demand, reflecting continued interest in longitudinal traffic analysis.
Top Restricted Datasets | Total Users |
---|---|
Anonymized Internet Traces (2019) | 112 |
DDoS 2007 attack | 66 |
Anonymized Internet Traces (2018) | 40 |
Anonymized Internet Traces (2016) | 35 |
Anonymized Two-Way Traffic Packet Header Traces (2024) | 18 |
Ark IPv4 Routed /24 Topology | 14 |
Internet Topology Data Kit (ITDK) | 14 |
Ark IPv4 prefix-probing | 11 |
Anonymized Internet Traces (2014) | 10 |
Anonymized Internet Traces (2015) | 8 |
Aggregated Daily RSDoS Attack Metadata (Corsaro 2) | 7 |
Publications using CAIDA data (by non-CAIDA authors)
Users of CAIDA datasets agree, as part of the Acceptable Use Agreements, to provide CAIDA with information of their publications using CAIDA data. But many users forget to report. We conduct extensive literature searches to locate relevant papers, searching Google Scholar for names and DOIs of CAIDA datasets and services. We also use computer science search engines, such as IEEE Xplore Digital Library, ACM Digital Library, ScienceDirect.com, and Springer, among others. We are aware of 304 publications authored by non-CAIDA researchers that utilized CAIDA data and that were published in 2024. Our external publications database is updated as we become aware of new publications. As of August 2025, we have indexed 3808 papers in our database. Please let us know if you know of a paper using CAIDA data not yet on our list: Non-CAIDA Publications using CAIDA Data.
Internet (Security and Performance) Data Analytics Platforms
We continued developing platforms to generate and provide interactive access to strategic datasets that reveal vulnerabilities, risks, and security challenges across the global Internet. These datasets support scientific use cases aimed at evaluating the national security posture of critical infrastructure systems, providing crucial insights for strengthening resilience, and addressing emerging threats
Inferring security properties of Internet Topology to Automate Verification Of Internet Data-paths (AVOID)

The AVOID prototype design ensures that communications for devices in the tactical bubble only traverse benign commercial cellular network base stations.
In 2024 we scaled up our ML classifier that identifies base station vendors, improving accuracy and performance as we gathered new data around the world for training and testing. We also trained and tested a SIB classifier that works with SIB data collected via an SDR or through modem diagnostic logs. Our SIB classifier performs with 100% accuracy against our test set. We created an Android app that uses the classifier to identify the vendor of the base station currently connected to the phone. In parallel, we continued development of the capability to infer security properties of communication paths beyond the first base station hop, based on many of our topology analytics capabilities that are used in the ITDK. We are incorporating these and new analytics into our existing AVOID-Path prototype. (AVOID: Automatic Verification Of Internet Data-paths, MILCOM). We have spun out a company, Revelare Networks, to commercialize results of this work.
Inferring security properties of Internet Traffic to scalably detect anomalies
We introduced DarkSim, a novel analytic framework that utilizes Dynamic Time Warping to measure similarities within the high-dimensional time series of darknet (IBR) traffic. We showcased the effectiveness of our framework in our benchmark against DarkGLASSO, a recent framework that applies the GLASSO algorithm to darknet time series. Whereas DarkGLASSO achieved only a maximum precision of 73.3% and a 37.5% overlap with our framework’s detections, DarkSim achieved perfect precision and a maximum overlap of 91%. (DarkSim: A Similarity-Based Time Series Analytic Framework for Darknet Traffic, IMC)
Scientific Analysis of Network Performance
Reliable and Accurate Broadband Performance Measurement Toolkit
RABBITS Measurement Toolkit. The RABBITS (Reliable and Accurate Broadband Benchmarking with Innovative Test Strategies) toolkit is a customizable measurement platform that enables accurate, flexible speed tests across multiple platforms. We prototyped RABBITS to run tests on six platforms (e.g., Ookla, Fast.com, M-Lab) with adjustable parameters such as server selection, connection count, and test duration, using a headless browser on the FABRIC testbed. Our findings revealed that geopolitical events can reduce test server availability by up to 64%, and removing servers in major ISPs significantly limits user access to in-network tests. Throughput estimates varied by over 40× under identical network conditions depending on test parameters. Additionally, we highlighted the consequences of Ookla’s new policy, which prohibits individual users from operating speed test servers and has resulted in a 53% reduction in on-net server coverage among the total population (Empirical Characterization of Ookla’s Speed Test Platform: Analyzing Server Deployment, Policy Impact, and User Coverage, CCWC)
Platform for Measuring Quality of Experience (QoE)
In 2024, we launched the cloud-native QUINCE platform on AWS, leveraging Jitsi video conferencing to crowdsource Internet quality measurements from diverse users. By optimizing deployment, we reduced application launch times from five minutes to one and integrated M-Lab speed tests, traceroute capabilities, and an AI-assisted video conferencing experiment. This experiment used a headless Chromium browser, streaming videos and interacting with participants while we collected detailed performance metrics such as audio quality, bitrates, reaction times, and order accuracy. In particular we simulated drive-through food ordering with a Llama-based chatbot and AI-generated videos to assess audio/video quality, bitrates, and interaction metrics. After implementing automated DNS/SSL tools and gamification features we have used QUINCE to provide insights into Ookla and Fast.com server assignments and Netflix’s video cache behavior.
Analysis of Cloud Performance and Reachability
We continued developing tools to understand cloud connectivity performance and reachability in the U.S. and around the world. We developed new techniques for detecting bottleneck links in networks, centered around FlowTrace+, a tool that injects probe packets into active TCP flows to measure available bandwidth. We tested these components on the FABRIC testbed, where we set up realistic multi-site network paths with controlled latency and packet loss. We used SmartNIC-equipped bare-metal clients to inject and capture probe packets with hardware timestamps, and validated timing using FABRIC’s packet mirroring. This work is supported by NSF award CNS-2212241 (Cloud Bottlenecks).
Data and Metadata APIs
Facilitating Advances in Network Topology Analysis (FANTAIL)
As described in our 2023 Annual Report, over the past few years, CAIDA developed the [Facilitating Advances in Network Topology Analysis FANTAIL system to enable researchers to discover and analyze end-to-end Internet path measurement data from large-scale archives—using high-level queries without needing to manage infrastructure or learn big data programming. In 2024 CAIDA made significant improvements to the FANTAIL prototype, focusing on both functionality and user experience. We enhanced the interface to make it easier and more intuitive to use, introduced new ways to search for prefixes by country or AS number, and updated the API to handle larger queries and outputs more efficiently. However, due to resource constraints and low uptake of the service, in 2025 we plan to decommission the service and open source all the software components in case there is interest in maintaining it.
AS Rank
AS Rank, CAIDA’s platform for ranking Autonomous Systems (ASes) and the organizations that operate them. By combining multiple datasets, AS Rank provides a more complete picture of how ASes are connected, their roles in the global Internet, and how they relate to one another. Many network operators and researchers use the AS Rank website and API, and many have asked for more visibility into the inference process. In 2024, we began redesigning the system’s BGP data analysis engine to show exactly which AS paths are used at each step of the inference process—making the logic behind the rankings more transparent. We also updated the AS Rank RESTful API to help users better understand when and how data was collected. Feedback from AS operators continues to be an important part of improving AS Rank: in 2024, we incorporated about 80 corrections submitted by operators. Implementation of the new design is pending funding and resources.
BGP2Go
Our BGP2GO platform helps users pinpoint exactly which MRT files contain data relevant to their research—saving them time and storage by avoiding the need to download and process unrelated files. Users can build a custom list of relevant MRT files, share that list with collaborators, or stream the data directly using tools like BGPStream. In 2024, we published a detailed description of a common BGP2GO use case: Analyzing Prefix Propagation with PEERING Testbed, BGP2GO, and BGPStream. The demo shows how researchers can identify MRT files related to their real-world PEERING experiments, stream just those files into a terminal or script using BGPStream, and begin analysis immediately—without needing to download large datasets. We are hoping to transition this project to a partner or collaborating institution in 2025.
Hoiho API
Holistic Orthography of Internet Hostname Observations (Hoiho) is an open-source tool released as a part of scamper. It uses CAIDA’s Macroscopic Internet Topology Data Kit (ITDK) and observed round-trip times to infer regular expressions that extract apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which are used as input to Hoiho for its inference rules (encoded as regular expressions) that extract these annotations. In 2024 we released the publicly available Hoiho API for hostname-location lookups, providing inferred geolocation from router hostnames.
The Internet Topology Data Kit (ITDK)
Since 2010, CAIDA has maintained and expanded the Macroscopic Internet Topology Data Kit (ITDK) collection, which now includes 26 datasets. These kits provide rich information about Internet connectivity and routing from a broad set of Ark vantage points across the globe. In 2024, our efforts focused on making the ITDK pipeline more scalable, accurate, and efficient. We improved data collection and post-processing automation, upgraded our DNS lookup tools, expanded the probing infrastructure, and enhanced data curation practices. These changes enabled the largest ITDK release to date — ITDK 2024-02 — which leveraged 101 Ark vantage points to probe 3.6 million destination addresses. We also secured a renewal of our research license with Iconectiv, ensuring continued access to CLLI code data, which we use to enhance ITDK annotations to maximize its scientific value. In 2024, we released two datasets (2024-02 and 2024-08), indexed both in our catalog, and shared key updates with the community through our blog post ITDK 2024-02.
Infrastructure for Outreach
We continued developing a comprehensive Infrastructure for Outreach aimed on Engaging Diverse Stakeholders in Infrastructure Development, Use, Evaluation and Systematic Improvements, and in Scaling up the STEM Workforce Training. Significant emphasis was placed on fostering collaboration, disseminating research, and developing human resources.
Workshops and Meetings
We held two biannual workshops: GMI-AIMS-3 at SDSC (June 2024) and GMI-AIMS-4 in Madrid (November 2024) . During the week of June 24 - 28, 2024, we hosted an in-person workshop at SDSC. It focused on reviewing progress and discussing various measurement infrastructures that are critical for the next implementation phase of the MSRI GMI project. There were 48 participants representing industry and academic organizations. Key discussions centered around enhancing the collection, curation, and sharing of data relevant to Internet security, stability, and resilience. Several technical topics were addressed, such as advancements in active Internet measurement tools like Scamper and a mesh-traceroute system across Ark nodes. The discussions also explored methodologies for improving BGP measurement platforms, Telescope, and DNS measurement infrastructures.
We hosted the GMI-AIMS-4 workshop on November 7, 2024, just after IMC in Madrid. The event brought together over 25 researchers and practitioners from across Europe, North America, and Asia to push forward the GMI3S. This year’s focus was on shifting from design to implementation — especially around active measurement, BGP visibility, DNS transparency, and network telescope infrastructure. Talks ranged from DNS blocklists and IPv6 scanning quirks to new tools for anycast and route collection. We also spent time discussing how to make measurement data more usable and shareable, how to support long-term sustainability, and how to ensure these tools meet the needs of not just researchers, but also operators and policymakers.
We continued our calls with key stakeholders, including the Department of Defense (DOD), DREN, RNP (Brazilian NREN), RIPE NCC, LSU, and TU Dresden, fostering collaborations and discussing sustainability models.
We also participated in weekly calls with the SALON (Studies on Architecting Legislation of Networks) group discussed regulatory and policy developments in Internet-related topics, with a focus on how independent measurement data can shape policy discussions.
Publications and Dissemination
As always, we engaged in a variety of tool development, data sharing, and outreach activities. Our web site attracted approximately 300,000 unique visitors, with an average of 2 visits per visitor, serving an average of 26 pages per visit.
CAIDA Blogs In 2024 we published 12 blog posts:
- Towards a Domain Specific Language for Internet Active Measurement
- Developing active Internet measurement software locally to run on Ark
- A First Look at Suspicious IRR Records
- ITDK 2024-02
- Understanding the deployment of public recursive resolvers
- Help CAIDA Refine and Enhance the FANTAIL Traceroute Analytics platform
- Seeking Beta Users for 100 GB link Anonymized Passive Traces
- Streamlining Access to BGP Routing Data
- Observing the DDoS Landscape Requires Collaboration
- CAIDA’s 2023 Annual Report
- AS Reachability Visualization
- Observing the DDoS Landscape Requires Collaboration
Promotional Materials We created a “Why should my network host an Ark node?” flyer and disseminated it at the March 2024 FABRIC workshop.
CAIDA Publications and Presentations In 2024, CAIDA published 12 peer-reviewed papers and 2 non-peer-reviewed papers. The list below presents these publications, grouped by research category.
Measurement and Data Analysis Infrastructure:
- An Integrated Active Measurement Programming Environment, Passive and Active Measurement Conference (PAM), Dec 2024
- DarkSim: A Similarity-Based Time Series Analytic Framework for Darknet Traffic, ACM Internet Measurement Conference (IMC), Nov 2024
- The Next Generation of BGP Data Collection Platforms, ACM SIGCOMM Conference, Aug 2024
Internet Routing and Security:
- A path forward: improving Internet routing security by enabling zones of trust, Journal of Cybersecurity, Dec 2024
- The Age of DDoScovery: An Empirical Comparison of Industry and Academic DDoS Assessments, ACM Internet Measurement Conference (IMC), Nov 2024
- DarkDNS: Revisiting the Value of Rapid Zone Update, ACM Internet Measurement Conference (IMC), Nov 2024
- AVOID: Automatic Verification Of Internet Data-paths, IEEE Military Communications Conference (MILCOM), Oct 2024
- REVEAL: Real-time Evaluation and Verification of External Adversarial Links, IEEE Military Communications Conference (MILCOM), Oct 2024
- A path forward: Improving Internet routing security by enabling zones of trust, Telecommunications Policy Research Conference (TPRC), Sep 2024
Measuring Network Performance:
- Empirical Characterization of Ookla’s Speed Test Platform: Analyzing Server Deployment, Policy Impact, and User Coverage, Computing and Communication Workshop and Conference (CCWC), Jan 2024
Informing Public Policy:
- Sublet Your Subnet: Inferring IP Leasing in the Wild, ACM Internet Measurement Conference (IMC), Nov 2024
- Differential Privacy, Firm-level Data and the Binomial Pathology, IEEE Security & Privacy, Sep 2024
- Exploring the Limits of Differential Privacy, Telecommunications Policy Research Conference (TPRC), Sep 2024
- Survey on Packet Filtering, ACM SIGCOMM, Jul 2024
We also made 10 presentations.
STEM Workforce Training & Development
During 2024, CAIDA employed 17 staff (researchers, programmers, data administrators, technical support staff), hosted 1 postdoc, 5 PhD students, 15 masters students, and 33 undergraduate students.
We supported a large number of Research Experience for Undergraduates (REU) participants across our projects — including 9 in ILANDS, 20 in RABBITS, 7 in STARNOVA, 10 in QUINCE, and 11 in GMI3S. These students contributed to a range of tasks and gained hands-on, mentored experience in software development, Internet measurements, data analysis, and web development.
Funding and Expenses
The chart below shows CAIDA’s operating expenses, with a breakdown of operating expenses by type and program area:
Expense type | Amount ($) | Percentage |
---|---|---|
Supplies & Expenses | $87,044.29 | 1.39% |
(UCSD) Benefits | $596,941.41 | 9.56% |
Consultant | $340,093 | 5.45% |
Equipment | $315,109.82 | 5.05% |
(UCSD) Indirect Costs | $1,657,871.19 | 26.54% |
Labor | $1,962,695.10 | 31.42% |
Professional Development | $25,200.66 | <1% |
Subcontracts | $1,260,745.10 | 20.19% |
Total | $6,245,700,57 | 100% |
Research Program Area | Amount ($) | Percentage |
---|---|---|
Security, Stability, Resilience | $3,223,970 | 52% |
Infrastructure & Data Sharing | $2,407,358 | 39% |
Performance | $614,372 | 10% |
Total | $6,245,701 | 100% |
Supporting Resources
CAIDA’s accomplishments are in large measure due to the high quality of our visiting students and collaborators. We are also fortunate to have financial and IT support from sponsors, members, and collaborators, and monitoring hosting sites.
UC San Diego Graduate Students
- Sai (Nikhila) Bandaru, master's student
- Chongyang (Ben) Du, PhD. student
- Jie (Samuel) Fu, master's student
- Max Gao, PhD. student
- Syed Mujtaba Hadi Jafri, master's student
- Shivani Hariprasad, master's student
- Anish Koulgi, master's student
- Dhruthick Mohan, master's student
- Loukik Naik, master's student
- Hemil Panchiwala, master's student
- Reventh Sharma, master's student
- Sampada Shelke, master's student
- Raymond Sun, master's student
- Amanda Tomlinson, PhD. student
- Nipun Wahi, master's student
- Zesen (Jason) Zhang, PhD. student
Visiting Scholars
- Bernhard Degen, PhD. student from U Twente
- Thomas Krenc, postdoc from Naval Postgraduate School
Funding Sources
- (ended in 2024) CNS-2133452 - A Unified Approach to Internet Performance Measurement.
Principal Investigator: Ka Pui Mok
- (ending in 2025) 312285-00001 - Transforming the Responsibility for Trusted Denial-of-Service Mitigation in the Internet.
Principal Investigator: Karen Sollins
- (ending in 2025) CNS-2323219 - A measurement toolkit for Reproducible Assessment of BroadBand Internet Topology and Speed.
Principal Investigator: Ka Pui Mok
- (ending in 2025) CNS-2212241 - Detection and Analysis of Infrastructure Bottlenecks in a Cloud-Centric Internet.
Principal Investigators: Ka Pui Mok kc claffyAlexander Marder
- (ending in 2025) OAC-2131987 - Designing a Global Measurement Infrastructure to Improve Internet Security.
Principal Investigators: kc claffy David ClarkBradley Huffaker
- (ending in 2025) service agreement - Supporting AMPRNet and the UCSD Network Telescope.
Principal Investigator: Ka Pui Mok
- (ending in 2026) OAC-2319959 - Scalable Technology to Accelerate Research Network Operations Vulnerability Alerts.
Principal Investigators: Ka Pui Mok kc claffyFabian Bustamante
- (ending in 2026) ITE-2326928 - Automated Verification Of Internet Data-paths for 5G.
Principal Investigators: Alexander Marder Erik KlineKa Pui Mokkc claffyKyle Jamieson
- (ending in 2027) CNS-2120399 - Integrated Library for Advancing Network Data Science.
Principal Investigators: kc claffy David Clark