Outcomes: Integrated Library for Advancing Network Data Science - (ILANDS)
This page reports the outcomes of the Integrated Library for Advancing Network Data Science (ILANDS) project.
Principal Investigators: kc claffy David Clark
Funding source: CNS-2120399 Period of performance: October 1, 2021 - September 30, 2026.
Task 1: Traffic Data Infrastructure Enhancement
Traffic Monitor on 100 GB Link Deployment
In October 2023, we deployed a completely new passive traffic monitor on a 100 Gbps backbone link at an IXP in Los Angeles. Using Napatech network cards, we recorded these traces while removing the payload (beyond the layer 4 headers) from all packets. The card interprets various layer 4 headers, including ICMP, ICMPv6, TCP, UDP, SCTP, and GRE, and strips others. For performance optimization, our packet capture architecture utilizes 16 streams, which we then combine into two unidirectional traces.
Traces Capturing and Processing
Starting April 2024 we perform a monthly one-hour capture with separate files for each direction. The current data acquisition workflow involves capturing, post-processing (anonymizing), and transferring each monthly snapshot into a corresponding Swift storage container (one container per year) for researcher access.
We anonymize the traces using CryptoPan prefix-preserving anonymization. Previously, the CryptoPan implementation did not support the encryption of bit strings longer than 32 bits, which limited its ability to anonymize 128-bit IPv6 addresses. We now use an updated version of CryptoPan that can anonymize all 128 bits of IPv6 addresses. Our capture and post-processing workflow is thoroughly documented to ensure clarity and reproducibility.
The resulting data includes the following fields:
- Monitor Name
- Year and month (including a link to a graphical display of breakup by protocol, application, and country)
- Start time of trace (UTC)
- Stop time of trace (UTC)
- Number of IPv4 packets
- Number of IPv6 packets
- Unknown packets (as a fraction of the total number of packets)
- Transmission rate in packets per second
- Transmission rate in bits per second
- Link load (as a fraction of the nominal maximum load for a 100 GB link)
- Average packet size (bytes) (including a link to a graph of the packet size distribution).
- ...see more
CAIDA Data Enclave
We built a new storage cluster and migrated data out of our old storage cluster into the new cluster.
We optimized local compute and storage resources to efficiently process the captured data, which averages 2–3TB per one-hour monthly capture.
Traces Management and Sharing
We indexed the new “Anonymized Two-Way Traffic Packet Header Traces” in the catalog.
We have introduced a new data request form for users seeking access to the dataset. To gain access, users must agree to the CAIDA Acceptable Use Agreement (AUA), which strictly prohibits reverse engineering, decryption, de-anonymization, derivation, or any other attempt to re-identify anonymized information.
We announced the availability of the dataset through a blog post, which has since generated around 30 data requests. After vetting applicants, we have approved approximately 20 requests for access.
Statistics for restricted (by request) datasets reveal that since the beginning of the ILANDS project, we have received 960 requests for “old” 10 GB link passive anonymized two-way traces, 520 of which were approved.
Users of CAIDA datasets agree, as part of the Acceptable Use Agreements, to provide CAIDA with information on their publications using CAIDA data. Our Data Publication Report Page provides instructions on how to report papers most easily. In addition, we are conducting an extensive literature search trying to locate relevant papers. We are aware of 350 publications by external authors that have utilized datasets containing passive traces captured on a 10Gbps link. However, as of now, we have not yet identified any external publications based on the 100Gbps traces, though we anticipate research outputs as adoption increases.
User Training and Support for Traffic Data
Based on a survey of the 100 GB anonymized trace users we created and shared two complementary datasets:
Publicly available “Passive 100G Metadata” Dataset
A dataset that provides statistics for restricted anonymized data including:
- Trace date and time
- Duration of the trace (hours, minutes, and seconds)
- Total packets and bytes captured
- Mean packet rate (packets per second)
- Mean bit rate (bits per second)
- Mean link load as a fraction of the nominal maximum link capacity
Restricted Anonymized Two-Way Traffic Packet Header Traces sampler
This dataset is part of the 2024 Anonymized Traces 100G dataset, consisting of 5-second snapshots of bidirectional traffic captured in November 2024. The sampler dataset allows researchers to explore the usability of the data before committing to downloading large volumes, helping them assess whether it meets their research needs.
Task 2: BGP Routing Data Infrastructure Enhancements
BGP2Go Platform For Indexing and Analyzing BGP Data (MRT files)
We developed a new science gateway, BGP2GO, to enable efficient access to MRT files from RouteViews. BGP2GO allows users to quickly retrieve routing data referencing any numeric identifier (e.g., ASN, prefix, or community) over a specified time window, streamlining data acquisition for advanced processing and analytics. The goal is to facilitate new insights into the security and resilience of Internet infrastructure.
Millions of MRT files containing routing table dumps and updates are continuously archived by BGP collectors such as RouteViews and RIPE RIS to support network research and debugging. However, the volume of these archives is growing quadratically, driven by the increasing number of collector peers and the complexity of modern networks. Downloading and processing all MRT files is inefficient and impractical.
BGP2GO solves this by allowing users to filter and select only MRT files that contain a specific resource (prefix, ASN, or community), avoiding unnecessary data downloads and processing. Instead of retrieving entire datasets, users query the database with a numeric identifier, and BGP2GO returns a list of relevant MRT file metadata—not the actual files. The provided MRT file information can then be used to construct URLs for direct downloads.
Key Features & Benefits:
- Enhances BGPStream’s bgpreader and other command-line tools by acting as a broker for archived MRT data.
- Easier and more transparent data acquisition, with a system query returning statistics on:
- Number of matching MRT files
- Total download size
- Earliest and latest matching MRT files
- Involved collectors (+ geolocation)
- Users can compile, share, and stream customized lists of relevant MRT files (e.g., using BGPStream).
- Indexed 13 years (12.7 TB) of historical RouteViews data into 316 SQL databases, with regular updates.
- Databases contain:
- 38.5 million IP prefixes
- 14.9 million IPv4 addresses
- 23.5 million IPv6 addresses
- 420 thousand BGP communities
- 117 thousand AS numbers
In October 2024, we published a blog post “Streamlining Access to BGP Routing Data” informing users that they can now request access to the BGP2GO platform.
To gain access, users must:
- Create an account with the CAIDA Services Single Sign-On (SSO) system by providing basic information and completing authentication via Keycloak.
- Request access to BGP2GO by visiting https://bgp2go.caida.org/ and submitting the request form.
The blog post also highlighted a BGP2GO use case, showcasing how researchers can analyze prefix propagation using the PEERING Testbed, BGP2GO, and BGPStream. This example demonstrates BGP2GO’s ability to streamline access to routing data for advanced research and analysis.
Enhance BGPStream service broker
We have built new pybgpstream packages and updated the BGPstream docker image to use debianBookworm. The required supporting packages have been rebuilt for Bullseye, Bookworm, and Jammy, and pushed to the CAIDA package repository.
Additionally we have built new packages for libbgpstream and dependencies, for Ubuntu 22.04 Jammy (as well as Debian Bullseye and Bookworm). We pushed these changes to the libbgpstream github repository and to CAIDA package repository.
RouteViews coordinated with CAIDA staff to update the dependencies in libBGPstream to ensure compatibility with current OS versions. In particular, they fixed the problem with multiple bgpstream warning messages while uploading data, replaced Ubuntu deprecated thread functions with new ones, installed updated Ubuntu 22.04 Jammy bgpstream package.
RouteViews provided an updated list of collectors to CAIDA staff so BGPStream can use those new data sources.
Data Integrity and Quality Controls
Demonstrating use of BGP and traffic datasets to support security research
Before releasing the Anonymized Passive Traces captured from a 100Gbps link, we conducted an internal analysis to identify potential issues and ensure data quality before making it available to researchers. We analyzed the first complete two-way passive trace from the 100Gbps link in 2024, comparing it with similar traces from 10Gbps links in 2014 and 2019, integrating the data with corresponding BGP information.
Our analysis focused on two key trends:
- The concentration of Internet traffic across fewer networks
- The adoption of IPv6
We observed a significant increase in traffic concentration on Tier-1 backbone links, with the top source and destination ASes accounting for 26% and 40% of IPv4 traffic, respectively. The concentration was even more pronounced for IPv6 traffic, where over 90% of traffic originated from a single AS.
Despite this concentration, we found evidence of growing IPv6 adoption, with the number of ASes providing IPv6 transit increasing by approximately 25% over the past decade.

Top 10 IPv4 source ASes were responsible for 96.541.8% to 68.6% of total IPv4 traffic of the 100GBps traces
These results underscore the importance of combining traffic and topology data to address key questions about Internet evolution that cannot be fully understood using either dataset alone.
BGP data compression and quality assurance
We analyzed the storage requirements for RIPE RIS and RouteViews BGP data and determined that the fourfold increase in data volume has rendered the current storage and processing mechanisms unsustainable.
To address this challenge, we dedicated significant effort—funded through our GMI3S) project—toward prototyping a fundamental reconceptualization of public BGP data collection architectures. This work led to the introduction of an overshoot-and-discard strategy, aimed at improving scalability and efficiency in handling ever-growing BGP data volumes.
This innovative approach aims to overcome operational scalability limitations of existing RIPE RIS and RouteViews systems. More details on this effort can be found in our paper, “The Next Generation of BGP Data Collection Platforms”.
Given the inherent data loss in the overshoot-and-discard approach, we simultaneously developed an alternative design that stores BGP data in a new, massively compressed yet lossless format. This effort required re-architecting the MRT data format, which has been in use for over 20 years. The first step involved:
- Thoroughly analyzing the existing MRT format
- Identifying its inefficiencies and shortcomings
- Evaluating its compression performance on real-world BGP data
As part of this initiative, we developed CAIDA BGP MRT file explainer tool. This open-source tool:
- Fully parses MRT update files
- Detects and explains errors in file structure
- Includes a file-check mode to quickly identify corrupt MRT files
- Links errors to specific RFC sections, providing clear explanations of non-compliance with BGP or MRT standards
This tool is designed to improve the reliability and transparency of BGP data processing and is available in the CAIDA catalog under bgp-explain.
Meta-data analysis tool to study RouteViews data integrity
Through our GMI3S funding, we created a Peer Stats tool that RouteViews team now uses for data quality control purposes and that enables them to reason about noisy peers and contact the respective operators. The tool provides daily statistics on RouteViews collectors’ peers.
RouteViews Peer Stats powered by BGP2GO
RouteViews Infrastructure Updates
The University of Oregon RouteViews (RV) team has made significant improvements to enhance the reliability, security, and efficiency of its infrastructure. Since the beginning of this funding RV has expanded its global reach by deploying new collectors in key locations, including Los Angeles (USA), Chile, Malaysia, Amsterdam, Iraq, Mexico, Hawaii, and Oregon. Several collectors in Singapore and Miami have been upgraded or replaced, and a new installation is underway in Seoul, South Korea.
To increase operational efficiency, RV transitioned most collectors from Quagga to FRR, improving stability and reducing disruptions. The team also optimized the peer reloading process, allowing updates without restarting services. Additionally, they streamlined the peering request process, introducing GitHub-based submissions and updating policy documentation. These efforts contributed to a major expansion in peering coverage, with over 100 new peers added.
In parallel, the team upgraded RouteViews’ live data feed systems, improving stability, reducing downtime, and replacing outdated network equipment for better connection speeds and overall performance. To strengthen security, they implemented new protective measures to prevent unauthorized access and cyber threats. Additionally, they introduced a more efficient storage and data management system, ensuring researchers can access BGP routing data faster and more reliably.
RV team updated the RouteViews map data pipeline to be more reliable by replacing Google Drive with RV-hosted resources to feed the mapping application.
The RouteViews team has released a beta version of their RouteViews API designed for network operators and researchers who require regular access to RouteViews data as part of their global routing system monitoring. Rather than replacing the MRT archive, the API serves as a complementary tool, providing faster and more convenient access to commonly queried routing information.
Traditionally, RouteViews collectors have provided command-line access, allowing operators to quickly check BGP announcements and reachability information. However, as the Internet continues to grow, along with the increasing size of both IPv4 and IPv6 routing tables, this direct access places a growing strain on RouteViews’ infrastructure.
To address this, the RouteViews API replaces automated command-line queries, offering a more efficient and scalable solution for frequent users. It supplements BGP UPDATES and RIB dumps stored in the RouteViews BGP data archive, ensuring that researchers and operators can retrieve critical routing data without overloading the system.
These enhancements collectively strengthen RouteViews’ role as a vital resource for the global networking and research community, supporting efforts to analyze, secure, and understand the evolving structure of the Internet.
Task 3: Outreach and Community Engagement
Catalog Management
We manually integrated RV BGP data into CAIDA catalog.
We automated the process of indexing RouteViews external publications in the catalog and linking them to relevant catalog objects.
We indexed all external and CAIDA publications using CAIDA Two-way anonymized passive traces datasets and CAIDA BGP Topology datasets which are powered by RV BGP data.
RouteViews-Related User Training and Support
The NSRC RouteViews team organized numerous public presentations about RV data at locations such as the ThaiNOG (part of BKNIX Peering Forum) meeting in Bangkok, the PacificNetwork Operators Group meetings, mnNOG (Mongolia Network Operators Group), NANOG in Seattle in 2023 and other meetings as well. The team also participated in Peering Forums, Research and Education Network meetings including the GÉANT TNC meeting, European Peering Forum, RIPE, APRICOT, CENIC.
RouteViews Presentations and Network Security Discussions
RouteViews team and collaborators provided numerous technical presentations on RoutingSecurity and the RouteViews collection of network routing data. RouteViews has created a page that includes RouteViews presentations by University of Oregon personnel who have worked on the RouteViews project since its inception in 1997.
RouteViews and RPKI
The University of Oregon’s RouteViews system provides detailed public views of Internet routing data. As U.S. government agencies work more closely together to improve Internet routing security, RouteViews provides highly valuable raw BGP data and systems for detecting and limiting the impacts of Internet hijack attempts. Documentation about the implementation of Resource Public Key Infrastructure (RPKI) deployment is curated in the RouteViews global database of collectors and enables researchers and operators to assess the global progress of RPKI advancements. The RPKI helps to create trust in reachability information by enabling cryptographically verifiable associations between specific IP address blocks, or autonomous system numbers (ASNs), and the operators of those Internet number resources.RouteViews is used as a primary data source to measure RPKI implementation globally. For more info, see: https://www.kentik.com/blog/rpki-rov-deployment-reaches-major-milestone/
RouteViews and Internet Routing Security
RouteViews is a valuable resource for the federal government as it helps policy decision-makers,government researchers, and network operators make more informed and secure routing decisions. Robert Cannon, Senior Telecommunications Policy Analyst at the National Telecommunications and Information Administration, U.S. Department of Commerce, emphasized the critical role of RouteViews in federal cybersecurity and policy efforts:
RouteViews is a valuable resource for the federal government as it helps policy decision makers, government researchers, and network operators make more informed and secure routing decisions. The RouteViews data is relied upon in NIST routing security research and the NIST RPKI Monitor, the National Cybersecurity Strategy Initiative 4.1.5 Advancing Routing Security, and the Federal Communication Commission’s review of routing security. The project is critical to developing effective federal policy to address the ‘pervasive concern’ of routing security.”
Recently a group of federal government agencies, including the Office of theNational Cyber Director, reached out to NSRC about RouteViews and its importance for federal initiatives to improve Internet routing security.
MANRS Compliance Support
NSRC has taught about Internet routing security practices and key principles for years before the Internet Society initiated the Mutually Agreed Norms for Routing Security (MANRS) program, so it was a natural evolution to partner with ISOC personnel to emphasize MANRS compliance as a goal of improved network operations. For example, as part of direct engineering assistance provided at the University of Guam (UoG), the work included routing security, including Routing Public Key Infrastructure (RPKI) deployment and other best practices to finalize MANRS compliance for the UoG campus network during this reporting period as documented in the UoG blog entry, “UOG participates in a global initiative to strengthen routing security”
For additional information, see also a previous blog posting entitled “Partnering with NSRC on MANRS & Routing Security Training” by Megan Kruse, which highlights the utilization of the OregonRouteViews BGP monitoring infrastructure to improve the MANRS Observatory data sources,data collection, and verification.
Community Workshops
Since the launch of ILANDS funding, CAIDA has hosted four in-person workshops, providing a platform for in-depth discussions on ILANDS progress and future directions. While these workshops were funded by GMI3S (OAC-2131987), they played a crucial role in advancing ILANDS-related research and collaboration.
The most recent and largest workshop, with 82 participants, was held at UCSD from February 10–14, 2025. Leading up to the event, we also organized a Hackathon to foster hands-on engagement.
Key highlights included a dedicated day for BGP data discussions and another for Internet traffic research, and a tutorial on UCSD NT Telescope traffic analysis, led by CAIDA research scientist Ricky Mok demonstrating the use of ACCESS-CI supercomputing resources. This methodology is applicable to two-way passive trace analysis, and to further lower the barrier for analyzing 100Gbps passive traces, we plan to develop a similar data and analysis pipeline, along with comprehensive tutorials.
Publications and Presentations
There are currently 18 peer-reviewed publications by CAIDA researchers that are all indexed in CAIDA catalog: ILANDS publications
There are 26 presentations by CAIDA researchers indexed in CAIDA catalog: ILANDS presentations
RouteViews team made 13 presentations: ILANDS presentations made by RouteViews