CAIDA Data - Overview of Datasets, Monitors, and Reports

CAIDA collects several different types of data at geographically and topologically diverse locations, and makes this data available to the research community to the extent possible while preserving the privacy of individuals and organizations who donate data or network access.

This page provides a quick-access overview of available datasets (publicly available or otherwise restricted), with links to the dataset descriptions and access request forms when applicable.


CAIDA Topology Datasets (14)

Internet topology is visualization of the arrangement and interconnectedness of devices within the autonomous systems (ASes) of the Internet at a large scale. Internet topology maps are an important tool for those who seek to describe, analyze, or model various aspects of the Internet‛s structure, behavior, and evolution.

Ark IPv4 Routed /24 DNS Names
The IPv4 Routed /24 DNS Names Dataset provides fully-qualified domain names for IP addresses seen in the traces of the IPv4 Routed /24 Topology Dataset

start

2008-03-01

end

ongoing

Ark IPv4 prefix-probing
This dataset results from traceroute-based measurements running on the Archipelago (Ark) measurement infrastructure.

start

2015-12-08

end

ongoing

Ark IPv4 Routed /24 Topology
These are all the Ark IPv4 team-probing data, collected by a globally distributed set of Archipelago (Ark) monitors. IPv4 Routed /24 Topology dataset. It contains information useful for studying the IP and AS topology of the IPv4 Internet.

start

2007-09-13

end

ongoing

Ark IPv6 Topology
These are all the Ark IPv6 probing data, collected by a globally distributed set of IPv6-enabled Archipelago (Ark) monitors. These data contain information useful for studying the IP and AS topology of the IPv6 Internet.

start

2008-12-12

end

ongoing

ITDK: Internet Topology Data Kit
Ark-based macroscopic Internet Topology Data Kits (ITDK)

start

2010-01

end

2024-02

AS Classification
Inferred classified Autonomous Systems (ASes) by their business type.

start

2015-08-01

end

ongoing

AS to organizations mappings
Contains AS to organization mappings derived from the quarterly WHOIS dumps

start

2004-04-07

end

ongoing

AS Rank
AS Rank is CAIDA's ranking of Autonomous Systems (AS) (which approximately map to Internet Service Providers) and organizations (Orgs) (which are a collection of one or more ASes). This ranking is derived from topological data collected by CAIDA's Archipelago Measurement Infrastructure and Border Gateway Protocol (BGP) routing data collected by the Route Views Project and RIPE NCC.
ASes and Orgs are ranked by their customer cone size, which is the number of their direct and indirect customers.
Note: We do not have data to rank ASes (ISPs) by traffic, revenue, users, or any other non-topological metric..

start

2011-11-01

end

2024-12

AS Relationships (serial-1)
Contains AS links annotated with inferred relationships. Each file contains a full AS graph derived from a set of RouteViews BGP table snapshots. Served online in the public AS Relationships dataset. Online since 5 November 2013. Also see: as-relationships-as-relationships-pre-201206 as-relationships-as-relationships-201206-201311 as-relationships-as-relationships-serial2

start

1998-01-01

end

ongoing

DZDB
DZDB is CAIDA's database of tld zone file data, which tracks the history of a zone's domain, nameserver, and IP records.

start

2011-04

end

ongoing

Hoiho - Holistic Orthography of Internet Hostname Observations
Our system, Hoiho, released as open-source as part of scamper, uses CAIDA's Macroscopic Internet Topology Data Kit (ITDK) and observed round trip times to infer regular expressions that extract these apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which we used as input to Hoiho for it infer rules (encoded as regular expressions) that extract these annotations. CAIDA has released these inferred rulesets in recent ITDKs.

start

2023-02-13

end

2024-12

Ark IPv6 Topology Dataset
The IPv6 Topology Dataset contains information useful for studying the IP- and AS-topology of the IPv6 Internet. The focus of this measurement is on discovering topology and not on finding responding destinations.

start

2008-12-12

end

ongoing

PEERINGDB ARCHIVE
CAIDA manages the only repository of daily snapshots of historic PeeringDB data. The repository consists of two parts, version 1 and version 2. Daily snapshots of version 1 are available as sql and sqlite files covering July 29, 2010, to March 13, 2016. In 2016 peeringDB switched to a new format. These new version 2 data are available as sql files from May 27, 2016, to March 10, 2018; and as json files from March 11, 2018, onwards.

PeeringDB, an online database of peering policies, traffic volumes and geographic presence of participating networks. PeeringDB, a non-profit member-based organization, has been established to support practical needs of network operators. However, it is also a valuable source of information for researchers. The first version of PeeringDB resided in a MySQL database, which was not scalable and lacked security features and data validation mechanisms. It presented potential risks of exposing contact information to spammers, and contained typos. Starting at the end of March 2016, PeeringDB switched to a new data schema and API.

start

2010-07-29

end

ongoing

RouteViews Prefix to AS mappings
Contains IPv4 and IPv6 Prefix to Autonomous System (AS) mappings derived from RouteViews data (https://www.routeviews.org).

start

2005-05-21

end

ongoing

CAIDA Geolocation Datasets (3)

Geolocation is the process of determining the physical location of a device connected to the internet, typically through IP addresses.

Cloud Prefixes (Geolocation)
This dataset provides list of IP prefixes for each of the following cloud providers: AWS, Azure, Cloudflare, DigitalOcean, Fastly, GCP, IBM, OCI. Each IP prefix may be associated with a specific region or service label, providing insights into the geographical distribution or service classification of the IP addresses.

start

2024-06-04

end

ongoing

Hoiho - Holistic Orthography of Internet Hostname Observations
Our system, Hoiho, released as open-source as part of scamper, uses CAIDA's Macroscopic Internet Topology Data Kit (ITDK) and observed round trip times to infer regular expressions that extract these apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which we used as input to Hoiho for it infer rules (encoded as regular expressions) that extract these annotations. CAIDA has released these inferred rulesets in recent ITDKs.

start

2023-02-13

end

2024-12

Internet eXchange Points Dataset
Provides information about IXPs and their geographic locations, facilities, prefixes, and member ASes. Derived by combining information from PeeringDB, Hurricane Electric, Packet Clearning House, Wikipedia, BGP Looking Glass, and GeoNames

start

2018-01-01

end

ongoing

CAIDA Infrastructure Datasets (5)

Network Infrastructure refers to the underlying physical and virtual components that enable network connectivity and allow communication between users, devices, and the Internet.

AS Classification
Inferred classified Autonomous Systems (ASes) by their business type.

start

2015-08-01

end

ongoing

AS to organizations mappings
Contains AS to organization mappings derived from the quarterly WHOIS dumps

start

2004-04-07

end

ongoing

Internet eXchange Points Dataset
Provides information about IXPs and their geographic locations, facilities, prefixes, and member ASes. Derived by combining information from PeeringDB, Hurricane Electric, Packet Clearning House, Wikipedia, BGP Looking Glass, and GeoNames

start

2018-01-01

end

ongoing

PEERINGDB ARCHIVE
CAIDA manages the only repository of daily snapshots of historic PeeringDB data. The repository consists of two parts, version 1 and version 2. Daily snapshots of version 1 are available as sql and sqlite files covering July 29, 2010, to March 13, 2016. In 2016 peeringDB switched to a new format. These new version 2 data are available as sql files from May 27, 2016, to March 10, 2018; and as json files from March 11, 2018, onwards.

PeeringDB, an online database of peering policies, traffic volumes and geographic presence of participating networks. PeeringDB, a non-profit member-based organization, has been established to support practical needs of network operators. However, it is also a valuable source of information for researchers. The first version of PeeringDB resided in a MySQL database, which was not scalable and lacked security features and data validation mechanisms. It presented potential risks of exposing contact information to spammers, and contained typos. Starting at the end of March 2016, PeeringDB switched to a new data schema and API.

start

2010-07-29

end

ongoing

RouteViews Prefix to AS mappings
Contains IPv4 and IPv6 Prefix to Autonomous System (AS) mappings derived from RouteViews data (https://www.routeviews.org).

start

2005-05-21

end

ongoing

CAIDA DNS Datasets (3)

DNS (Domain Name System) is the system which translates human-readable web addresses to computer-readable IP addresses, enabling the proper routing of internet traffic. DNS names are useful for obtaining additional information about routers and hosts making up the Internet topology.

Ark IPv4 Routed /24 DNS Names
The IPv4 Routed /24 DNS Names Dataset provides fully-qualified domain names for IP addresses seen in the traces of the IPv4 Routed /24 Topology Dataset

start

2008-03-01

end

ongoing

Ark IPv6 Topology DNS Names
The IPv6 DNS Names Dataset provides fully-qualified domain names for IP addresses seen in the traces of the IPv6 Topology Dataset

start

2014-05-30

end

ongoing

DZDB
DZDB is CAIDA's database of tld zone file data, which tracks the history of a zone's domain, nameserver, and IP records.

start

2011-04

end

ongoing

CAIDA DNS Telescope (5)

Telescope is a system that allows the monitoring of a large number of IP addresses in order to identify potentially malicious activity on the internet.

UCSD Network Telescope Aggregated Flow Dataset
Archival aggregated flowtuple Telescope data in Corsaro format

start

2003-11-06

end

ongoing

Aggregated Daily RSDoS Attack Metadata
Consists of daily files of unsolicited traffic captured by the UCSD Network Telescope traces and aggregated into the avro format. Must be analyzed at CAIDA machines. The entire data is stored in swift.

start

2008-01-01

end

ongoing

UCSD Real-time Network Telescope
Keeps track of the live pcap telescope data available on the telescope data server (currently thor). The on-disk data cover only the nominal 60-day window we keep on disk. Files that roll out of the windo are still archived to NERSC. To get accurate information about file counts and size (since 2013-07-01) check the history for this fileset, or the history of telescope-nersc.

start

2020-04-26

end

ongoing

Telescope nDAG Live
A live feed of the traffic observed at the UCSD network telescope. DAG refers to the DAG hardware capture card that is used to capture the telescope traffic and the 'n' stands for 'network' (to reflect that we are taking a DAG capture and exporting it directly to multiple users via a network).

For more information, reference https://stardust.caida.org/docs/data/ndag/

start

end

ongoing

UCSD Telescope data at NERSC
Keeps track of all pcap files currently archived in the NERSC HPSS tape archive from the file telescope-nersc.history. That file contains names and compressed file sizes for all pcap files stored at NERSC.

start

2003-11-06

end

ongoing

Other organizations across the globe, both academic and non-academic, provide access to internet-related data that also are of interest to the research community. Links to several of the more interesting datasets are given in a non-exhaustive list of external (non-CAIDA) data.

We maintain a list of publications from research using CAIDA data. The purpose of this list is to provide insight into past uses of CAIDA data. We rely on researchers who download our data to comply with the Acceptable Use Policies of CAIDA datasets in reporting published papers and presentations to us. See this list of Non-CAIDA Publications using CAIDA Data.

Report a Publication using CAIDA data

Users of CAIDA data agree to an AUA to notify CAIDA when they make a publication using CAIDA data. Report a publication using CAIDA data by completing the form or emailing data-info@caida.org.

Additional Information

Read more about CAIDA's efforts in data curation and promoting data sharing.

To keep up to date on CAIDA datasets you can subscribe to data-announce@caida.org. For other questions about CAIDA data, please contact data-info@caida.org. For more information about using CAIDA data, please see the CAIDA Data Usage FAQ.

Other Datasets

To see a list of all datasets by CAIDA, please visit the catalog page. You can also see a list of all completed datasets by CAIDA here.

Related Objects

See https://catalog.caida.org/search?query=types%3Ddataset%20links%3Dtag%3Acaida%20status%3Dongoing to explore related objects to this document in the CAIDA Resource Catalog.
Published
Last Modified