The CAIDA Anonymized Internet Traces Dataset (April 2008 - January 2019)

CAIDA's passive traces dataset contains traces collected from high-speed monitors on a commercial backbone link. The data collection started in April 2008 and ended in January 2019. These data are useful for research on the characteristics of Internet traffic, including application breakdown, security events, geographic and topological distribution, flow volume and duration. For an overview of all traces see the trace statistics page)

This dataset contains anonymized passive traffic traces from various CAIDA's high-speed monitors on high-speed Internet backbone links.
Data for 2008 - 2014 contain anonymized passive traffic traces from CAIDA's equinix-chicago and equinix-sanjose monitors on high-speed Internet backbone links.
Data for 2015 - 2016 contain anonymized passive traffic traces from CAIDA's equinix-chicago monitor.
Starting 2018 the data contain anonymized passive traffic traces from CAIDA's equinix-nyc monitor.
The first traffic trace available is a one-hour traffic trace collected during the DITL 2008 measurement event. This trace contains anonymized packet headers in pcap format on a single direction of the bidirectional OC192 link at equinix-chicago from approximately 2008-03-19 19:00 to 20:00 UTC. The hardware monitoring the other direction of the link was not functioning properly at the time of the traffic capture, so only data for a single direction was captured.

For the equinix-chicago monitor, the first monthly bidirectional traffic trace was taken on April 30 2008, and added to the Anonymized 2008 Internet Trace dataset in June 2008. This one-hour trace resulted in 83 GB of compressed pcap files. The first monthly bidirectional traffic trace from the equinix-sanjose monitor was taken on July 17 2008.
Starting with the 2014 dataset the yearly passive trace datasets only contain one trace per quarter (previous years contain one trace per month). While we still collect a one-hour trace each month (and add statistics about each trace to the trace statistics page), we are forced by storage limitations to select only one of the three traces for each quarter for inclusion in this yearly collection.

Traffic traces in this dataset are anonymized using CryptoPan prefix-preserving anonymization. The anonymization key changes annually and is the same for all traces recorded during the same calendar year. During capture packets are truncated at a snap length selected to avoid excessive packet loss due to disk I/O overload. The snap length has historically varied from 64 to 96 bytes. In addition, payload is removed from all packets: only header information upto layer 4 (transport layer) remains.
The Endace network cards used to record these traces provide timestamps with nanosecond precision. However, the anonymized traces are stored in pcap format with timestamps truncated to microseconds. Starting with the 2010 traces the original nanosecond timestamps are provided as separate ascii files alongside the pcap files.

The traces can be read with any software that reads the pcap (tcpdump) format, including the CoralReef Software Suite, tcpdump, Wireshark, and many others.

We are aware that 2008 data contains more then trivial amounts of packet loss; this has especially been an issue for equinix-chicago direction B. Due to the way the monitoring equipment is set up the synchronization between directions we don't know how well-aligned both directions of a single link are.

The related IPv6 Day and World IPv6 Launch Day Dataset contains three anonymized passive traffic traces from CAIDA's equinix-chicago and equinix-sanjose monitors on high-speed Internet backbone links taken during IPv6 Day on 8 June 2011, and three additional traces from the equinix-sanjose monitor taken during IPv6 Launch Day on 6 June 2012 (the Chicago monitors were offline on IPv6 Launch Day). The traces cover start, middle and end of the 24-hour IPv6 Day and IPv6 Launch Day periods. The first IPv6 Day trace runs from 7 June 2011 23:45:00 UTC to 8 June 2011 00:45:00 UTC, the second trace from on 8 June 2011 13:00:00 UTC to 14:00:00 UTC and the final trace runs from 8 June 2011 23:45:00 UTC to 9 June 2011 00:45:00 UTC. The three traces on IPv6 Launch Day run from 5 June 2012 23:45:00 UTC to 6 June 2012 00:45:00 UTC, from 6 June 2012 13:00:00 UTC to 6 June 2012 14:00:00 UTC, and from 6 June 2012 23:45:00 UTC to 7 June 2012 00:45:00 UTC, respectively. Traffic traces in this dataset are anonymized using CryptoPan prefix-preserving anonymization. All traces in this dataset are anonymized with the same key. In addition, the payload has been removed from all packets. The dataset size is 194 GB, 172 GB and 198 GB for each of the three IPv6 Day traces, respectively. The IPv6 Launch Day sizes are 177 GB, 121 GB and 171 GB.

Acceptable Use Agreement

Access to these data is subject to the terms of the following CAIDA Acceptable Use Agreement

When referencing this data (as required by the AUA), please use:

The CAIDA UCSD Anonymized Internet Traces - <dates used>
https://www.caida.org/catalog/datasets/passive_dataset
You are required to report your publications using this dataset to CAIDA.

Request Data Access

Request Access to the CAIDA Anonymized Internet Traces Dataset and other Anonymized Internet Traces Datasets

Anonymized Internet Traces Datasets

Restricted Datasets (available through CAIDA)

Publicly Available Datasets

The Data Collection Monitors

Related Objects

See https://catalog.caida.org/dataset/passive_merged_pcap to explore related objects to this document in the CAIDA Resource Catalog.
Published
Last Modified