Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > data : passive : telescope-flowtuple.xml
UCSD Network Telescope Aggregrated Flow Dataset
This dataset consists of hourly files of unsolicited traffic captured by the UCSD Network Telescope traces and aggregated into the FlowTuple format.

Data Description

The UCSD Network Telescope consists of a globally routed, but lightly utilized /8 network prefix, that is, 1/256th of the whole IPv4 address space. It contains few legitimate hosts; inbound traffic to non-existent machines - so called Internet Background Radiation (IBR) - is unsolicited and results from a wide range of events, including misconfiguration (e.g. mistyping an IP address), scanning of address space by attackers or malware looking for vulnerable targets, backscatter from randomly spoofed source denial-of-service attacks, and the automated spread of malware. CAIDA continously captures this anomalous traffic discarding the legitimate traffic packets destined to the few reachable IP addresses in this prefix. We archive and aggregate these data, and provide this valuable resource to network security researchers.

Raw data captured by the UCSD Network Telescope are stored in huge pcap files, each file containing 1 hour of data. In order to enable more efficient data storage, processing, and analysis, these hourly pcap files are post-processed using Corsaro software to extract the most important packet header fields and aggregate data into FlowTuple files. The FlowTuple format includes the following eight fields:

source IP address; destination IP address; source port; destination port; protocol; TCP Flags; TTL; IP length.
In the hourly FlowTuple output files, the data are broken into 60 second intervals. Within a given interval, each unique key (unique combination of the FlowTuple fields) observed in the raw pcap data is reported on a separate line in the following format:

<src_ip>|<dst_ip>|<src_port>|<dst_port>|<protocol>|<tcp_flags>|<ttl>|<ip_len>,value

where "value" is the number of packets in this interval whose header fields match this FlowTuple key.
Flows are further subdivided into three flowtuple classes: backsatter, ICMP Request and "other" (not backsacatter and not ICMP Request), and the total number of flows in each class is recorded.

One can use "cors2ascii" command to display the FlowTuple output in a human-readable ASCII format:

$ cors2ascii example.flowtuple.cors.gz
#CORSARO_INTERVAL_START 0 1527811200
START flowtuple_backscatter 1012004
xxx.181.55.12|xx.68.45.71|3|3|1|19|0x00|100,20
xxx.85.229.197|xx.4.59.12|3|1|1|19|0x00|159,1
......
END flowtuple_backscatter
START flowtuple_icmpreq 314399
xxx.136.34.2|xx.128.122.196|8|0|1|1|0x00|44,3
xxx.52.83.2|xx.165.41.187|8|0|1|1|0x00|48,2
......
END flowtuple_icmpreq
START flowtuple_other 17964239
xxx.228.34.84|xx.151.31.6|0|0|4|63|0x00|256,6
xxx.228.34.84|xx.151.31.6|0|0|4|63|0x00|512,140
......
END flowtuple_other
# CORSARO_INTERVAL_END 0 1527811259
# CORSARO_INTERVAL_START 1 1527811260
START flowtuple_backscatter 991397
.........


Each hourly flowtuple file contains 60 intervals (0-59) characterized by unique start and end local unix time. More information about the structure of Corsaro global output file can be found at Corsaro Documentation page. This ongoing dataset covering the period from February 2008 till now is stored locally at CAIDA.

Caveats that apply to this dataset

This dataset and the types of worm and denial-of-service attack traffic contained therein are representative only of some spoofed source denial-of-service attacks. Many denial-of-service attackers do not spoof source IP addresses when they attack their victim, in which case backscatter would not appear on a telescope. Attackers can also spoof in a non-random fashion, which will incur an uneven distribution of backscatter across the IPv4 address space, and may cause backscatter traffic to miss any telescope lenses. Note that the telescope does not send any packets in response, which also limits insight into the traffic it sees.

Data Access Policy

These data must be analyized on CAIDA machines, and cannot be downloaded!

Academic researchers, government agencies and corporate entries in the DHS-Approved Locations can only request access through the website of the Information Marketplace for Policy and Analysis of Cyber-risk and Trust (IMPACT) portal. After locating this dataset in the IMPACT data catalog, please follow the IMPACT instructions for requesting the dataset. In order for the application to be considered, the researchers must obtain an IMPACT account as well as complete and agree to IMPACT Memorandum of Agreement (MOA).
@@ elena should change the DS number

Academic researchers from other foreign countries can request access through CAIDA by filling out and submitting the online form. It usually takes about five to ten business days to process your request. We carefully review each application and the decision to grant the data access is based on the merits of your proposed data use.

Finally, these data also may be available for government and corporate entities not from DHS-Approved Locations who participate in CAIDA's membership program. Information on membership levels, services, and rates can be found on the CAIDA Sponsorship Information page, or by emailing sponsorship@caida.org.

Once users are approved for access to this dataset, they will be set up with an account on the CAIDA machine that provides direct access to the Telescope data they requested. Accounts will be valid for a nominal twelve months in which the research is expected to be completed. CAIDA strictly enforces a "take software to the data" policy for this dataset: all analysis must be performed on CAIDA computers; no download of raw data will be allowed. CAIDA provides several basic tools to access the dataset, including CoralReef and Corsaro. Researchers can also upload their own analysis software.

Acceptable Use Agreement

Access to these data is subject to the terms of the following CAIDA Acceptable Use Agreement (printable version in PDF format)
and the supplemental AUA for the Near-Real-Time Telescope Data, below:


When referencing this data (as required by the AUA), please use:

The CAIDA UCSD Network Telescope Aggregrated Flow Dataset - < dates used >,
http://www.caida.org/data/passive/telescope-flowtuple.xml
Also, please, report your publication to CAIDA.

UCSD Network Telescope Datasets

References

For more information about the use of these data in studies of internet censorship, see:

For more information on Conficker and worm attacks, see:

For more information on Backscatter and Denial-of-Service attacks, see:

For more information on the UCSD Network Telescope, see:

For more information on the CoralReef Software Suite, see:

For more information on the Corsaro Software Suite, see:

For a non-exhaustive list of Non-CAIDA publications using Network Telescope data, see:

  Last Modified: Mon Nov-5-2018 13:34:36 PST
  Page URL: http://www.caida.org/data/passive/telescope-flowtuple.xml