CAIDA Data Usage Frequently Asked Questions

| 
|
|
Suggestions for this FAQ? Please send them to data-info@caida.org.
We appreciate your feedback.
| 
|

|
1. Types of Data
-
1.1: What is pcap?
Pcap is a file format used to capture network traffic. It is
the format used by the tcpdump program.
-
1.2: What tools can be used to read pcap files?
A non-exhaustive list of applications that can read pcap files:
- CoralReef
- tcpdump
- Wireshark ( fork of the Ethereal project )
If you want to write your own programs reading pcap files, libraries
to read these files are:
- libpcap ( C )
- CoralReef ( C and perl )
-
1.3: Why are some files compressed using LZO compression instead of the more common DEFLATE (zip, gzip) compression?
For datasets both the compression ratio and speed of decompression are key factors in usability. For some
large datasets we've chosen LZO over DEFLATE for its better speed of decompression. LZO compressed files can be decompressed using
lzop.
-
1.4 How are CAIDA pcap traces anonymized?
For traces where privacy plays an important role (like our OC48 traces taken on a link of a tier1 ISP), we use
Crypto-PAn anonymization. This anonymization is prefix-preserving,
so if two original IP addresses share a k-bit prefix, their anonymized mappings will also share a k-bit prefix.
2. Access to Data
-
2.1: How do I download publicly available data from CAIDA data servers?
You could use a web browser to download individual files
from our data servers, but this is a tedious task for more then a couple of files
and some browsers have problems with large files. There are a number of tools that can be used to automate
downloading of data files from a web server. A few open source tools that help in non-interactive downloads
are:
- wget
- cURL
Typical command-line options for downloading all files in http://www.example.com/data using wget is:
wget -np -m http://www.example.com/data
Explanation of options:
-m : mirroring, ie. make a copy of the directory-structure and files
-np : don't ascend to the parent directory
It is not advisable to use older versions of wget (before 1.10) to download CAIDA data,
due to the lack of large file support.
-
2.2: How do I download password restricted data from CAIDA data servers?
To use wget (version 1.10.2) to download from https://www.example.com/data for user joe@caida.org
with password c00lbean$:
- First create a .wgetrc file in your home directory, and make sure it's only readable
by you (eg. on UNIX do a chmod 600 .wgetrc), and edit the file to contain the next line:
http-password = c00lbean$
It is advisable to remove the .wgetrc that contains your password after you have finished
downloading.
It is also possible to specify the password on the command-line with the
--http-password=c00lbean$ command-line option. This would allow others to see
your password if they check the process table on your computer, so this is not
advisable on multi-user systems.
- Next you can start the actual download using the command:
wget -np -m --http-user=joe@caida.org --no-check-certificate https://www.example.com/data
You'll need the --no-check-certificate option on some CAIDA servers where self-signed
SSL certificates are used.
-
2.3: My download was aborted, can I restart it from the point it stopped?
Yes, wget has the -c option that allows you to continue downloading partially-downloaded files.
This works together with the mirroring option -m, with the exception of directory listings
that will be refetched when you restart a mirroring download.
-
2.4: How do I check if data is downloaded correctly?
Most large data files CAIDA hosts have a corresponding md5 file in the same directory that
you can use to verify your data. The md5 file has the same name as the data file with
the extension .md5 appended, and contains the md5 checksum of the corresponding file.
An md5check.pl script is provided for checking md5 checksums of downloaded data files
against corresponding md5 files (requires perl with the Digest::MD5 module installed,
which is included in the core perl distribution since perl 5.8.0).
Some other utilities to compute MD5 checksums are: md5 (installed on most UNIX versions), openssl,
and FastSum (Windows).
-
2.5: How can I protect my data?
For our restricted data sets we require our data not to be distributed beyond authorized users.
This includes preventing unauthorized users on a multiuser system to access your data. This can
easily be accomplished by changing the access permissions on the data files. For example: on UNIX
systems you can change permissions to user-only read-only access with the command:
chmod 600 <datafile>
Alternatively you use the umask command to set default file permissions to user-only read-only
before you start downloading:
umask 077 ; wget -np -m --http-user=joe@caida.org --no-check-certificate https://www.example.com/data
Also make sure the data isn't (inadvertently) shared over a network resource that is accessible through the
Internet (a file share, FTP- or web server are examples), since that is in violation with the
Acceptable Use Policy (AUP) of our datasets.
-
2.6: Why is the display of data from the UCSD network telescope realtime monitor delayed by approximately a day?
The display of data from the UCSD network telescope realtime monitor is delayed for security reasons. The delay makes it harder to find evidence of attacks
in the data that is captured by the telescope. We do this to minimize the risk of evidence of attacks being used in illegal activities, like extortion.
3. Misc
-
3.1: How can I keep up-to-date on data availability at CAIDA?
-
3.2: How can I help CAIDA provide data?
- Give us feedback (data-info@caida.org) on what datasets were useful to you and how you used them.
-
Report publications (papers, web pages, class projects, presentations etc.) using our datasets to us. This information allows CAIDA to justify the time and effort spent on data provisioning to our funding agencies. We maintain a list of publications using CAIDA data, based on feedback from our data users.
- Join CAIDA. Available options include donations of software or
equipment, financial contributions to support CAIDA as an organization, and
direct sponsorship of a specific project. For more information, see
Joining CAIDA and CAIDA Sponsorship Information.
|
|