Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > data : data-usage-faq.xml
CAIDA Data Usage Frequently Asked Questions
Suggestions for this FAQ? Please send them to data-info@caida.org. We appreciate your feedback.

Types of Data

Access to Data

  • You could use a web browser to download individual files from our data servers, but this is a tedious task for more then a couple of files and some browsers have problems with large files. There are a number of tools that can be used to automate downloading of data files from a web server. A few open source tools that help in non-interactive downloads are:
    - wget
    - cURL

    Typical command-line options for downloading all files in http://www.example.com/data using wget is:
    wget -np -m http://www.example.com/data

    Explanation of options:
    -m : mirroring, ie. make a copy of the directory-structure and files
    -np : don't ascend to the parent directory

    It is not advisable to use older versions of wget (before 1.10) to download CAIDA data, due to the lack of large file support.

  • To use wget (version 1.10.2) to download from https://www.example.com/data for user joe@caida.org with password c00lbean$:

    - First create a .wgetrc file in your home directory, and make sure it's only readable by you (eg. on UNIX do a chmod 600 .wgetrc), and edit the file to contain the next line:
    http-password = c00lbean$

    It is advisable to remove the .wgetrc that contains your password after you have finished downloading.

    It is also possible to specify the password on the command-line with the --http-password=c00lbean$ command-line option. This would allow others to see your password if they check the process table on your computer, so this is not advisable on multi-user systems.

    - Next you can start the actual download using the command:
    wget -np -m --http-user=joe@caida.org --no-check-certificate https://www.example.com/data

    You'll need the --no-check-certificate option on some CAIDA servers where self-signed SSL certificates are used.

  • Yes, wget has the -c option that allows you to continue downloading partially-downloaded files. This works together with the mirroring option -m, with the exception of directory listings that will be refetched when you restart a mirroring download.

  • Most large data files CAIDA hosts have their md5 checksum stored in a file md5.md5 in the same directory as the data file. This checksum can be used to verify your data download. An md5check.pl script is provided at http://data.caida.org/scripts/ for checking md5 checksums of downloaded data files against the md5.md5 file (this requires perl with the Digest::MD5 module installed, which is included in the core perl distribution since perl 5.8.0). Some other utilities to compute MD5 checksums are: md5 (installed on most UNIX versions), openssl, and FastSum (Windows).

  • For our restricted data sets we require our data not to be distributed beyond authorized users. This includes preventing unauthorized users on a multiuser system to access your data. This can easily be accomplished by changing the access permissions on the data files. For example: on UNIX systems you can change permissions to user-only read-only access with the command:
    chmod 600 <datafile>

    Alternatively you use the umask command to set default file permissions to user-only read-only before you start downloading:
    umask 077 ; wget -np -m --http-user=joe@caida.org --no-check-certificate https://www.example.com/data

    Also make sure the data isn't (inadvertently) shared over a network resource that is accessible through the Internet (a file share, FTP- or web server are examples), since that is in violation with the Acceptable Use Policy (AUP) of our datasets.

  • The display of data from the UCSD network telescope realtime monitor is delayed for security reasons. The delay makes it harder to find evidence of attacks in the data that is captured by the telescope. We do this to minimize the risk of evidence of attacks being used in illegal activities, like extortion.

Miscellaneous

Topology dataset

Real-time passive network monitor reports

Anonymized Internet traces datasets

  Last Modified: Tue Jul-29-2014 15:32:19 PDT
  Page URL: http://www.caida.org/data/data-usage-faq.xml