Distribution Statistics for CAIDA Online Datasets

This page provides links to a variety of information about the distribution of CAIDA online datasets to the scientific community. It provides answers to questions like: Which are the most popular CAIDA datasets? Where are users of CAIDA datasets located? How many papers are published that use CAIDA datasets? Several other pages provide statistical information about the data themselves. For statistics about Archipelago (Ark) see the Ark project page, and links provided at the top of that page. For statistics about the CAIDA anonymized passive traces see the summary statistics page.

Requests for CAIDA Restricted Datasets

CAIDA collects, curates and distributes several restricted data collections to the scientific community, most notably the active topology data related to the Archipelago project, and the passive anonymized internet traces captured by the Equinix Chicago and San Jose monitors. Interested researchers apply for these data using online request forms, and once approved, are provided an online CAIDA account to download the data. The information shown here derives from the information provided by our users, and the web log history of downloads.

Usage of CAIDA Public Datasets

Some of our most popular datasets (in terms of the number of researchers actually accessing the online data) are our public datasets (i.e. with unrestricted access), most notably the AS Relationships dataset. For these datasets we count the number of different users by checking our weblogs for unique IP addresses. Potential users are also asked to fill out a User Info form. This entirely optional form includes a field where an email address can be provide. We use the TLD of these email addresses to get an idea of the geographic distribution of users of public CAIDA datasets.

Publications Using CAIDA Datasets

We ask our users to inform us whenever they publish a research paper with results that depend on the analysis of CAIDA data. We use this information, and additional references we track down ourselves in periodic literature searches, to maintain an online list of papers published by non-CAIDA authors. The affiliations of authors and co-authors listed in the papers is used to track the geographic distribution.

Downloads of CAIDA Datasets

The webserver logs provide the actual amounts of data downloaded from our servers. We look at the numbers of unique users (usernames of CAIDA accounts for restricted data; IP addresses for public data) and the total size of downloaded data (where files downloaded multiple times by the same user are counted only once).

Related Objects

See https://catalog.caida.org/search?query=types=dataset%20links=tag:caida to explore related objects to this document in the CAIDA Resource Catalog.
Published
Last Modified