CAIDA collects, curates and distributes several restricted data collections to the scientific community, most notably the active topology data related to the Archipelago project, and the passive anonymized internet traces captured by the Equinix Chicago and San Jose monitors. Interested researchers apply for these data using online request forms, and once approved, are provided an online CAIDA account to download the data. The information shown here derives from the information provided by our users, and the web log history of downloads.
Some of our most popular datasets (in terms of the number of researchers actually accessing the online data) are our public datasets (i.e. with unrestricted access), most notably the AS Relationships dataset. For these datasets we count the number of different users by checking our weblogs for unique IP addresses. Potential users are also asked to fill out a User Info form. This entirely optional form includes a field where an email address can be provide. We use the TLD of these email addresses to get an idea of the geographic distribution of users of public CAIDA datasets.
We ask our users to inform us whenever they publish a research paper with results that depend on the analysis of CAIDA data. We use this information, and additional references we track down ourselves in periodic literature searches, to maintain an online list of papers published by non-CAIDA authors. The affiliations of authors and co-authors listed in the papers is used to track the geographic distribution.
The webserver logs provide the actual amounts of data downloaded from our servers. We look at the numbers of unique users (usernames of CAIDA accounts for restricted data; IP addresses for public data) and the total size of downloaded data (where files downloaded multiple times by the same user are counted only once).