Center for Applied Internet Data Analysis
Procedure for locating publications using CAIDA data
The list of papers using CAIDA data is kept up-to-date as much as possible using a combinations of sources. These are outlined here.

We maintain our collection of papers using CAIDA datasets using information collected from several sources, aiming for a reasonably complete list of references.

Users of CAIDA datasets agree, as part of the Acceptable Use Agreement, to provide CAIDA with information of their publications using CAIDA data. Our Data Publication Report Page provides instructions on how to report papers most easily. To make it as easy as possible for users to include these references in their papers, the AUA includes a cut-and-paste template that can be used for this purpose. As an added benefit this template, when used, makes it easier for us to locate papers in literature searches by allowing us to use non-trivial search phrases based on the standard reference template. A typical example of this format is:

The CAIDA UCSD Anonymized Internet Traces 2012 - 2012/05/17 13:00:00 UTC

In spite of our best efforts, we do not expect that this list of publications is complete. If you know of a publication that should be on our list, but isn't yet, we would like to hear from you (e.g. use the Data Publication Report Page mentioned above).

The following list provides an outline of all factors that contribute to our list of papers that use CAIDA data:

  • A few times each year we receive an email from a user reporting a publication the way they are supposed to according to the AUA (you know who you are, and thank you very much!).
  • Twice a year (in April and October) we send out a email on the CAIDA data-announce mailing list, reporting on new developments on CAIDA datasets. In this email we also remind people to send information about publications. This typically results in a few more responses.
    These two factors combined contribute a few percent (< 5%).
  • Also twice a year a fairly extensive literature search is done trying to locate relevant papers. The first cut is done using Google scholar using phrases derived from the names of CAIDA datasets, guided by the reference format specified in the AUA (fortunately, most users actually now do use this format). Typical search phrases are "CAIDA anonymized internet traces", "CAIDA passive traces", "CAIDA topology", "CAIDA skitter", "CAIDA AS relationships", etc. The typical pattern: always include CAIDA, then add a string that is expected to be in the reference, and is specific enough to narrow down the number of responses to a manageable amount.
    This search gives the largest number of hits. A really succesful search provides a few dozen new papers, or typically 80-90% of all hits.
  • A couple of other, more explicitly (computer-)science oriented, search engines are used to complement the results from Google scholar: IEEE Xplore Digital Library, ACM Digital Library,, Springer, etc. These more targeted searches do not provide many additional hits that are not on Google scholar already (about 5-15% of the total), but they do turn up papers in topics that lie outside computer science proper. An example in this category is
    H. Wu and G. Kvizhinadze
    Martingale limit theorems of divisible statistics in a multinomial scheme with mixed frequencies
    Statistics and Probability Letters 81 (8), 1128-1135, March 2011
  • We also search sites of recent and upcoming computer science meetings, and skim abstracts looking for CAIDA-related items. Candidate conferences are selected from online lists, e.g.

