CAIDA Home
 by Topic | by Source | by Tool | by Accessibility | How-to | Statistics  
 www.caida.org > data : collection : aup : internet_traffic_collection_aup.xml
    visit     contact     search:
CAIDA: Cooperative Association for Internet Data Analysis
Passive Internet Trace Collection Acceptable Use Agreement

$Revision: 1.1.2.1 $

CAIDA is seeking Internet trace collection sites to host passive monitors as a part of CAIDA's passive data collection infrastructure. CAIDA has more than nine years of experience in collection, curating, and distribution of passively collected Internet trace data. In 2006, more than 4900 researchers downloaded a total of more than 72 terabytes of data from our servers. As of February 2008, we have 16 datasets available, half of which are freely available and the other half of which require registration and are available to academic researchers, government agencies, and CAIDA members.

CAIDA uses the following Acceptable Use Policy to govern its Internet traffic data collection activities:

  1. Passive monitors will run only strictly necessary services and will be kept up-to-date with necessary security patches and operating system upgrades to limit security risk.
  2. Only a minimal number of CAIDA personnel trained in protecting user privacy and secure handling of data will have accounts on Internet traffic collection monitors. (Hosting sites may have accounts for local users who are not bound by this policy at the discretion of the hosting site and administrator of monitored links)
  3. No packet payloads will be permanently recorded without specific permission from the hosting site. Because packet headers have dynamic lengths, a few bytes of payload may be initially recorded during an attempt to capture the full length of packet headers, but this information will be filtered and discarded as soon as possible and before the data is used for any research purpose, including CAIDA internal research.
  4. Traces will not be released from CAIDA custody unless the IP addresses are anonymized using prefix-preserving anonymization (or other current state-of-the-art anonymization technology). CAIDA personnel and collaborators who are physically present in CAIDA offices may have access to non-anonymized packet headers for research purposes.
  5. CAIDA will require registration (e.g. a dataset request form) from users who wish to download anonymized traces. This allows us to help researchers determine which datasets will best assist them, and gives us information on how our data is used. We can only continue to be funded to collect and distribute data as long as we can demonstrate that data we host is appropriate for current research projects and supports research on a broad range of topic areas.
  6. Traces will be distributed internationally to registered users, although we are bound by the restrictions of the US State department's International Traffic in Arms Regulations (ITAR).

Additional research questions can be answered without compromising user privacy by storing cryptographic hashes of packet payloads for research use. This is useful, but not necessary for the data collection and distribution efforts described above. Please let us know if your site is willing to allow this extended data collection (recording cryptographic hashes of packet payloads). If you are willing to allow us to distribute these hashes as a part of a dataset with anonymized IP addresses, please let us know that as well.

Within the guidelines specified above, CAIDA has the following plans for collecting data from the Internet traffic monitors and distributing it to the research community:

  • CAIDA's CoralReef Software Suite includes a realtime report generation tool. CAIDA plans to run the report generator collection tools on all monitors continuously and provide the reports publicly via a central CAIDA-managed webserver (individual monitors will not host web reports for security reasons). The reports provide configurable breakdowns of packets, bytes, and flows by protocol, port/application, source country, destination country, source AS, and destination AS. Data can be displayed as percentage- and absolute-value-based timeseries graphs, pie graphs for a given time period, tables of data, and geographic maps. The report generator can automatically anonymize IP addresses using the CryptoPAn prefix-preserving anonymization technique (or other current state-of-the-art anonymization technology), and this option will be used on all publicly accessible reports. We plan to provide hosting sites with access to non-anonymized reports as well.
  • We plan to collect monthly time-synchronized packet header traces from all locations that CAIDA has a functional Internet traffic monitor. These traces will be anonymized, cataloged in DatCat, and made available to any registered academic researchers and government agencies. Data will also be available to any commercial entities who contribute resources to our data collection and distribution efforts. Trace durations will vary by location depending on the available disk space and other technical constraints.
  • We will collect additional traces to support research needs (of CAIDA and of the wider research community) to the extent possible given funding and other resource constraints. This data will also be anonymized, cataloged in DatCat, and made available using the same criteria as the monthly traces described above.
  • We may distribute flow files or other summarized or sampled information based on packet traces or live packet feeds. These files will be subject to the same anonymization techniques as packet traces, including the use of prefix-preserving anonymization (or other current state-of-the-art anonymization technology) for IP address anonymization.
  • For some ongoing security research into identification, classification, and mitigation of Internet worms, distributed denial-of-service attacks, and botnets, we may wish to test payload detection or inspection algorithms on monitored links. For ongoing research into Internet application classification, we may wish to test payload-based classification algorithms on monitored links. Before performing any activity that involves packet payload, we will get specific approval for each collection event from each collection site.


Cooperative Association for Internet Data Analysis (CAIDA)
  Last Modified: Thurs Feb-14-2008 15:48:18 PDT
  Maintained by: CAIDA Webmaster webmaster@caida.org
  Page URL: http://www.caida.org/data/collection/aup/internet_traffic_collection_aup.xml