CAIDA uses the following Acceptable Use Policy to govern its data collection activities:
- Passive monitors will run only strictly necessary services and will be kept up-to-date with necessary security patches and operating system upgrades to limit security risk.
- Only a minimal number of CAIDA personnel trained in protecting user privacy and secure handling of data will have accounts on passive data monitors. (Hosting sites may have accounts for local users who are not bound by this policy at the discretion of the hosting site.)
- No packet payloads will be permanently recorded without specific permission from the hosting site. Because packet headers have dynamic lengths, a few bytes of payload may be initially recorded during an attempt to capture the full length of packet headers, but this information will be filtered and discarded as soon as possible and before the data is used for any purpose, including CAIDA-internal research.
- Traces will not be released from CAIDA custody unless the IP addresses are anonymized using prefix-preserving anonymization (or other current state-of-the-art anonymization technology). CAIDA personnel and collaborators who are physically present in CAIDA offices may have access to non-anonymized packet headers for research purposes.
- CAIDA will require registration from users who wish to download anonymized traces. This allows us to help researchers determine which datasets will best assist them, and gives us information on how our data is used. We can only continue to be funded to collect and distribute data as long as we can demonstrate that data we host is appropriate for current research projects and supports research on a broad range of topic areas.
- Traces will be distributed internationally to registered users, although we are bound by the US State department's International Traffic in Arms Regulations (ITAR).
Some research projects will require collecting 16 bytes of payload without collecting any IP addresses. No collection of payload will ever occur that is traceable back to an IP address and no IP addresses will ever have any payload associated with them.
Additional research questions can be answered without compromising user privacy by storing cryptographic hashes of packet payloads for research use. This is useful, but not necessary for the data collection and distribution efforts described above. Please let us know if your site is willing to allow this extended data collection (recording cryptographic hashes of packet payloads). If you are willing to allow us to distribute these hashes as a part of a dataset with anonymized IP addresses, please let us know that as well.
Within the guidelines specified above, CAIDA has the following plans for collecting data from the COMMONS monitors and distributing it to the research community:
- CAIDA's CoralReef Software Suite includes a realtime report generation tool. CAIDA plans to run the report generator collection tools on all monitors continuously and provide the reports publicly via a central CAIDA-managed webserver (individual monitors will not host web reports for security reasons). The reports provide configurable breakdowns of packets, bytes, and flows by protocol, port/application, source country, destination country, source AS, and destination AS. Data can be displayed as percentage- and absolute-value-based timeseries graphs, pie graphs for a given time period, tables of data, and geographic maps. The report generator can automatically anonymize IP addresses using CryptoPAn, and this option will be used on all publicly accessible reports. We hope to provide hosting sites with access to non-anonymized reports as well.
- We plan to collect monthly time-synchronized packet header traces from all locations that CAIDA has a functional passive monitor (including sites that are not COMMONS nodes). These traces will be anonymized, cataloged in DatCat, and made available to any registered academic researchers and government agencies. Data will also be available to any commercial entities who contribute resources to our data collection and distribution efforts. Trace durations will vary by locations depending on the available disk space.
- We will collect additional traces to support research needs (of CAIDA and of the wider research community) to the extent possible given funding and other resource constraints. This data will also be anonymized, cataloged in DatCat, and made available using the same criteria as the monthly traces described above.
- For some ongoing security research into identification, classification, and mitigation of Internet worms, distributed denial-of-service attacks, and botnets, we may wish to test payload detection or inspection algorithms on monitored links. Before performing any activity that involves packet payload, we will get specific approval for each collection event from each collection site.