Passive Internet Trace Collection Acceptable Use Agreement
$Revision: 1.1.2.1 $
CAIDA is seeking Internet trace collection sites to host passive
monitors as a part of CAIDA's passive data collection infrastructure.
CAIDA has more than nine years of experience in collection, curating,
and distribution of passively collected Internet trace data. In
2006,
more than 4900 researchers downloaded a total of more than 72
terabytes of data from our servers. As of February 2008, we have
16 datasets available, half of
which are freely available and the other half of which require
registration and are available to academic researchers, government
agencies, and CAIDA members.
CAIDA uses the following Acceptable Use Policy to govern
its Internet traffic data collection activities:
- Passive monitors will run only strictly necessary
services and will be kept up-to-date with necessary security
patches and operating system upgrades to limit security risk.
- Only a minimal number of CAIDA personnel trained in
protecting user privacy and secure handling of data will have
accounts on Internet traffic collection monitors. (Hosting
sites may have accounts for local users who are not bound by
this policy at the discretion of the hosting site and administrator
of monitored links)
- No packet payloads will be permanently recorded without
specific permission from the hosting site. Because packet
headers have dynamic lengths, a few bytes of payload may be
initially recorded during an attempt to capture the full length
of packet headers, but this information will be filtered and
discarded as soon as possible and before the data is used for
any research purpose, including CAIDA internal research.
- Traces will not be released from CAIDA custody unless
the IP addresses are anonymized using prefix-preserving
anonymization (or other current state-of-the-art anonymization
technology). CAIDA personnel and collaborators who are physically
present in CAIDA offices may have access to non-anonymized
packet headers for research purposes.
- CAIDA will require registration (e.g. a dataset request form) from users who wish to download
anonymized traces. This allows us to help researchers determine
which datasets will best assist them, and gives us information
on how our data is used. We can only continue to be funded to
collect and distribute data as long as we can demonstrate that
data we host is appropriate for current research projects and
supports research on a broad range of topic areas.
- Traces will be distributed internationally to registered
users, although we are bound by the restrictions of the US State
department's International Traffic in Arms Regulations (ITAR).
Additional research questions can be answered without compromising
user privacy by storing cryptographic hashes of packet payloads for
research use. This is useful, but not necessary for the data
collection and distribution efforts described above. Please let
us know if your site is willing to allow this extended data collection
(recording cryptographic hashes of packet payloads). If you are
willing to allow us to distribute these hashes as a part of a dataset
with anonymized IP addresses, please let us know that as well.
Within the guidelines specified above, CAIDA has the following plans
for collecting data from the Internet traffic monitors and distributing
it to the research community:
- CAIDA's CoralReef Software Suite includes a realtime report generation tool. CAIDA plans to run the
report generator collection tools on all monitors continuously
and provide the reports publicly via a central CAIDA-managed
webserver (individual monitors will not host web reports for
security reasons). The reports provide configurable breakdowns
of packets, bytes, and flows by protocol, port/application,
source country, destination country, source AS, and destination
AS. Data can be displayed as percentage- and absolute-value-based
timeseries graphs, pie graphs for a given time period, tables
of data, and geographic maps. The report generator can
automatically anonymize IP addresses using the CryptoPAn
prefix-preserving anonymization technique (or other current
state-of-the-art anonymization technology), and this option
will be used on all publicly accessible reports. We plan to
provide hosting sites with access to non-anonymized
reports as well.
- We plan to collect monthly time-synchronized packet
header traces from all locations that CAIDA has a functional
Internet traffic monitor. These traces will be anonymized,
cataloged in DatCat, and made
available to any registered academic researchers and government
agencies. Data will also be available to any commercial entities
who contribute resources to our data collection and distribution
efforts. Trace durations will vary by location depending on
the available disk space and other technical constraints.
- We will collect additional traces to support research
needs (of CAIDA and of the wider research community) to
the extent possible given funding and other resource constraints.
This data will also be anonymized, cataloged in DatCat, and made available
using the same criteria as the monthly traces described above.
- We may distribute flow files or other summarized or
sampled information based on packet traces or live packet feeds.
These files will be subject to the same anonymization techniques
as packet traces, including the use of prefix-preserving
anonymization (or other current state-of-the-art anonymization
technology) for IP address anonymization.
- For some ongoing security research into identification,
classification, and mitigation of Internet worms, distributed
denial-of-service attacks, and botnets, we may wish to test
payload detection or inspection algorithms on monitored links.
For ongoing research into Internet application classification,
we may wish to test payload-based classification algorithms on
monitored links. Before performing any activity that involves
packet payload, we will get specific approval for each collection
event from each collection site.
|
|