- Researchers tend to want as much data as possible without the
datasets getting too unwieldy. It would be ideal to capture all
TCP and UDP data, queries and responses, from all root instances
and gTLDs for port 53. Although TCP generally accounts for a small
fraction of DNS traffic, having more comprehensive data increases
its value to the research community. Query responses, while requiring
significantly more disk, are necessary to answer questions related
to DNSSEC workload and DNS response sizes.
- All measurements should syncronize to UTC time. This means that machines
where collection occurs must be synchronized with ntp. CAIDA will
send timestamp probes to root server instances during data collection to test
for clock skew.
- We recommend that measurement schedules cover a 50-hour period
(48 hrs + leading and trailing hours), preferrably mid-week
(Tuesday-Wednesday or Wednesday-Thursday). This approach is more
likely to capture a continuous 48-hour period, i.e., two full days
of data. Researchers will be able to determine average daily
traffic, to see diurnal patterns, and to rule out that a single day
might be anomalous.
- Analysis tasks are typically easier if collection files are
split based on time rather than on file size boundaries. We
prefer to have each pcap file be one hour long.
- Additionally, pcap files should start and stop on time-based
boundaries. For example, a pcap file should start at 12:00:00
and end at 12:59:59. ISC's dnscap tool, available
here,
will do this automatically.
- Do not collect responses if you have a shortage of local disk
and/or bandwidth resources. In the past, we've had a hard time
getting data from some collection sites because they either run
out of local disk space, or do not have sufficient bandwidth to
transfer the pcap files faster than they are generated. You may
omit DNS responses from packet captures to make them smaller.
We'd much rather have (only) all the queries than some of the
queries and some of the responses.
We recommend using dnscap
because it will automatically rotate pcap files,
listen on multiple network interfaces, and only capture port 53 packets.
Sample usage:
dnscap -i eth0 -t 3600 -w ${root}.${instance}
To capture only queries add "-s i" to command line:
dnscap -i eth0 -t 3600 -w ${root}.${instance} -s i
To capture only queries to specific addresses, add "-z" options to
command line. For example:
dnscap -i eth0 -t 3600 -w ${root}.${instance} -s i \
-z 192.5.5.241 -z 2001:500::1035
To automatically compress each pcap file after each interval, use
the "-k" option:
dnscap -i eth0 -t 3600 -w ${root}.${instance} -s i \
-z 192.5.5.241 -z 2001:500::1035 -k 'gzip -9'
Instead of 'gzip -9' you might want to execute a script
that compresses the file and uploads it to a different system.
If you'd like to use tcpdump,
You may want to use our ditl-dnsroot-run
script, along with Duane's tcpdump-split
program, which will take care of file rotation, error handling,
recording
some useful metadata, and other issues.
However, we realize that many sites already
have regular data collection in place or local policy that mandates
certain filters. Plese keep the following factors in mind when designing
your collection process:
- Be sure to use the
-w option of tcpdump to write the
raw packets rather than parsing and printing them out as text.
- Use the
-s0 option of tcpdump to capture full packets.
- We recommend a file naming convention that helps guarantee uniqueness and
encapsulates some of the key metadata used for combining datasets, such as
${root}.${instance}.${date}.${time}.pcap.
- We recommend using one of the following tcpdump filters,
listed from most to least prefered:
- collect TCP and UDP, requests and responses (preferred, but requires the most disk space)
-
"host (${hosts}) and port 53"
- collect UDP requests, and TCP requests and responses
-
"(udp and dst host (${hosts}) and dst port 53) or (tcp and host (${hosts}) and port 53)"
- collect TCP and UDP requests
-
"dst host (${hosts}) and dst port 53"
- collect UDP requests
-
"udp and dst host (${hosts}) and dst port 53"
where ${hosts} is a list of DNS server addresses, separated by
"or", e.g. "192.5.5.241 or 2001:500::1035".
That is, if you must drop some types of packets because of limited
resources, the best thing to drop is responses.
This is because at least some of the information contained in responses
can be reconstructed, given a zone file.
Dropping requests of any kind is a last resort, because they are useful
for more kinds of research and can not be reconstructed.
To increase the usability of the data, and particularly for indexing in
DatCat, please collect extra information
about the data.
We understand that pariticipating sites will make use of varying
tcpdump options, so please always record the specific tcpdump
command-line used to collect the data. For specific recommendations
on what type of metadata to include, refer to CAIDA's web page on
How to Document a
Data Collection.
- Time-of-day usage differences:
We hope to see differences in the instances diurnal patterns,
corresponding to increased user activity within local
daylight/evening hours.
- Distribution of queries across anycast instances
Plot geographic distribution of clients. Is anycast attracting
geographically local client workload as expected?
- Distribution of queries by GTLD and ccTLD:
Traffic to all root instances will be dominated by the gTLDs (.com
especially), but there will be variations in the sets of ccTLDs
requested from different parts of the world. One could graph
requests/responses to country codes, by node as well as instance.
- Distribution of response sizes and types:
Distribution of response sizes, by node as well as anycast instances.
How is it shifting due to DNSSEC, ENUM, etc?
- With TCP requests and response data:
what fraction are genuine DNS requests, versus bogus?
- Growth in and impact of DNSSEC
not yet existing at the roots.. relevant to other TLDs, e.g., se.
if data becomes available.
-
2007
-
| Participant |
#Instances |
Packets (Millions) |
Gbytes |
| c-root (Cogent) |
4 |
3374 |
166 |
| f-root (ISC) |
36 |
4739 |
158 |
| k-root (RIPE) |
15 |
3320 |
177 |
| m-root (WIDE) |
7 |
4845 |
236 |
| B.ORSN |
1 |
0.682 |
0.051 |
| M.ORSN |
1 |
1.072 |
0.053 |
| NAMEX |
1 |
14.54 |
4.10 |
| Total |
|
|
741 |