CAIDA seeks to characterize the dynamics and performance of the Domain Name System (DNS), a critical infrastructural component of the Internet. We also collect data in support of long-term research of the DNS behavior and stability.
The main function of the Domain Name System (DNS) is to provide translation between Internet hostnames and IP addresses. Therefore, the DNS is a critical infrastructure service whose efficiency and robustness are crucial for the operation of the Internet. Despite the essential nature of the DNS, long-term research and analysis in support of its performance, stability, and security is sparse. Our goal is to enable DNS research pertinent to real Internet problems by supplying the research community with the best available, operationally relevant and methodologically sound, measurement data. In addition, the tools, models, and analysis methodologies developed in the course of this project will contribute to ensuring the vitality and integrity of the DNS as it faces relentless growth of the Internet user population worldwide. CAIDA DNS measurements and analysis were previously sponsored by the NSF grant SCI-0427144 "Improving the Integrity of Domain name System (DNS) Monitoring and Protection" and are currently partly sponsored by DHS S&T contract NBCHCC040159.
Research topics are:
- Analysis of DNS Root Server Traffic
- Collaboration with NIC Chile and the .CL ccTLD Domain
- Anycast Modeling
- Analysis of RFC1918 Traffic
- DNS Measurement Software
DNS root servers are at the top of the DNS hierarchy. To characterize their workload and performance and perform an analysis of DNS root server traffic, we have undertaken the coordination of large-scale data collection events when participating operators captured concurrent traces from a large number of root server instances. We conduct these 'Day in the Life of the Internet' (DITL) experiments in collaboration with DNS-OARC and ISC, as well as other participating data providers.
As of February 2010, DNS-OARC hosts four global DNS data sets obtained in January of 2006 and 2007 and March of 2008 and 2009. These datasets allow researchers a view of the characteristics and workload of traffic to the Domain Name System (DNS) root nameservers. This data provides a baseline for comparison against traffic we expect to see in the near future that will contain cryptographic signatures, internationalization of the name space, and new global Top Level Domains (TLDs). We most recently posted our analysis of evolution of traffic to the roots over the last four years as observed from these four data sets. We have tried to publish these "Day in the Life" DNS root server studies since 2006, including a comparison of traffic from the DNS root nameservers across years. In this most recent study we focus on a few attributes relevant to the impending changes to the DNS root zone.
Working with Visiting Researcher, Mia Zhang, we developed an interactive web interface to the DNS-OARC data that enables users to view graphs showing coverage, geography of clients, distributions of clients and queries across root nameservers for which data is available. The interactive graphs also view other relevant DNS data, including example heatmap distribution of DNS resolvers in the IPv4 address space querying the participating root servers.
Using the 2006 and 2007 data sets, we developed Influence Maps of DNS anycast servers that visualize the geographic distribution of DNS clients for each anycast instance. As part of our recent paper, "Understanding and preparing for DNS evolution" to be presented at the 2nd International Traffic Monitoring and Analysis (TMA'10) Workshop colocated with the Passive and Active Measurement (PAM) Conference in April 2010, we show the geographic distribution of clients querying the root server instances for the 2009 data.
We have summarized our early experiences with data collections in a set of recommendations for future large scale simultaneous DNS data collections intended to optimize collection strategies and to increase the value of future global multi-site coordinated data measurements.
We studied the Chilean DNS data characterizing the .CL ccTLD domain in collaboration with NIC Chile. Our efforts included: 1) analysis and indexing of daily packet traces captured on three anycast and one unicast name servers located in Chile. NIC Chile collected the traces daily at 12:10 pm local time from January 2005 till March 2007. Each 10-minute trace contains IPV4 traffic only and includes queries and responses with full payload; 2) anycast switching experiments conducted on the Chilean .CL ccTLD anycast infrastructure; and 3) DNS workload capture and visualization.
In collaboration with CAIDA, in May 2006 Prof. George Riley and his student Sunitha Beeram from Georgia Tech University simluated DNS anycast scenarios. They simulated three different scenarios: no failures, a single link failure, and prefix withdrawal, on a 44-node topology with 34 clients and 10 anycast server instances. They found that in the case of a single link failure, the distribution of requests among server instances changed insignificantly: the clients could still reach the same servers through other links. In the explicit prefix withdrawal scenario, the network quickly converged to a new state since the simulated graph was small and strongly connected. The requests were re-distributed to other instances with only one "flip" (change in anycast instance) for affected clients. A 2008 paper "Realistic Topology Modeling for the Internet BGP Infrastructure" details expanding to a more realistic topology (using CAIDA AS-level graphs). For tractability, the study took a narrow focus on network failure; many aspects remain unmodeled, including more realistic detail on global and local server node behavior and failure modes, dynamics, routing instability, and prefix hijacking/dampening/ misconfiguration.
To service intra-enterprise networks that do not directly connect to the Internet, RFC1918 establishes guidelines for address allocation for private internets. Unfortunately, some operating systems do not behave as expected and traffic that should stay within local area networks leaks onto the Internet at large. Conducting an analysis of RFC1918 data, CAIDA researchers analyzed the properties and sources of spurious RFC1918 updates that are directed toward the root name servers, and captured by a specially created protective system of name servers known as AS112.
DSC - DNS Statistics Collector
DSC is CAIDA's flagship software for DNS measurements. It provides an open-source system for collecting and exploring statistics from busy DNS servers. Duane Wessels and The Measurement Factory developed the DSC software. Currently three root servers and a few smaller operators use the DSC software to monitor the state of their systems.
We highly encourage operators to deploy DSC. You can run the DSC application directly on a DNS node or it can run on a standalone system configured to "capture" (e.g., using libpcap) bi-directional traffic for a DNS node. Below, we present examples that highlight DSC's capabilities.
- 7-day delayed feed of F-root DNS statistics
- Duane Wessels analyzed data collected using DSC at the F root name server in Palo Alto for cases of DNS abuse:
NeTraMet traffic monitor
NeTraMet is a user-configurable traffic monitor implementing the RTFM architecture for Traffic Flow Measurement (RFC2722). A user sets a certain 'ruleset' that specifies which packet attributes the NeTraMet should look for in the bi-directional traffic. Only matching packets are then counted. This software developed by Nevil Brownlee (U. of Auckland, New Zealand) previous of this project is now in maintenance mode.
An example of NeTraMet usage by CAIDA is ongoing (since January 2002) monitoring of the root and gTLD DNS servers performance. The meters are installed at the following strategic locations: University of California San Diego, University of Auckland (New Zealand), University of Colorado in Boulder, and Keio University (Tokyo and Fujisawa, Japan). The monitor rulesets specify to capture DNS request packets sent to root and gTLD servers and their corresponding response packets. The round trip time for DNS requests/responses, the percentage of unanswered requests, and the number of identified DNS request/response pairs represent a directly observable measure of macroscopic Internet performance since the DNS response times are directly influenced by macroscopic Internet events such as congestion and routing changes. We have accumulated a long-term archive of these data and are working on indexing them in the DatCat.
CAIDA would like to deploy meters in more sites. If you are interested in hosting a NeTraMet meter, please see Setting up a NeTraMet meter: background and requirements for more details.
dnsstat statistics tool
The crl_dnsstat application watches for DNS queries on UDP port 53. To collect accurate statistics on a specific nameserver (or client), it must be run on an interface that sees all DNS messages to that server (or from that client). It counts numbers of messages and numbers of queries, aggregated by any of source IP, destination IP, opcode, query type, query class. The subjects of queries are never recorded.
dnstop traffic display tool
dnstop is a libpcap application (ala tcpdump) that displays various tables of DNS traffic on your network, including tables of source and destination IP addresses, query types, top level domains and second level domains.
We strongly encourage those with access to infrastructure to capture and document datasets to help preserve and promote scientifically rigorous, reproducible research. We encourage anyone who collects data to list the data in DatCat, the Internet Measurement Data Catalog. For specific recommendations on what type of metadata to include, refer to CAIDA's web page on How to Document a Data Collection.
Our data collection efforts support the scientific Internet research community in the process of validating their models, simulations, or theories. The following DNS related CAIDA datasets are available for researchers.
- OARC DNS root traces January 10-11, 2006
- A Day In The Life of the Internet: A Summary of the January 9-10, 2007 Collection Event
- A Day In The Life of the Internet: A Summary of the March 18-19, 2008 Collection Event
- A Day In The Life of the Internet: A Summary of the March 30 - April 1, 2009 Collection Event
- IPv4 Routed /24 DNS Names Dataset
- The most recent report showing the number of open DNS resolvers for each Autonomous System number as well as daily archives.
- DNS root/gTLD RTT Dataset