Analysis of the DNS root and gTLD nameserver system

This page describes CAIDA and related activities in Macroscopic DNS Measurements project started in 2001. We conduct both passive and active measurements in order to study the DNS root servers behavior, their connectivity and performance. We also analyze data collected at the servers themselves.

CAIDA research of the DNS root servers currently focuses on the following problems:
Continuous monitoring of the DNS root servers performance.

Investigation and modeling of BIND algorithm behavior.

Analysis of bogus queries and broken resolver configurations.

Evaluation and optimization of servers' placement

I. Continuous monitoring of the DNS root servers performance.

Goals

Test various approaches to regular monitoring of the DNS root servers behavior.

Develop techniques to track performance of the individual root servers and of the system as a whole.

Investigate long-term temporal trends in the DNS service.

Results

We have installed two passive NeTraMet traffic meter that capture DNS request packets and their corresponding response packets. One meter is monitoring the UCSD campus network, and the other is installed on an OC-48 link of a large Internet provider in San Jose area.

We monitor continuously round trip time for DNS request/responses, percentage of requests which did not get a response, and number of DNS request/response pairs observed. Strip chart plots present the data collected since early January 2002 for root and gTLD servers and are updated daily at midnight.

We have created a long-term archive for DNS response data and plots.

Current work

We are working to establish two or three additional strategically located NeTraMet meters for passive flow data collection. The preferred order of placement (based on previous analysis results) would be:

Europe
Asia/Pacific
East Coast of US

Future work

Future plans include monitoring more than just the top level of DNS. For example, we are extending our passive measurement system to observe the performance of Country-Code servers (ccTLDs). These data should provide an interesting view of global Internet connectivity, at least showing connectivity to each country's ccTLD server.

Papers and Presentations

DNS Root/gTLD Performance Measurements by Nevil Brownlee, kc claffy, Evi Nemeth (LISA 2001)
Passive Global DNS Measurements and Multipathing by Nevil Brownlee, Ilze Ziedins (IEPG, Aug 2001)
Response Time Distributions for Global Name Servers paper by Nevil Brownlee, Ilze Ziedins (PAM 2002)
Root/gTLD DNS Performance Measurements presentation (IEPG, Dec 2001) by Nevil Brownlee
Root/gTLD DNS Performance Web Page presentation (IEPG, Mar 2002) by Nevil Brownlee

II. Investigation and modeling of BIND algorithm behavior.

Goals

Study BIND's name server affinity algorithm (use of RTTs for server selection).

Why does A-root experience a higher load of queries than any other root server?

Discover more about the DNS caching structure, behavior, and scaling.

Develop software simulating a large system of DNS clients and servers.

Current work

We have used the CAIDA dnsstat utility to collect large volume of statistics of queries on a number of root servers simultaneously. Starting on August 14, 2002, 16:10 UTC we obtained 26 hours of data on the E, I, K, and M root servers. Two weeks later we obtained 7 consecutive days of data on the E, F, I, K, and M root servers starting from Wednesday 2002-08-28 16:10 UTC and 4 consecutive days of data on the A root server starting from Thursday 2002-08-29 16:10 UTC.

Analysis of these data focuses on the rate of growth of the number of unique clients seen by each individual server and by all servers combined. The most surprising results are:

The growth of the number of unique clients seen by individual servers does not slow down after 7 days of observations.
On all root servers, the rate of growth peaks on hourly boundaries.
The total number of queries per 10 minute interval on the A root is about 25% higher than on the next busiest root server (M root).
- The number of A queries per 10 minute interval is about the same on the A root and the M root, and is about 30% lower on the E, F, I, and K root servers.
- The number of PTR queries per 10 minute interval is the highest on the A root server, is about 25% less on the E, F, and I root servers, and is about 60% less on the K and M root servers.
- A queries and PTR requests together account for the vast majority (at least 90%) of the total number of messages received by each root server.
Half of the clients sent 8 or fewer messages (to all root servers that we observed) in a week. At the same time, the busiest clients sent more than a hundred messages per second to the A-root, more than 60 messages per second to the I-root, and more than 30 messages per second to the other root servers we monitored.

There appear to be diurnal variations in the data, and the rate of requests on all root servers drops noticeably during the weekend. Note that the instrumented root servers are located in very different time zones: A is on the East Coast of the US, E is on the West Coast of the US, I and K are in Europe, and M is in Japan. However, the numbers of unique clients observed in 10 minute intervals correlate remarkably well for all these servers. This observation implies that a significant fraction of the new queries coming to the root servers does not originate from human users' requests but are software driven.

We are also starting a more detailed analysis of packets captured with tcpdump utility on root servers.(Requires Login)

Future work

We will collect BIND logfiles (and any other logs we can get from busy name servers) and characterize them in various ways. In particular, we will look at the following parameters:

interarrival rates
popularity (some names are more popular than others)
correlations between popularity and TTLs
message sizes
response codes
duplicate queries
invalid queries
estimate the percentage of clients that do not cache DNS replies

Next, we are going to develop software for simulating a large system of DNS clients and servers. Our simulated clients will issue DNS queries with properties derived from the logfile characterization. Using a number of networked computers, we will model the entire system. For servers we will use actual BIND installations, both as intermediate caching nameservers and as root name servers. We plan to also simulate wide-area network delays and packet loss with operating system features such as FreeBSD's Dummynet.

We will run a number of experiments in this fully controlled environment seeking to understand how a single parameter or configuration affects the overall performance of the system.

III. Analysis of bogus queries and broken resolver configurations.

Goals

Search for signatures of the software a DNS client is using based on characteristics of its query message.

Identify clients abusing root servers and their operating systems.

are they bugs in "valid" name servers or viruses?

Make recommendations for best-practice DNS server configuration.

would proper caching reduce the load significantly?

Results

Large numbers of bogus queries and broken resolvers consume valuable root server resources. We have analyzed errors recorded in DNS log files at the F-root. Bogus queries typically fall into one of the following categories:

stupid (i.e. address lookup for addresses, "A 206.168.0.4")
invalid TLDs ("A foo.ntdomain")
repeat queries for the same data

We also made an attempt to identify problem-prone end user applications. Our analysis helped to find and fix a bug in Microsoft Win2k resolver.

Current work

Currently, we are analyzing log files obtained from the host hazel, an authoritative anycast server located near F-root. These files contain bogus PTR record updates (attempts to modify a PTR record, that is an association between an IP address and domain name) made for addresses from private address space. As specified in RFC1918, these private addresses can be used inside a network but should never be communicated globally.

Our main results are:

The rates of bogus PTR record updates are periodic, with a few distinctive periods.
Grouping source addresses of queries by continents of origin reveals:
1. clear diurnal patterns - rates are enerally higher during the day, lower during the night
2. sharp prominent peaks at midnight time of each time zone. American query sources produce four peaks, European sources cause two peaks (the smaller one corresponds to United Kingdom addresses), and Asian sources display three peaks.
3. We hypothesize that expiration and renewal of DHCP leases at midnight may be the cause of observed peaks.
IP addresses sending bogus PTR record updates belong to 3309 origin ASes. Only 20 ASes are responsible for more than 50% of these updates. The top three offenders are Chinalink (China), Ibernet (Spain) and SW Bell (USA).
Our very limited attempt at dynamic probing of offending addresses (a few samples, containing between 100 and 500 adresses each) did not reveal any operating system predominantly responsible for sending bogus updates. Note that we did not find Apple systems among offenders. Possible causes of this absense are:
- properly configured software
- small number of Apple systems on the Internet
- undercount of Apple systems by the Xprobe utility we used for probing
Updates originating from the same source address often are periodic with periods of 30, 60, and 75 minutes.
- The pattern of a 75 minute update cycle typically consists of three updates made at intervals of 5, 10, and 60 minutes.
Neither "mice" (many hosts that made only 1-2 bogus updates per week) nor "elephants" (few hosts that made hundreds of thousands of bogus updates per week) dominate in the total number of bogus updates. The major contribution is due to intermediate ("workhorse") contributors, that make between 200 and 500 hundreds updates per week.

Future work

We are investigating the parameters of DNS query packets in order to find if they include any diagnostic features ("fingerprinting" analysis). Our goal is to determine more accurately which operating systems are responsible for originating different types of bogus queries.

Papers and Presentations

DNS Measurements at a Root Server by Nevil Brownlee, kc claffy, and Evi Nemeth (GlobeCom 2001)
DNS Damage - Measurements at a Root Server by Evi Nemeth, Nevil Brownlee, and kc claffy (NANOG 2002)

IV. Evaluation and optimization of servers' placement.

Goals

Characterize DNS root name server connectivity on the macroscopic level.

Identify subsets of DNS clients having consistently large latency connections to the root servers.

Determine which root servers are most crucial in providing expeditious service to their clients and which are redundant.

Results

We have instrumented eleven DNS root servers (A, B, D, E, F, G, H, I, K, L, and M) with skitter monitors in order to track their global IP level connectivity.

The J root server is co-located with the A root server and does not require a separate monitor. As of September 2002, the administration of the C root server has not responded to RSSAC's request to host a skitter monitor at their site.

Our tool continuously sends probe packets to destinations in a pre-specified target list and captures forward IP paths and round trip time. We have built a representative target list of DNS root server clients for skitter probing. First, we used the dnsstat utility to collect statistics of DNS queries by passive monitoring of seven DNS root servers. On each root server, numbers of messages and number of queries (but not the subjects of queries) were counted for 24 hours and recorded together with source IP addresses originating these messages.

The list was used for monitoring six root servers (A, E, F, K, L, and M) in 2000-2001. We identified a subset of destinations that had large latency connections to all instrumented root servers and studied their geographical make-up. We found that destinations in Africa, Asia and South America accounted for over 60% of the observed large latency destinations, but less than 14% of the total target list.

Current work

In March 2002, we updated our target list and increased it to about 140 thousand destinations. We selected target destinations for the new list based on the following goals:

When possible select an IP address from the old DNS Clients list used in 2000-2001.
When possible select an IP address seen by dnsstat in the largest number of DNS root servers.
Provide as much coverage of the routable IPv4 address space as possible but restrict the size of the list to between 100 and 200 thousand destinations. The size restriction ensures that each destination in the list is probed 3-5 times per day making results less sensitive to diurnal variations.

We have been monitoring the new target list since the end of March, 2002.

We are working to evaluate the redundancy among existing DNS root servers. We use the median RTT as the metric of proximity between a skitter monitor (co-located with a DNS root server) and target destinations. When the data from one server are removed from consideration, the distribution of median RTT shifts. The magnitude of this shift shows quantitatively the importance of each root server for the overall set of clients.

We found that M-root is the most crucial root server. Its removal from service would cause significant increase in RTT for the largest number of clients.

Next, we define the distance between a pair of root servers. From our target list, we consider a subset of destinations that respond to both skitter monitors co-located with these servers. For each destination in this subset we calculate the absolute difference between median RTTs to these skitter monitors, sum up the differences, and divide the sum by the number of destinations in the subset. We use the resulting metric to represent the distance between the root servers. The shorter the distance between a pair of root servers, the closer the resemblance between them in terms of the RTT distribution of the monitored list of destinations.

We further cluster the servers based on their virtual proximity (in terms of the metric above), and thus determine root server groups ("root families"). Within each cluster, any one server can functionally replace another one with the least RTT increase experienced by their clients.

We found four clusters of servers in virtual space. These groupings correlate remarkably well with their geographical location: Europe (K, I, and k-peer), California (B, E, F), US-East (A, D, G, H), and Japan (M). We also investigate the relationship between server clusters and geographical locations of the destinations that have minimum median RTT to one of the servers in a given cluster.

Future work

In the near future we will deploy three more skitter monitors (at the B, G, and L root servers). We will also collect DNS statistics from these servers and update our target list to include their clients.

Papers and Presentations

Macroscopic Internet Topology and Performance Measurements from the DNS Root Name Servers by Marina Fomenkov, kc claffy, Bradley Huffaker, and David Moore (LISA 2001)
Distance Metrics in the Internet by Bradley Huffaker, Marina Fomenkov, Daniel J. Plummer, David Moore and kc claffy (ITS 2002)
Thoughts on Measurement and Management of the DNS system by k claffy, presentation to the CSTB Committee on Internet Navigation and the Domain Name System, 2001)

V. Related work by other groups.

Report from IEPG/IETF meeting, Yokohama, July 02

http://www.iepg.org

Nameserver coverage of DNS Domains
- Akiro Kato (JPNIC), "JPNIC study on DNS misconfiguration"
- Ed Lewis (ARIN), "DNS Lameness"
- George Michaelson (APNIC), "More Bad DNS"
These three presentations look at the proportion of nameservers in the corresponding registred domains, testing whether they respond correctly, or are 'lame delegations.' Nearly one-third of all the nameservers tested were fully or partially lame. This is a particular problem for reverse name lookup, affecting applications which do reverse lookups. The Registries invite suggestions as to the best way to solve this problem.
DNS studies by WIDE MAWI (Measurement and Analysis on the WIDE Internet) group.

Kenjiro Cho (Sony) uses active measurements to collect performance data for root/gTLD servers (rootprobe tool) and ccTLD servers (ccTLDprobe). These tools are run on hosts, rather than on dedicated servers.

Kenjiro is also investigating the server selection algorithms used by various resolver implementations, and the ways in which these algorithms may influence future root server placement.

DNS data points by Cymru.COM

Rob Thomas collects hourly updated statistics of queries to root and gTLD servers and compares a DNS query response time with an ICMP response time. For each name server, weekly, monthly, and yearly trends are also presented. Links used for measurements are at ASes 6079, 3789, and 7018.