CAIDA research of the DNS root servers currently focuses on the following problems:
- Continuous monitoring of the DNS root servers performance.
- Investigation and modeling of BIND algorithm behavior.
- Analysis of bogus queries and broken resolver configurations.
- Evaluation and optimization of servers' placement
Goals
- Test various approaches to regular monitoring of the DNS root servers
behavior.
- Develop techniques to track performance of the individual root servers
and of the system as a whole.
- Investigate long-term temporal trends in the DNS service.
Results
We have installed two passive NeTraMet traffic meter that capture DNS
request packets and their corresponding response packets. One meter
is monitoring the UCSD campus network, and the other is installed
on an OC-48 link of a large Internet provider in San Jose area.
We monitor continuously round trip time for DNS request/responses,
percentage of requests which did not get a response, and number of DNS
request/response pairs observed.
Strip chart plots
present the data collected since early January 2002 for root and gTLD servers
and are updated daily at midnight.
We have created a long-term archive for DNS response data and plots.
Current work
We are working to establish two or three additional strategically located
NeTraMet meters for passive flow data collection. The preferred order of
placement (based on previous analysis results) would be:
- Europe
- Asia/Pacific
- East Coast of US
Future work
Future plans include monitoring more than just the top level of DNS.
For example, we are extending our passive measurement system to observe the
performance of Country-Code servers (ccTLDs). These data should provide an
interesting view of global Internet connectivity, at least showing
connectivity to each country's ccTLD server.
Papers and Presentations
Goals
- Study BIND's name server affinity algorithm
(use of RTTs for server selection).
- Why does A-root experience a higher load of queries than any other
root server?
- Discover more about the DNS caching structure, behavior, and scaling.
- Develop software simulating a large system of DNS clients and servers.
Current work
We have used the CAIDA
dnsstat utility to collect large volume of statistics of queries on
a number of root servers simultaneously. Starting on August 14, 2002, 16:10 UTC
we obtained 26 hours of data
on the E, I, K, and M root servers. Two weeks later we obtained
7 consecutive days of data on the E, F, I, K, and M root servers starting from
Wednesday 2002-08-28 16:10 UTC and
4 consecutive days of data on the A root server
starting from Thursday 2002-08-29 16:10 UTC.
Analysis of these data focuses on the rate of growth of the number of unique
clients seen by each individual server and by all servers combined. The most
surprising results are:
- The growth of the number of unique clients seen by individual
servers does not slow down after 7 days of observations.
- On all root servers, the rate of growth peaks on hourly boundaries.
- The total number of queries per 10 minute interval on the A root is about
25% higher than on the next busiest root server (M root).
- The number of A queries per 10 minute interval is about the same
on the A root and the M root, and is about 30% lower on the
E, F, I, and K root servers.
- The number of PTR queries per 10 minute interval is the highest
on the A root server, is about 25% less on the E, F, and I
root servers, and is about 60% less on the K and M root servers.
- A queries and PTR requests together account for the vast majority
(at least 90%) of the total number of messages received by each
root server.
.
- Half of the clients sent 8 or fewer messages (to all root servers that we
observed) in a week. At the same time, the busiest clients sent more than
a hundred messages per second to the A-root, more than 60 messages per
second to the I-root, and more than 30 messages per second to the
other root servers we monitored.
There appear to be diurnal variations in the data, and the rate of requests
on all root servers drops noticeably during the weekend. Note that the
instrumented root servers are located in very different time zones: A is on the
East Coast of the US, E is on the West Coast of the US, I and K are in Europe, and
M is in Japan. However, the numbers of unique clients observed
in 10 minute intervals correlate remarkably well for all these servers.
This observation implies that a significant fraction of the new queries
coming to the root servers does not originate from human users' requests
but are software driven.
We are also starting a more detailed
analysis of packets
captured with tcpdump utility on root servers.
Future work
We will collect BIND logfiles (and any other logs we can get
from busy name servers) and characterize them in various ways. In particular,
we will look at the following parameters:
- interarrival rates
- popularity (some names are more popular than others)
- correlations between popularity and TTLs
- message sizes
- response codes
- duplicate queries
- invalid queries
- estimate the percentage of clients that do not cache DNS replies
Next, we are going to develop software for simulating a large system of DNS
clients and servers. Our simulated clients will issue DNS queries with
properties derived from the logfile characterization. Using a number of
networked computers, we will model the entire system. For servers we will
use actual BIND installations, both as intermediate caching nameservers and
as root name servers. We plan to also simulate wide-area network delays and
packet loss with operating system features such as FreeBSD's Dummynet.
We will run a number of experiments in this fully controlled environment
seeking to understand how a single parameter or configuration affects
the overall performance of the system.
Goals
- Search for signatures of the software a DNS client is using based on
characteristics of its query message.
- Identify clients abusing root servers and their operating systems.
- are they bugs in "valid" name servers or viruses?
- Make recommendations for best-practice DNS server configuration.
- would proper caching reduce the load significantly?
Results
Large numbers of bogus queries and broken resolvers consume valuable
root server resources. We have analyzed errors recorded in DNS log files
at the F-root. Bogus queries typically fall into one of the following
categories:
- stupid (i.e. address lookup for addresses, "A 206.168.0.4")
- invalid TLDs ("A foo.ntdomain")
- repeat queries for the same data
We also made an attempt to identify problem-prone end user applications.
Our analysis helped to find and fix a bug in Microsoft Win2k resolver.
Current work
Currently, we are analyzing log files obtained from the host hazel,
an authoritative anycast server located near F-root.
These files contain bogus PTR record updates (attempts to modify a PTR record,
that is an association between an IP address and domain name) made for
addresses from private address space. As specified in
RFC1918, these
private addresses can be used inside a network but should never be
communicated globally.
Our main results are:
- The rates of bogus PTR record updates are periodic, with a few
distinctive periods.
- Grouping source addresses of queries by continents of origin reveals:
- clear diurnal patterns - rates are enerally higher during the day,
lower during the night
- sharp prominent peaks at midnight time of each time zone. American
query sources produce four peaks, European sources cause two peaks (the
smaller one corresponds to United Kingdom addresses), and
Asian sources display three peaks.
- We hypothesize that expiration and renewal of DHCP leases at midnight
may be the cause of observed peaks.
- IP addresses sending bogus PTR record updates belong to 3309 origin ASes.
Only 20 ASes are responsible for more than 50% of these updates. The top three
offenders are Chinalink (China), Ibernet (Spain) and SW Bell (USA).
- Our very limited attempt at dynamic probing of offending addresses
(a few samples, containing between 100 and 500 adresses each) did not
reveal any operating system predominantly responsible for sending bogus updates.
Note that we did not find Apple systems among offenders. Possible causes of this
absense are:
- properly configured software
- small number of Apple systems on the Internet
- undercount of Apple systems by the
Xprobe
utility we used for probing
- Updates originating from the same source address often are periodic with
periods of 30, 60, and 75 minutes.
- The pattern of a 75 minute update cycle typically consists of
three updates made at intervals of 5, 10, and 60 minutes.
- Neither "mice" (many hosts that made only 1-2 bogus updates per week)
nor "elephants" (few hosts that made hundreds of thousands of bogus updates per
week) dominate in the total number of bogus updates. The major contribution
is due to intermediate ("workhorse") contributors, that make between 200
and 500 hundreds updates per week.
Details of our analysis and graphs are available
here.
Future work
We are investigating the parameters of DNS query packets in order
to find if they include any diagnostic features ("fingerprinting" analysis).
Our goal is to determine more accurately which operating systems are responsible
for originating different types of bogus queries.
Papers and Presentations
- Characterize DNS root name server connectivity on the macroscopic level.
- Identify subsets of DNS clients having consistently large latency
connections to the root servers.
- Determine which root servers are most crucial in providing expeditious service
to their clients and which are redundant.
Results
We have instrumented eleven DNS root servers (A, B, D, E, F, G, H, I, K,
L, and M) with
skitter monitors in order to track their global IP level
connectivity.
The J root server is co-located with the A root server and
does not require a separate monitor. As of September 2002, the administration
of the C root server has not responded to RSSAC's request to host a
skitter monitor at their site.
Our tool continuously sends probe packets to destinations in a pre-specified
target list and captures forward IP paths and round trip time.
We have built a representative target list of DNS root server clients
for skitter probing. First, we used the
dnsstat
utility to collect statistics of DNS queries by passive monitoring of seven DNS
root servers. On each root server, numbers of messages and number of queries
(but not the subjects of queries) were counted for 24 hours and recorded together
with source IP addresses originating these messages.
The list was used for monitoring six root servers (A, E, F, K, L, and M) in
2000-2001. We identified a subset of destinations that had large latency
connections to all instrumented root servers and studied their geographical
make-up. We found that destinations in Africa, Asia and South America accounted
for over 60% of the observed large latency destinations, but less than 14% of
the total target list.
Current work
In March 2002, we updated our target list and increased it to about 140 thousand
destinations. We selected target destinations for the new list based on the
following goals:
- When possible select an IP address from the old DNS Clients list used
in 2000-2001.
- When possible select an IP address seen by dnsstat in the
largest number of DNS root servers.
- Provide as much coverage of the routable IPv4 address space as possible
but restrict the size of the list to between 100 and 200 thousand
destinations. The size restriction ensures that each destination
in the list is probed 3-5 times per day making results less sensitive
to diurnal variations.
We have been monitoring the new target list since the end of March, 2002.
We are working to
evaluate the redundancy among existing DNS root servers.
We use the median RTT as the metric of proximity between a skitter
monitor (co-located with a DNS root server) and target destinations. When
the data from one server are removed from consideration, the distribution
of median RTT shifts. The magnitude of this shift shows quantitatively
the importance of each root server for the overall set of clients.
We found that M-root is the most crucial root server. Its removal
from service would cause significant increase in RTT for the largest
number of clients.
Next, we define the distance between a pair of root servers.
From our target list, we consider a subset of destinations that respond to both
skitter monitors co-located with these servers. For each destination
in this subset we calculate the absolute difference between median RTTs to
these skitter monitors, sum up the differences, and divide the sum
by the number of destinations in the subset. We use the resulting metric
to represent the distance between the root servers. The shorter the distance
between a pair of root servers, the closer the resemblance between them in terms
of the RTT distribution of the monitored list of destinations.
We further cluster the servers based on their virtual proximity (in terms
of the metric above), and thus determine root server groups
("root families").
Within each cluster, any one server can functionally replace another one with
the least RTT increase experienced by their clients.
We found four clusters of servers in virtual space. These groupings correlate
remarkably well with their geographical location: Europe (K, I, and k-peer),
California (B, E, F), US-East (A, D, G, H), and Japan (M).
We also investigate the
relationship between server clusters and geographical locations of the destinations
that have minimum median RTT to one of the servers in a given cluster.
Future work
In the near future we will deploy three more skitter monitors (at the
B, G, and L root servers). We will also collect DNS statistics from
these servers and update our target list to include their clients.
Papers and Presentations
Report from IEPG/IETF meeting, Yokohama, July 02
http://www.iepg.org
-
Nameserver coverage of DNS Domains
- Akiro Kato (JPNIC), "JPNIC study on DNS misconfiguration"
- Ed Lewis (ARIN), "DNS Lameness"
- George Michaelson (APNIC), "More Bad DNS"
These three presentations look at the proportion of nameservers in the
corresponding registred domains, testing whether they respond correctly, or are
'lame delegations.' Nearly one-third of all the nameservers tested were
fully or partially lame. This is a particular problem for reverse name lookup,
affecting applications which do reverse lookups. The Registries invite
suggestions as to the best way to solve this problem.
-
DNS studies by WIDE MAWI (Measurement and Analysis on the WIDE
Internet) group.
Kenjiro Cho (Sony) uses active measurements to collect
performance data
for root/gTLD servers (rootprobe tool) and ccTLD servers (ccTLDprobe). These
tools are run on hosts, rather than on dedicated servers.
Kenjiro is also investigating the server selection algorithms used by
various resolver implementations, and the ways in which these algorithms may
influence future root server placement.
DNS data points by Cymru.COM
Rob Thomas collects
hourly updated statistics of queries to root
and gTLD servers and compares a DNS query response time with an ICMP response time.
For each name server, weekly, monthly, and yearly trends are also presented.
Links used for measurements are at ASes 6079, 3789, and 7018.