measurement and analysis of the root DNS system: update
Archived MagicPoint presentation slides, compiled into a single PDF document.
2002_dns0209.pdf (37 slides, 2.7 MB)
Slide text transcript
Slide 1: measurement and analysis of the root DNS system: update
measurement and analysis of the root DNS system: update september 2002 ucsd/sdsc/caida kc@caida.org http://www.caida.org/publications/presentations/
Slide 2: research problems
research problems main directions of caida's DNS research: continuous performance monitoring of root/gtld servers investigation and modeling of bind algorithm behavior analysis of bogus queries and broken name server configurations evaluation and optimization of root server placement
Slide 3: types of collected data
types of collected data caida started DNS measurement in 2001 three kinds of data are collected and analyzed passive capturing of DNS packets (netramet, dnsstat/coralreef, tcpdump) log files from root servers active probing of the infrastructure
Slide 4: I. monitoring dns root servers performance
I. monitoring dns root servers performance (nevil brownlee, caida/u.auckland) NeTraMet traffic meter captures DNS request/response packets root servers and gTLDs passive observations, January 2002 - present From UCSD - nearly continuous From SJC - best effort measurements of: rtt for UCSD and SJC loss% and count for UCSD results at: www.caida.org/cgi-bin/dns_perf/main.pl Updated daily after midnight
Slide 5: I. monitoring dns root servers performance
I. monitoring dns root servers performance www.caida.org/cgi-bin/dns_perf/main.pl interactive plotting of parameters for comparison and analysis examples follow
Slide 6: I. root servers performance - RTT
I. root servers performance - RTT stable response time for most servers (G overloaded on weekdays)
Slide 7: I. root servers performance - losses
I. root servers performance - losses periods of high loss on A, C, I, and J but note: high losses do not necessarily affect the measured RTT
Slide 8: I. gTLD servers performance - RTT
I. gTLD servers performance - RTT gTLDs are consistently more stable than the roots
Slide 9: I. continuous monitoring: future plans
I. continuous monitoring: future plans deploy 3-4 additional NeTraMet meters please contact kc or nevil@caida.org strategic locations: europe asia/pacific east coast of US start monitoring of country code servers (ccTLDs) time-series analysis of the data [lack of] correlation among loss, workload, RTT evaluate ICMP as indicator of DNS performance (see task 3)
Slide 10: II. investigation and modeling of BIND
II. investigation and modeling of BIND
(duane wessels, caida/packet-pushers)
bind certainly works - but how?
who queries root servers?
how does a client select a root server?
presumption: based on RTT measurements
how does a root server acquire clients?
do all root servers "see" all clients?
Note: they are supposed to...
can we model all (any of) this?
Slide 11: II. modeling of BIND (cont.)
II. modeling of BIND (cont.)
how does a client select a root server? (courtesy vix)
1) rtt sorting
try every NS+A until you find best one
2) aged rtt sorting
gradually depref the "best NS+A" to force rescans
3) priming
only use "root.cache" to do an initial ". NS" query sweep
4) static
use servers in order on EVERY query, stop when answer heard
5) random selection among known authorative (including roots)
6) round robin
rotate the NS and/or A list every time one is used.
BIND4/BIND8, and modern BIND9 do (2) and (3)
early BIND9 did (1) and (3)
djbdns does (5)
win2k does (5) [although it may tend to prefer A if it responds quickly]
win2003 "(2)-like" [per usoft: tries to balance across NSes per node]
nobody does (6)
Slide 12: II. modeling BIND (cont.): asymmetric loads
II. modeling BIND (cont.): asymmetric loads
why does A get 2X the query load of B..M
if BIND4/BIND8 and modern BIND9 are still dominant query sources
then they are hitting A..M somewhat evenly, with small localizations
according to RTT variance seen by various client populations.
A's additional ~3Kq/sec in volume from where?
as yet uninvestigated
Slide 13: II. investigating/modeling BIND: measurements
II. investigating/modeling BIND: measurements CAIDA dnsstat utility passive measurements at the root servers collects aggregated statistics of queries: source address number of queries type of queries does not record query subjects can run for days without a problem
Slide 14: II. BIND behavior - dnsstat data
II. BIND behavior - dnsstat data 26 hours at 10 minute intervals from 14 august 2002 instrumented: e-root (california, us) i-root (stockholm, sweden) k-root (london, uk) m-root (tokyo, japan) f-root (california, us) a-root (va, us) need simultaneous runs on all participating servers need cooperation from US servers (2.5 non-US servers are quite forthcoming)
Slide 15: II. dnsstat results - number of queries
II. dnsstat results - number of queries number of queries per 10-minute interval no clear pattern, between 5000 and 12000 queries per second
Slide 16: II. dnsstat results - new clients
II. dnsstat results - new clients accumulated number of unique clients no conversion or slowdown after 1 day
Slide 17: II. dnsstat results - new clients (cont.)
II. dnsstat results - new clients (cont.) number of unique clients per 10 minute interval apparently, no diurnal variations peaks at hourly boundaries - why? some popular software's cron behavior? still investigating
Slide 18: II. dnsstat results - number of messages
II. dnsstat results - number of messages number of messages sent by clients half of the clients sent 8 or fewer messages (in 26 hrs) one client sent about 18M messages (192 per second)
Slide 19: II. BIND behavior analysis - future plans
II. BIND behavior analysis - future plans continue collection and characterization of log files interarrival rates popularity (some names are more popular than others) correlations between popularity and TTLs message sizes response codes duplicate queries invalid queries percentage of caching/non-caching clients develop simulating software run controlled experiments `icmp as indicator of dns' calibration (again)
Slide 20: III. bogus queries
III. bogus queries (evi nemeth, caida/sailboat; andre/ken/duane, caida) misuse of root servers root servers receive a large amount of invalid queries possible classification: stupid (e.g. lookup the IP address of an IP address) invalid TLD (i.e. "foo.ntdomain") repeat queries for the same data (new meaning to `persistent software') our goals: identify clients that do not cache referrals would more consistent caching reduce load significantly? determine the nature of high load clients misconfigured name server installations? unknown DNS implementations? viruses? suggest possible fixes to reduce the load
Slide 21: III. bogus queries
III. bogus queries data from our earlier study: bogus A queries to root servers for a few hours at f-root in 2001 A queries ask for the IP address of a hostname malformed A queries were 14% of the load at F.root asking for the IP address of an IP address example: "A 206.168.0.4" - should not happen guilty: Microsoft Win2k resolver, viruses (win95/98/nt), macOSX resolver (good news: with our help, Microsoft found and fixed this bug in Win2k (although the way to turn off a bad default configuration is 6 or so menus deep...) 20% of queries asking for non-existent TLD lots of internal microsoft names (active directory) lots ending in .local, .localhost, .workgroup, .msft, .domain, etc. hard to track down, nameservers just relay clients queries cannot see back to the actual client that asked the question
Slide 22: insidious problem: private (rfc1918) addresses
insidious problem: private (rfc1918) addresses workload myth: private addresses do not appear in the core reality: private addresses appear all over the place including (consistently) in queries to root name servers Broido's 1st Law: `what should not be seen in the Internet will appear 1% of the time' data: log files from an authoritative RFC1918 (AS112 project) name server hazel bogus PTR record updates attempts to modify a PTR record 51.4M updates in 86.5 hr = 10,000 per minute = 165 per second
Slide 23: Private addresses (cont.) - workload
Private addresses (cont.) - workload 192.168.0.0/16 is the most popular in networks using cable modems and DSL connections
Slide 24: private addresses (cont.) - by continent, time
private addresses (cont.) - by continent, time clear diurnal patterns (by time zones) sharp peaks at midnight of each time zone hypothesis: expiration/renewal of DHCP leases?
Slide 25: private addresses (cont.) - by ASes
private addresses (cont.) - by ASes clients belong to 3309 origin ASes 20 ASes cause more than 50% of RFC1918 PTR updates top offenders: 4134 Chinalink (China) 3352 Ibernet (Spain) 7132 SW Bell (USA) 5673 Pac Bell (USA) 5676 Pac Bell (USA) 4813 China Telecom (Guandong, China) 4812 China Telecom (Shanghai, China) 852 Telus (Canada) 6128 Cablevision (USA) 2828 XO (USA) 1142 Road Runner (USA) 7843 Adelphia (USA) 4760 Netvigator (Hong Kong) 2914 Verio (USA) 1221 Telstra (Australia) 11509 Pajo (USA) 4436 SantaCruz Community I't (USA) 11426 Road Runner (USA) 10994 Time Warner (USA) 2548 Business Internet (USA)
Slide 26: who contributes most of RFC1918 workload?
who contributes most of RFC1918 workload? a week of RFC1918 PTR upates bulk are from hosts sending btw. 256 and 512 updates per week (andre: neither mice, nor elephants - but `workhorses')
Slide 27: private addresses (cont.): identifying OSen
private addresses (cont.): identifying OSen dynamic probing of offending addresses used xprobe utility very limited: a few samples, 100 to 500 addresses each no dominant operating system found mixture of Windows- and Unix- based OS no Apple systems.... need better diagnostic tools
Slide 28: IV. evaluation/optimization of server placement
IV. evaluation/optimization of server placement (bradley huffaker, caida/u.auckland) 13 root servers 10 in US (6 in Washington DC, 4 in California) 2 in Europe 1 in Asia is this arrangement optimal? are some servers redundant? are more servers necessary? how to determine best root server location? politics prestige control
Slide 29: IV. server placement (cont.)
IV. server placement (cont.) macroscopic topology measurements CAIDA skitter tool http://www.caida.org/tools/measurement/skitter traceroute-like methodology increments Time-To-Live (TTL) ICMP echo requests small (52-bytes) probe packets slow-paced probes measure IP forward path information round trip time (RTT) to destination thousands of destinations resulting data hundreds of thousands of paths per day, for years most comprehensive macroscopic Internet topology data
Slide 30: IV. server placement (cont.)
IV. server placement (cont.) skitter measurements for dns root servers 11 (out of 13) root servers instrumented w. skitter monitors J co-located with A C has not responded measures ICMP RTT from skitter to target destinations not actual DNS response time characteristic of infrastructure common destination lists for all monitors => dns clients list
Slide 31: IV. server placement (cont.): measurements
IV. server placement (cont.): measurements
dns clients list
Goal: representative list to run on all skitter monitors
combine individual clients lists from all root name servers
stratify IPv4 address prefix space
prefix - independently routable slice of address space
no more than 150K destinations
so that we can probe 3-5X/day (less sensitive to diurnal variations)
current dns clients list created in March 2002
passive collection of addresses at 7 root servers
select one host per routable /24 prefix
prefer hosts seen by most root servers
nearly 2M addresses passively collected from root servers
selected more than 143K addresses
cover about 50% of prefixes from the global BGP table
Slide 32: IV. server placement (cont.): `remove one'
IV. server placement (cont.): `remove one' m-root is most crucial server f-root is second most crucial server
Slide 33: IV. server placement (cont.)
IV. server placement (cont.) distance btw. a pair of root servers - definition skitter monitor is co-located with each root server select a subset of destinations responding to both skitter monitors distance = the average absolute difference btw. median RTTs short distance <=> similar RTT distributions for destinations cluster root servers based on their virtual proximity => "root families"
Slide 34: IV. Servers' placement (cont.)
IV. Servers' placement (cont.) clusters of root servers clusters correlate with geography remarkably well servers within each cluster are functionally equivalent more at the bottom of http://www.caida.org/projects/dns-analysis/
Slide 35: IV. server placement: nameservers by geography
IV. server placement: nameservers by geography clusters of root servers 2/3 of NA dsts have lowest mRtts to servers in US-E family, other 1/3 to US-W majority of European dsts best-served by Europe family, with some in US-E a few Asian dsts favor Europe and US-W family Oceania prefers US-W family note data missing from 3 root servers but likely not result-changing
Slide 36: conclusions
conclusions
a ton of damage in the root system
much more dns analysis on caida web site
www.caida.org/projects/dns-analysis/
www.caida.org/publications/papers/
www.caida.org/cgi-bin/dns_perf/main.pl
this talk
www.caida.org/publications/presentations/dns200209/
lot more to study than cycles to study it
please send mail if you want to offer monitoring site or analysis cycles (students)
Slide 37: contact info
contact info the purpose of models is not to fit the data but to sharpen the questions. Samuel Karlin, Samuel Karlin, 11th R A Fisher Memorial Lecture, 20 April 1983. k claffy ucsd/sdsc/caida kc@caida.org www.caida.org

