MagicPoint presentation foils
Archived MagicPoint presentation slides, compiled into a single PDF document.
2003_nmspi0305.pdf (29 slides, 4.9 MB)
Slide text transcript
Slide 1: modeling the
modeling the Domain Name System (DNS) duane wessels marina fomenkov kc claffy 28 may 2003 nms pi meeting san diego, ca kc@caida.org
Slide 2: DNS is critical infrastructure
DNS is critical infrastructure
Slide 3: refresher: how DNS works
refresher: how DNS works DNS utilizes a hierarchical name space divided into zones that are distributed among the name servers. Each zone has one or more authoritative name servers responsible for answering queries for names within their zone(s). In order to reach a machine with the name not.invisible.net, one must send a query to the DNS server responsible for machines and/or sub-domains in the domain .invisible.net. To find this DNS server, one must send a query to the server authoritatively responsible for .net. Such a server is called a global top-level domain (gTLD) server. To find the appropriate gTLD server, one must query one of the root servers. Currently there are 13 gTLD servers and 13 root servers.
Slide 4: last time: egregious macroscopic dns damage
last time: egregious macroscopic dns damage dns updates for private address space leaking up to roots spectroscopy analysis of RFC1918 updates RFC1918 updates coming from DHCP/nameservers millions a day getting to root name servers (whee) 51.4M updates in 86.5 hr = 10,000 per minute = 165 per second weekday, weekend patterns; weird spikes at midnight local time 4 in the US, 3 in Asia, 2 in Europe can see that Asians work on the weekend can see that Europeans and Asians get to work on time
Slide 5: ... global RFC1918 damage in DNS system
... global RFC1918 damage in DNS system rare to get macroscopic Internet data so radically broken who is trying to update the roots anyway? dsl, cablemodem, small population providers, developing countries verified that vast majority derive from two OSes: Windows 2000 and Windows XP majority of updates from sources that send them constantly bulk of workload from contributions of medium size, not mice/elephants most source IP addresses are of home and small business users (owned by individuals, not organizations) connected to the Internet via cable, DSL or phone-based ISPs majority using software with default vendor settings academic, corporate, backbone networks contribute little rfc1918 update traffic
Slide 6: ...global threat arising from single vendor
...global threat arising from single vendor combination of Microsoft software features & misconfigurations essentially causing a slowly paced massive distributed denial of service (DDOS) attack on the root name server system current state of fielded desktop software poses substantial & increasing burden on (if not threat to) the robustness of the global Internet software and setups affecting global systemic Internet stability must be designed more carefully wrt potential effects of: software engineering decisions misimplementations misconfigurations measurement can make a huge difference
Slide 7: next step: toward realistic DNS simulation
next step: toward realistic DNS simulation DNS simulation parameters and constraints based on real data 13 root servers (root zone file) 13 gTLD servers (159 zone files) thousands of SLDs use real SOA resource records where possible real user query workload test all commonly available nameserver implementations variety of realistic experimental conditions just first step toward global DNS modeling
Slide 8: DNS simulation methodology
DNS simulation methodology requires 5 computers
Slide 9: DNS workload generation
DNS workload generation derived from 24 hours of IRCache logs 7,507,544 hostnames filter invalid data (e.g. query asking to resolve an IP address) extract unique SLD zones extract valid unique TLD zones keep invalid TLDs to model error handling workload is played back as fast as possible
Slide 10: DNS caching name servers under test
DNS caching name servers under test client-side caching dns servers configuration files contain only the hints for the root zone most common DNS implementations today: BIND (8.3.4 and 9.2.2) djbdns (1.05 w/default 1M cache and w/ 100M cache) DNS software in Windows 2000 (v5.0.49664) runtime < 2 hours
Slide 11: DNS simulation: experimental conditions
DNS simulation: experimental conditions Experiment 1: no loss, no delay (ideal conditions) Experiment 2: 10% query loss, no delay Experiment 3: no loss, linear delay Root name servers (and those for .com, .uk, .jp) are given delays starting at 10 msec and increasing by 15 msec for each nameserver. The .org nameserver delays overlap with the .com nameservers, going from 100-210 msec in 15 msec increments. SLD nameservers have no delays. Experiment 4: no loss, constant delays Experiment 5: 100% loss, no delays
Slide 12: DNS responses from simulated zones
DNS responses from simulated zones simulated root zone looks much like the real root zone same (refresh,retry,expiry,minimum) values in the SOA record 13 NS records ([a-m].root-servers.net) and 13 glue (A) records for those nameservers TTLs for all records match the real root zone 159 top level domains (TLD) zones, each containing: SOA record with values taken from real SOA records for its zone some number of NS records, same number of glue (A) records (match real world, e.g. 13 NS+A records for .com, 8 NS+A records for .it) values inside NS+A records, however, are fictitious; 24-hour TTLs for all NS and glue (A) records. delegations for subdomains
Slide 13: DNS responses from simulated zones (2)
DNS responses from simulated zones (2) 82,891 second level domain (SLD) zones, each contains: 1 SOA record with fictitious values 2 NS records, 2 glue (A) records some number of A records for hosts in the zone both NS and A records match those in the parent zone (TLD) file 24-hour TTLs name server IP addresses from a pool of 254 addresses allocated sequentially, wrap around as needed A records for hosts in the SLD zone have random IP addresses 12-hour TTLs
Slide 14: DNS responses from simulated zones (3)
DNS responses from simulated zones (3) small number (254) of TLD and SLD name server IP addresses occurs because BIND binds to each IP address twice (TCP and UDP) and is limited to 1024 file descriptors zone files are perfectly consistent no lame delegations, non-answering name servers, or other errors. each hostname has only one A record no CNAME records each SLD has only two NS records TTLs are larger than the time to run an experiment once a record is cached, it stays cached
Slide 15: overall comparison of caching name servers
overall comparison of caching name servers W2K 5.0.49664 exhibits more root queries because it appears to delete negative responses from the cache after only 15 minutes.
Slide 16: overall comparison of caching name servers
overall comparison of caching name servers
Slide 17: BIND 9.2.2 queries to roots
BIND 9.2.2 queries to roots BIND 9.2.2 always selects the initial root at random, but transitions to a uniform distribution in all experiments with no delay (red and green). High peaks at M & F (red) and K & E (green) in the root usage graph result from BIND's random initial root selection. In Experiment 3 (blue) with linear delay, BIND initially randomly selected H to handle root queries. For both root and .com TLD queries, BIND detects delay by evaluating RTTs and then tends to select roots with the lowest delay (A-D).
Slide 18
BIND 9.2.2 queries to .com TLDs
Slide 19: djbdns 1.05 queries to roots
djbdns 1.05 queries to roots While 1MB is the default cache size for djbdns, compared to a 100MB cache, the 1MB cache increases the load on the TLD and SLD nameservers by a factor of 4, and increases root server load by a factor of 10. To avoid this, we simulated with only the 100MB cache. djbdns 1.05 (100M cache) uniformly distributes its queries to roots and .com TLDs in all three experiments. Increased query load in the linear delay experiment (blue) is an artifact of our simulation because the caching name server repeats queries before a response is cached. (Note that this effect does not occur in BIND9.)
Slide 20
djbdns 1.05 queries to .com TLDs
Slide 21: W2K v5.0.49664 queries to roots
W2K v5.0.49664 queries to roots windows W2K v5.0.49664 always initially select A root A few seconds later an "NS ." query is sent to M, which returns an ordered list of roots with a random starting point. First root in this list becomes the designated choice. Root selection temporarily changes only in the event of packet loss, but then returns to the designated root.
Slide 22
W2K v5.0.49664 queries to .com TLDs
Slide 23: BIND8 queries to roots
BIND8 queries to roots
Slide 24: BIND8 queries to TLDs
BIND8 queries to TLDs all other plots: http://www.packet-pushers.net/dns/simulations/sample-plots/
Slide 25: contribution to national measurement priorities
contribution to national measurement priorities challenges in Internet measurements `12-step program': step 3-5, 9-10, 12 NSF ANIR PI meeting Jan 2003 http://www.caida.org/publications/presentations/nsfpi200301/ (3) mathematical frameworks to find structure/patterns in traffic a la scott's encouragement to `formalize some of what we (and providers) know' macroscopic as well as microscopic theory of joint spatial/temporal locality spectroscopy, tomography (4) source modeling (for realistic inputs into simulations, models) extract a set of source models from an aggregate trace feature extraction problem 10,000 gnutella port numbers are not 10,000 flows ultimate goal: augment libraries of source level models w generation of own calibrate models by evaluating their power for prediction (5) empirically validated simulation of significant aspect of Internet already much work in large-scale simulations, but no recognized empirically validated simulation of any signficant piece of the Internet. requires cooperation from providers and vendors to get default and configured parameters of OSes and algorithms. NSF should shepard/foster this cooperation (note: large scale means in size as well as # of protocols)
Slide 26: contribution to national measurement priorities
contribution to national measurement priorities challenges in Internet measurements `12-step program': step 3-5, 9-10, 12 NSF ANIR PI meeting Jan 2003 http://www.caida.org/publications/presentations/nsfpi200301/ (9) discovering pervasive hidden bugs any modeling or analylsis must handle impact of this huge component of traffic (10) how does measurement affect/support security goals infer bgp, firewall, and virus spread behavior how do you get networks to share security-related information protection of measurement infrastructure from security compromises (12) encouragement of strategic measurement in new networks based on what we learned from what we did wrong in old networks
Slide 27: contribution to national measurement priorities
contribution to national measurement priorities payoffs of Internet measurement improve accuracy, validity, repeatability of network research provide reference points or baselines for simulation and model validation other fields, e.g., architecture, have had this for years build a solid understanding of network behavior including subtleties not otherwise detected including damage not otherwise detected accelerate present and future modeling, simulation, and analysis efforts avoid duplication of effort // scientific apparatus offers a window to knowledge, but as they grow more elaborate, scientists spend ever more time washing the windows. -- Isaac Asimov //
Slide 28: NMS collaborations
NMS collaborations students visiting caida Robert Nowak's student Ryan King (2002) (Rice) Michalis Faloutsos' student Thomas Karagiannis (UCR) Srikant Rayadurgam's student Srinivas Shakkottai (UIUC) Ellen Zegura's student Ruomei Gao (GaTech) Ken Calvert's student Aditya Namjoshi (U.KY) Edmond Jonckheere's student Khushboo Shah (USC) other NMS PI users of caida data/infrastructure: George Riley (GaTech), John Heidemann (USC), Srikant (UIUC), Yuri Pryadkin (USC) other relevant collaborations CAIDA testing of Reidi's (Rice) pathchirp for DOE bwest project SLAC (Les Cottrell) on LSN presentation on measurement priorities
Slide 29: // disorder increases with time
// disorder increases with time
because we measure time in the
direction in which disorder increases //
-- stephen hawking
www.caida.org

