"bad data better than no data?"
Archived MagicPoint presentation slides, compiled into a single PDF document.
2000_darpa0010.pdf (61 slides, 3.7 MB)
Slide text transcript
Slide 1: bad data better than no data?
bad data better than no data?
observations on our (in)ability
to accurately predict, analyze
or even measure conditions
on the global Internet
3 oct 2000
the so-called science of poll-taking is not a science at all but a mere necromancy.
people are unpredictable by nature, & though you can take a nation's pulse,
you can't be sure that the nation hasn't just run up a flight of stairs.
--E. B. White New Yorker, Nov 1948.
kc@caida.org
ucsd/sdsc/caida
Slide 2: Internet's resistance to modeling/measurement
Internet's resistance to modeling/measurement
evolution-based (good!) reasons
protocols, technologies, applications
independently developed and deployed
by no means synergistic
by all accounts rapid
`punctuated' but no equilibrium
"have done fine without modeling so far"
(let's wait till modeling cheaper than bandwidth)
but simulation/analysis validation
(& lately other stuff) needs data
right granularities hard to come by
measurement technology just not there
argument for it also not there
"helps everyone", but who pays?
losing battle?
Slide 3
Internet's resistance to measurement
measurement tools lack
well-defined traffic metrics
e.g supporting SLAs, QOS, billing
uniformly applied methodologies
varied topologies, equipment, ISP practices
scalability
ability to explain phenomena
topology changes, routing loops, black holes
relevance to actual ISP problems or mechanisms for fixing
communication of useful results
Slide 4: Internet measurement taxonomy
Internet measurement taxonomy
topology (circulatory/respiratory)
performance (physiology/psychology)
workload (cardiovascular/GI)
routing (neuroscience)
correlation essentially non-explored
.....(holistic Internet measurement?)
buy big posters peacockmaps.com
Slide 5: topology: caida's skitter
topology: caida's skitter track/depict topology cross-sections 22 monitors (inc. some root name servers) forward IP path and round-trip delay tens of thousands of dst (multiple lists) remove targets that complain architecture continuous, parallel 52-byte ICMP probes depending on dst list size, 0.3 to 200 probes/day/dst kernel time stamping correlate path perf. w events, e.g. BGP identify critical pieces of infrastructure case studies of relevant cross-sections
Slide 6: other active (probed) data sets
other active (probed) data sets MOAT: http://amp.nlanr.net HPC sites RTT, traceroutes I2's Surveyor: http://www.advanced.org/surveyor/ I2 sites one-way delay, paths vBNS http://www.vbns.net:8080/stats/ SLAC, NIMI, XIWT, RIPE, others (too many) http://www.caida.org/tools/taxonomy/ http://atlas.caida.org/
Slide 7: skitter: colored by countries
skitter: colored by countries
Slide 8: topology vis: geographic mapping
topology vis: geographic mapping difficult data analysis requires mapping of thousands (millions?) of nodes to latitude/longitude coordinates NetGeo service designed to help http://netgeo.caida.org backbones require company-specific heuristics DNS registry growth is problematic no common data formats
Slide 9: GTrace: geographic traceroute
GTrace: geographic traceroute www.caida.org/Tools/GTrace/
Slide 10: topology mapping: interface merging
topology mapping: interface merging 26 sept 2000, 18 hours 360k interfaces, 505k probes responses joins: 29893 new i/fs: 2692 nodes with >1 i/fs: 18556 i/faces on nodes: 48005 single interfaces: 311663
Slide 11: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from riesling skitter monitor in san diego
54,893 nodes, 54,892 tree links
spanning tree from src
Slide 12: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from riesling skitter monitor in san diego
54,893 nodes, 54,892 tree links
24,517 non-tree links (transparent)
Slide 13: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from riesling skitter monitor in san diego
54,893 nodes 54,892 tree links
24,517 non-tree links (less transparent)
Slide 14: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from london skitter monitor in
535,102 nodes, 535,101 tree links
66,577 nontree links
Slide 15: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from london skitter monitor in
535,102 nodes, 535,101 tree links
66,577 nontree links (transparent)
Slide 16: hyperbolic viewer (java 3D, 100,000s nodes)
hyperbolic viewer (java 3D, 100,000s nodes)
from london skitter monitor in
535,102 nodes, 535,101 tree links
66,577 nontree links (less transparent)
Slide 17: skitter case study: DNS roots
skitter case study: DNS roots RSSAC, DNS technical advisory committee to ICANN goal: optimize root nameserver location co-locate skitter hosts w root servers demonstrate root server performance in serving target community develop techniques for evaluating architectual optimality for root server placement visualization to correlate data sources/types collaborative project to encourage proactive participation (network operators, researchers, others) (www.caida.org/tools/measurement/skitter/)
Slide 18: skitter case study: DNS roots
skitter case study: DNS roots get roots instrumented gather/analyze client lists correlation among different sources determine of connectivity metrics closeness redundancy persistence of paths how many clients not secondaries skitter to client sets from non-root sources
Slide 19: skitter: rtt distribution: tri-modal
skitter: rtt distribution: tri-modal
Slide 20: skitter: rtt vs longitude (light cone)
skitter: rtt vs longitude (light cone)
Slide 21: skitter: rtt vs longitude (light cone)
skitter: rtt vs longitude (light cone)
Slide 22: skitter: modeling rtt along a single path
skitter: modeling rtt along a single path
champagne.caida.org (UIUC) monitor
fewest destinations,
most probes/dst per day, 1/5min/dst (270)
63 days 08 July - 08 Sept 2000
exactly 9 weeks, sun-sat
only use stable subsets of paths
75% responsive, 33% of paths same per day
min 5000 paths same
--> gives us 168 pairs to work with
Slide 23: skitter: modeling rtt along a single path
skitter: modeling rtt along a single path
Slide 24: skitter: modeling rtt along a one-hop path
skitter: modeling rtt along a one-hop path
Slide 25: skitter: modeling rtt along a five-hop path
skitter: modeling rtt along a five-hop path
Slide 26: skitter: modeling rtt along a five-hop path
skitter: modeling rtt along a five-hop path
Slide 27: skitter: modeling rtt along a five-hop path
skitter: modeling rtt along a five-hop path
Slide 28: skitter: bimodal rtt along a 7-hop path
skitter: bimodal rtt along a 7-hop path
Slide 29: skitter: bimodal rtt along a 7-hop path
skitter: bimodal rtt along a 7-hop path
Slide 30: skitter: bimodal rtt along a 7-hop path
skitter: bimodal rtt along a 7-hop path
Slide 31: skitter: bimodal rtt along 7-hop path
skitter: bimodal rtt along 7-hop path
Slide 32: skitter: non-independence of rtt for 7-hop path
skitter: non-independence of rtt for 7-hop path
Slide 33: skitter: dispersion among ASes across paths
skitter: dispersion among ASes across paths
Slide 34: skitter: AS dispersion across paths (sdsc)
skitter: AS dispersion across paths (sdsc)
Slide 35: skitter: other interesting studies
skitter: other interesting studies
non-shortestPath-ness of topology
idealized topology (skit+ches graph)
median/max # extra hops taken vs optimal
worst case: actual path 18 hops longer than shortest path
Slide 36: skitter on-going daily summaries
skitter on-going daily summaries http://www.caida.org/tools/measurement/skitter/summary path length (in IP hops) distribution RTT distribution RTT versus longitude, path dispersion AS & country granularity
Slide 37: skitter on-going daily summaries
skitter on-going daily summaries path lengths from yto/sjc.skitter to 35k dsts
Slide 38: skitter on-going daily summaries
skitter on-going daily summaries rtts from yto/sjc.skitter to 35k dsts
Slide 39: skitter on-going daily summaries
skitter on-going daily summaries rtt by region from sjc.skitter to 35k dsts
Slide 40: skitter on-going daily summaries
skitter on-going daily summaries rtt by region from yto.skitter to 35k dsts
Slide 41
skitter on-going daily summaries
as dispersion graph from sjc/yto to 35k dsts, 090300
Slide 42
skitter on-going daily summaries
country dispersion graph from sjc/yto to 35k dsts, 090300
Slide 43: Internet workload
Internet workload many uses capacity planning performance and QOS assurance across ISPs accounting/billing security management measurement tools router-based (cflowd, netflow) stand-alone monitors (coral,skitter) visualization huge challenge too much data noone correlates across/with much evolution requires use envisioning new methods? better data correlation tools are essential
Slide 44: available data: (passive) header traces
available data: (passive) header traces
coral: oc3/oc12 `real' networks
HPC sites: http://moat.nlanr.net/Traces/
tcpdump: campus/corporate sites
http://ita.ee.lbl.gov/html/traces.html
have given to NDA'ed researchers
vBNS, ATT
Slide 45: AMES packet size mean/median trend
AMES packet size mean/median trend little change over 9 months unsurprising as long as TCP dominates
Slide 46: workload: AIX-MAEW fragmentation
workload: AIX-MAEW fragmentation relevant to recent IP traceback techniques [Savage00] definitely on rise (from UDP) at AIX almost no TCP frags (MTU disc + small pkts)
Slide 47: workload: CERFNET fragmentation
workload: CERFNET fragmentation size of first fragment seen in series on UCSD link component of distribtion is uniform/random
Slide 48: workload: AIX fragmentation
workload: AIX fragmentation size of first fragment seen in series on AMES AIX link
Slide 49: workload: top applications in/out
workload: top applications in/out
Slide 50: workload: other recent work
workload: other recent work UWisconsin flowscan (dave plonka) http://net.doit.wisc.edu/data/flow/size/ netramet, (nevil brownlee, nz) http://www.caida.org/analysis/workload/netramet/ nextra (alexander kunz, .de) http://flowstats.nextra.de/graphs/ coralreef (david moore, ucsd) https://anala.caida.org/CoralReef/Demos/
Slide 51: workload data: meta-challenges
workload data: meta-challenges
splintered & competitive core
limited access to data
so difficult to argue `representativeness'
network performance impact
higher b/w increasing difficult to measure
faster speeds and changing transport technologies complicate data acquisition and processing
e.g. monitor gone when AIX converts to POS
user privacy volatile issue
hard to get data in researchers hands
CAIDA's UCSD/CERFnet link monitor available:
https://anala.caida.org/CoralReef/Demos/
Slide 52: workload data: challenges
workload data: challenges id and present `useful' workload metrics, particularly given persistence of fire-fighting environment id significant patterns, timeframes, correlations vary by user need change as technologies and 'net change methodology has many weaknesses dynamic port negotiation (napster) tons of `other' ports unmapped ports not really assurance/unique anyway IPSEC blows away ports anyway need traffic profiling things getting worse not better here
Slide 53: routing & addressing data
routing & addressing data things getting slightly better (data-wise) but BGP/infrastructure getting worse faster not much real-time instrumentation on routers UO's route-views http://www.antc.uoregon.edu/route-views/ Merit's IPMA http://www.merit.edu/ipma/
Slide 54: skitter: AS interconnectivity
skitter: AS interconnectivity
Slide 55: routing: address consumption
routing: address consumption prefix length distribution for routes announced by core ISPs, 1-6/1998 (courtesy NLANR/MOAT, Jeff Brown)
Slide 56: routing: address usage of *traffic* sample
routing: address usage of *traffic* sample 32x32 `bitmap' matrix of address space height is % packets with src IP in that address block
Slide 57: routing: research priorities
routing: research priorities better IP routing instrumentation real-time analysis without interfering with performance realistic inter-domain routing models tasks identification/vis of flaps, outages, critical paths correlation performance with some measure of path `length' comparison of forward path with BGP path shortest path reverse path effects of unicast/multicast incongruities?
Slide 58
routing: research obstacles routes may change faster than ability to measure or analyze sometimes on purpose (load-balancing) poorly instrumented infrastructure (new tools needed) prudent security dictates inhibiting research mapping IP address to anything (deja vu)
Slide 59: now what?
now what?
the ideal:
well-instrumented infrastructure
seamless integration of variety of data sources
important for simulation/prediction (& lately, operations)
but unlikely for the foreseeable future
tools still need:
interpret of vast quantities of data in real-time
geographically & logically distributed
user-friendly integration with network utilities
and control systems
inter- & intra-ISP feature detection
new methods for data collection, reduction,
aggregation, and mining (GByte or Tbyte datasets)
Slide 60: setting expectations
setting expectations
rule 1: no magic data sets
(not so far anyway)
the so-called science of poll-taking
is not a science at all
but a mere necromancy.
people are unpredictable by nature,
& though you can take a nation's pulse,
you can't be sure that the nation
hasn't just run up a flight of stairs.
--E. B. White New Yorker, Nov 1948.
Slide 61: www.caida.org/Presentations/
www.caida.org/Presentations/ kc claffy UCSD/SDSC/CAIDA kc@caida.org www.caida.org

