Skip to main content

"bad data better than no data?"

Archived MagicPoint presentation slides, compiled into a single PDF document.

2000_darpa0010.pdf (61 slides, 3.7 MB)

Slide text transcript

Slide 1: bad data better than no data?

bad data better than no data?
observations on our (in)ability
to accurately predict, analyze
or even measure conditions
on the global Internet

3 oct 2000


           the so-called science of poll-taking is not a science at all but a mere necromancy. 
           people are unpredictable by nature, & though you can take a nation's pulse, 
           you can't be sure that the nation hasn't just run up a flight of stairs. 
                       --E. B. White New Yorker, Nov 1948.

kc@caida.org 
ucsd/sdsc/caida

Slide 2: Internet's resistance to modeling/measurement

Internet's resistance to modeling/measurement


evolution-based (good!) reasons
protocols, technologies, applications
independently developed and deployed
by no means synergistic
by all accounts rapid
`punctuated' but no equilibrium 
"have done fine without modeling so far"
                (let's wait till modeling cheaper than bandwidth)

but simulation/analysis validation 
(& lately other stuff) needs data
right granularities hard to come by
measurement technology just not there
argument for it also not there
"helps everyone", but who pays?
losing battle?

Slide 3

Internet's resistance to measurement
                

measurement tools lack

well-defined traffic metrics
e.g supporting SLAs, QOS, billing
uniformly applied methodologies
varied topologies, equipment, ISP practices
scalability
ability to explain phenomena
topology changes, routing loops, black holes
relevance to actual ISP problems or mechanisms for fixing 
communication of useful results

Slide 4: Internet measurement taxonomy

Internet measurement taxonomy


topology  (circulatory/respiratory)
performance (physiology/psychology)
workload  (cardiovascular/GI)
routing   (neuroscience)


   correlation essentially non-explored
     .....(holistic Internet measurement?)

       
                                     





buy big posters peacockmaps.com

Slide 5: topology: caida's skitter

topology:  caida's skitter


track/depict topology cross-sections 
22 monitors (inc. some root name servers)
forward IP path and round-trip delay 
tens of thousands of dst (multiple lists)
remove targets that complain

architecture
continuous, parallel 52-byte ICMP probes
depending on dst list size, 0.3 to 200 probes/day/dst
kernel time stamping

correlate path perf. w events, e.g. BGP
identify critical pieces of infrastructure
case studies of relevant cross-sections

Slide 6: other active (probed) data sets

other active (probed) data sets

MOAT: http://amp.nlanr.net 
HPC sites
RTT, traceroutes

I2's Surveyor: http://www.advanced.org/surveyor/
I2 sites
one-way delay, paths

vBNS  http://www.vbns.net:8080/stats/ 

SLAC, NIMI, XIWT, RIPE, others (too many)

http://www.caida.org/tools/taxonomy/
http://atlas.caida.org/

Slide 7: skitter: colored by countries

skitter: colored by countries

Slide 8: topology vis: geographic mapping

topology vis:  geographic mapping


difficult data analysis
requires mapping of thousands (millions?) of nodes to latitude/longitude coordinates

NetGeo service designed to help
http://netgeo.caida.org

backbones require company-specific heuristics

DNS registry growth is problematic
no common data formats

Slide 9: GTrace: geographic traceroute

GTrace: geographic traceroute

www.caida.org/Tools/GTrace/

Slide 10: topology mapping: interface merging

topology mapping: interface merging


26 sept 2000, 18 hours 
360k interfaces, 505k probes
responses
joins: 29893
new i/fs: 2692
nodes with >1 i/fs: 18556
i/faces on nodes:  48005
single interfaces: 311663

Slide 11: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from riesling skitter monitor in san diego
        54,893 nodes, 54,892 tree links
        spanning tree from src

Slide 12: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from riesling skitter monitor in san diego
        54,893 nodes, 54,892 tree links 
        24,517 non-tree links (transparent)

Slide 13: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from riesling skitter monitor in san diego
        54,893 nodes 54,892 tree links
        24,517 non-tree links (less transparent)

Slide 14: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from london skitter monitor in 
        535,102 nodes, 535,101 tree links
        66,577 nontree links

Slide 15: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from london skitter monitor in 
        535,102 nodes, 535,101 tree links
        66,577 nontree links (transparent)

Slide 16: hyperbolic viewer (java 3D, 100,000s nodes)

hyperbolic viewer (java 3D, 100,000s nodes)

        from london skitter monitor in 
        535,102 nodes, 535,101 tree links
        66,577 nontree links (less transparent)

Slide 17: skitter case study: DNS roots

skitter case study: DNS roots

RSSAC, DNS technical advisory committee to ICANN

goal: optimize root nameserver location
co-locate skitter hosts w root servers
demonstrate root server performance in serving target community
develop techniques for evaluating architectual optimality for root server placement 
visualization to correlate data sources/types

collaborative project to encourage proactive participation (network operators, researchers, others) 

(www.caida.org/tools/measurement/skitter/)

Slide 18: skitter case study: DNS roots

skitter case study: DNS roots


get roots instrumented
gather/analyze client lists 
correlation among different sources
determine of connectivity metrics 
closeness
redundancy
persistence of paths
how many clients not secondaries
skitter to client sets from non-root sources

Slide 19: skitter: rtt distribution: tri-modal

skitter: rtt distribution: tri-modal

Slide 20: skitter: rtt vs longitude (light cone)

skitter: rtt vs longitude (light cone)

Slide 21: skitter: rtt vs longitude (light cone)

skitter: rtt vs longitude (light cone)

Slide 22: skitter: modeling rtt along a single path

skitter: modeling rtt along a single path


champagne.caida.org (UIUC) monitor 

fewest destinations, 
most probes/dst per day, 1/5min/dst  (270)
63 days 08 July - 08 Sept 2000
exactly 9 weeks, sun-sat
only use stable subsets of paths
75% responsive, 33% of paths same per day
min 5000 paths same

     --> gives us 168 pairs to work with

Slide 23: skitter: modeling rtt along a single path

skitter: modeling rtt along a single path

Slide 24: skitter: modeling rtt along a one-hop path

skitter: modeling rtt along a one-hop path

Slide 25: skitter: modeling rtt along a five-hop path

skitter: modeling rtt along a five-hop path

Slide 26: skitter: modeling rtt along a five-hop path

skitter: modeling rtt along a five-hop path

Slide 27: skitter: modeling rtt along a five-hop path

skitter: modeling rtt along a five-hop path

Slide 28: skitter: bimodal rtt along a 7-hop path

skitter: bimodal rtt along a 7-hop path

Slide 29: skitter: bimodal rtt along a 7-hop path

skitter: bimodal rtt along a 7-hop path

Slide 30: skitter: bimodal rtt along a 7-hop path

skitter: bimodal rtt along a 7-hop path

Slide 31: skitter: bimodal rtt along 7-hop path

skitter: bimodal rtt along 7-hop path

Slide 32: skitter: non-independence of rtt for 7-hop path

skitter: non-independence of rtt for 7-hop path

Slide 33: skitter: dispersion among ASes across paths

skitter: dispersion among ASes across paths

Slide 34: skitter: AS dispersion across paths (sdsc)

skitter: AS dispersion across paths (sdsc)

Slide 35: skitter: other interesting studies

skitter: other interesting studies

non-shortestPath-ness of topology
    idealized topology (skit+ches graph)
    median/max # extra hops taken vs optimal 
    worst case: actual path 18 hops longer than shortest path

Slide 36: skitter on-going daily summaries

skitter on-going daily summaries


http://www.caida.org/tools/measurement/skitter/summary

path length (in IP hops) distribution
RTT distribution
RTT versus longitude, 
path dispersion 
AS & country granularity

Slide 37: skitter on-going daily summaries

skitter on-going daily summaries

path lengths from yto/sjc.skitter to 35k dsts

Slide 38: skitter on-going daily summaries

skitter on-going daily summaries

rtts from yto/sjc.skitter to 35k dsts

Slide 39: skitter on-going daily summaries

skitter on-going daily summaries

rtt by region from sjc.skitter to 35k dsts

Slide 40: skitter on-going daily summaries

skitter on-going daily summaries

rtt by region from yto.skitter to 35k dsts

Slide 41

skitter on-going daily summaries
        
as dispersion graph from sjc/yto to 35k dsts, 090300

Slide 42

skitter on-going daily summaries
        
country dispersion graph from sjc/yto to 35k dsts, 090300

Slide 43: Internet workload

Internet workload 

many uses
capacity planning 
performance and QOS assurance across ISPs
accounting/billing
security management

measurement tools
router-based (cflowd, netflow)
stand-alone monitors (coral,skitter)

visualization huge challenge
too much data 
noone correlates across/with much

evolution requires use
envisioning new methods?
better data correlation tools are essential

Slide 44: available data: (passive) header traces

available data: (passive) header traces


coral: oc3/oc12 `real' networks
HPC sites: http://moat.nlanr.net/Traces/

tcpdump: campus/corporate sites
        http://ita.ee.lbl.gov/html/traces.html

have given to NDA'ed researchers
        vBNS, ATT

Slide 45: AMES packet size mean/median trend

AMES packet size mean/median trend

little change over 9 months
unsurprising as long as TCP dominates

Slide 46: workload: AIX-MAEW fragmentation

workload: AIX-MAEW fragmentation

relevant to recent IP traceback techniques [Savage00]
definitely on rise (from UDP) at AIX
almost no TCP frags (MTU disc + small pkts)

Slide 47: workload: CERFNET fragmentation

workload: CERFNET fragmentation

size of first fragment seen in series on UCSD link
component of distribtion is uniform/random

Slide 48: workload: AIX fragmentation

workload: AIX fragmentation

size of first fragment seen in series on AMES AIX link

Slide 49: workload: top applications in/out

workload: top applications in/out

Slide 50: workload: other recent work

workload: other recent work


UWisconsin flowscan (dave plonka)
http://net.doit.wisc.edu/data/flow/size/

netramet, (nevil brownlee, nz)
http://www.caida.org/analysis/workload/netramet/

nextra (alexander kunz, .de)
http://flowstats.nextra.de/graphs/

coralreef (david moore, ucsd)
https://anala.caida.org/CoralReef/Demos/

Slide 51: workload data: meta-challenges

workload data:  meta-challenges

splintered & competitive core 
limited access to data
so difficult to argue `representativeness'

network performance impact 
higher b/w increasing difficult to measure
faster speeds and changing transport technologies complicate data acquisition and processing 
e.g. monitor gone when AIX converts to POS

user privacy volatile issue
hard to get data in researchers hands

    CAIDA's UCSD/CERFnet link monitor available:
    https://anala.caida.org/CoralReef/Demos/

Slide 52: workload data: challenges

workload data:  challenges


id and present `useful' workload metrics, particularly given persistence of fire-fighting environment

id significant patterns, timeframes, correlations
vary by user need
change as technologies and 'net change

methodology has many weaknesses 
dynamic port negotiation (napster)
tons of `other' ports unmapped
ports not really assurance/unique anyway
IPSEC blows away ports anyway
need traffic profiling 
things getting worse not better here

Slide 53: routing & addressing data

routing & addressing data


things getting slightly better (data-wise)
but BGP/infrastructure getting worse faster

not much real-time instrumentation on routers

UO's route-views  http://www.antc.uoregon.edu/route-views/

Merit's IPMA http://www.merit.edu/ipma/

Slide 54: skitter: AS interconnectivity

skitter: AS interconnectivity

Slide 55: routing: address consumption

routing: address consumption 

prefix length distribution for routes announced by core ISPs, 1-6/1998 (courtesy NLANR/MOAT, Jeff Brown)

Slide 56: routing: address usage of *traffic* sample

routing: address usage of *traffic* sample

32x32 `bitmap' matrix of address space 
height is % packets with src IP in that address block

Slide 57: routing: research priorities

routing:  research priorities 

better IP routing instrumentation 
real-time analysis without interfering with performance
realistic inter-domain routing models

tasks
identification/vis of flaps, outages, critical paths
correlation performance with some measure of path `length'
comparison of forward path with
BGP path
shortest path
reverse path
effects of unicast/multicast incongruities?

Slide 58

routing:  research obstacles


routes may change faster than ability to measure or analyze
sometimes on purpose (load-balancing)

poorly instrumented infrastructure (new tools needed)

prudent security dictates inhibiting research 

mapping IP address to anything (deja vu)

Slide 59: now what?

now what?  


the ideal:
well-instrumented infrastructure 
seamless integration of variety of data sources
important for simulation/prediction (& lately, operations)
but unlikely for the foreseeable future

tools still need:
interpret of vast quantities of data in real-time
geographically & logically distributed 
user-friendly integration with network utilities 
       and control systems
inter- & intra-ISP feature detection
new methods for data collection, reduction, 
       aggregation, and mining (GByte or Tbyte datasets)

Slide 60: setting expectations

setting expectations


rule 1: no magic data sets
(not so far anyway)

                             the so-called science of poll-taking 
                             is not a science at all 
                             but a mere necromancy. 
                             people are unpredictable by nature, 
                             & though you can take a nation's pulse, 
                             you can't be sure that the nation 
                             hasn't just run up a flight of stairs. 
                                    --E. B. White New Yorker, Nov 1948.

Slide 61: www.caida.org/Presentations/

www.caida.org/Presentations/

kc claffy
UCSD/SDSC/CAIDA
kc@caida.org
www.caida.org

Related Objects

See https://catalog.caida.org/media/2000_darpa0010/ to explore related objects to this document in the CAIDA Resource Catalog.