Skip to main content

priorities and challenges in Internet measurement, simulation, and analysis

Archived MagicPoint presentation slides, compiled into a single PDF document.

2003_nsfpi0301.pdf (27 slides, 2.7 MB)

Slide text transcript

Slide 1: priorities and challenges in Internet measurement, simulation, and analysis

priorities and challenges in 
Internet measurement, 
simulation, and analysis 


// somebody has to do something, and it's just 
 incredibly pathetic that it has to be us. //
                         -- jerry garcia 
                        


9 january 2003
nsf pi meeting reston virginia
kc

Slide 2: Jan 2003 on Internet mailing list

Jan 2003 on Internet mailing list



http://www.postel.org/pipermail/end2end-interest/2003-January/002720.html

    I believe (based solely on a single long-ago observation that an M/M/1/K
    queuing model seemed to predict the measured behaviour of core routers
    with short buffers pretty accurately) that the speculation in your third
    sentence above may in fact be true.  To relate this to another recent
    thread on this list, however, this seems like one of those things that
    should require no speculation since it is not so difficult to measure, yet
    I know of no good-quality data from "typical" core circuits which has been
    published anywhere.  The network we've built is constructed with insufficient
    instrumentation to enable us to understand what it is we've built with any
   certainty, so we speculate.
                            ---- dennis@juniper.net

Slide 3: Internet windmills needing data support

Internet windmills needing data support

topology, topography
connectivity and stability 
routing
dynamics
performance 
diagnostics
bandwidth estimation
congestion avoidance
qos
workload characterization 
traffic flow analysis & modeling
security
anomaly detection
IP traceback
macroscopic level measurement
dos attacks 
dns system

middleware layers
feedback to system at various granularities (dynamics and trends)
new protocol development: routing, transport, IP
feedback to future operational measurement methodologies

Slide 4: why is it so hard to do research w real data?

why is it so hard to do research w real data?


short answer: there hasn't been any since 1995 
long answer: there's way too much data floating around
disadvantage: inappropriate data can be distracting or worse
advantage: publishing inappropriate data can incent people to offer you better data (`desperate times call for desperate measures' methodology)

2 outstanding talks about problems w Internet data
vern's talk aug2001 www.icir.org/vern/talks/vp-nrdm01.ps.gz
david's talk apr2002 www.caida.org/publications/presentations/

Slide 5: two approaches to measurement (vern)

two approaches to measurement (vern)


(1) research-driven
for specific goal
notion may be flawed
may design data to support conclusion

(2) for-the-sake-of-measurement
needed for longitudinal studies
may foster serendipity
often useless data, rife with errors
        (problem with public measurement repositories)

Slide 6: unsettling admissions about dealing w data

unsettling admissions about dealing w data 

(courtesy vern paxson, david moore)
bizarre behavior, misconfigurations, non-RFC, attacks, `impossible' behavior
measurement tools lie 
(packet filters drop, reorder, replicate, miss due to routing)
clocks can be arbitrarily off/moving, timestamps don't know accuracy, applied differently
app-level measurement tools miss hidden network stuff (middleware, socket buffer parameters)
asymmetric paths
measurements made 2 different ways always disagree (anisotropic)
even a single measurement may disagree (not atomic): routing tables, traceroute
events ripple through network along trajectory that is unlikely fully instrumented
measurements carry no indication of quality
measurements lack meta-info (e.g., hostnames)
representative data points - there is no typical on the Internet
analysis results not reproducible
lack a culture of calibration
large-scale measurements required for repr/longitudinal analysis overwhelm our current methods
archived data often ad hoc, corrupt, truncated, poorly documented. unnavigable
lack of historical data renders it difficult to assess trends
alas, people do it anyway, see kc's myths talk (or any trade rag)
Internet measurement, although too hard, is too easy
not enough data and too much data
we don't yet know how to measure real traffic in the core
speed, sampling, anonymization
can't keep up with media in core (oc12 monitor arrives right after upgrade to oc48)

Slide 7: intra-scientist irreproducibility

intra-scientist irreproducibility


 missing: systematic approach to reduction/analysis
 e.g., paper trail for analysis train, especially for bugs

  what we don't yet require of ourselves and each other:
keep master script of analysis results
keep intermediate forms of data
keep notebook of what was done
version control for scripts and notebook
ways to visualize what's changed in analysis results after re-run

  (recently funded meta-repository project will try to faciliate 
   the cultivation of a culture of calibration.  long road.)

Slide 8: in spite of it all, amazing stuff has emerged

in spite of it all, amazing stuff has emerged 

topology mapping/inference: caida/ucsd, uw, and icir
simulation: ns (icir,isi...), renesys bgp/ssfnet
bgp/security measurement and analysis: renesys
tcp analysis: icir, cambridge (frank kelly, glenn vinnicombe), ucla (paganini), caltech
dns: gatech (ewz), mit (hari), caida
passive measurement/analysis: sprint, icir, caida, att
dragonflies/tortoises, self-similarity
macroscopic analysis: dns roots, dos attacks
invariance of rtt spectrum 
evolution of bgp system at AS, prefix, and IP granularities
introduce semi-global prefix and related taxonomy
dispel myths of bgp growth and churn (rates and contributors)
route dampening considered harmful
using bgp beacon (unused prefix announced and withdrawn at well-known times)
lots about topology/policy/event inferences from bgp data
patrignani, gao, schulzrinne, nicol, feamster
bgp convergence and scalability (anja&olaf)
udp/tcp performance during bgp update activity (avi)
comparison of routeviews and ripe data for bgp analysis (agilent, caida)
data sets seem congruent, both need improvement/coordination
traffic modeling

Slide 9: amazing++: ISPs even sort of care now

amazing++: ISPs even sort of care now

   (can't say the recession hasn't helped raise awareness of the value of data) 
example nanog meeting, oct 2002 (see nanog.org videoarchives) 
route dampening considered harmful (again)
damage in the DNS system 
bgp behavior under stress (lixia et al)
http://www.cs.ucla.edu/~lanw/paper/imw02-bgp.ps
bgp enhancements to prevent persistent route oscillations
http://www.nanog.org/mtg-0210/ppt/sue.pdf
feds want to secure cyberspace (office of cyberspace security)
http://www.whitehouse.gov/pcipb/
`how to own the Internet' 
panels on measurement, complexity/robustness spiral 
http://www.nanog.org/mtg-0210/complexity.html
http://www.nanog.org/mtg-0210/measurement.html
raised more questions than answered ( <- a good thing )
scriptroute: 'public measurement facility' proposal
www.scriptroute.org
now specifically solicit proposals from research community

Slide 10: nanog (oct2002): ISP/researcher interaction

nanog (oct2002): ISP/researcher interaction 


bgp enhancements to prevent persistent route oscillations
major factors for route oscillation:
dependency of IBGP updates
sometimes circular dependency
partial information by RR or confederation
withdraws (over reduction) amplifies the issue
partial order (due to MED) in route selection
suggestions: modify route reflection spec, allow advertisement of multiple paths
issues:
non-deterministic routing can be demonstrated within full mesh topology
MEDs prevent reasoning about routing policies 
though existing references encourage use of MEDs to influence inbound policies
how much are we facing limitations of distance vector protocol and inherent limitations of same
can't tell what tie-breakers cause route selection..

Slide 11: vendor example: measurement reveals bugs

vendor example: measurement reveals bugs

dns updates for private address space leaking up to roots
spectroscopy analysis of RFC1918 updates
RFC1918 updates coming from DHCP/nameservers  
millions a day getting to root name servers (whee)
51.4M updates in 86.5 hr = 10,000 per minute = 165 per second

weekday, weekend patterns; weird spikes at midnight local time
4 in the US, 3 in Asia, 2 in Europe
can see that Asians work on the weekend   
can see that Europeans and Asians get to work on time

Slide 12: ... global RFC1918 damage in DNS system

... global RFC1918 damage in DNS system


rare to get macroscopic Internet data so radically broken
who is trying to update the roots anyway?
dsl, cablemodem, small population providers, developing countries
verified that vast majority derive from two OSes: Windows 2000 and Windows XP 
majority of updates from sources that send them constantly
bulk of workload from contributions of medium size, not mice/elephants
most source IP addresses are of home and small business users (owned by individuals, not organizations)
connected to the Internet via cable, DSL or phone-based ISPs
majority using software with default vendor settings
academic, corporate, backbone networks contribute little rfc1918 update traffic

Slide 13: ...global threat arising from single vendor

...global threat arising from single vendor 


combination of Microsoft software features & misconfigurations essentially causing a slowly paced massive distributed denial of service (DDOS) attack on the root name server system

current state of fielded desktop software poses substantial & increasing burden on (if not threat to) the robustness of the global Internet

software and setups affecting global systemic Internet stability must be designed more carefully wrt potential effects of:
software engineering decisions
misimplementations
misconfigurations

measurement can make a huge difference

Slide 14: optimism (no, really)

optimism (no, really)


    still the case: security, performance, configuration, 
    and fault management lack both effective solutions, 
    as well as an apparent lack of people able to state 
    a concise problem to be solved.
      [ok so that wasn't the optimistic part...]


researchers do have a valuable role to play 
(but we have to get out of our labs)

Slide 15: measurement needs (logistics)

measurement needs (logistics)


performance 
e2e measurements must be relevant to the user
bandwidth estimation tools 
single source vs two end point
available bw vs capacity
calibration against real paths w cross-traffic (requires link and other info from multiple providers)
infrastructural heterogeneity rendering harder by the minute

workload
longer traces 
tcp/ip headers, w options, other headers if possible
at least 24 hours 
concurrent at several places
several across time for trend analysis
(buy stock in RAID vendors now)
traffic matrices
same constraints as above
diverse locations
disa, enterprise, university, backbone, peering point, cdn's (e.g., akamai)

topology 
large scale traceroute coverage
overcoming problems in traceroute methodology
tracking topology at other layers
IPv6

Slide 16: measurement needs (logistics)

measurement needs (logistics)


routing
strategically designed interdomain routing data collection
constant, sufficiently large, diverse, and representative set of peers
stability across time
IGP updates (requires cooperation/release from ISPs)
configured software parameters from ISPs, e.g MRAI
other implementation details from vendors 
security
anomaly detection, traceback
wireless
headers, location, signal strength
needs further requirements analysis
sensor
johnh's going to cover

macroscopic monitoring
[how to do] macroscopic monitoring of DNS, BGP system
network telescopes for dos attack tracking

correlation
need repository support

Slide 17: also needed: analyses of causes of damage

also needed: analyses of causes of damage 


   no rigorous study exists of root causes of Internet 
   performance problems/outages. anecdotal survey
     (courtesy sean donelan nanog post):

                 1. network engineers (what's this command do?)
                 2. power failures (what's this switch do?)
                 3. cable cuts (backhoes, enough said)
                 4. hardware failures (what's that smell?)             
                 5. congestion (more bandwidth! Captain, I'm giving you all she's got!)
                 6. attacks (malicious, you know who you are)
                 7. software bugs (your call is very important to us....)
  


          //    ``I prefer the wicked rather than the foolish.
                the wicked sometimes rest.'' - alexandre dumas   //

Slide 18: panelist questions

panelist questions 

(1) how would you characterize past progress and current efforts to 
    characterize, understand or model behaviors of large-scale systems, 
    such as the Internet, where the artifacts being studied are complex 
    and require more detailed understanding of traffic behavior?

(2) it has been argued that `good data outlives bad theory'.
     what are impediments to acquiring a representative set of data points 
     that capture Internet behavior in some significant way and 
     what long-term vision is needed to overcome these impediments?

(3) the challenge of scale, coupled with the lack of configuration-specific  
    knowledge, calls for a more precise formulation of the measurement 
    challenge and associated research directions.  what role would the use 
    of inference techniques and simulation play in yielding insights into 
    behavior of current or future Internet protocols (transport, routing, etc)?
    what type of infrastructure must be available for resarchers not only
    to address current questions but new questions that cannot be 
    adequately addressed by current tools?

Slide 19: action items for research community

action items for research community

culture of & passion for sound measurement, as science & discipline
measurements need pedigrees describing them, how to navigate
audit trails, portable analysis scripting language to support reproducibility
well-managed meta-data
understand sampling implications and technology better
anonymization tools & reduction agents
simulation
needs way to calibrate against real data (broken record)
safety tip: still has mostly no respect from providers
analysis 
find ways to assess opex versus capex of any new idea
or at least don't render it impossible to do so later
as scott encourages 'let's try to formalize some of what we (and providers) know'
as dave clark encourages 'it's about the $$$, stupid'
safety tip: providers do lose their patience w this research @#$%

continued/increased interaction with providers and vendors
nanog, ietf 
switch, router, measurement hardware vendors

   take advantage of the industry's regrouping efforts and 
   inclination to listen.  who knows how long it will last.

Slide 20: action items for funding/publishing agents

action items for funding/publishing agents


willingness to invest $$$$ in measurement and databases
software, storage, communications for collaboration, data distribution
frustrating [perception] that it takes funding away from `real' research
encourage tech transfer from other disciplines who have done it 
willingness to bounce manuscripts that don't include raw data + scripts
willingness to publish `mundane' work on measurement management
emphasis on reproducible results

encourage research projects that involve provider/vendor cooperation
encourage researchers to attend industry meetings (nanog, ietf)

   good news: there has never been a better time for this 
overheard 'the recession is stay of execution for the bgp routing system'

Slide 21: challenges in Internet measurement

challenges in Internet measurement





results of breakout session on 
11am fri 10 jan 2003 
(next 5 slides)

Slide 22: challenges in Internet measurement

challenges in Internet measurement

motivating vision: self-aware network

cultivate culture of sound measurement as science & discipline
measurements need pedigrees describing them, how to navigate
audit trails, portable analysis scripting language to support reproducibility
well-managed meta-data
understand sampling implications and technology better
anonymization tools & reduction agents

more strategic measurement, guided by rather than constraining research questions
what data is missing and how do we strategically optimize the return on investment in data collection and instrumentation 
recognizing that we don't always know what questions will be asked next year
routing, topology, passive, active
improved standardized interface to data archives

Slide 23: challenges in Internet simulation & analysis

challenges in Internet simulation & analysis


mathematical frameworks to find structure/patterns in traffic
a la scott's encouragement to `formalize some of what we (and providers) know' 
macroscopic as well as microscopic
theory of joing spatial/temporal locality
spectroscopy, tomography
ietf/ippm has been trying for a few years, but without dedicated funding

source modeling (for realistic inputs into simulations, models)
extract a set of source models from an aggregate trace 
feature extraction problem
10,000 gnutella port numbers are not 10,000 flows
ultimate goal: augment libraries of source level models w generation of own
calibrate models by evaluating their power for prediction

empirically validated simulation of a significant aspect of the Internet
already much work in large-scale simulations, but no recognized empirically validated simulation of any signficant piece of the Internet.
requires cooperation from providers and vendors to get default and configured parameters of OSes and algorithms.  NSF should shepard/foster this cooperation
(note: large scale means in size as well as # of protocols)

Slide 24: challenges in Internet meas. & analysis

challenges in Internet meas. & analysis

don't forget the real world
analyses must incorporate expense ratios (opex/capex) into tradeoffs where possible
need to get/keep relevant to providers

systematic studies of outages
assessment of various causes of damage

tools and techniques to tie user-perceived performance w/ network measurement, with statistically significant results 
function of control plane, routing plane, server on other end.
including bugs in routers, servers, client stacks
correlating user experience with events that happen in the network
incorporate into network performance models
validation of tools still doesn't exist
use at truly large scale still doesn't exist
dave clark's `why?' tool (non network geek compatible) still doesn't exist

Slide 25: other challenges in Internet measurement

other challenges in Internet measurement


discovering pervasive hidden bugs
any modeling or analylsis must also handle the impact of this huge component of traffic

how does measurement affect/support security goals
infer bgp, firewall, and virus spread behavior
how do you get networks to share security-related information 
protection of measurement infrastructure from security compromises

measurement specific to optical, wireless and sensor networks
especially assessment of new application domains (for wireless/sensor)

encouragement of strategic measurement in new networks
based on what we learned from what we did wrong in old networks

Slide 26: payoffs to Internet measurement

payoffs to Internet measurement


improve accuracy, validity, repeatability of network research

provide reference points or baselines for simulation and model validation
other fields, e.g., architecture, have had this for years

build a solid understanding of network behavior
including subtleties not otherwise detected
including damage not otherwise detected

accelerate present and future modeling, simulation, and analysis efforts 
avoid duplication effort


// scientific apparatus offers a window to knowledge,
but as they grow more elaborate,
scientists spend ever more time washing the windows. 
-- Isaac Asimov //

Slide 27: // disorder increases with time

// disorder increases with time 
because we measure time in the 
direction in which disorder increases //
                -- stephen hawking

www.caida.org

Related Objects

See https://catalog.caida.org/media/2003_nsfpi0301/ to explore related objects to this document in the CAIDA Resource Catalog.