MagicPoint presentation foils
Archived MagicPoint presentation slides, compiled into a single PDF document.
2003_lsn20030610.pdf (26 slides, 2.7 MB)
Slide text transcript
Slide 1: priorities and challenges in
priorities and challenges in
Internet measurement,
simulation, and analysis
problems that remain persistently insolvable
should always be suspected as
questions asked in the wrong way.
-- alan watts
10 june 2003
lsn meeting
kc
Slide 2: objective of this talk (per LSN request)
objective of this talk (per LSN request)
identify R&D gaps
large-scale deployment issues for federal agencies that fund network research
DOE, NSF, DARPA, NASA, NSA, NIST
themes
develop measurement as a respected field of science and engineering
address relationship/rift between research and operations
acknowledge problems persistently unsolvable in current paradigm
in the face of brilliant expectations, demand, and opportunities,
the need for network self-awareness is crossing a threshold
for both the high and low end
recession combined w/heightened consciousness of homeland security creates an inflection point in the opportunity for U.S. policymakers to make a positive difference
Slide 3: outline of this talk
outline of this talk why Internet provision market forces are torqued how to make the problem worse effect on research community why policy makers have a vital role to play now more than ever example of perfectly normal accident why it matters long-term challenges in Internet measurement 12-step program as outlined by NSF ANRI PI meeting measurement breakout group near-term community gaps in measurement concrete 1-2 year tasks action items for research community (slide 24) action items for funding agencies (slide 25)
Slide 4: not our father's Internet
not our father's Internet radical changes in last 20 years capacities increased by 5+ orders of magnitude penetration universities -> worldwide commercial and residential 170M+ hosts usage academic -> commerce, government, entertainment, porn, spam, broken traffic not necessarily in that order trust models cream cheese -> swiss cheese service models best effort -> VoIP, e2e model, VPN except w.o definitions e2e architecture guiding architectural principle -> historical artifact still the most disputed of Internet holy ground
Slide 5: accidental funding model of Internet
accidental funding model of Internet more radical changes originally: funding was for technology not infrastructure NSF took up infrastructural gauntlet for R&E community put it down when it was already too hot to touch built and deployed architecture with inability to account for costs with any degree of granlarity inconvenient when we finally needed someone to pay for it result: market forces are badly torqued
Slide 6: result: market forces torqued
result: market forces torqued R&E casualties of Internet industrial (!r)evolution spam multicast qos DOS attacks routing announcements address space UDP, unfriendly TCP death of e2e principle software bugs _normal accidents_ (charles perrow) measurement (meta-issue, integral to all of the above) worse than `tragedy of the commons': active disincentives to fix competitive pressure, pricing models, privacy, constitutionality, technology, lack of standards, dearth of capital
Slide 7: ways to make the problem worse
ways to make the problem worse make network management research seem really boring lend measurement no respect as a field in and of itself don't fund it don't require any real data from providers let market forces take care of it if any free market has proven incapable of taking care of itself, the Internet is it. if any system has proven itself more expensive to not measure than to measure, the Internet is it see IAB recommendations regarding Internet research & evolution see global recession
Slide 8: what this problem is not
what this problem is not not a public park not libraries not the phone system not the electricity grid not the railroad not the highway system not tragedy of the commons not cathedral, not bazaar these analogies all break down. this is something new. // The significant problems we face cannot be solved by the same level of thinking that created them. --Albert Einstein //
Slide 9: inherent operating constraints of constituents
inherent operating constraints of constituents vendors provide minimum measurement functionality (snmp, netflow) no incentive to do more since it is expensive and does not sell boxes providers dis-incented to share data with others due to competitive and privacy concerns no apparent payoff (beyond what they do already) software engineering (operating systems, applications, routers) lack Internet systems perspective renders Internet vulnerable or non-optimal measurers/researchers oblivious to real world of costs, e.g. opex/capex ratios largley unable to target measurements at immediate provider problems struggle with interpretation of non-standard data sets users as if we could forget if policy makers ever had a role to play anywhere.. (manipulator of market forces to protect/enhance consumer welfare)
Slide 10: unsettling admissions about dealing w data
unsettling admissions about dealing w data [courtesy vern paxson & david moore:] www.icir.org/vern/talks/vp-nrdm01.ps.gz www.caida.org/publications/presentations/2002/ipam0203/ bizarre behavior, misconfigurations, non-RFC, attacks, `impossible' behavior measurement tools lie (packet filters drop, reorder, replicate, miss due to routing) clocks can be arbitrarily off/moving, timestamps don't know accuracy, applied differently app-level measurement tools miss hidden network stuff (middleware, socket buffer parameters) asymmetric paths measurements made two different ways always disagree (anisotropic) even a single measurement may disagree (not atomic): routing tables, traceroute events ripple through network along trajectory that is unlikely fully instrumented measurements carry no indication of quality measurements lack meta-info (e.g., hostnames) representative data points - there is no typical on the Internet analysis results not reproducible we lack a culture of calibration large-scale measurements required for representative/longitudinal analysis overwhelm our current methods archived data often ad hoc, corrupt, truncated, poorly documented. unnavigable lack of historical data renders it difficult to assess trends alas, people do it anyway, see kc's myths talk (or any trade rag) Internet measurement, although too hard, is too easy not enough data and too much data we don't yet know how to measure real traffic in the core speed, sampling, anonymization can't keep up with media in core (oc12 monitor arrives right after upgrade to oc48)
Slide 11: in spite of it all, amazing stuff has emerged
in spite of it all, amazing stuff has emerged recession has helped raise awareness of the value of data (see nanog 2002-2003 meetings, nanog.org videoarchives) DNS: damage in the DNS system BGP: route dampening considered harmful (again) behavior under stress (lixia et al) non-deterministic routing can be demonstrated w/in full mesh topology enhancements to prevent persistent route oscillations partial order (due to MED) in route selection MEDs prevent reasoning about routing policies though existing references encourage use of MEDs to influence inbound policies how much are we facing limitations of distance vector protocol and inherent limitations of same lots more examples upshot: environment more receptive to collaboration than ever
Slide 12: normal accidents
normal accidents: rfc1918 traffic sent to roots updates for private addresses leak outside local domains spectroscopy analysis of RFC1918 updates coming from DHCP/nameservers tens of millions a day coming to blackhole servers 51.4M updates in 86.5 hrs = 10,000/min = 165/s. up to 1200 updates/s (nov02), up to 29 pkts/update weekday, weekend patterns; weird spikes at midnight local time (4 in US, 3 in Asia, 2 in Europe) can see that Asians work on the weekend, Europeans not so much can see that Europeans and Asians get to work on time
Slide 13: ... global RFC1918 damage in DNS system (2)
... global RFC1918 damage in DNS system (2) rare to get macroscopic Internet data so radically broken who is trying to update the roots anyway? dsl, cablemodem, small population providers, developing countries verified that vast majority derive from two OSes: Windows 2000 and Windows XP majority of updates from sources that send them constantly bulk of workload from contributions of medium size, not mice/elephants most source IP addresses are of home and small business users (owned by individuals, not organizations) connected to the Internet via cable, DSL or phone-based ISPs majority using software with default vendor settings academic, corporate, backbone networks contribute little rfc1918 update traffic
Slide 14: ...global threat arising from single vendor
...global threat arising from single vendor combination of Microsoft software features & misconfigurations was essentially causing a slowly paced massive distributed denial of service (DDOS) attack on the root name server system current state of fielded desktop software poses substantial & increasing burden on (if not threat to) the robustness of the global Internet software and setups affecting global systemic Internet stability must be designed more carefully wrt potential effects of: software engineering decisions misimplementations misconfigurations measurement can make a huge difference
Slide 15: optimism (no, really)
optimism (no, really)
still the case: security, performance, configuration,
and fault management lack both effective solutions,
as well as an apparent lack of people able to state
a concise problem to be solved.
[ok so that wasn't the optimistic part...]
researchers do have a valuable role to play
including because we are a(n albeit
relatively) trusted neutral party.
but we have to explore outside of our labs.
Slide 16: challenges in Internet measurement (1)
challenges in Internet measurement (1)
(12-step program, from NSF ANIR PI meeting breakout session, jan 2003)
motivating vision: self-aware network
cultivate culture of sound measurement as science & discipline
measurements need pedigrees describing them, how to navigate
audit trails, portable analysis scripting language to support reproducibility
well-managed meta-data
understand sampling implications and technology better
anonymization tools & reduction agents
more strategic measurement, guided by rather than constraining research questions
what data is missing and how do we strategically optimize the return
on investment in data collection and instrumentation
recognizing that we don't always know what questions will be asked next year
improved standardized interface to data archives
Slide 17: challenges in Internet simulation & modeling(2)
challenges in Internet simulation & modeling(2) mathematical frameworks to find structure/patterns in traffic a la scott's encouragement to `formalize some of what we (and providers) know' macroscopic as well as microscopic theory of joint spatial/temporal locality spectroscopy, tomography ietf/ippm has been trying for a few years, but without dedicated funding source modeling (for realistic inputs into simulations, models) extract a set of source models from an aggregate trace feature extraction problem 10,000 gnutella port numbers are not 10,000 flows ultimate goal: augment libraries of source level models w generation of own calibrate models by evaluating their power for prediction empirically validated simulation of a significant aspect of the Internet already much work in large-scale simulations, but no recognized empirically validated simulation of any signficant piece of the Internet note: large scale means in size as well as # of protocols requires cooperation from providers and vendors to get default and configured parameters of OSes and algorithms. govt could shepherd/foster this cooperation
Slide 18: challenges in Internet analysis (3)
challenges in Internet analysis (3) don't forget the real world analyses must incorporate expense ratios (opex/capex) into tradeoffs where possible need to get/keep relevant to providers systematic studies of outages assessment of various causes of damage tools and techniques to tie user-perceived performance w/ network measurement, with statistically significant results function of control plane, routing plane, server on other end. including bugs in routers, servers, client stacks correlating user experience with events that happen in the network incorporate into network performance models validation of tools still doesn't exist use at truly large scale still doesn't exist dave clark's `why?' tool (non network geek compatible) still doesn't exist
Slide 19: other challenges in Internet measurement (4)
other challenges in Internet measurement (4) discovering pervasive hidden bugs any modeling or analylsis must also handle the impact of this huge component of traffic how does measurement affect/support security goals infer bgp, firewall, and virus spread behavior how do you get networks to share security-related information protection of measurement infrastructure from security compromises measurement specific to optical, wireless and sensor networks especially assessment of new application domains (for wireless/sensor) encouragement of strategic measurement in new networks based on what we learned from what we did wrong in old networks
Slide 20: near-term community gaps in measurement (1)
near-term community gaps in measurement (1) (high impact tasks for next 1-2 years) problem diagnosis & response detection, location, isolation & reporting macroscopically, DHS (homeland security)'s GEWIS problem global early warning information system DNS BGP not so near term, needs support from <next 4 slides> performance low impact bandwidth estimation ideally single source scaling to Gbits/s & new NICs (interrupt coalescing) calibration against real paths w cross-traffic e2e trouble-shooting toolkit 'why?' tool SLA validation large scale RTT distributions, publically available
Slide 21: near-term community gaps in measurement (2)
near-term community gaps in measurement (2) security traceback, forensics network telescope (backscatter analysis) automated worm response intrusion detection topology/routing `pop/router level map of the Internet' AS connectivity ranking refined methodology for topology and routing measurement strategically designed interdomain routing data collection constant, sufficiently large, diverse, and representative coverage stability across time IGP updates (requires cooperation/release from ISPs) configured parameters from ISPs, e.g MRAI, metrics tracking topology at other layers IPv6 aggregation, abstraction, and visualization techniques
Slide 22: near-term community gaps in measurement (3)
near-term community gaps in measurement (3) traffic characterization workload and flow analysis and modeling longer traces (at least 24 hours, diverse sites, times) longitudinal trend analysis (baseline) router/switch capabilites options at range of cost/capability wireless, sensor headers, location, signal strength correlation among data from massive nodes
Slide 23: community gaps in measurement (4)
community gaps in measurement (4) meta-problems firewalls, filtering, blocking (ports/apps) VPNs, layer2 archiving privacy/sharing interpolation/extrapolation validation correlation feedback to system at various granularities (dynamics and trends) new protocol development: routing, transport, IP feedback to future operational measurement methodologies
Slide 24: action items for research community
action items for research community culture of & passion for sound measurement, as science & discipline measurements need pedigrees describing them, how to navigate audit trails, portable analysis scripting language to support reproducibility well-managed meta-data understand sampling implications and technology better anonymization tools & reduction agents simulation need way to calibrate against real data (broken record) safety tip: still has mostly no respect from providers. diplomacy serves. analysis find ways to assess opex versus capex of any new idea or at least don't render it impossible to do so later as scott encourages 'let's try to formalize some of what we (and providers) know' as dave clark encourages 'it's about the $$$, stupid' safety tip: providers do often quickly lose their patience w researchers continued/increased interaction with providers and vendors nanog, ietf, caida ask for help from those who have succeeded switch, router, measurement hardware vendors show more interest than ever people don't trust `analyst estimates' anymore... good news in many ways take advantage of industry regrouping efforts & inclination to listen
Slide 25: action items for funding/publishing agents
action items for funding/publishing agents set measurement standards for procurement of federal agency networking services (qos, security) inter-agency measurement/qos agenda (start w cross-community workshop, might result in dedicated FTEs at each network..) new LSN working group, perhaps eventually program, focused on measurement willingness to invest $$$$ in measurement and databases software, storage, communications for collaboration, data distribution frustrating [perception] that it takes funding away from `real' research encourage tech transfer from other disciplines who have done it willingness to bounce manuscripts that don't include raw data + scripts willingness to publish `mundane' work on measurement management emphasis on reproducible results encourage research projects that involve provider/vendor cooperation and discourages ones that don't encourage researchers to participate in industry meetings and mailing lists (nanog, ietf) good news: there has never been a better time for this overheard 'the recession is a stay of execution for the bgp routing system'
Slide 26: // disorder increases with time
// disorder increases with time
because we measure time in the
direction in which disorder increases //
-- stephen hawking
www.caida.org

