unsettling admissions about dealing w data [courtesy vern paxson & david moore:] www.icir.org/vern/talks/vp-nrdm01.ps.gz www.caida.org/publications/presentations/2002/ipam0203/ bizarre behavior, misconfigurations, non-RFC, attacks, `impossible' behavior measurement tools lie (packet filters drop, reorder, replicate, miss due to routing) clocks can be arbitrarily off/moving, timestamps don't know accuracy, applied differently app-level measurement tools miss hidden network stuff (middleware, socket buffer parameters) asymmetric paths measurements made two different ways always disagree (anisotropic) even a single measurement may disagree (not atomic): routing tables, traceroute events ripple through network along trajectory that is unlikely fully instrumented measurements carry no indication of quality measurements lack meta-info (e.g., hostnames) representative data points - there is no typical on the Internet analysis results not reproducible we lack a culture of calibration large-scale measurements required for representative/longitudinal analysis overwhelm our current methods archived data often ad hoc, corrupt, truncated, poorly documented. unnavigable lack of historical data renders it difficult to assess trends alas, people do it anyway, see kc's myths talk (or any trade rag) Internet measurement, although too hard, is too easy not enough data and too much data we don't yet know how to measure real traffic in the core speed, sampling, anonymization can't keep up with media in core (oc12 monitor arrives right after upgrade to oc48)