Internet Statistics and Metrics Analysis:
Engineering Data and Analysis
Workshop Report
August 31 - Sept. 1, 1998
San Diego Supercomputer Auditorium
Contents
The Internet Statistics and Metrics Analysis: Engineering Data and Analysis workshop was an invitational meeting for individuals involved in developing or deploying Internet traffic measurement or analysis tools associated with backbone engineering. Thirty-nine (39) people attended, representing Internet service providers (ISP), the research and education (R&E) community, and vendors. The meeting was held at the San Diego Supercomputer Center (SDSC) on the campus of the University of California, San Diego (UCSD). The meeting was sponsored by the Cooperative Association for Internet Data Analysis (CAIDA), with a reception co-sponsored by TCG CERFnet and Cisco Systems.
The goals for the meetings included clarification of ISP requirements for Internet statistics and metrics that support backbone engineers ability to:
The meeting focused on the collection, analysis and visualization of three forms of Internet traffic data: passive (link-specific) data, active (end-to-end) data, and BGP routing data. Passive measurements involve the collection of traffic information from a point within a network, e.g., data collected by the router or switch or by an independent device passively monitoring traffic as it traverses a network link. Common forms of passive monitoring range from collection of utilization or traffic flow information directly from the switch or router to statistics collection by RMON-like probes or Coral (ocXmon) monitors. Active measurements involve the introduction of traffic into the network for the purpose of monitoring performance between specific endpoints. Active measurement techniques are often useful to network engineers in diagnosing network problems; however, most recently their application is by network users or researchers in analyzing traffic behavior across specific network paths.
The sections that follow describe highlights from these three topic areas, as well as a discussion of the role of Internet exchange points in collecting, analyzing, and providing services relating to Internet traffic data.
A. Collection and Analysis of Passive Traffic Data (discussion questions)
Presentations and discussions of passive monitoring and analysis focused on two areas:
Analysis of Passive Data
David Moore, UCSD/CAIDA delivered a presentation on Coral: A flexible platform for network monitoring and Daniel McRobb (UCSD/CAIDA) described CflowD and ARTS++, tools for analyzing and storing netflow data exported from Cisco routers.
Coral is a platform for non-intrusive (passive) monitoring and analysis of Internet traffic. Full trace capture or traffic flow summaries are available on a variety of media, from Ethernet or FDDI to OC12, through tools developed by MCI/vBNS, the National Laboratory for Applied Network Research (NLANR) and CAIDA. The coral/ocXmon family of monitors use optical splitters to tap fiber, filtering 5-10% of the light signal to interface cards in the coral monitoring host. Flow analysis allows answering questions regarding basic traffic characterization, matrices of traffic flow by country or Autonomous System, traffic import and export tables, and routing/address space coverage. Other (non-flows-based) analyses include: interarrival time behavior, protocol-relevant (TCP retransmissions/dups, packet size distributions), and security/vulnerability protection applications.
Examples of key forms of analyses available through Coral data include:
Comments on Coral-related analyses should be sent to David Moore at info @ caida.org.
CflowD is host software used to collect Cisco version 5 flow-export data and aggregate the raw data into tables, allowing continuous collection of summary data in time series. Daniel McRobb developed the original version of cflowd while at ANS, in support of backbone capacity planning and trend analyses. An enhanced version of cflowd and its ARTS++ binary file format library is now under development by CAIDA; the software package currently supports tracking and visualization of AS matrices, net matrices, port and protocol tables. Simple display utilities include: summary dumps based on Autonomous Systems, networks, ports, and protocols, all in ARTS-format. cflowd uses utilities based on XRT/PDS for graphs and plots and JClass software for charts. Open questions involving this form of analyses include:
CAIDA will put up cflowd web page at http://www.caida.org/tools/measurement/cflowd in late October 1998. A mail list is now available for discussing cflowd issues (cflowd-request @ caida.org, with "subscribe" in the body of the message).
Participants agreed that trend characterization such as that available through Coral and cflowd tools are important to capacity planning (of primary interest to ISPs) and to enhancing our understanding of new Internet protocols/applications, e.g., streaming media (video & audio), voice over IP, DNS authentication, IPv6, etc. Evaluating effects of non-conforming traffic for congestion and avoidance purposes are also important, e.g., TCP accelerators may require use of more sophisticated queuing techniques. In general, however, the actual benefits of analyses of passive data have been poorly articulated to the ISP, vendor, and user communities. Participants suggested stronger efforts to better define the relevance and use of these data and the tools (described below).
Passive Measurement Tools
Joel Apisdorf, MCI/vBNS, provided an overview of plans for an OC48 passive monitoring device, intended to support collection of packet, header, and flow traces, and compute flow aggregate statististics on a full 2.488Gb OC48c link (roughly 300 MB/s in each direction) via fiber optic splitters. Switches would enable users to select a DWDM channel or link to monitor.
Unforunately, currently available chips are too slow to support host bus requirements for an OC48mon, e.g., the PCI 33 MHz 64-bit chip (264MB/s) is too slow; PCI 66 MHz 64-bit and AGP 2X 132 MHz 32-bit are 528 MB/s bursting only. Obtaining components in small quantities for prototype development is also proving problematic.
Participants expressed a strong desire for an OC48 measurement capability soon, however, the potential cost (over $100,000), size (15" rack space), and power requirements of the proposed design will likely deter significant deployment by ISPs. Participants agreed that if flow export capabilities (e.g., Netflow, and related vendor implementations), were available at OC48 speeds, then separate monitors would be useful mostly for research applications rather than operational use in ISP backbones. The planned realtime playback feature of the OC48mons, however, was perceived as an important and unique attribute, presumably not available via router alternatives.
Key drivers with respect to the operating system for OC48mon or other monitors include: manageability, level of security and cost to maintain (e.g., labor). A mailing list for discussion of OC48mon development issues has been set up. To join, send mail to oc48mon-request @ caida.org, with the word subscribe in the body of the message.
David Rowell (Cisco) described NetFlow Switching: Router Based Statistics and plans for continued development and application of statistics functionality to Cisco routers. NetFlow statistics are used by networks for planning, billing and troubleshooting purposes. Netflow export format version 5 uses either Cisco's CEF-based flow switching or the route cache for route information (note that among other enhancements, CEF resolves historic problems relating to caching of source AS and host route information).
Rowell described recent enhancements and current development efforts involving on-the-router flow aggregation, export filtering, reduced size export records, and application to the GSR router. Under consideration are additional aggregation schemes, AS path export, and a reliable transport mechanism (UDP datagrams are used in version 5). Areas where Rowell (drowell @ cisco.com) is seeking input from ISPs include:
Much of the discussion revolved around the need for common definitions and specifications regarding what key data elements are required/desired from passive measurement devices. Indeed, no common definition of flow is in use today; vendors implement definitions that are often unique to their specific hardware or implementation; research and measurement entities use alternative definitions of flow ranging from timeout-based to TCP SYN/FIN-based to QOS-measurement-based definitions. Participants agreed that to make measurements across vendors and platforms most useful, specification of common definitions and measurement priorities is important. Other features meriting clarification include: acceptable practices with respect to artificially timing out flows; types of data to collect (real-time vs. trend data); frequency of data collection (sampling vs. constant monitoring); and identification of user-specified knobs/functions vs. vendor specified implementations. Note that each of these values must be framed according to the specific use or application of the data; e.g., capacity planning vs. problem identification/diagnosis vs. identifying/tracking denial-of-service attacks vs. billing/accounting.
NOTE that since ISMA, CAIDA has taken steps toward drafting and editing a Measurement Specification for Routers. A strawman document is posted at http://www.caida.org/tools/measurement/measurementspec/. A mailing list is now available to facilitate community discussion and development of this specification. To participate in these efforts, send mail to meas_spec-request @ caida.org, with subscribe in the body of the message.
Many ISPs attending ISMA use Cisco's netflow data to support their own capacity planning. The flexibility of having traffic measurements acquired directly from the router is highly desirable, they explained, more realistic than trying to deploy and support/maintain an independent monitor. This flexibility is particularly critical given current high cost of co-location space in peering facilities and the general inclination of ISPs to minimize non-essential equipment on their infrastructures.
Use of these data and specifics of data collection were discussed at some length. Ideally, the group agreed, measurement functionality of routers should be implemented in a manner that:
Participants also urged that efforts be undertaken to compare/benchmark the various passive measurement tools. Specifically, participants suggested
B. Active Measurement Data (discussion questions )
Discussions of active measurements touched on an almost religious argument as to the relevance or benefit of end-user measurements and network problem diagnosis to ISPs. Presentations of existing active measurement infrastructure initiatives were made by Jamshid Mahdavi (PSC) on NIMI, Will Leland (Bellcore) on Felix, Matt Zekauskas (Internet2) on CSG/Surveyor, and Daniel McRobb (UCSD/CAIDA) on Skitter.
The NIMI, Felix and Surveyor projects are designed to provide infrastructure for active performance-oriented measurements. Surveyor is a tool for implementing one-way delay and packet loss metrics as defined in IETF/IPPM RFCs. Its primary uses are in problem determination, engineering (analysis of loads and traffic trends), as feedback to advanced applications users, and in monitoring QoS. Surveyor systems use a TrueTime GPS antenna and send data to a central repository for analysis and reporting purposes. Surveyor monitors are currently located at 28 university campuses throughout the U.S. and abroad, reflecting 623 paths. One-way measurements employ a Poisson schedule averaging a measurement every 10 minutes, followed by a traceroute to identify the path associated with specific measurements. Future deployment plans include placing a monitor at each Abilene (Internet2) backbone sites.
Bellcore's Felix project, funded by the Defense Advanced Research Projects Agency (DARPA), seeks to develop a prototype monitoring infrastructure to provide information about the health of large networks, without requiring prior knowledge of network topology or routing information. Topology and performance information gathered through this prototype infrastructure is intended to facilitate automatic detection of network faults and anomalous behavior. One of the most complicated features of this effort is the development of Linear Decomposition Algorithms (LDA) for topology discovery and performance evaluation of specific network elements. Initial deployment of monitors is underway with current efforts focused on formulating LDAs to determine simple topologies characteristics. (see http://govt.argreenhouse.com/felix/)
The National Internet Measurement Infrastructure (NIMI) is a project at the Pittsburgh Supercomputer Center (PSC) and Lawrence Berkeley Labs (LBL) designed to develop an infrastructure approach for measurement of the global Internet. NIMI is designed as a distributed infrastructure with probe machines situated throughout the Internet. These probes are to be used to monitor end-to-end (probe-to-probe) performance of the Internet to assess how well it continues to meet the needs of specific end-user groups and to be used to localize faults to particular pieces of the national Internet infrastructure. Measurement tools are treated as external packages to the NIMI probes, thereby allowing additional tools to be added easily. Current tools running on prototype probes at LBL, PSC, SLAC, FNAL, and CERN include Traceroute (measuring the paths); TReno (measuring bandwidth and packet loss); and Poip(measuring loss and delay). Near-term priorities for NIMI include: completing/releasing NIMI v.2.0, developing NIMI's Auto-Upgrade capability, implementing a Public Key Server for security, developing Auto-Mapping features, and developing new tools, e.g., for multicast measurements and passive measurements. Funding for the NIMI initiative is provided by the National Science Foundation (NSF) and the Department of Energy (DOE). Organizations interested in hosting NIMI monitors or otherwise participating in the project should contact Jamshid Mahdavi (mahdavi @ psc.edu). For more information on the project, see http://www.psc.edu/networking/nimi/index.html)
Somewhat distinct from the measurement initiatives described above, the Skitter effort focuses on macro-level analysis of the Internet. Skitter measures the forward IP paths from a single source to many destinations using traceroute-like incrementing of the TTL of each hop. Key goals for this tool are to identify and track routing behavior, e.g., providing indications of low-frequency persistent routing changes, and to assist in dynamic discovery of network connectivity through probing paths to destinations spread throughout the IPv4 address space. A secondary objective for Skitter data is to collect round trip time for the paths to each of these destinations for analysis of general trends in Internet performance.
Currently, CAIDA is using Skitter to monitor more that 23,000 destination hosts from five source monitors in the U.S. Preliminary visualization efforts suggest that two-dimensional depictions of large data sets have minimal utility for analysis of connectivity and performance. Three dimensional visualizations of select datasets and development of tools and scripts to analyze and correlate data from multiple source monitors and from passive monitors and routing tables are key project goals over coming months. Organizations interested in hosting a Skitter measurement host or otherwise participating in the project should contact Nancy Bachman (nlb @ caida.org). For more information on Skitter, see http://www.caida.org/tools/measurement/skitter.
Much of the discussions during this session centered on customers' use of active tools to measure Service Level Agreements (SLAs) with their ISP and upon the emergence of third party measurement groups and tools such as Keynote, Inverse, VitalSigns, NetMedic and others. Participants acknowledged growing influence of `published' information about ISP performance. Attendees agreed that there is a need for both:
(a) ISPs (as a community) to define acceptable metrics for public measurement and evaluation of their performance and for
(b) the development of tools to assist ISP Customer Service Departments in responding to claims about their network's performance, e.g., tools that possibly incorporate both active and passive measurement data.
Participants also suggested that in identifying the location of perceived network bottlenecks, customers should start by examining utilization of their tail circuit, e.g., if utilization is high, then bottlenecks may be alleviated by upgrading the circuit. Including passive measurement features in browsers (e.g., frames per unit time) was also recommended as an important means of providing meaningful performance data to both users and providers.
C. Analysis and Visualization of Routing Data
(discussion questions)Craig Labovitz (University of Michigan/Merit) and K.C. Claffy (UCSD/CAIDA) delivered presentations on analysis and visualization of routing data by IPMA and by CAIDA's Otter A visualization tool. Claffy also described recent analyses of AS adjacency relationships using BGP routing tables.
The goal of the Internet Performance Measurement and Analysis (IPMA) RouteTracker project to examine BGP, OSPF, ISIS and RIP routing behavior. Craig noted several interesting observations since his May 1997 ISMA presentation relating to the order of magnitude decrease in routing updates as witnessed by the RouteTracker. The fact that most pathological routing behavior can now be explained is also an indication of increased stability in this area. He also noted the slight increases in BGP announcements due to policy and the existence of some persistently oscillating routes. Graphical depictions of the data collected by the IPMA project were also presented. Additional information is available at http://www.merit.edu/ipma.
K.C. Claffy described the application of the Otter tool to Internet routing tables. Otter is a general purpose, 2-dimensional, Java-based network visualization tool utilizing graph layout algorithms permitting geographic, topological, and IPclustering depictions. Several examples of individual and multiple network routing relationships were presented. Otter is also being used to depict SNMP data, web cache hierarchy relationships, MBONE traffic, and web tree structures. Next steps in its development include adding temporal features (both animation and simulation) and integrating Otter with real-time data collected by remote probes. Claffy indicated that significant work is needed in the field of visualizating data relating to traffic characterization, traffic performance information, router queue lengths, IPv4 information, routing (AS) relationships and topologies. Additional information on the Otter tool is available at http://www.caida.org/tools/visualization/otter.
Claffy also reviewed recent CAIDA efforts relating to analysis of BGP routing tables. While there are no mappings or tools that enable straightforward analysis of the full Internet infrastructure, examination of autonomous system (AS) numbers in BGP-based routing tables provides indications of how networks and ISPs are interconnected. Examples, using the University of Oregon's RouteViews data (see: http://www.antc.uoregon.edu/route-views), were used to illustrate interconnections between AS-based paths from given Internet routers to target network/mask pairs (i.e., destination networks). Relative connectivity among networks can be depicted by illustrations of which AS advertises a specific network.
Participants noted the importance of accurate depictions of routing relationships, suggesting that a central repository be established archiving data from the various routeview projects. Participants also noted that existing visualizations still have minimal applicability to ISPs, agreeing that significant challenges remain before visual presentation of complicated, voluminous traffic data is useful for engineering analysis.
D. Role of Internet Exchange Points (discussion questions)
Steve Feldman (Worldcom) delivered a presentation on Measuring MAEs. Measurements at Network Access Points (NAPs) is necessary for obtaining customer utilization data, resource utilization relating to the NAP infrastructure, detection of fault conditions, analysis of NAP performance, and overall trend analysis. It is also important, Feldman explained, for characterization of Internet traffic for:
Traffic characterization data also has potential for supporting service-level agreements and for usage sensitive billing, e.g., settlements. However to date, no NAP is offering this form of service. Feldman also reviewed the tools used to monitor FDDI and ATM connections at the MAEs, ranging from SNMP polling, to circuit accounting, to Coral monitors (OC3mon and OC12mon).
While participants felt that private peering is the favored means of communicating between large ISPs, they recognized that the importance of NAPs is likely to continue. New services, such as caching and forms of customer-requested measurements may play a growing role in successful NAPs of the future. The potential of NAPs as neutral sites for peering settlements was also discussed, with participants divided on the subject. [Note that participants expressed the view that settlements and billing issues are very sensitive topics currently, with increasing scrutiny by ISP management and regulators. According to participants, ISP engineers (such as those participating in ISMA) have a limited direct role in these business/policy topics.]
E. Other
Other discussion topics during ISMA included a Backbone Engineers Panel (see discussion questions) and a discussion of requirements for correlating the various forms of traffic data. In the later case, discussions centered on both the importance and the nascent stage of these efforts, as well as the lack of focused resources being applied to this critical sector.
The agenda and specific presentations is available at ./agenda.html.
ISMA: Engineering and Data Analysis was an invitational meeting, targeting participation by select Internet engineers possessing hands-on experience in Internet traffic measurement and analysis. Thirty-nine (39) people representing ISPs, the R&E community, and vendors attended the meeting, see ./participants.html. The high caliber of these individuals and the limited attendance were essential ingredients to ISMA's success.
Additional focused ISMA meetings are planned by CAIDA. ISMA: Passive Measurements & Analysis is scheduled for January 14-15, 1999, and meetings on Visualization of Internet Traffic and Active Measurements and Analysis are expected. For more information, contact info @ caida.org.
This ISMA was organized by Amy Blanchard of UCSD/CAIDA. Many thanks to the staff at SDSC and UCSD who contributed to ISMA's success. Thanks also go to TGI/Cerfnet and Cisco for sponsoring the meeting's San Diego Bay Cruise reception.
| for more information: info @ caida.org | last update: |
| this page: | |