What's Next for Internet Data Analysis?
Status and Challenges Facing the Community

k claffy, mailto:kc@caida.org
Tracie Monk mailto:tmonk@nlanr.net
University of California, San Diego /
Cooperative Association for Internet Data Analysis (CAIDA)
http://www.caida.org/


Abstract

Most large providers currently collect basic statistics on the performance of their own infrastructure, typically including measurements of utilization, availability, and possibly rudimentary assessments of delay and throughput. In today's commercial Internet, the only baseline against which organizations can calibrate their networks is past performance data; no data or even standard formats are available against which to compare performance with other networks or against an industry norm, nor are there reliable data with which customers can assess performance of providers. Data characterization and traffic flow analysis are also virtually non-existent at this time, yet they remain essential for understanding the internal dynamics of the Internet infrastructure.

Increasingly, both customers and providers need information on end-to-end performance and traffic flows, beyond the realm of what is realistically controllable by individual networks or users. Path performance measurement tools enable users and operators to better evaluate and compare providers and to monitor service quality. Many of these tools treat the Internet as a black box, measuring end-to-end characteristics, e.g., packet latency and loss (ping) and reachability (traceroute), from points originating and terminating outside individual networks. Traffic flow characterization tools focus on the behavior and inner-workings of these wide area networks.

This paper has two goals. We first provide background on the current Internet architecture and describe how measurements are a key element in the development of a robust and financially successful commercial Internet. We then discuss the current state of Internet metrics analysis and steps underway within various forums, particularly the Cooperative Association for Internet Data Analysis (CAIDA) and the National Laboratory for Applied Network Research (NLANR), to encourage the development and deployment of Internet performance monitoring and workload characterization tools.

Key Words: measurement, ISP, CAIDA, NLANR, Internet, statistics, metrics, performance, flow, tools, visualization


"Section 2.2: [COPYRIGHT SYMBOL] 199x IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE."

This material is based partially on work sponsored by the National Science Foundation under NSF Cooperative Agreement No. NCR-97-9796124.


Contents

Current Internet
Performance Measurement
Flow Measurement
Analysis / Visualization / Simulation / Modeling
Conclusions and Future Work


Current Internet

The Internet architecture is in a perpetual state of transition. A decentralized, global mesh of several thousand autonomous systems (ASes), its providers are highly competitive, facing relatively low profit margins and few economic or business models by which they can differentiate themselves or their services.

The challenges inherent in Internet operational support, particularly given its underlying best effort protocol, fully consume the attention of these Internet Service Providers (ISPs). Given its absence from the list of critical ISP priorities, data collection across individual backbones and at peering points continues to languish, both for end-to-end data (which require measurement across IP clouds) and actual traffic flows, e.g., the application (web, e-mail, real-audio, FTP...); packet origin, destination, and size; and the duration of flows.

Yet it is detailed traffic and performance measurement and analysis that has heretofore been essential to identifying and ameliorating network problems. Trend analysis and accurate network system monitoring permit network managers to identify hot spots (overloaded paths), predict problems before they occur, and avoid congestion and outages via efficient deployment of resources and optimized network configurations. As our dependence on the Internet increases, we must deployed mechanisms that enable Internet infrastructure-wide planning and analysis and promote efficient scaling.

User communities will also serve an important role in driving this process through their demands for verifiable service guarantees that are not readily available under the current Internet. This is particularly true for users engaged in Just-in-time manufacturing, such as the automotive industry, and users deploying high bandwidth applications and distance education, such as that proposed in the higher education community's Internet-2 initiative and the federal government's Next Generation Internet (NGI) program.

A first step in achieving measurements that are not only relevant, but also comparable, is the development of common definitions of IP metrics. The Internet Engineering Task Force (IETF) IP performance metrics (IPPM) working group was chartered to develop a more rigorous theoretical framework and guidelines for designing robust measurement tools for the Internet's wide variety of disparate signal sources. In late 1996, draft requests for comments (RFCs) were issued delineating metrics for connectivity [Mahdavi and Paxson], one-way delay [Almes and Kalidindi], and empirical bulk transfer capacity [Mathis]. (http://www.advanced.org/IPPM

However, these efforts notwithstanding, the community is still only at a rudimentary stage in its ability to isolate and visualize traffic bottlenecks, routing anomalies, and congestion, and in fact few institutions promote cohesive approaches to these tasks. Toward this end, in mid-1997 the National Science Foundation (NSF) funded the creation of the Cooperative Association for Internet Data Analysis (CAIDA) (http://www.caida.org/). CAIDA is a collaborative undertaking to promote greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure. CAIDA extends existing measurement collaborations of the National Laboratory for Applied Network Research (NLANR), which have included supercomputing centers, ISPs, universities, vendors, and government, to the ISP industry. The project will also respond to evolving needs of the Internet through its encouragement of continued innovation by the research and education (R&E) community, tempered by the realities of the commercial Internet infrastructure.

CAIDA provides a neutral framework to support cooperative research and operational measurement/analysis endeavors. Its initial goals include:

In May 1997, NLANR/CAIDA held an invitational Internet Statistic and Metrics Analysis (ISMA-97) workshop, where ISP engineers, researchers, and vendors explored the current state of the art and most important gaps which require attention. Weaknesses in current tools and end-user measurement initiatives are daunting, most of them lacking:

Table 1. Who cares about measurement
  ISPs Users Vendors
Goal . capacity planning
. operations
. value-added services
(e.g., customer reports)
. usage-based billing
. monitor performance
. plan upgrades
. negotiate service contracts
. set user expectations
. optimize content delivery
. usage policing
. improve design/
configuration of equipment
. implement real-time debugging/
diagnosis of deployed h/w
Measure . bandwidth utilization
. packets per second
. round trip time (RTT)
. RTT variance
. packet loss
. reachability
. circuit performance
. routing diagnosis
. bandwidth availability
. response time
. packet loss
. reachability
. connection rates
. service qualities
. host performance
. trace samples
. log analysis

Performance Measurement

Metrics of delay, packet loss, flow capacity, and availability are fundamental to performance comparison. Tools with reasonably statistical validity have been slow to emerge, but recent prototypes exist to measure: TCP throughput (treno, Mathis & Mahdavi/PSC); analyzer of misbehaving TCP implementations (tcpanaly, Paxson/LBL); end-to-end delay distributions (NetNow, Labovitz/Merit, and the Imeter from Intel), detailed ping and trace route analysis (Cottrell/SLAC) (http://www.slac.stanford.edu/~cottrell/tcom/escc-berkeley-96.html) and tools to isolate traffic bottlenecks and congestion points, e.g., pathchar (Jacobson/LBL). Merit [Labovitz]is also developing prototype tools to measure routing instabilities, and testing them at public exchange points, see http://www.merit.net/IPMA.

Many emerging end-to-end path performance tools are intended to serve users, both for self-diagnosis of problems they experience and for conducting measurements over the shared infrastructure, which can yield data with which to compare alternative providers and monitor service qualities.

In general, Internet customers are most interested in metrics that provide an indication of the likelihood that their packets will arrive at their destination in a timely manner. Therefore estimates of past and expected future performance across specific paths are perhaps even more important than measuring current performance. Users also need tools for assessing path availability, particularly for delay and jitter-sensitive multimedia applications, e.g., a user may want to use such data to schedule an online distance education seminar.

Table 2. Internet measurement tools
(from http://www.caida.org/tools/taxonomy/)
type example tools
icmp-based ping, Nikhef ping, fping, gnuplotping, Imeter/Lachesis
per-hop analysis traceroute, Nikhef traceroute, traceroute servers, pathchar, OTTool
throughput treno, bing, {b|c}probe
bulk throughput netperf, ttcp, nettest, netspec
web availability wwping
packet collection argus, tcpdump, libpcap, pcapture, Packetman (free)
etherfind, iptrace, netsnoop, snoop (bundled software collection)
Century LAN Analyzer, EtherPeek, LANSleuth, Monet, netMinder, Observer (commerci al packet analyzers)
Cellblaster, HP Internet Advisor, Sniffer, W&G (hardware)
fs2flows, Coral/OC3mon, NeTraMet (flow collectors)
tcptrace, tracelook, xplot (analysis/plotting tools)
flow stats NetFlow interface, cflowd, Oc3mon, mrtg
mbone mtrace, mview
route behavior NPD, NetNow, IPMA
type measurement efforts
endtoend monitoring Keynote, NetScore, timeit
systematic pinging MIDS Internet weather report, DOE monitoring, traceping
ISP measurement ClearInk report, Inverse
application perf. SmartVu, Unidata IDD

Internet Traffic Flow Measurement

Today's infrastructure is unprepared to deal with unabated increase in the number of large flows, particularly flows that are several orders of magnitude higher volume than rest, e.g., videoconferencing. Providers and users need tools to support more accurate analysis of these flows, as well as mechanisms to account for resources/bandwidth consumed.

Traffic flow characterization measurements focus on the internal dynamics of individual networks and cross-provider traffic flows, on a per-user basis if necessary. Resulting indications of global traffic trends and behavior can enable network architects to better engineer and operate networks, and adopt/respond to new technologies and protocols as they are introduced into the infrastructure.

In addition to data on burstiness and loss characteristics, flow measurement tools can typically categorize traffic application type (e.g., web, e-mail, FTP, real-audio, and CUSeeMe); traffic sources/destinations; and packet size and duration distributions. But because these measurement tools must operate within an ISP's infrastructure, e.g., border routers or peering points, they require more cooperation and involvement from service providers than do end-to-end measurement tools. MCI has placed real-time OC3/OC12 flow monitors ( Coral/Oc3mons), developed by Joel Apisdorf at MCI, on its commercial backbone and vBNS nodes (the latter currently at each of the NSF-supported supercomputing centers). They then make detailed vBNS flow data graphics publicly available through the vBNS web site. Figure 1 below represents a time series plot of flows across the vBNS node at the National Center for Supercomputing Applications (NCSA) from January 24-28, 1997. Figures 2-3 illustrates web server traffic across one of Internet MCI's backbone nodes; Figure 4 shows that approximately 1% of Internet MCI's traffic on July 21, 1997 consisted of realaudio traffic. Other data on flow volume, duration, distributions of packet sizes, and traffic flow by autonomous systems and countries are also available as specified via the web form. MCI uses the traffic flow information to analyze and report usage and flag anomalies.


Figure 1. Time series plot of packets across vBNS at NCSA



Figure 2. Web server traffic across iMCI, Jul 15, 1997



Figure 3. Web server traffic across iMCI, Jul 15-22, 1997



Figure 4. Realaudio traffic across iMCI node, July 21, 1997


Daniel McRobb (ANS) and John Hawkinson (BBN Planet) are developing cflowd, a post-processing tool for data from the netflow statistics output of Cisco routers. Cflowd focuses on capacity planning and trend analysis, in particular via traffic matrices by autonomous systems (AS). Figures 5 and 6 illustrate cflowd's AS matrix capabilities. (http://engr.ans.net/cflowd).


Figures 5&6. AS matrices from cflowd software

There is also a flow-based measurement tool under development within the IETF's Real Time Flow Meter (RTFM) working group, an effort led by Nevil Brownlee of the University of Auckland, New Zealand. His flowmeter tool suite, NetraMet/Nifty, was initially motivated by his need to support usage-based resource accounting in New Zealand.
(http://www.auckland.ac.nz/net/Internet/rtfm/TOP.html)

Traffic analysis, visualization, simulation, modeling

A common complaint about traffic measurement studies is their inability to sustain relevance in an environment where traffic, technology, and topologies are changing faster than we can even measure, much less much less analyze or publish papers about. Indeed, it seems likely that the dynamic nature of the Internet will continue to render collected traffic data primarily of historical interest unless such data can lead to tangible improvements in our ability to analyze and predict network behavior. All the workload and performance data in the world will not get us very far without improvements in analysis, modeling, visualization, and simulation tools, particularly those capable of addressing Internet scalability. The absence of these tools hinders the ability of networking engineers and architects to reasonably plan capacity expansions, not to mention prepare for the introduction of new technologies and protocols. Without fundamental progress in tool development, skepticism will continue regarding the utility and relevance of empirical measurement studies to the realities of instrumenting large Internet backbones.

The need for a fundamental understanding of Internet traffic behavior, suggests a strong need for identifying common ground among the inter-dependent provider, research, vendor and user communities. Yet there is little consensus among these or other groups on how to approach IP traffic modeling or incorporate real time statistics into such analyses. Telephony models developed at Bell Labs and elsewhere rely on queuing theories and other techniques that are not readily replicable to Internet style packet-switched networks. In particular, Erlang distributions, Poisson arrivals, and other tools for predicting call-blocking probabilities and other vital telephony service characteristics typically do not apply to wide area internetworking technologies.

Visually depicting Internet traffic dynamics and network topologies is the objective of several research efforts, including those relating to topology mapping, depicting traffic flows, and illustrating BGP peering relationships among autonomous systems, Figures 7-11. In 1996, k claffy (UCSD/CAIDA), Eric Hoffman (Ipsilon), Tamara Munzner (Stanford University), and Bill Fenner (Xerox PARC) experimented with visually depicting what they hoped was a manageable subset of Internet topology: the Mbone infrastructure, illustrated in Figure 7 below. http://www-graphics.stanford.edu/papers/mbone/


Figure 7. European Mbone topology, characterized by a relatively more efficient star topology than seen in the United States Mbone structure, largely due to bandwidth scarcity that provides stronger incentive for more efficient configurations. Data from March 17, 1997

To depict the global MBONE topology, Munzner et al. used the mrwatch utility developed by Atanu Ghosh at the University College London to generate data. They then translated these MBONE data into a geographic representation of the tunnel structure as arcs on a globe by resolving the latitude and longitude of the Mbone routers. The resulting visualizations provide a macro level view of otherwise overwhelming textual data (hosts names, IP addresses), yielding a level of understanding of the global Mbone traffic structure that is unavailable from the data in its original form. These 3-D representations are interactive, permitting users to define groupings and thresholds in order to isolate aspects of the Mbone topology.

These maps, updated daily, are publicly available as still images, or as Geomview/VRML objects, the latter for use with a VRML (virtual reality modeling language) browser. Other groups have used these tools for depicting other topologies, such as the 6-Bone infrastructure (http://www.6bone.nasa.gov/viz).


Figure 8. Global Mbone traffic, illustrates the concentration of Mbone traffic in the Northern Hemisphere (US & Europe). data from March 17, 1997

Anemone is a network visualization tool that has been used for tasks such as delineating relationships among Autonomous Systems (ASes). Figure 9 uses anemone to depicts AS peering adjacencies, sampled from a BGP session on on May 1, 1996. Node sizes are proportional to the total number of BGP peering relationships in which the Autonomous System participates; line sizes are proportional to number of routes advertised across the corresponding adjacency. Others are also using anemone to present pathchar results.


Figure 9. AS Peering Relationships

Figure 10 provides a 3D VRML view of AS Peering, in contrast to the earlier planar view. This image provides a snapshot of BGP peering relationships for all ASes who peer with at least seven other ASes, as viewed from a specific router.


Figure 10. 3D View of AS Peering Relationships

Development of a prototype global web caching hierarchy is an area for research of Internet scalability and traffic efficiencies. NSF and Digital Equipment are sponsoring the deployment of root web caches at select nodes on the NSF-supported vBNS network, under the direction of k claffy and Duane Wessels (UCSD/NCAR). These caches, and hundreds of others throughout the world, run the NLANR-maintained Squid caching software, a publicly available package supported by community volunteers. ( www.nlanr.net/Cache)


Figure 11. Cache traffic in Asia

As part of its caching project, NLANR provides daily images of global caching traffic flows. Figure 11 shows a snapshot of Asian caching traffic patterns on January 19, 1997; red arcs reflect high traffic volume between caches. The visualizations have already proven useful in optimizing topologies; in mid-1996 they revealed the need to implement access controls to force coherence to a more sound hierarchical structure.

Table 3 identifies various Internet traffic visualization tools used in these and other projects, taken from the CAIDA Tool Taxonomy, http://www.caida.org/tools/taxonomy/

Table 3. Internet visualization tools
name/contact summary
network topology mapper (Mapnet)
Java-based tool to present and update topologies
Caidants (Expired Link)
network visualization toolkit/dataset
link congestion visualization plots latency variance on routes to various hosts
planet multicast Mbone geographic visualization, updated daily
Web cache visualization
Squid cache hierarchy geographic visualization,
updated daily
ASExplorer
NAP route map

In addition to the lack of data on Internet traffic flows and performance, a similar dearth exists in quality analysis, modeling, and simulations tools, particularly those capable of addressing Internet scalability. Those commercial tools which are available currently are generally viewed by users as sorely inadequate. Few of these tools are designed for today's Internet environment and are therefore incapable of assisting Internet engineers and architects to reasonably plan for backbone expansions or substantive changes in protocols, technology or routing.

Conclusions and Future Work

The field of Internet measurement and analysis faces enormous challenges. Our inability to explain or fully understand phenomenon resulting from topology and routing metric changes (e.g., routing loops or black holes), for example, will limit our ability to continue scaling the Internet and undermine attempts to improve its robustness.

Continued exponential growth faces the Internet with demands for new equipment and new capacity, dictating resource allocation decisions within ISPs for the foreseeable future. Improvements in equipment performance, such as faster forwarding speeds, will likewise dominant the attention of routing and other vendors, with significantly less attention accorded to the challenges relating to network management and measurement and analysis.

The complexity of the Internet infrastructure however, is becoming increasingly apparent to users, operators, and vendors alike. No single community on its own can significantly improve the performance or robustness of the national or global infrastructure without the support and cooperation of others. This fact is slowly leading key individuals within the commercial sector and governments to acknowledge the importance of cooperation as a necessary component of the future Internet. The newly formed Internet Operators (IOPS - http://www.iops.org/) organization is indicative of this trend. The establishment of CAIDA and initiatives such as those of CSG, the automotive industry, and others are also indicative of a change in the way the community views the Internet and illustrative of a more proactive approach to topics such as Internet measurement, traffic analysis, and network management.

Partnerships between these groups, designed to facilitate tool development and data analysis, promote testbeds for new networking technologies and techniques, and disseminate results, are critical to furthering our ability to achieve the social, economic, and educational potential of the Internet.


authors:
kc received a doctoral degree from UCSD in June, 1994, and is currently at UCSD's San Diego Supercomputer Center (SDSC). kc's research focuses on establishing and improving the efficacy of traffic and performance characterization methodologies on wide-area communication networks, in particular to cope with the changing traffic workload, financial structure, and underlying technologies of the Internet. [kc@caida.org]

Tracie Monk serves as the Director, External Affairs for CAIDA. She began collaborating with researchers at SDSC in 1996 while with the Federal Networking Council (FNC). Prior to the FNC, Tracie was a business consultant focusing on international trade and investment opportunities in the telecommunications and other sectors. [tmonk@caida.org]


@IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
24 July 1997, mailto:%20kc@caida.org