Measurement Infrastructure for the Next Generation Internet

Vern Paxson
Network Research Group
Information and Computing Sciences
Lawrence Berkeley National Laboratory
vern@ee.lbl.gov


A key question inevitably arising with any large, high-speed network is: how come I'm not actually getting high performance? The question arises because it can be exceedingly difficult to determine where, in the end-to-end chain of links and routers between sending and receiving applications, bottlenecks have developed. The only way to answer such questions, and in turn to tune the network for high performance, is by including measurement support as a cornerstone of the network's design.

Yet building a solid, large-scale measurement infrastructure is not in the least a solved engineering problem. It instead remains a hard, important area of networking research. For example, the IPPM (IP Performance Metrics) effort with the IETF is still wrestling with developing sound definitions for basic metrics such as loss, delay, and available bandwidth. These however supply only the building blocks for a measurement infrastructure: an infrastructure must also include mechanisms for scheduling measurements, disseminating the results, identifying performance problems based on analysis of the measurements, and maintaining an archive of past measurements. This last is particularly important so that network researchers might have access to the data necessary to gain insights into the many different ways in which the network behaves.

In recent work, we developed a prototype of part of such an infrastructure, the "Network Probe Daemon" (NPD) framework. This framework then allowed us to gather the data necessary for the first large-scale studies of end-to-end Internet routing and packet dynamics.

The framework consists of a number of hosts located around the Internet running specialized measurement servers (the NPDs). One unique aspect of the measurement is that the NPDs *cooperate* with one another in performing measurement. For example, throughput is measured by initiating a TCP bulk transfer from one NPD at a particular site to another NPD at another site. The two NPDs both send and receive the data, and, using packet filters, trace the corresponding packet departure and arrival times. By measuring at both ends of the network path, subsequent analysis can fully separate out disparities between the forward and reverse directions of the path. These differences turn out to be quite significant: our research found that the routes of fully half of all network paths differ in at least one city visited in the two directions along the path, and furthermore that bottleneck bandwidths and queueing delays are frequently asymmetric.

Another important property of the "probe platform" approach is that it exhibits N^2 scaling: if the infrastructure consists of N probe machines, then it can probe O(N^2) paths through the network. Adding a single additional probe location adds N more paths amenable to study. This scaling is very attractive for attempting to develop comprehensive measurement coverage of the network.

For a high-speed network such as the NGI, we would envision including in its design a number of dedicated measurement hosts. These would be sited at strategic locations around the network, particularly at both sides of interconnect points. Note, however, that the hosts do *not* need direct access to the primary network links - and such access would be difficult to obtain, for privacy and security reasons. Instead, we locate each host a single, high-speed hop away from a primary link. Consequently, hosts can trace their own measurement traffic using packet filters, which provide high-resolution "wire times" invaluable for sound analysis, without raising security concerns.

A number of important issues remain. We need access control mechanisms to ensure that probe platforms are not misused. These mechanisms would also be coupled with a notion of a "measurement schedule". Our experiences with the NPD framework highlighted how an infrastructure that only supports "measurement on demand" can suffer from serious problems of bias during outages, since these can prevent us from reaching the platform itself in order to examine the outage! In addition, for a widely-deployed framework, "measurement on demand" could lead to massive overloading of the measurement infrastructure, when legions of users affected by a performance problem all turn to the infrastructure in an attempt to trouble-shoot it.

Instead, in such situations we need a mechanism that will permit only a few judiciously chosen measurements, the results of which can then be disseminated to the affected users. Thus, the infrastructure needs to include a notion of "advertising" recent measurements, and using these advertisements to prune out additional measurements. This problem appears well suited for a solution based on using measurement multicast groups, to which interested users would then join when they wish to learn the results of measurements performed by others. One problem with this approach, however, is that we require that measurement dissemination occurs in a reliable fashion; but efficient, reliable, large-scale multicast dissemination remains an area of research.

In summary, the NGI offers an excellent opportunity to develop a sound measurement infrastructure: both one that will serve to abet the network's goal of delivering solid, high performance interconnection between the NGI sites; and one that will provide invaluable insights into how to then build such an infrastructure for the commercial Internet. For the commercial Internet, this infrastructure could provide a basis for supporting contractual agreements between private parties that include ways of quantifying the service delivered by a network provider. These contracts then would generate powerful economic incentives for improving Internet service.


Acknowledgements: Many of the ideas in this paper derive from numerous discussions with Jamshid Mahdavi and Matt Mathis of the Pittsburgh Supercomputer Center, my colleagues on the NSF-sponsored project: "Creating a National Internet Measurement Infrastructure."


Last updated 10 April 1997