`but some data is worse than others': measurement of the global Internet

Paper on Internet statistics measurement by kc claffy.

`but some data is worse than others':
measurement of the global Internet

As the era of the NSFnet Backbone Service came to a close in April 1995, the Internet community lost the ability to rely on what was the only set of publicly available statistics for a large national U.S. backbone. The transition to the new NSFnet program, with commercial operations providing both regional service as well as cross-service provider switching points (NAPs, also referred to as ( exchange points ), has virtually eliminated the public availability of statistics and analysis at the national level. In this article we cover three areas:

limitations of current Internet statistics and why data collection is more difficult on the Internet than on the public telephone network
who needs Internet statistics and why
possible models for ISPs and users to narrow the gap in understanding the nature of Internet traffic

how we got here: limitations of current statistics

The existence of the NSFNET (1986-1995) as a central network for the research and education community facilitated research into aspects of aggregate network traffic patterns and the anomalies in those patterns caused by the introduction of new or unique applications. Decommissioning the NSFNET backbone left the Internet community with no dependable public source of statistics on Internet workloads. And yet empirical investigations of the nature of current workloads and their resource requirements, as well as how they change over time, is vital to supporting Internet evolution. Workload profiles are changing more rapidly than ever before, keeping pace with them in an increasingly competitive, increasingly proprietary environment, is even more important now than during the life of the NSFNET backbone.

The transition to the new NSFNET program, with commercially operated services providing both regional service as well as cross-service provider switching points (NAPS), renders statistics collection a much more difficult task. There are several dimensions of the problem, each with their own cost-benefit tradeoff.

contractual and logistical

Thus far users have required few statistics from their providers. Even the NSF, one of the largest and most forward-looking users, called for few statistics in the cooperative agreement with backbone NAP providers. Understandably, NSF had never been in a position to specify in detail what statistics its providers (for the NSFNET backbone as well as the vBNS and the NAPs) should collect since they simply did not know enough about the technology yet (neither did anyone else, although presumably the providers knew slightly more than the NSF).

The situation is similar for other emerging network service providers, whose understanding of the technology and what statistics collection is possible likely exceeds that of NSF. However, as it turned out, the NAPs and NSPs found it challenging enough just getting and keeping their infrastructure operational; statistics have never been a top priority. Nor do the NAPs really have a good sense of what to collect, as all of the technology involved is quite new to them as well. The issue is not whether traffic analysis would help, even with equipment and routing problems, but that traffic analysis is perceived as a secondary issue, and there is no real mechanism (or spare time) for collaborative development of an acceptable model.

`we suck less.'
-overhead slogan of an ISP

academic and fiscal

Many emerging Internet services are offered by companies whose primary business has thus far been telecommunications rather than IP; the NAP and vBNS providers are good examples. Traditionally phone companies, they are accustomed to reasonable analytic tools for modeling telephony workload and performance (e.g., Erlang distributions). Unfortunately, the literature in Internet traffic characterization, both in the analytical and performance measurement domains, indicate that wide area networking technology has advanced at a far faster rate than has the analytical and theoretical understanding of Internet traffic behavior.

The slower and more containable realms of years ago were amenable to characterization with closed-form mathematical expressions, which allowed reasonably accurate prediction of performance metrics such as queue lengths and network delays. But traditional mathematical modeling techniques, e.g., queueing theory, have met with little success in today's Internet environments.

Years ago, for example, the assumption of Poisson arrivals was acceptable for the purposes of characterizing small LANs. As a theory of network behavior, however, the tenacity of the use of Poisson arrivals, whether in terms of packet arrivals within a connection, connection arrivals within an aggregated stream of traffic, or packet arrivals across multiple connections, has been remarkable in the face of egregious inconsistency with any collected data. Leland, et al. [Leland [5]] and Paxson and Floyd [Paxson94 [6]] investigate alternatives to Poisson modeling, specifically the use of self similarity (fractal) mathematics to model IP traffic (see sidebar).

Until recently, the vast majority of Internet hosts were in the United States and relied upon a common backbone network supported by the U.S. National Science Foundation (NSF). The NSFnet was the largest and most widely used Internet interconnection facility and backbone traffic measurements provided a reasonable indication of Internet traffic trends worldwide. NSF decommissioned this government-funded backbone in in April 1995, as it became clear that multiple commercial providers should now be in a position to offer Internet backbone services. NSF withdrawal from support of backbone services to the R&E community involved modifications to the NSFNET architecture that would ensure Internet stability during the transition from government supported services and full privatization of the network. These modifications involved the creation of four new projects, three infrastructural, and one research-related:

general purpose Network Access Points (NAPs): to connect the commercial backbone networks thus avoiding network partitioning
routing arbiter, to provide routing coordination among providers during the transition
regional interconnectivity to regional networks.
Specifically, NSF-sponsored regional providers, i.e., those who received funding from the NSF throughout the life of the NSFNET, will continue to receive (an albeit annually decrementing amount of) funding for 4 more years so long as they connect to a backbone provider that connects to all three NSF NAPs. This constraint is the only leverage NSF had to prevent partitioning since the backbone providers themselves received no funding from NSF and thus had less incentive to `do the right thing' at the time. In any event, regional funding ends after four years, at which point the regional providers will have had ample opportunity to become fully self-sustaining within the marketplace.
At the same time, the NSF wanted to continue to foster leading edge network research and development, and created the last project to this end:
very High Speed Backbone Network Services (vBNS), a wide area network initially connecting the NSF supercomputer centers for use by both application scientists as well as network researchers.

Yet, there is still no consensus on how statistics can support research in IP traffic modeling. There is also skepticism within the Internet community regarding the utility of empirical studies that rely on collecting real data from the Internet. Some critics claim that because the environment is changing so quickly, within weeks any collected data is only of historical interest. They argue that research is better served by working on mathematical models rather than by empirical surveys that, at most, capture only one stage in network traffic evolution.

A further contributing factor to the lag of Internet traffic modeling is the early financial structure of the Internet. A few U.S. government agencies assumed the financial burden of building and maintaining the transit network infrastructure, leaving little need to trace network usage for the purposes of cost recovery. As a result Internet customers had little financial leverage with which to motivate IPSs to improve their service quality.

Many of the studies for modeling telephony traffic came largely out of Bell Labs, who had several advantages: no competition to force profit margins slim, and therefore the financial resources to devote to research, and strong incentive to fund research that could ensure the integrity of the networks for which they charge. The result is a situation today where telephone company tables of ``acceptable blocking probability'' (e.g., inability to get a dial tone when you pick up the phone) reveal standards that are significantly higher than our typical expectations of the Internet.

The new commercial Internet is characterized by hundreds of ISPs, many on shoestring budgets in low margin competition. They generally view statistics collection as a luxury that has never proven its operational utility. Note the last publicly available source of Internet workload and performance data, for the NSFNET backbone, was funded by the NSF in the hope that tools, methodologies, theories of traffic, refinements, feedback would emerge from the efforts of the IETF and other bodies. But there was never any fiscal pressure that the statistics collection activity justify the resources it required within the cost structure of providing Internet service. It was never forced to prove itself worthwhile. And it didn't.

Historically, the ISPs have grown up with the IETF meeting them half way with the technology. But with the simultaneous diversification and usage explosion of the infrastructure, the ISPs are not in a position to figure out accurate statistics models themselves, certainly not while they are so busy trying to keep up with demand. From their perspective statistics collection does not seem to be a wise investment of customer connection fees at all.

self similarity: hope for Internet modeling

Although Internet traffic does not exhibit Poisson arrivals, the cornerstones of telephony modeling, a number of researchers have measured a consistent thread of self-similarity in Internet traffic. Several metrics of network traffic have heavy tailed distributions:

call holding times (CCSN/SS7) (telephone call holding times)
telnet packet interarrivals
FTP burst size upper tail
transmission times of WWW files

Recent theorems have shown that aggregating traffic sources with heavy-tailed distributions leads directly to (asymptotic) self-similarity. Willinger [Willinger [21]] identified three minimal parameters for a self-similar model:

The Hurst (H) parameter , which reflects how the time correlations scale with the measurement interval
variance of the arrival process
mean of the arrival process

Although self-similarity is a parsimonious concept, it comes in many different colors, and we only now are beginning to understand what causes it. Self-similarity implies that a given correlational structure is retained over a wide range of time scales. It can derive from the aggregation of many individual, albeit highly variable, on-off components. The bad news about self-similarity is that it is a significantly different paradigm that requires new tools for dealing with traffic measurement and management. Load service curves (e.g., delay vs. utilization) of classical queueing theory are inadequate; indeed for self-similar traffic even metrics of means and variances indicate little unless accompanied by details of the correlational structure of the traffic. In particular, self-similarity typically predicts queue lengths much higher than do classical Poisson models. Researchers have analyzed samples and and found fractal components of behavior in a wide variety of network traffic (SS7, ISDN, Ethernet and FDDI LANs, backbone access points, and ATM). Still unexplored is the underlying physics that could give rise to self-similarity at different time scales. That is, at millisecond time scales, link layer characteristics (i.e., transmission time on media) would dominate the arrival process profile, while at the 1-10 second time scales the effects of the transport layer would likely dominate. Queueing characteristics might dominate a range of time scales in between, but in any case the the implication that several different physical networking phenomena manifest themselves with self-similar characteristics merits further investigations into these components.

on ISPs whining for but not being able to handle the responsibility for commercialized Internet service i think it probably feels like, yeah, sure, they wanted dad (NSF) to hand over the keys to the car; they wanted to play, could handle it. but i think they just expected to get the car, which at the time was the size of a VW beetle. i don't think they expected that if they left the back doors unlocked, 75 people would crawl in the back and start mouse-clicking holes into the gas tank while they were on the freeway. In this situation, counting the number of people in the back seat and how fast they're clicking just seems to lack urgency. It's not that they would refuse the information, just that they've got other things on their mind. But the real issue is, even if they had time, even if they decided oh, my, this could be a wise investment of resources, now the problem is. they wouldn't know how to start; there are no well-defined, consistent metrics, tools, presentation methods, even a language to describe Internet service, workloads, performance.

on ISP performance : `frankly i think they're dancing as fast as they can it's just that customers are dancing faster and now some of them are starting to dance in 24 bit color'

The Internet Engineering Task Force (IETF) is a large, open community of network designers, operators, vendors, and researchers whose purpose is to coordinate the operation, management and evolution of the Internet, and to resolve short-range and mid-range protocol and architectural issues. It is a major source of proposals for protocol standards which are submitted to the IAB for final approval. The IETF meets three times a year and extensive minutes are included in the IETF Proceedings. The IETF IP Provider Metrics (IPPM) working group is one of many working groups in the IETF; it is comprised of researchers and service providers interested in defining basic metrics and measurement methodologies in order to develop standardized performance evaluations across different Internet components, particularly ``IP clouds''

We are not implying that the monopoly provider paradigm is better, only observing how we got where we are today: we have essentially no way to predict, verify, or in some cases even measure Internet service quality in real time.

Larger telecommunication companies entering the marketplace will inevitably learn to devote more attention to this area. The pressure to do so may not occur until the system breaks, at which point billed customers will demand, and be willing to pay for, better guarantees and data integrity.

technical

With deployment of the vBNS and NAPs, the situation has grown even more disturbing. The National Information Infrastructure continues to drive funding into hardware, pipes, and multimedia-capable tools, with very little attention to any kind of underlying infrastructural sanity checks.

And until now, the primary obstacles to accessing traffic data in order to investigate such models has been political, legal (privacy), logistic, or proprietary. With the transition to ATM and high speed switches, it is in many cases no longer even be technically feasible to access the kind of data needed for traffic flow profiling, certainly not within commercial ATM network equipment. The NAPs were chartered as link layer entities , i.e., providing only a service at a level underneath the IP level; indeed, without regard for if it is even IP traffic it is carrying above this low layer. Because most of the NSFNET statistics reflected information at the IP layer, the NAPs cannot use the NSFNET statistics collection architecture as a model upon which to base their own operational collection. Many newer layer 2 switches, e.g., DEC gigaswitch, ATM switches, have little if any capability for performing layer 3 statistics collection, or even looking at traffic in the manner allowed on a broadcast medium (e.g., FDDI, Ethernet), where a dedicated machine can collect statistics without interfering with packet forwarding. Statistics collection functionality in newer switches takes resources directly away from forwarding, driving customers toward switches from competing vendors who sacrifice such functionality in exchange for speed.

privacy

Privacy has always been a serious issue in network traffic analysis. Most ISPs have service agreements that prohibit them from revealing information about individual customer traffic. Collecting and using more than aggregate traffic counts will require customer cooperation regarding what may be collected and how it will be used. For an ISP to breach customer expectations or ethical standards, even for the most noble of research, does not bode well for future business.

However, communications providers have had considerable protection from legislation in the Omnibus Crime Control and Safe Streets Act of 1968, Section 2511.(2)(a)(i)

It shall not be unlawful under this chapter for an operator of a switchboard, or an officer, employee, or agent of a provider of wire or electronic communication service, whose facilities are used in the transmission of a wire communication, to intercept, disclose, or use that communication in the normal course of his employment while engaged in any activity which is a necessary incident to the rendition of his service or to the protection of the rights of property of the provider of that service, except that a provider of wire communication service to the public shall not utilize service observing or random monitoring except for mechanical or service quality control checks.

Responsible providers could go further than the law and `anonymize' monitored traffic with tools such as tcpdpriv . so there is no basis for accusations of breach of privacy.

We recognize the difficulty for ISPs to deal with statistics collection at an already very turbulent period of Internet evolution. However, it is just such a time, marked ironically with the cessation of the NSFNET statistics, that a baseline architectural model for statistics collection is most critical, so that customers can trust the performance and integrity of the services they procure from their network service providers, and yet so service providers do not tie their hands behind their backs in terms of being able to preserve robustness, or forfend demise, of their own clouds.

Clearly we need to find to minimize exposure rather than surrendering the ability to understand network behavior. It seems that no one has determined an `optimal' operating point in terms of what to collect; the best choice often depends on the service provider, and changes with time and new technologies.

All Internet data is bad. some is just more bad than others. but the worst thing about most of the data on the Internet is that it doesn't exist at all.
-kc at IETF, Montreal 1996

it's not that they don't care [although i admit at times they do a remarkably realistic simulation of not caring.] it's just that for network service providers, statistics collection on the Internet has historically had a priority somewhere below what they name the routers.

where we are: who needs Internet statistics and why

There is no centralized control over all the providers in the Internet. The providers do not always coordinate their efforts with each other, and quite often are in competition with each other. In Routing in a Multi-Provider Internet , Y. Rekhter (Internet researcher at cisco Systems) writes

Despite all the diversity among the providers, the Internet-wide IP connectivity is realized via Internet-wide distributed routing, which involves multiple providers, and thus implies certain degree of cooperation and coordination. Therefore, we need to balance the provider goals and objectives against the public interest of Internet-wide connectivity and subscriber choices. Further work is needed to understand how to reach the balance.

Many internet service providers currently collect basic statistics on performance of their own infrastructure, typically such as measurement of utilization, availability, and perhaps delay and throughput. In the era of the post-NSFnet backbone service, the only baseline against which networks evaluate performance is their past performance metrics. There are no data available or even standard format defined against which to compare performance with other networks or some baseline. Increasingly, both users and providers need information on end-to-end performance, which is beyond the realm of what is controllable by individual networks.

Another example of statistics maintained in isolation within an individual ISP is trouble ticket tracking of problems that originate and are resolved within the context of a single ISP. Throughout most of the life of the NSFnet backbone, resolving route instabilities and other trouble tickets was the the responsibility of Merit, who held the cooperative agreement with NSF for operation of the NSFNET backbone. In the current environment there is no such entity to claim to share responsibility for national much less global management of the Internet. As a result, there are no scalable mechanisms available for resolving or tracking problems originating or extending beyond the control of an individual network.

Route instabilities is another area that can have a direct, sometimes profound, effect upon the performance of individual networks. Some networks are seeking to improve the stability of their routing by peering directly with the routing arbiter (RA) at network access points (e.g., SprintNAP and FIX-West/MAE-West). The routing arbiter project awardees, Merit and ISI have specified and collected routing statistics, e.g., on route flapping and inappropriate routing announcements, that characterize routing stability from a macroscopic perspective, identifying trouble areas for the networks with which they peer. However, these efforts are still in early, prototype stages, and do not yet have sufficient support from commercial players to render them a fundamental component of the Internet architecture.

The vacuum created in national-level statistics/metrics collection that followed the transition to the commercial architecture has also complicated planning for national service providers and others. While detailed traffic and performance measurements are essential to identifying the causes of network problems and formulating corrective actions, it is trend analysis and accurate network/systems monitoring that permit network managers to identify hot spots (overloaded paths), predict problems before they occur and avoid them by efficient deployment of resources and optimization of network configuration. As the nation and world become increasingly dependent on the National and Global Information Infrastructures (NII/GII), it is critical that mechanisms be established to enable infrastructure-wide planning and analysis.

Examples of statistics analyses of immediate relevance to providers include measurements of: round-trip-time (RTT), e.g., with probe queries, to assess congestion and other conditions at an infrastructure-wide level; routing behavior (beyond that currently available through the Routing Arbiter project), to assess status and stability, as well as unusually configured routing and the conflict between simultaneous presence of more specific routes for a given route aggregate.

The table below provides an overview of the types of metrics which are currently desired related to IP traffic and routing. The relevance of these metrics to future financial settlements and to analyzing network performance is included -- ranging from low (minimal) relevance to high relevance. The table also indicates the tools currently available -- or yet to be developed -- related to gathering statistics on each metric.

Table 1: Internet Metrics and tools

Type	Applicable where	Relevance to Internet Settlements	Relevance to Analysis of Network Performance	Measurement Tools
Raw Metrics:
- Access Capacity (bit/sec)	CC charge for bit rate; equipment cost depends on bit rate	high	low	a priori
- Connect Time (sec)	CC charge for connect time	high	low	CC metering
- Total Traffic (bytes)	transit traffic settlement between ISPs	high	low	router or access server stats; packet/flow sampling.
- Peak travel (bit/sec sustained for n sec.)	ISP/NSP overbooks trunks	moderate	moderate	router or access server stats; TCP dump sampling, RTFM meters ; etc.
Routing:
Announced Routes (#)	at peering/exchange points & connection of subscribers to multiple subnets	high	moderate	TBD, analysis of routing tables, i.e. netaxs
- Route Flaps (#)	at peering/exchange points & connection of multihomed networks	moderate	high	TBD (currently available if peering through RA )
- Stability, e.g. route uptime/downtime, route transitions	at peering/exchange points and connection of multihomed networks	low	moderate	" "
- Presence of more specific routes with less specific routes	at peering/exchange points and connection of multihomed networks	moderate	moderate	TBD
- Number of reachable destinations (not just IP addresses) covered by a route	at peering/exchange points & connection of multihomed networks	moderate	moderate	TBD
Path Metrics:
- Delay (milliseconds)	Individual networks	moderate	high	TBD, ping
- Flow Capacity (bits/sec)	everywhere (networks, routers, exchange points)	moderate	high	TBD, treno
- Mean Packet Loss Rate (%)	everywhere	moderate	high	TBD, ping, mapper
- Mean RTT (sec)	everywhere	moderate	high	TBD, ping, mapper
- HOP Counts/Congestion	everywhere	low	high	TBD, traceroutes
Other:
- Flow characteristics (protocol profiles, cross-section, traffic matrix, asymmetry)	exchange points, multihomed networks	low	moderate	Reporting by ISPs
- Network outage information (remote host unreachable)	Individual networks	low	moderate	Reporting by ISPs
- AS x AS matrices	Individual networks	moderate	low	Reporting by ISPs
- Information Source	connection of service provider (DNS or RR server); content provider (web server); info replicator (MBONE router & caches)	high	high	router or access server stats; packet sampling, flow meters; etc.
Topology Visualization:
- MBONE	Internet Infrastructure	moderate	moderate	TBD
- Information caching hierarchy	Internet Infrastructure/individual caches	moderate	moderate	TBD

Notes: CC - common carrier, ISP - internet service provider, NSP - national service provider, TBD - to be determined

Sources: Metrics for Internet Settlements , B. Carpenter (CERN), Internet Draft, May 1996;
A Common Format for the Recording and Interchange of Peering Point Utilization Statistics , K. Claffy (NLANR), D. Siegel (Tucson NAP), and B. Woodcock (PCH), presented at NANOG, May 30, 1996. Many of the metrics above are inherently problematic for the Internet infrastructure, and still require research, in areas such measuring one-way delay, metrics for variance and other statistics of delay distribution (e.g., percentiles), and dealing with asymmetric routing. For example, for a given measurement of delay, delay across a path equals the sum of the delay across the components of the path. It is not clear what analogous statements one can make regarding other delay statistics e.g., mean, variance, percentiles.

Measuring the throughput or flow capacity of a live connection without detrimental impact upon performance is even more difficult. In theory, flow capacity is also amenable to path decomposition, i.e., one can approximate the flow sustainable across a path by the minimum of the flows sustainable across each of the components of the path. In practice, however, buffering characteristics and routing asymmetries impose friction in the system, limiting the viability of the formal definitions. We will need to compare several alternatives for empirical metrics, and qualify the degree to which they deviate from the corresponding formal metrics. It will be essential to develop a methodology to estimate sustainable throughput from some baseline flow capacity measurements in conjunction with current delay measurements.

Furthermore, future data collection/analysis activities will likely extend beyond the realms of a plain packet switching infrastructure, toward optimizing overall service quality via mechanisms such as information caching and multicast. Effective visualization techniques will be critical to making sense of all the data sets described above, especially for developing and maintaining the efficiency of logically overlaying architectures, such as caching, multicast, mobile, IPsec, and IPv6 tunnel infrastructures. As examples, NLANR has created visualization prototypes for the mbone tunnel logical topology and the NLANR web caching hierarchy topology. For the latter they have also built a tool for automatically generating a nightly update of the caching hierarchy topology map (http://www.nlanr.net/Cache/daily.html Expired Link) . Such visualization could be of great assistance for improving the global efficiency of traffic flows.

Note that one can derive the mbone topology from existing public tools, but it is extremely difficult to make sense of the data without reasonable visualization. One can then make a major contribution to the ISP community by illustrating where optimizations/better redundancies could occur across ISPs. Another useful development could be if network interconnection points provided a LAN medium for native multicast peering (as a tunnel hub) until native multicast is better integrated into the infrastructure.

Many ISPs as well as researchers are skeptical that any measurements that might occur in the short to medium term at exchange points would lead to useful information for ISPs, either because useful data is too hard to collect and analyze or because ISPs are averse to their collection due to customer privacy concerns. (The two factors are not unrelated: ISPs have been unsure about the legal implications as well as the benefit of data collection, and so have not put pressure on their equipment suppliers to support functionality that they now wish they had.) For example, it seems clear that the router vendors are in no position to support the collection of traffic matrices, despite that all ISPs vehemently agree that aggregate traffic matrices are crucial to backbone topology engineering. In addition to allowing the discovery of mistraffic, e.g, route leakage or a customer accidentally sending huge amounts of unintended traffic into the core, traffic matrices combined with route flap data are essential to an ISP's ability to communicate problems to peer ISPs when necessary. Backbone engineers consider traffic matrix data significantly more important than flow data for short to medium term engineering, and it may be essential to the investigation of Big Internet issues (routing, addressing) as well.

Notably, although the telcos have long measured traffic matrices for phone network engineering, ISPs have had both technical, legal, and resource limitations as obstacles to collecting much less sharing such measurements. Collecting packet headers, though essential for researchers to develop realistic models and analysis techniques, is even more technically and logistically problematic.

Major User Groups (Educom, FARNET, auto industry)

The higher education and research communities, the first communities to depend on the Internet, have also been among the first in their vocal complaints about its recent state of congestion and inability to offer anything more than best effort service. Since the advent of the world wide web, the proliferation of users and the lack of cooperation among commercial ISPs has led to significantly degraded service quality.

In meetings over the last year, representatives from EDUCOM, FARNET, and related institutions have discussed their communications requirements and concerns about the ability of the Internet to meet their future needs. Doug Gale and the Monterey Futures Group concludes that by the year 2000, higher education will require an advanced internetworking fabric with the capacity to support:

desktop and room-based video teleconferencing
high-volume video from distant servers
integrated voice traffic
dynamic insertion of large-capacity inter-institutional projects into the fabric
interconnection of enterprise networks in various stages of migration to higher performance technologies
controlled costs and predictable pricing models

Other user groups that view the Internet as mission critical are making similar demands, forming tasks groups and defining their service requirements. These groups include: the Automotive Industry Action Group (AIAG), who in 1995 chose TCP/IP as a standard for data communications among its thousands of trading partners. Specific areas being examined include specifications related to:

certifying a small number of highly competent Internet service providers to interconnect automotive trading partner's private networks;
monitoring providers' ongoing compliance with performance standards that support business requirements;
enforcing strict security mechanisms to authenticate users and protect data, thereby creating a virtual private network for the auto industry

The AIAG has identified several metrics it views as critical to this initiative and to future monitoring efforts. These include performance metrics such as delay, loss, link utilization, throughput, and route configuration, as well as reliability metrics such as physical route diversity; routing protocol convergence times; disaster recovery plans; backbone, exchange point and access circuit availability; and speed of failed customer premise equipment replacement.

As more user groups (e.g., the financial sector, energy industry, and others) move toward the Internet as their preferred communication vehicle, we will likely see increasing pressure on providers and others to collect, collate, analyze, and share data related to Internet (and provider) performance.

service quality and pricing models

Demands for implementing multiple Internet service levels are increasing. From the providers standpoint, such offerings will enable increased revenue through business quality services offerings. From the users perspective, their ability to contract for higher priority service will enable many industries to switch from intranets and private networks to Internet-based infrastructure.

The ability to specify or reserve the services one needs from the network will in turn require mechanisms for accounting and pricing (else there is no incentive not to reserve all one can or not to use the highest priority). [footnote: Many fear pricing will stifle the open, vibrant nature of the Internet community; we suggest that it may rather motivate the constructive exploration of more efficient network implementation of innovative networking applications over the Internet.]

The Internet is still relatively devoid of pricing models or other mechanisms to allocate and prioritize scarce resources -- particularly bandwidth, and acutely needs mechanisms for more rational cost recovery , that is, more accurate accountability for resources consumed, than current technology supports. In particular, the Internet architecture is not prepared to deal with the large aggregation of flows it handles now if a significant number of those flows are several orders of magnitude higher volume than the rest. Of major concern in workload profiles today is the disparity in size between most current Internet flows/transactions, at less than 10 packets, and newer multimedia applications with much higher volume and duration.

The disparity in workload profiles in the current cross-section of Internet applications necessitates revised metrics of network behavior. Simple mean or peak utilization figures are ineffective in addressing a service provider's engineering needs, without also knowing the transaction profile constituting and perhaps dominating those figures. Keeping track of workload profiles requires measuring flow data at relevant network locations. (The NSF-sponsored National Laboratory for Applied Network Research (NLANR) currently supports http://www.nlanr.net/NA/ , an interface to the operational collection of such data at several points, including data from a U.S. federally sponsored multiagency network interconnection facility.)

More accurate resource consumption, and concomitant pricing models, will allow progress with another severe need in the current infrastructure: a service architecture from the perspective of the end user. Maximizing value for the end user in the Internet is difficult since the economic value model is quite randomized. In most markets, value is attributed according to quality of service. The Internet is no exception: rational pricing would provide the right feedback to providers and users to encourage more appropriate use. Quality signals are not now clear: users need operating signals accompanied by measurement-distinguishable service qualities so they can declare the QoS for which they will pay. Otherwise high value users may get degraded by high requirement, low value users.

In his recent Internet Draft on Metrics for Internet Settlements, Brian Carpenter (CERN) asserts that financial settlements are a `critical mechanism for exerting pressure on providers to strengthen their infrastructures'. He suggests that metrics used in Internet settlements should not rely on expensive instrumentation such as detailed flow analysis, but rather simple measurements, estimated, if necessary, by statistical sampling.

Factors that continue to inhibit implementing settlements include the lack of a common understanding of the business mechanics of inter-ISP relations. Some suggest that the ITU/telco settlements model may have relevance to ISP settlements, but many feel that the connectionless nature of the IP protocol demands that entirely new pricing models be developed. (footnote: Hal Varian's web site at http://www.sims.berkeley.edu/resources/infoecon provides a good introduction to Internet economics.)

Implications for traffic measurement include the need for measurement capability at boundaries, where different providers will no doubt measure different things, e.g., source vs. destination-based billing. Users will also likely need to accept sample-based billing as just another revenue stream, not precise or optimal.

where to go: possible models for concerted data collection

Although most of the community agrees that we should seek out measurement infrastructure and sources of statistics in the commercially decentralized Internet, there is definite dissonance as to which measurements would help, and who should have access to them. While a public measurement infrastructure could help researchers and end users, ISPs would benefit more from the ability to collect statistics that were too sensitive to release publicly, and perhaps from comparing them to corresponding statistics from other ISPs.

infrastructural gaps: where we need to invest attention and resources
development of more powerful routers for core Internet components, a prohibitively expensive endeavor with too small a potential market and thus too little return to motivate vendors to pursue independently
short term research into basic traffic engineering methodologies given limited data, and longer term research into the implications of realistic theoretical and empirical traffic characterization
public measurement infrastructure
development of an ISP consortium for coordination and limited, secure data-sharing

The best means for collecting the various statistical needs outlined in this paper may be a provider consortium. The National Laboratory for Advanced Network Research (NLANR) has suggested a possible framework for such a collaboration, which would serve as a forum for:

facilitating the identification, development and deployment of measurement tools across the Internet;
providing commercial providers with a neutral, confidential vehicle for data sharing and analysis;
providing networking researchers and the general Internet community with additional realtime data on Internet traffic flow patterns;
enhancing communications
among commercial Internet service providers, exchange/peering point providers and the broader Internet community.

Focal areas of the consortium would include outage tracking, congestion monitoring, examining use of backbone routing, studies of peering relationships, and forum for discussion of charging policies. Market pressures upon ISPs to participate in such a consortium concept include:

customers increasing dependence on the Internet for `mission critical' applications
settlements that require authenticated and possibly confidential provider statistics
the meshed nature of the Internet, which suggests that no single company can do it alone; systemic improvements will require collaboration

Business constraints hindering such cooperation include: The business constraints hindering such cooperation relate to the competitive nature of the Internet business environment, as well as the appearance of industry collusion by major providers. However, a charter with principals of openness and inclusion can readily address these concerns, as well as addressing constraints arising from the lack of adequate pricing models and other mechanisms for economic rationality in Internet business practices.

Probably the most relevant constraint to cooperation is that of data privacy, which has always been a serious issue in network traffic analysis. Many ISPs have service agreements prohibiting them from revealing information about individual customer traffic. Collecting and using more than aggregate traffic counts often requires customer cooperation regarding what to collect and how to use it. However, provisions of the Omnibus Crime Control and Safe Streets Act of 1968, Section 2511.(2)(a)(i) accord communications providers considerable protection from litigation:

It shall not be unlawful under this chapter for an operator of a switchboard, or an officer, employee, or agent of a provider of wire or electronic communication service, whose facilities are used in the transmission of a wire communication, to intercept, disclose, or use that communication in the normal course of his employment while engaged in any activity which is a necessary incident to the rendition of his service or to the protection of the rights of property of the provider of that service, except that a provider of wire communication service to the public shall not utilize service observing or random monitoring except for mechanical or service quality control checks.

Responsible providers could go further than the law and anonymize monitored traffic with tools that are already available, virtually eliminating any accusations of breach of privacy. Technical Constraints to Cooperation

Technology constraints hindering the collection and analysis of data on Internet metrics center on the nascent development stage of IP and ATM measurement tools and supporting analysis technologies, and on complications arising from adoption of new and emerging technologies, e.g. gigaswitches and ATM. Generally, we view these and other technical constraints as solvable given sufficient technical attention and market pressure.

Next Steps

Despite the business and technical challenges, requirements for cooperation among Internet providers will continue to grow, as will demands for enhanced data collection, analysis, and dissemination. Development of an effective provider consortium to address these needs would require, minimally:

participation by 3 or more of the major service providers, e.g., ANS, AT&T, BBN Planet, MCI, Netcom, PSI, Sprint, or UUNet
participation by a neutral third party with sufficient technical skills to provide the core data collection and analysis capabilities required by the consortium
appropriate privacy agreements to protect the interests of members
agreement on which basic metrics to collect, collate, analyze, and present (assuming differences in the granualarities of data available to consortium members vs. approved researchers vs. the general public)
agreement on which tools to develop, particularly those related to emerging infrastructures using new technologies

This organization would not only coordinate a solid consistent library of tools that could appeal to both users and providers, but also at the same time serve a vehicle to facilitate testing such tools on real data without comprising anyone's proprietary data or technology. Data collection would strictly focus on engineering and evolution of the overall Internet environment. Accurate data on traffic patterns that could allow engineers to design more efficient architectures, and design them more quickly, conserving both labor and resources unnecessarily allocated to parts of the network where they are not needed. The right statistics collection and cross-ISP dissemination mechanisms would facilitate faster problem resolution, saving the time and money now devoted to chasing down problems, e.g., routing misbehavior, link saturation. Developing the appropriate metrics and tools to measure such phenomena, as well as end-to-end performance and workflow characteristics, is still a looming task. This organization could coordinate not only a solid consistent library of tools that could appeal to both users and providers, but also allow testing on real data in the process with the cooperation of ISPs without comprising anyone's proprietary data or technology. Finally, experience with data will foster the development of more effective usage-based economic models, which, in the final analysis, will allow ISPs to upgrade their infrastructure in accordance with customer demand for it.

additional information

NLANR maintains a repository of links to operational statistics data from research sites, ISPs and the NAPs, at http://www.nlanr.net .

possible focus points for an ISP data analysis consortium for

network architecture design and resource allocation policies . Accurate data on traffic patterns will allow engineers to design more efficient architectures, and design them more quickly, conserving both labor and resources unnecessarily allocated to parts of the network where they are not needed.

faster problem resolution , saving the time and money now devoted to chasing down problems, e.g., route leakage, link saturation, route flapping.

identify/develop critical networking metrics and tools , including defining the characteristics of an ideal measurement tool that could gather data on both end-to-end performance and workflow characteristics.

the development of more effective usage-based economic models

Related Objects

See https://catalog.caida.org/paper/1996_telegeog96/ to explore related objects to this document in the CAIDA Resource Catalog.