Boardwatch and Keynote recently announced the results of a study (www.keynote.com/company/announcements/pr062597.html, www.boardwatch.com/MAG/97/JUL/bwm22.htm) that described itself as `the first comprehensive, independent measurement of 29 ... Internet backbones'.
Significant feedback from the technical community has characterized this study as over-ambitious and not representative of Internet backbone capacity as claimed, but rather of the quality of connectivity to an Internet provider's corporate web site, unlikely to be well correlated to backbone capacity.
Feedback from the community has come in the form of mailing list discussions on NANOG (North American Network Operations Group) and inet-access (a mailing list geared toward small Internet providers). The consensus opinion among technical respondents has been to dismiss the study as of minimal utility for many reasons related to the enormity of the task embodied in that label. A responsible alternative to withdrawing the study altogether would be to modify its label to more accurately reflect the actual measurements and the components that are likely to dominated the results.
The data provided reflect a first attempt to study web hosting server performance of specific ISPs. The largest weakness in the study is the selection of the ISPs' own web site (e.g., www.sprint.net), rather than servers of web-hosted customers of those ISPs. A superior quality ISP might have strongly over-engineered backbone bandwidth and customer servers, but decide to use a spare slower machine for their web site, or not place that server in the best position in their infrastructure, since that site typically does not have to sustain nearly so many hits as their more popular content providing customers. ISP home pages are also likely located on a single server, whereas customer web pages are more likely distributed among several server clusters, all strategically connected. In this case, measurement of a single site would not necessarily correlate to the other hosting sites.
It also seems likely that should such measuring organizations continue to misrepresent measurements as an overall indicator of ISP quality, ISPs will learn to capitalize of the gross weaknesses in the methodology. An obvious method of doing so in this case would be for the ISP to place its web server in one or multiple specially well-connected locations but leaving the majority of its web hosts and users poorly connected. Given the eagerness of large customer sites to have their sites included in the keynote testing (even paying for it), it seems likely that the measurements could reasonably include non-ISP-specific web sites. This approach too has has limitations in that it is not immediately obvious whether a web site served by a specific provider is hosted by that provider's web hosting service or is located at the customer premise, data from which may reflect congestion on an individual customer's leased line connectivity or their own web server capacity.
Another weakness is the Keynote/Boardwatch failure to disclose from where they launch client queries. They could more accurately describe the study as a measurement of the path from specific client locations to specific ISP's corporate web sites. The paths taken to a given ISP's corporate web site may traverse other ISP backbones depending on the selection of client sites.
In the final analysis, the Keynote/Boardwatch study brings light to the increasing emphasis on timely web interactiveness, rather than the traditional engineering lexicon of packet latency/throughput/loss-rate. Keynote's widely spread sample client base also addresses the need to characterize Internet performance as seen by a large cross-section of users. With the current infrastructure in place, the Keynote team could without much difficulty undertake a similar study addressing some of the issues ostensibly aimed at with the current study, with considerably less effort than most other groups would require.
As noted, Keynote makes audacious assessments of the correlation of their data with overall ISP performance. The web page describing the study, as well as the press release, strongly suggest the utility of the data as a general metric of merit for judging ISP performance, perhaps a SPECmark for ISPs. However, Specmark avoids the WAN measurement issue, and includes considerably stronger disclaimers. Specmark also takes the noble approach of distributing source code.
Keynote strikes a somewhat more ambitious cord. It is incogitable that measurement of a single web server in Virginia can accurately predict performance of a tail circuit to a business client in California. Some overlap of performance factors (some bits of backbone used may be the same), but on the whole there are so many factors influencing web server delay stats, most uncorrelated to the ISP's performance and vice versa.
It is unfortunate that such strong claims accompanied the study. Tamer claims appear later in the report:
The Backbone Index indicates how well a server on a particular backbone will be seen by the general Internet population.
The measurement results are still poor indicators of even this behavior, but the study does make a significant step toward solving the problems associated with obtaining such numbers. For the remainder of this article, we treat this latter goal as the legitimate aim of this study.
Via a diverse list of measurement sources, Keynote's methodology reasonably approximates performance over a diverse client, although not server population. To address the latter deficiency, one could measure performance to a wide variety of popular sites, associating server data with the topologically closest, and perhaps weighting measurements according to site popularity. One could go further by using a few separate (and/or equally sized) pages in each commonly referenced web hosting site.
Providing an even more thorough explanation of the methodology on the study's home page would also improve its perception within the engineering as well as the educated user community, e.g., an detail on the topological distribution of Keynote clients to ISPs, at the very least with verifiable evidence that the clients are spread among several ISPs.
It's not so much difficult to measure as it is difficult to develop fair measurements that are empirically capable of duplication. And almost any measurement developed immediately raises a ballyhoo of detractors attempting to prove that it doesn't prove anything.Indeed, but measurement is useful precisely because it allows us to apply the scientific method to characterizing network behavior, facilitating the dissemination of information that reflects the strongest possible integrity. A claim has a much higher chance of scientific legitimacy if it has an independently reproducible method to test the validity of the claim at hand.
The Keynote test does not well indicate how the general Internet population will experience a specific server on a particular backbone. It does reasonably test performance of the ISP's own web pages as seen by a broad sample of well-connected users, a difficult test in itself, and one with Keynote's technology facilitates.
Over-advertising of the project as more than it really reflects will certainly lend additional speciousness to the study. The undertone of discussions on the list following its initial release, calumnious at best, with arguments going in circles on both sides, turned off many folks from even waiting for any resolution on the list, convinced that there was diminishing if ever any real interest in scientific rigor.
On the other side of the argument have been extreme critics, some whose opposition to publicly available measurements to date may be as counterproductive as the more inaccurate or misrepresented measurements among those measurements. It would undoubtedly be useful if academic or interested parties were to pursue similar studies, merging their own (perhaps more rigorous) perspectives. More community participation would be helpful, especially from those most quick to criticize. Such involvement can include better defining and solving these performance measurement challenges, providing feedback to IPPM and Caida.
Several other related information sources include:
is not an indication that one Internet Service Provider is superior to another. Rather, it is intended to be an indication of Internet health from the perspective of OUR connection here at Clear Ink at a given moment. Also, routes change, variables are many, and certainty is minimal in the crazy world of electrons. Please take all of this information with a grain of salt; we just did the scripts as a grand experiment and people seem to be enjoying the page.