end-end Internet performance assessments

The qualitative state of the Internet system is currently difficult to assess, and the challenge has increased with the transition away from the NSFNET backbone model. A wide range of competing service providers (ISPs) now fill the role that the NSFNET program largely carried for several years, and these providers are only recently beginning to perceive the need for proactive collaboration in the context of a fate-sharing condition of their industry. When fragmentation and service degradation occurs, users begin to seek ways to describe service, or lack thereof, as it is advertised to them. ISPs in turn seek (different) metrics for describing service and connectivity conditions from a `network-internal' perspective.

As part of a two-pronged approach to assess Internet service properties from both the end-user's as well as the service provider's point of view, we investigate performance between a set of endpoint pairs that cover a large geographic area. With help from the NSF supercomputing centers and NASA at FIX-West, NLANR has deployed web caching machines at those strategic locations. While the primary use of the machines is to support a joint project with NSF and Digital Equipment to investigate resource and performance issues in information caching, they also provide an ideal platform for distributed probe studies.

Using a modified ping (using microsecond timestamps as packets as they return from the remote locations), we are now pinging 10 locations throughout the country from each cache every 15 minutes, using a fast 10-packet (64+20 bytes) burst. We attempt to sequence these runs in time across the caching machines, to avoid synchronization among tests (particularly since the caches are time-synchronized). A major objective is attempting to obtain service quality information without being invasive to the infrastructure. For this, we accept a wider margin of error than some other tools which measure, e.g., throughput more precisely, but while imposing possibly significant load on the network. As infrastructural weaknesses are observed, other, likely more service provider centric, tools would have to be employed to determine more exact performance, reasons for the weaknesses, and appropriate responses to improve the situation.

The caching machines target ten locations: three at each coast, and four distributed more centrally. A perl script runs from cron that pings to each of the ten sites, with example output that looks like:
  PING Mon Jan 22 11:10:52 1996 www1.cac.washington.edu (140.142.3.7): 64 data bytes at 822337852.548258
  >       0 822337852.620408
  >       1 822337852.623333
  >       2 822337852.632108
  >       3 822337852.656483
  >       6 822337852.674033
  >       7 822337852.674033
  >       8 822337852.675008
  >       9 822337852.682808
  
  ----Mon Jan 22 11:11:02 1996 www1.cac.washington.edu (140.142.3.7) PING Statistics----
  10 transmitted, 8 received, 20.00% packet loss.
  10.134 seconds elapsed, throughput = 0.79 packets/sec; 530.482 bps.
  round-trip (ms) min/avg/max = 50.700/57.403/69.225
                  var/sdev/skew/kurt = 46.020/6.784/0.715/1.722
The timestamps in the PING line, show the initial starting time of the 10-packet run, and the lines beginning with ">" are the microsecond-timestamp of the returning ICMP Echo Reply packets as they arrive.

The resulting files from the caching machines are collected centrally, and can be processed to assess estimates of Internet performance, e.g.,

A summary output may take specific source/destination pairs, and summarize performances, e.g., for a specific day, which gives me output like:
src    time          throughput
    min avg max   min    avg    max destination timestamp

bo  4  9.65 10    310   7989   9586         ISI 822556264
bo  6  9.47 10    214   7463   8929         NSF 822556264
bo  5  9.52 10   1778   5606   8115         OSU 822556264
bo  4  9.10 10    303   5314   7660       UUtah 822556264
bo  6  9.96 10   1797   7187   9240       UWash 822556264
bo  7  9.73 10   3837   9091  10928         UCB 822556264
. . .

The following three examples summarize performance for approximately the past three months, ending the 14 April 1996.

Example 1: service probability

As an example, we could focus on the probability of successfully reaching the destinations, aggregated over several granularities, e.g., 1, 2, 3, 7, 14, and 30 days. In the maps below, red lines are the least succesfully connected site pairs, blue the most, with the range normalized from .7 to 1.0, then using the square root to make the change more obvious. Note the map has more red lines as you go further back in time, indicating that service to these sites was `better' in the last few days than over the last month.

1 day 2 days_ 3 days
7 days 14 days 30 days


A movie of them click for movie (you need netscape2.0, it's a gif89 thing, ~700K) supplied by Tamara Munzner, Eric Hoffman, and Carl Samos. For the dedicated, the raw data is available.

Example 2: service predictability

Standard, easy-to-use, and minimally invasive Internet performance metrics from an end-user perspective are very difficult to obtain. How does a customer know what performance to expect? We try to use these simple ping results to provide a very rough indication. We advocate this only as steps toward thinking in the right direction; we recognize that these assessments are not optimally accurate, but rather an example of what people can accomplish today, and a baseline from which to develop more accurate metrics.

Each point in a given graph below graph represents daily average/peak ratios of loss vs throughput. More specifically, there are about 24 data point loss-throughput pairs (one for each hour), so a single dot represents the mean loss divided by the peak loss on the x-axis, versus the mean throughput divided by the peak throughput on the y-axis.

In other words: for each 10-packet ping session, x <= 10
packets actually arrive and those packets get through within a certain throughput y, in bps.
Then we take summaries of these ping sample pairs. Each dot is based on one day's worth of ping tests (at one per hour, that's about 24, though many don't even make it through the test due to, well, the Internet. Someone should really investigate that.)

So for the 24 tests you get:

(a) 24 numbers between 1 and 10 (how many of the 10 ping packets got thru)
(b) 24 numbers between 0 and high (bits per second for the 10 packets)
and then what we do is plot the pair (X,Y) where
X = average(numbers in set (a) ) / max (numbers in set (a)), and
Y = same for (b)
so this is some measure of the variability in service, the average service you get, divided by the maximum that you could get.

You can get larger versions of each graph by clicking on them, but we will try to explain implications of some of the data. The legend for each graph shows:
colorabbrevsrc sitesite
whitesd san diego
redsv fix west
greenbo boulder
blueuc urbana-champaign
yellowpb pittsburgh
brownit ithaca


The dots in the upper right show paths that are fairly well-connected, i.e., Pittsburgh and Ithaca seem to have less loss and throughput variation to ISI than do San Diego, Silicon Valley, Boulder, and Urbana-Champaign. Urbana-Champaign has particularly bad connectivity to ISI, and the connectivity from Northern to Southern California is quite variable, consistent with anecdotal evidence (at least mine).

(Now one can go back and find out why with traceroutes and other tools, but that's a step we haven't taken yet, and would be quite tedious, so tools to automate that would be good).



Reaching Berkeley seems to be the hardest for Urbana-Champaign, San Diego, and Silicon Valley.












The data on the left indicates that as a destination, the University of Washington has trouble from San Diego and Silicon Valley; Urbana-Champaign and Boulder are rather reliably connected, Pittsburgh and Ithaca are so-so.
Things aren't great to Utah (on right): Urbana-Champaign, Pittsburgh and Boulder have the most trouble with it.














Same for Texas.
But it's the opposite for Wisconsin, OSU (on right and below). Urbana-Champaign and Boulder do okay getting there; Silicon Valley, San Diego, and Ithaca are unhappy sources.
















Florida isn't great from anywhere.
(Well, i hear the weather's nice.)













NSF is getting fairly high reachability and throuput from Pittsburgh, Ithaca San Diego and even Silicon Valley.

MIT seems to have the fewest reachability problems overall.













Example 3: performance profiles

To get a sense of the range of the above data, we can draw diamonds around the mininmum and maximum values for each data, with the averages crossing at the diamond `center'. We'll just show this for a few of the destination sites, to give an idea.

Note the San-Diego-to-Berkeley (right) and San-Diego-to-Washington (below) paths have a wide range of performance, consistent with the data from Example 2.














Utah, to the right, and UWisconsin, below














Texas to the right, and Oregon State below.














And NSF in Washington, DC.












22 Apr 96, comments to info@nlanr.net