Digressive but interesting discussion of self similarity

Several metrics of network traffic have heavy tailed distributions: Furthermore, recent theorems have shown that aggregating traffic sources with heavy-tailed distributions leads directly to (asymptotic) self-similarity. Distributions with infinite variance lead to self-similarity. Willinger identified three minimal parameters for a self-similar model:

Walter noted that queue lengths predicted by self-similarity are much higher than those from classical theory, which typically assumes Poisson traffic. Ferguson noted that the curves did not apply to anything he had seen on T3 backbone links that he examined when working at MCI. The core T3 links that he looked at did not have high burstiness, not nearly so much as the T1 access links, and he attributed the smoothing effect to the higher degree of multiplexing. (Note that another possible explanation is that the T1 access links coming into the T3 were saturdated, so they fed a smooth, constant 1.544 Mbps stream of data.) A number of researchers have analyzed samples from a wide variety of network (SS7, ISDN, Ethernet and FDDI LANs, backbone access points, and ATM) traffic, and found fractal behavior of one kind or another in each one. though he had no data from core backbone links. In any case, Ferguson could easily have been experiencing an environment with enough multiplexing that the fractal effect, while still present, no longer dominated. It is quite possible to have fractal traffic that looks smooth, because all of the correlation is in terms of small deviations from the mean. It is the combination of correlations and variance that matters for queueing effects.

Partridge agreed that such traffic likely still had long-range dependence (Long-range dependence is a more generalized framework for dealing with fractal and fractal-like correlations.), i.e., strong correlation, but it just may not be so obvious from a cursory look at the traffic levels. Willinger agreed, one could expect a range of impacts of self-similarity, depending on the characteristics of the traffic sources, and that aggregation effects will come only at higher utilizations for workloads with more demanding applications. @@guys that feels wrong please check

Sincoskie noted that the lesson was not to sell a T3 pipe into your T3 backbone to a subscriber with T3 applications but only to ones with highly multiplexed traffic workloads.

Willinger continued his presentation with what was needed next, in terms of measurements, modeling, and network engineering. Was there any point in fine tuning in a sea of change? He affirms that it is essential to get as many traffic measurements as possible, and reach out to the statistics and modeling community, both in this and other areas of science and engineering.

Halpern noted that if valid, self-similarity justifies sparse sampling, and Craig added the caveat that one would need occasional dense samples to make sure things were behaving as expected. Dave Sincoskie of Bellcore asked about the implications of self-similarity for the use of time averages, such as the 15 or 30-minute averages that are so popular engineers despite their coarse grain. Willinger agreed they are of value, and even noted that if one assumes self-similarity, one can use coarse-grained measurements to infer finer time scales. Paxson elaborated that one does need the fine granularity to validate the modeling assumptions, but just not all day long.

Matt Mathis expressed curiosity at the underlying physics that would give rise to self-similarity at different time scales. That is, at millisecond time scales, link layer characteristics (i.e., transmission time on media), would dominate the arrival process profile, while at the 1-10 second time scales, the effects of the transport layer would likely dominate. Queueing characteristics might dominate a range of time scales in between, but in any case the the implication that several different physical networking phenomena manifest themselves with self-similar characteristics merits further investigations into these components. There was agreement that answering such questions and refining the models toward greater utility for network engineering would require traffic measurements from a wide variety of locations.

Mark Garrett summarized several points of the discussion:

  1. engineering without measurement is dangerous,
  2. new models (theory) and data are needed,
  3. the federal government should encourage and consider funding projects that leverage implications of recent theoretical and empirical work
  4. we should seek out measurement infrastructure and sources of statistics in the commercially decentralized Internet