Application of sampling methodologies to wide-area network traffic characterization
K. Claffy and H.-W. Braun
Cooperative Association for Internet Data Analysis - CAIDA
San Diego Supercomputer Center,
University of California, San Diego
G. Polyzos
University of California, San Diego
The relative performance of different data collection methods
in the assessment of various traffic parameters is significant when the
amount of data generated by a complete trace of a traffic interval is
computationally overwhelming, and even capturing summary statistics for
all traffic is impractical. This paper presents a study of the
performance of various methods of sampling in answering questions
related to wide area network traffic characterization. Using a packet
trace from a network environment that aggregates traffic from a large
number of sources, we simulate various sampling approaches, including
time-driven and event-driven methods, with both random and deterministic
selection patterns, at a variety of granularities. Using several metrics
which indicate the similarity between two distributions, we then compare
the sampled traces to the parent population. Our results revealed that
the time-triggered techniques did not perform as well as the
packet-triggered ones. Furthermore, the performance differences within
each class (packet-based or time-based techniques) are small.