Overview
- Internet Data Archives and Analysis
- Passive Measurement Methodology
- The Auckland Measurement Point
- Auckland Data Sets I - IV
- Some Studies which used the Auckland Data
- Martin et al: Internet Delay Times
- Joyce: Games Traffic
- Veitch et al: Wavelet Analysis
- Conclusion
Internet Data Analysis
- Network Research began long before the Internet
- The need for research has grown with the Internet and
its commercialisation
- It's not easy to get good traffic data
- Traffic speed and volumes continue to increase
- Access to production networks is difficult
- It's important to preserve data privacy
- Good data archives, with accurate timestamps,
are rare
- What's needed in a data archive?
- Need bi-directional packet data, i.e. packet traces
- Need accurate timestamps
- Only keep header data, but encrypt IP Addresses
- Compromises must be made, e.g. discarding all
user data makes some measurements (e.g. DNS response
times) impossible
- Interest in Trace File Archives is growing, e.g.
ITA, MOAT, Trace User Community
Clock Synchronisation, the Dag Cards
- Delay measurements require accurate, high-resolution
timestamps
- Many papers published about clock drift, and how
to overcome it
- NTP is not good enough for this purpose, since it
only provides millisecond accuracy
- All interfaces involved in measuring delay must
be synchronised
- WAND group have developed the Dag cards
- Ethernet (10/100 Mbps)
- ATM (OC3, OC12, OC48), AAL5 and PoS
- Hardware support for timestamps, using external
synchronising pulse
- GPS: provides 1-microsecond accuracy at any two locations
but it can be difficult to get GPS signals in a
crowded measurement location
- CDMA: may be a useful alternative to GPS
- DUCK (Dag Universal Clock Kit): one Dag card provides
pulses to synchronise others in the same environment
The Auckland Measurement Point
- The University of Auckland is a large site -
about 3,000 staff and 27,000 students
- Auckland's traffic reaches the Internet via
a single ATM link; all packets (in both directions)
can be seen
- Since early 1999 we have been collecting packet traces
at Auckland, using two Dag cards in a Pentium-based PC
- The Dag cards connect to the Auckland ATM link via
optical splitters; our measurements are purely passive
- Summary
Link |
ATM Virtual Circuit Connection on STM-1c (OC3c equivalent) |
Bandwidth |
2 MBits/sec packet peak rate per each direction |
System |
Celeron 333A, 128KB cache, Asustek P2B (BX chipset) |
Auckland Data Sets I - IV
|
Auckland I |
Auckland II |
Auckland III |
Auckland IV |
Dates |
July 1999 |
November 1999 - June 2000 |
August 2000 |
Scheduled for end of 2000 |
Memory |
32MB PC100 |
32MB PC100 |
32MB PC100 |
96MB PC100 |
Disk |
6.3 GB ATA |
6.3 GB ATA |
50 GB SCSI |
50 GB SCSI |
Capture cards |
Dag 2.1 |
Dag 2.1 |
Dag 3.21 |
Dag 3.21 |
Synchronisation |
None |
Palisade GPS |
Palisade GPS |
Palisade GPS |
Timing |
12.5 MHz |
12.5 MHz soft |
16 MHz DUCK |
16 MHz DUCK |
Archive |
Internet |
DDS-2 Tape |
SCSI disk |
SCSI disk |
- Auckland measurement hardware has evolved during the
project, as shown in the table (memory, disk and archive)
- External time source is Pallisade GPS (except for Auckland I)
- Dag cards changed from 2.1 to 3.21 - introduction of
DUCK time synchronisation
- DUCK Timestamp Format: 64 bits
- High-order 32 are Unix seconds
- Low-order 32 are binary fraction of a second
- Arithmetic (including comparisons) are single 64-bit
operations
- Simpler (therefore faster) than Unix time_t (seconds,microseconds)
- Auckland I is a preliminary test
- Auckland III is a test of two-site measurement
- Auckland II has been published via MOAT, and has drawn
much interest from users
Auckland II Traces
Measurement period |
7 months |
Frequency |
irregular intervals |
Duration per trace |
2.5 - 38.5 hours |
Volume per trace (uncompressed/compressed) |
0.2 - 3.5 GB / 0.1 - 1.6 GB |
Volume total (uncompressed/compressed) |
59 GB / 26 GB |
IP headers total |
985 million |
Number of traces |
42 |
Active trace duration total |
24 days 3 hours |
- Goal was to produce 24-hour traces
- Traces compressed after collection (up to 2 hours
between runs)
- Data is sanitised (`munged'), i.e. IP Addresses are
non-reversibly mapped into 10.0.0.0 before publication
Auckland II Post-processing
- Available trace analysis packages (e.g. CoralReef) don't
handle large (multi-GB) data sets well
- We have produced `DagTools,' a set of simple analysis
tools which ..
- Can be used in a Unix pipeline
- Demonstrate how to use our trace data
- Provide simple visualisation of the trace content
- May serve as prototypes for other researchers
- Example views from Auckland II: 20 hours data
from 1800, 30 June 2000
- Data is aggregated in one-second bins, one-minute averages
are plotted
- Spike in new connections from about 2200 to 2300
investigated with DagTools ..
- dagcut: select required time interval (65 MB from 1.4 GB)
- dagsess: display active TCP and UDP connections
- sort: into session start-time order
- dagbpf: produce input for tcpdump
- Spike was caused by an active scan of the University's
IP Address space for DNS servers, followed by suspicious
badly-formed TCP packets - clearly a network attack
Other Auckland Data Sets: IV and Later
- Auckland IV
- Post-processing finished 8 April
- Covers six weeks, with only two days missing
- 75 GB, with 61% compression on-the-fly using gzip level 1
- About to be published
- Auckland V: to be discarded
- Auckland VI: three-point measurement. Being planned
- Auckland VII: ATM cell traces, replaces V. Being planned
Some Studies which have used Auckland Data
- Martin et al: Analysis of Internet Delay Times
- Presented at PAM2000
- Attempted to separate web page delays into three components
- Highlighted the need for more data, e.g. performing active
traceroutes during passive trace taking
- Joyce: A Study of Games Traffic
- Honours project report, Waikato
- Examined traffic for several different Internet games
- Games data used for simulation of TCP running
at the same time as UDP traffic
- Veitch et al: Wavelet Analysis of TCP Connection Start Times
- Left plots shows number of active TCP connections; note
`edge effects'
- Right plot shows active but incomplete TCP connections.
(Blue line is a copy of left plot)
- Wavelet analysis represents this data in two functions,
H(q) and n(2^j)
- H(q) (left plot) has coefficients for q which are
non-linear, indicating non-trivial scaling behaviour
- n(a) (right plot) is not a simple ln(a), but is approximately
piecewise linear (note the two linear fits)
- Computations for this `Infinitely Divisible Cascades (IDC)'
analysis can be performed in real time
Conclusion
-
Making precise Internet measurements is hard, requiring proper
equipment and technical knowledge
-
Traces are not the answer to all questions, they raise storage and
processing problems.
Traces can - and should - be complemented by realtime monitoring;
NeTraMet is one example for such an approach
-
These are many good reasons to support a public measurement
data archive, particularly for backbone network data.
In particular ..
- Individual measurement projects only serve a single group,
but Internet traces are useful to a broad audience
of researchers
- Many researchers do not get access to the links of interest;
they depend on other people's data
-
We (WAND, NLANR/MOAT and CAIDA) are encouraging the development of
a `Traces User community.' This involves providing analysis and
visualisation tools as well as trace data