The Auckland Data Set: An Access Link Observed

Joerg Micheel, Ian Graham (The University of Waikato)
and Nevil Brownlee, (The University of Auckland / CAIDA)

ITC14 Seminar, April 2001


  1. Overview

    • Internet Data Archives and Analysis
    • Passive Measurement Methodology
    • The Auckland Measurement Point
    • Auckland Data Sets I - IV
    • Some Studies which used the Auckland Data
      • Martin et al: Internet Delay Times
      • Joyce: Games Traffic
      • Veitch et al: Wavelet Analysis

    • Conclusion

  2. Internet Data Analysis

    • Network Research began long before the Internet
    • The need for research has grown with the Internet and its commercialisation
    • It's not easy to get good traffic data
      • Traffic speed and volumes continue to increase
      • Access to production networks is difficult
      • It's important to preserve data privacy
      • Good data archives, with accurate timestamps, are rare

    • What's needed in a data archive?
      • Need bi-directional packet data, i.e. packet traces
      • Need accurate timestamps
      • Only keep header data, but encrypt IP Addresses
      • Compromises must be made, e.g. discarding all user data makes some measurements (e.g. DNS response times) impossible

    • Interest in Trace File Archives is growing, e.g. ITA, MOAT, Trace User Community

  3. Clock Synchronisation, the Dag Cards

    • Delay measurements require accurate, high-resolution timestamps
    • Many papers published about clock drift, and how to overcome it
    • NTP is not good enough for this purpose, since it only provides millisecond accuracy
    • All interfaces involved in measuring delay must be synchronised
    • WAND group have developed the Dag cards
      • Ethernet (10/100 Mbps)
      • ATM (OC3, OC12, OC48), AAL5 and PoS

    • Hardware support for timestamps, using external synchronising pulse
      • GPS: provides 1-microsecond accuracy at any two locations
        but it can be difficult to get GPS signals in a crowded measurement location
      • CDMA: may be a useful alternative to GPS
      • DUCK (Dag Universal Clock Kit): one Dag card provides pulses to synchronise others in the same environment

  4. The Auckland Measurement Point

    • The University of Auckland is a large site - about 3,000 staff and 27,000 students
    • Auckland's traffic reaches the Internet via a single ATM link; all packets (in both directions) can be seen
    • Since early 1999 we have been collecting packet traces at Auckland, using two Dag cards in a Pentium-based PC
    • The Dag cards connect to the Auckland ATM link via optical splitters; our measurements are purely passive
    • Summary
    • Link ATM Virtual Circuit Connection on STM-1c (OC3c equivalent)
      Bandwidth 2 MBits/sec packet peak rate per each direction
      System Celeron 333A, 128KB cache, Asustek P2B (BX chipset)

  5. Auckland Data Sets I - IV
      Auckland I Auckland II Auckland III Auckland IV
      Dates July 1999 November 1999 - June 2000 August 2000 Scheduled for end of 2000
      Memory 32MB PC100 32MB PC100 32MB PC100 96MB PC100
      Disk 6.3 GB ATA 6.3 GB ATA 50 GB SCSI 50 GB SCSI
      Capture cards Dag 2.1 Dag 2.1 Dag 3.21 Dag 3.21
      Synchronisation None Palisade GPS Palisade GPS Palisade GPS
      Timing 12.5 MHz 12.5 MHz soft 16 MHz DUCK 16 MHz DUCK
      Archive Internet DDS-2 Tape SCSI disk SCSI disk

    • Auckland measurement hardware has evolved during the project, as shown in the table (memory, disk and archive)
    • External time source is Pallisade GPS (except for Auckland I)
    • Dag cards changed from 2.1 to 3.21 - introduction of DUCK time synchronisation
    • DUCK Timestamp Format: 64 bits
      • High-order 32 are Unix seconds
      • Low-order 32 are binary fraction of a second
      • Arithmetic (including comparisons) are single 64-bit operations
      • Simpler (therefore faster) than Unix time_t (seconds,microseconds)

    • Auckland I is a preliminary test
    • Auckland III is a test of two-site measurement
    • Auckland II has been published via MOAT, and has drawn much interest from users

  6. Auckland II Traces
      Measurement period 7 months
      Frequency irregular intervals
      Duration per trace 2.5 - 38.5 hours
      Volume per trace (uncompressed/compressed) 0.2 - 3.5 GB / 0.1 - 1.6 GB
      Volume total (uncompressed/compressed) 59 GB / 26 GB
      IP headers total 985 million
      Number of traces 42
      Active trace duration total 24 days 3 hours

    • Goal was to produce 24-hour traces
    • Traces compressed after collection (up to 2 hours between runs)
    • Data is sanitised (`munged'), i.e. IP Addresses are non-reversibly mapped into 10.0.0.0 before publication

  7. Auckland II Post-processing

    • Available trace analysis packages (e.g. CoralReef) don't handle large (multi-GB) data sets well
    • We have produced `DagTools,' a set of simple analysis tools which ..
      • Can be used in a Unix pipeline
      • Demonstrate how to use our trace data
      • Provide simple visualisation of the trace content
      • May serve as prototypes for other researchers

    • Example views from Auckland II: 20 hours data from 1800, 30 June 2000

    • Data is aggregated in one-second bins, one-minute averages are plotted
    • Spike in new connections from about 2200 to 2300 investigated with DagTools ..
      • dagcut: select required time interval (65 MB from 1.4 GB)
      • dagsess: display active TCP and UDP connections
      • sort: into session start-time order
      • dagbpf: produce input for tcpdump

    • Spike was caused by an active scan of the University's IP Address space for DNS servers, followed by suspicious badly-formed TCP packets - clearly a network attack

  8. Other Auckland Data Sets: IV and Later

    • Auckland IV
      • Post-processing finished 8 April
      • Covers six weeks, with only two days missing
      • 75 GB, with 61% compression on-the-fly using gzip level 1
      • About to be published

    • Auckland V: to be discarded
    • Auckland VI: three-point measurement. Being planned
    • Auckland VII: ATM cell traces, replaces V. Being planned

  9. Some Studies which have used Auckland Data

    • Martin et al: Analysis of Internet Delay Times
      • Presented at PAM2000
      • Attempted to separate web page delays into three components
      • Highlighted the need for more data, e.g. performing active traceroutes during passive trace taking

    • Joyce: A Study of Games Traffic
      • Honours project report, Waikato
      • Examined traffic for several different Internet games
      • Games data used for simulation of TCP running at the same time as UDP traffic

    • Veitch et al: Wavelet Analysis of TCP Connection Start Times

      • Left plots shows number of active TCP connections; note `edge effects'
      • Right plot shows active but incomplete TCP connections. (Blue line is a copy of left plot)

      • Wavelet analysis represents this data in two functions, H(q) and n(2^j)
      • H(q) (left plot) has coefficients for q which are non-linear, indicating non-trivial scaling behaviour
      • n(a) (right plot) is not a simple ln(a), but is approximately piecewise linear (note the two linear fits)
      • Computations for this `Infinitely Divisible Cascades (IDC)' analysis can be performed in real time

  10. Conclusion

    • Making precise Internet measurements is hard, requiring proper equipment and technical knowledge
    • Traces are not the answer to all questions, they raise storage and processing problems.
      Traces can - and should - be complemented by realtime monitoring; NeTraMet is one example for such an approach
    • These are many good reasons to support a public measurement data archive, particularly for backbone network data. In particular ..
      • Individual measurement projects only serve a single group, but Internet traces are useful to a broad audience of researchers
      • Many researchers do not get access to the links of interest; they depend on other people's data

    • We (WAND, NLANR/MOAT and CAIDA) are encouraging the development of a `Traces User community.' This involves providing analysis and visualisation tools as well as trace data

  11. Nevil Brownlee (nevil@caida.org)
    Last updated: 15 April 2001