NeTraMet - Changes and Extensions
NeTraMet for MS Windows
In order to produce a Windows NeTraMet, libpcap (the packet capture library from ftp://ftp.ee.lbl.gov) was ported to Windows. Two open-source (GPL) packages were used to accomplish the port:
- cygwin
a GNU development environment for MS Windows. The company that provides this package has been acquired by Red Hat. Changes resulting from the acquisition are slated to be made gradually. - WinDump
an MS Windows implementation of tcpdump produced by the Computer Network Group of the Politonico di Torino. It includes an NDIS packet capture driver for Windows network interfaces.
Porting libpcap was non-trivial, requiring not only a lot of work to understand the packet capture driver, but also quite a few changes to libpcap's autoconfigure files. Once a cygwin libpcap library was developed, however, the rest was easy. The normal ./configure and make steps worked for nearly all the programs in the NeTraMet distribution. Only NeTraMet - the traffic meter itself - needed cygwin-specific changes to the source code.
The resulting Windows NeTraMet works very well, provided that the interfaces being metered are fairly busy. This is because the WinDump packet capture module doesn't allow the setting of a timeout on read. Reads can therefore take a long time, reducing the rate at which the meter can respond to SNMP requests from NeMaC.
NeTraMet for CoralReef Trace Files
A version of NeTraMet was developed to use the CoralReef library to read packet headers. CoralReef provides routines which open 'sources' and read packets or cells from them. A CoralReef source may be a network interface (i.e. a live source), or it may be a trace file. A range of different trace file formats is supported.
The first step in developing NeTraMet for CoralReef was to modify the meter's Unix front-end (meter_ux.c). A third 'interface' type 'CoralReef' was added to the existing 'libpcap' and 'NetFlow' types, one of which is selected at compile time. Reading packets from tracefiles using the CoralReef API was straightforward: CoralReef timestamps could be used to determine when meter housekeeping needed to be done (e.g. garbage collection) while reading a trace file in batch mode.
Reading flow data in real-time from the meter was a little more difficult because the meter must first build flows from the trace. A simple state machine was therefore implemented in the meter's outer block. It handles different 'CoralReef trace states,' of which the most important are:
- 'ready to read' packet data from the trace file
- 'wait' until the data has been read
Next, a way needed to be provided for synchronizing NeMaC operations with the meter. A new variable was added to the University of Auckland MIB (not part of the RTFM standard, but used by the meter for management purposes) for holding the CoralReef state. NeMaC reads this MIB once per second until it switches into a wait state. Then, NeMaC reads the flow data, switching back to 'ready to read' state.
Other states allow the meter to wait until NeMaC has downloaded a ruleset before starting to reat the trace file, and to shut down gracefully when end-of-file is reached.
New NeTraMet Command Line Options
Three new command-line options were added to the meter:
- -C 'sourcefile'
tells meter which CoralReef source (tracefile) to use - -T sss
sets the meter reading interval. The meter will read packets until their timestamps show that 'sss' seconds have elapsed, then wait for NeMaC to read the flow data. - -N nnn
specifies the number of intervals for which data is to be read. The default is zero, indicating that the whole file is to be read.
New NeMaC Command Line Options
One new command-line option has been added to NeMaC:
- -C
tells NeMaC that this meter is reading a trace file. NeMaC will collect flow data at the interval specified by the meter's -T option
Experiences with Trace Files
The meter was tested on MOAT trace files (ftp://moat.nlanr.net/pub/MOAT/Traces). Two of the various sites available were particularly useful:
- AIX
Ames Internet eXchange interconnection to MAE-West. The traces have two interfaces, so one can see packet headers for both directions on the link, useful for testing packet-pair matching. However, this interconnection uses a group of separate physical links in a load-balancing configuration, and the trace only contains data from one of those links. This means that only about one-fifth of the data from the overall link is available. The trace data contains lots of partial TCP streams (streams having sudden jumps in sequence number) and many streams which have no SYNs or maybe no FINs. This provides a real test of NeTraMet's ability to time out such 'damaged' streams and recover the memory they previously held. - SDC
This is the commodity Internet connection to SDSC. Its traces are complete - containing no damaged streams - and tend to have lots of long-running TCP streams.
Packet-pair matching was implemented differently in this version of NeTraMet. In NeTraMet44b5, packet-pair matching for TCP streams used multiple data queues, while matching for UDP and ICMP pairs used a single queue. Having been rewritten, the meter now uses streams with packet data queues for all IP flows. This turned out to be a big change to the meter, requiring extensive restructuring and rewriting of the stream-handling code. The result is a better structure which is much simpler to understand and maintain.
NeTraMet for Live CoralReef Sources
After getting NeTraMet running well with trace files, it was time to test it on a live source. For the last few years I have been part of the WAND group at Waikato University (http://wand.cs.waikato.ac.nz), which has developed high-speed network measurement interface cards - the DAG cards. Consequently, I was determined to get NeTraMet running using DAG cards.
My first problem was finding a PC with a pair of DAG cards for use on the SDSC commodity Internet connection - an OC3 ATM link. Hans-Werner Braun of NLANR generously lent me a PC with two DAG 3.2 cards. By coincidence, Joerg Micheel was visiting from Waikato at the time I was setting up the DAG/CoralReef meter, and was available from time to time to help me with DAG-related issues.
CoralReef supports three kinds of ATM cards (Fore, Point, and DAG) as well as its various trace file formats. I added a little more code to meter_ux.c so that the meter could realize that its CoralReef source was a live interface rather than a trace file. In the case of a live interface, it can run its normal outer block.
Unfortunately, for this particular configuration, the CoralReef packet-reading routines that reassemble packets from ATM cells would not work. This has been addressed, and will be fixed in the next release of CoralReef. (Ken Keys went to great lengths to support me throughout this NeTraMet/CoralReef development project.)
In the meantime, I reworked meter_ux.c to read cells from a live ATM interface, which was sufficient for me to start collecting data from the SDSC link.
Meter algorithms for Streams and Packet Data
The Meter MIB provides for the meter to recover memory for flows which are no longer active. A flow's memory can be recovered provided that its data has been read by every meter reader which has an interest in the flow, and that no new packets have been seen for a time interval. That interval is controlled by the MIB's InactivityTimeout variable, which has a default value of 10 minutes. This algorithm works well in practice. The user simply has to make sure that when the meter starts, enough memory is allocated for flows which appear over an interval of InactivityTime plus twice the meter reading interval.
The New Attributes, however, required a lot more implementation effort. It is easy to write a ruleset that requests distributions of information about the streams making up a flow. This can result in a ruleset that creates only a few very long-lived flows, each of which must maintain information about hundreds of streams. For example, we can make flows for UDP DNS requests to the root name servers, but the meter may see hundreds of those requests from many different hosts. For each stream, the meter can maintain a queue of packet data, and use it to match pairs of packets. Once packets have been matched, they are removed from the queue. However, we must cope with requests having no responses, otherwise the flow's list of streams will grow without limit.
An approach was used where a limit was set on the queue lengths (90 packets for a TCP stream, 60 for UDP and ICMP). The queue length is checked whenever the meter wants to enqueue a new packet. If the queue is already full, the oldest entry is deleted. This keeps the total number of packets in stream queues at a manageable level.
Each flow maintains a queue of streams. A stream is created when the meter sees its first packet. The stream information is updated on each successive packet. Eventually the stream terminates. When this happens, it is dequeued, and memory is recovered. For TCP streams this is fairly straightforward, since the meter sees their FIN or RST packets. Although the meter can then time out the TCP stream, this approach is not sufficient by itself. If the meter doesn't see the FIN, then a TCP stream can be left hanging indefinitely. Additionally, UDP and ICMP streams must also be timed out since they have no explicit state signaling.
To cope with this, a dynamic timeout scheme was implemented for streams. This strategy uses a result reported in a paper by Ryu, Cheney & Braun, which found that a stream's packet rate is roughly constant over its lifetime. The algorithm is therefore as follows:
- Look at streams inactive for OTHER_STREAM_TIMEOUT centiseconds (500 centiseconds = 5 seconds).
- For these selected streams, compute the average packet interval: (LastTime-FirstTime) / (ToPDUs + FromPDUs).
- Multiply this average packet interval by STO_MULTIPLIER (default = 20) to estimate timeout interval.
- If (inactive time > estimated timeout interval) then recover the stream.
This algorithm allows flows operating at a steady low rate to last a long time, but will time out busy streams which have become idle in a short time.
These three algorithms (for recovering flows, packet data, and streams) together allow NeTraMet to operate effectively even under heavy packet loads. The meter was run for most of June, 2000. Over those weeks it proved highly reliable.