Coral Reef Application Documentation


Coral Reef Application Documentation

CoralReef programs starting with 'crl_' operate on coral sources as input, and take options in the form of -C 'coral_command'. For more information, see the CoralReef command usage document. Most commands take a -? option (which you may need to type as -\?, depending on your shell) to print a usage message, including the subset of standard CoralReef options specific to that command. In case of disagreement between the application's usage message and this document, the application is more likely correct.


Utilities

crl_trace

Captures raw ATM cell traces from any CoralReef device or POS traces from a DAG device and dump them to a file in CoralReef format. In addition to standard CoralReef options, crl_trace accepts:
-o outfile
output to outfile (default: "%s.crl"). We recommend using a ".crl" suffix for these files. Outfile may contain % timestamp conversion specifiers as described in the command usage documention of rotating files; the timestamp used is the current real time. If outfile ends in ".gz", and CoralReef was compiled with libz, the output file will be gzipped.
-rN
rotate the output file after every N blocks (default: do not rotate).
With one or more trace file as input, crl_trace can also be used to join multiple traces into one, and convert old trace formats to the (current) CoralReef format. Unlike most CoralReef applications, crl_trace does not try to correct for the FATM timestamp bug, but records raw data; other applications that read the trace later can correct for the bug.

If protocol rules configuration commands are given, they will be recorded in the file and used by other applications that read the file. However, "deny" commands do not actually prevent recording of the specified subinterface.

If no -Ccomment option is given, crl_trace writes the hostname and command line into the comment field of the tracefile.

To write traces in other formats, see crl_to_pcap and crl_to_dag.

crl_info

Prints information about the selected file to stderr. Reports the type of the hardware, the iomode it was in, the interface it is on, the hardware revision, bandwidth of the link, etc. In addition to standard CoralReef options, it accepts:
-i
print info for all standard interfaces (not supported on all platforms)
-v
print CoralReef package and file format versions
-d
print counts and first/last timestamps of records (blocks, cells, or packets, depending on file type)
-s
short form: omit format information
-p
like -d, but for packets, regardless of file type
-b
like -d, plus per-block counts (if applicable)
-c
like -d, but on cells instead of blocks (if applicable)
-r
use raw (uncorrected) time

Sample output.

See also crl_guess, crl_stats.

crl_print

Prints ATM cells in hex and ascii. ATM headers are expanded to a more readable form, unless the "-r" (raw) option is specified.

Sample (sanitized) output.

crl_print_pkt

Prints information about full (reassembled) link or sub-network layer packets: interface, time, protocol-specific information, length. Recursively prints human-readable information from headers of each protocol coral is able to parse, including IPv4, IPv6, and common layer 4 protocols. Prints unparsable protocols in hex and ascii. In addition to standard CoralReef options, it accepts:
-m n
print only protocol layers >= n (default 1).
-l n
print only protocol layers <= n (default 7).
-H
print IPv6 extension headers on IPv6 line
-h
print IPv6 extension headers on separate lines (default)
-P
Do not print partial packets.
-p
Print partial packets (default)
-s
Print in short format, one line per packet.
-c
Print and verify IP checksum
-x
with -s, print timestamps in native hex format
-e
with -s, print extra fields: IP ToS, ID, TTL; TCP seq, ack, flags
The source code crl_print_pkt.c is a good example of using the packet API.

Note that when reading from multiple interfaces, crl_print_pkt by default does not sort packets by time; to make it do so, you must use the -Csort option.

With the -e option, the "flags" column will contain six characters, with "-" indicating a TCP flag is off and a letter indicating it is on. The letters are "FSRPEC", for FIN, SYN, RST, PUSH, ECE, and CWR, respectively. ACK is indicated by an acknowledgment number in the "tcp.ack" column; no ACK by "-".

Sample (sanitized) output.

crl_time

Outputs to stdout one line per cell in trace. Each line after the header contains the following fields:
  • if - interface number
  • cell - cell number
  • high - high order bits of card timestamp (in hex)
  • low - low order bits of card timestamp (in hex)
  • seconds - timestamp, converted to seconds
  • difference - difference (in seconds) between this cell and the previous cell on the same interface
  • comment - description of any unusual conditions detected
    • high stamp inc (1) (when the high word of the timestamp increments)
    • XXX wrap error (on Fore cards, when hardware and firmware clocks have wrapping errors.)
    • XXX warning: large diff
    • XXX error: diff too small
    • XXX error: diff zero
    • XXX major error: negative diff
Useful for debugging hardware clock problems, allowing one to look at inter-arrival times and other statistics. In addition to standard CoralReef options, it accepts:
-i interface
Read data only from interface
-p
use packet timestamps (default is to use ATM cell timestamps if all sources are ATM, packet timestamps otherwise)
-r
Raw timestamps: don't try to correct card clock resets or Fore ATM clock wraps
-s
Sorts cells by time, interleaving interfaces (same as -Csort).

Note that when reading packets from multiple interfaces, crl_time by default does not sort packets by time; to make it do so, you must use the -Csort or -s option.

Sample output.

crl_cut

Copies a section of an ATM cell trace file to another file. Useful for isolating a portion of a tracefile for later analysis, without requiring the storage of the entire original trace. In addition to standard CoralReef options, it accepts:
-ooutfile
Specifies the file to write the output to.
-iinterface
Read data only from interface
-nnum
Skip cells before the numth (default: 0)
-Nnum
Stop reading after the numth cell
-ttime
Read only cells with timestamp >= time (in seconds)
-Ttime
Read only cells with timestamp <= time (in seconds)

To cut a trace file by packet count or time and output to a pcap file, see crl_to_pcap.

crl_guess

Usage: crl_guess [-d] filename

If filename is in any trace file format that CoralReef can read, crl_guess will attempt to identify the file format (including capture length), number of interfaces, the subinterfaces contained within the file, and, if the source was ATM, the link layer encapsulation protocol used on each subinterface. crl_guess ignores any protocol options recorded in the file, and attempts to guess the protocols by directly analyzing the packet headers. If the file contains records from multiple interfaces, crl_guess assumes they are all the same type of interface with the same set of subinterfaces and encapsulations.

The results of crl_guess are printed in a form suitable for use in a CoralReef configuration file, including a "source" command with prefix and iomode options, and, if the interface was ATM, a "proto" or "deny" command for each subinterface. Note that on ATM interfaces, virtual channels 0:0 through 0:15 are used for signalling, and crl_guess will always report them as UNKNOWN if they are present.

crl_guess can take up to a minute or more if some subinterfaces are truly unidentifiable, but is usually much faster when all subinterfaces are parsable or the file is total garbage.

Unlike most "crl_*" programs, crl_guess does not take -C options, and never requires a filename prefix or iomode to identify the file format; crl_guess is in fact useful for figuring out this information if you have forgotten it. Options:

-d
Print debug information.

Sample output.

See also crl_info.

crl_stats

Counts a variety of statistics on any source (typically trace files). Statistics include counts of source and destination IPs, ports, packets, bytes, and flows. In addition it gives the first and last timestamps recorded, and the first value (when possible).

If an error is encountered in the input, crl_stats prints the statistics for the data up to that point, prints a warning, and exits with a nonzero status.

In addition to standard CoralReef options, it accepts:

-h
print in more human-friendly format
-f
don't count flows (uses less memory)

Sample output.

See also crl_info, crl_guess.

crl_ips

Efficiently extracts IPv4 addresses from any coral source (typically trace files) and prints them on standard output.

See also crl_print_pkt, crl_time.


Converters/filters

crl_to_pcap (formerly crl_filter)

Roughly, a CoralReef equivalent of tcpdump. Given any live or file Coral source (not necessarily crl format), this application applies a tcpdump filter expression to all packets, and writes the matching packets to the output file in pcap format. crl_to_pcap is useful for converting any packet trace to pcap format, splitting a trace into multiple pcap traces by time (with -Cinterval and -o), or creating a pcap trace that is filtered (with -Cfilter), anonymized (with -Canonymize), with payloads stripped (with -l), etc.

Note that pcap timestamps are less precise than DAG timestamps, so if you want to anonymize or filter a DAG trace, you should probably use crl_to_dag.

If crl_to_pcap is not able to identify a link layer for the trace that it can translate to a pcap equivalent, it will attempt to find an IP layer in each packet and write only that to the output pcap file. This is the case for ATM cell traces, which do not have a default link layer protocol for the entire trace, only for individual virtual channels; however, if you specify a protocol for the entire trace (e.g. -Ciomode=proto=ATM_RFC1483), crl_to_pcap will be able to preserve the link layer. If verbosity is >= 2, timestamps of discarded packets will be printed.

In addition to standard CoralReef options, it accepts:

-o outfile
Dump matching packets to outfile in pcap binary format, suitable for being read as a "pcap:" coral source or by "tcpdump -r". We recommend using a ".pcap" suffix for such files. If a nonzero interval is given (with -Cinterval) and outfile contains '%', the output file will rotate every interval. If outfile ends in ".gz", and CoralReef was compiled with libz, the output file will be gzipped. (default: stdout)
-inum
Read only from interface num (default: read all interfaces of all sources)
-f expression
filtering expression, same as for tcpdump (this does the same thing as the -C'filter=expression' configuration option).
-r
Strip the link layer protocol and write only raw IP packets. This can be used to combine sources with different link layers into a single pcap file, or just to discard unneeded information. Packets that do not contain IP are discarded in their entirety.
-nnum
Skip packets before the numth (default: 0) (like -Cskippackets)
-Nnum
Stop reading after the numth packet (like -Cpackets)
-ttime
Read only packets with timestamp >= time (in seconds)
-Ttime
Read only packets with timestamp <= time (in seconds)
-ln
copy only up to protocol layer n and discard unknown protocols
-k
with -l, keep protocols unknown to CoralReef
-pn
with -l, keep n bytes of payload past the last requested header

crl_to_dag

Roughly, a CoralReef equivalent of dagsnap, dagconvert, and dagsplit. Given any live or file Coral source (not necessarily crl format), this application writes packets to the output file in DAG ERF format. If the input source is a DAG source, full precision of timestamps is preserved (but packet loss counters are not). crl_to_dag is useful for converting any packet trace to DAG format, splitting a trace into multiple dag traces by time (with -Cinterval and -o), or creating a DAG trace that is filtered (with -Cfilter), anonymized (with -Canonymize), with payloads stripped (with -l), etc.

In addition to standard CoralReef options, it accepts:

-o outfile
Write packets to outfile in DAG ERF format. We recommend using a ".dag" suffix for such files. If a nonzero interval is given (with -Cinterval) and outfile contains '%', the output file will rotate every interval. If outfile ends in ".gz", and CoralReef was compiled with libz, the output file will be gzipped. (default: stdout)
-ln
copy only up to protocol layer n and discard unknown protocols
-pn
with -l, keep n bytes of payload past the last requested header
-k
with -l, keep protocols unknown to CoralReef
The next three options control the output, and should only be used if you want it to be different than the input (e.g. when converting a file):
-V
write variable length (varlen) records (but use -Cm=varlen to set this on input)
-F
write fixed length (novarlen) records (but use -Cm=!varlen to set this on input)
-s slen
write at most slen bytes of packet; 0 means unlimited. [default: 0] (but use -Cm=first=N to set this on input)

When reading from a live DAG card source, the default values of the -V, -F, and -s options are based on the source's iomode (as set by -Ciomode).

crl_tofr+

This application reads one or more sources (in any format readable by CoralReef) and writes the packets to stdout in 'fr+' format. For more information on the fr+ format, see: http://moat.nlanr.net/Coral-tools/. However, the information there on Coral file format is VERY out of date and should be ignored.

crl_tofr

This application reads one or more sources (in any format readable by CoralReef) and writes the packets to stdout in 'fr' format. For more information on the fr+ format, see: http://moat.nlanr.net/Coral-tools/. However, the information there on Coral file format is VERY out of date and should be ignored.

crl_totsh

This application reads one or more sources (in any format readable by CoralReef) and writes the packets to stdout in 'tsh' format. For more information on the tsh format, see the tsh section of the command usage document.

crl_encode

Anonymizes the IP addresses and strips layer 3 payloads in a crl-format ATM cell trace. Like crl_trace, it does not correct timestamp errors, but leaves timestamps untouched. In addition to standard CoralReef options, it accepts:
-o outfile
Specifies the file to write the output to.

To anonymize or strip other trace formats, or to anonymize or strip a crl trace while simultaneously converting to another format, use the -Canonymize or -l options of one of the crl_to_format applications.


Dynamic reports

crl_rate (formerly crl_vpvc)

Collect IP-level stats, collecting the IP lengths, IP-packet counts, non-IP packet counts and IPv6 packet counts per interface/subinterface. Outputs summaries every interval seconds. interval defaults to 60 seconds. Useful for checking if a link is being utilized to its maximum allocated bandwidth. In addition to standard CoralReef options, it accepts:
-s
Outputs information in 'short mode', with only a single line for each interval.
-4
Show counts and rates for IPv4 only.
-6
Show counts and rates for IPv6 only.
-D[SITsit]
controls display options as follows:
S
displays subinterface-level information.
I
displays interface-level totals.
T
displays totals for all interfaces.
s, i, and t are the same as S, I, and T, respectively, except that redundant information is suppressed (e.g., if there's only one interface, only that total will be displayed).
The default is equivalent to -Dsit, for IPv4 and IPv6.

Sample output.

crl_rate_layer2 (formerly crl_vpvc_layer2)

Collect ATM-level stats, collecting the cell counts and bandwidth per VPI/VCI pair. Requires iomode = user. Outputs summaries every interval seconds. interval defaults to 60 seconds. Useful for making sure a monitor has been installed correctly. The output can be compared with SNMP and the expected bandwidth for the specified VPI/VCI pair. (And that you're even connected to the ones you thought you were!)

Sample output.


Static reports

crl_bycountry

Usage: crl_bycountry [-Ccoral_command]... source routing_table

Outputs the amount of traffic flowing to and from networks, and between networks, ASes, and countries. As opposed to crl_netnet, uses routing table information to determine the networks. Useful for finding inter-country or inter-AS traffic patterns. It requires the uncensored data files to generate sensible results. Source is any CoralReef source. Routing_table the name of a file contaning a parsed routing table (the output of parse_bgp_dump). Calls crl_ipmatrix internally, and uses the NetGeoClient library and server.

Sample output.

crl_hist

Outputs a report on packet and byte counts by IP length and protocol, port summary matrices for TCP and UDP, fragment counts by protocol, packet length histograms for the entire trace and for a list of applications, and the top 10 source and destination port numbers seen for TCP and UDP traffic. Useful for general reports on length, port, and protocol usage.

Sample output. (Note: port 0 means all ports.)

crl_hist_helper

Used as a subprocess of crl_hist. The interface between crl_hist and its helper is a private matter, and is subject to change without notice.

crl_tos

For each interface, displays the TOS of IP packets seen, and the percentage of packets that fall under each value. Useful for determining what TOS values are being used on a network, especially since the diffserv group has proposed using 'unused' bits from TOS.

Sample output.

crl_llcsnap

Prints out the all of the LLC/SNAP values in the trace, separated out by interface and VPI/VCI pairs. The number of occurrences is shown first. Note: Assumes that the input is ATM with ATM_RFC1483 (LLC/SNAP) encapsulation, and that exactly one cell per packet was captured (i.e., iomode=first=48). Useful for determining when a link, known to be carrying LLC/SNAP encapsulated traffic, has corruption, either in the card or elsewhere on the network.

Sample output.


Special-purpose applications

crl_portmap

Captures all packets from all hosts that connect to any other host's portmap port (111). It begins by capturing SYN packets sent to the portmap port on any host. Each time it sees such a packet, it also starts capturing all packets sent to or from the host that sent the portmap SYN packet. The idea behind this is that opening a connection to a portmap port could be the first step of an attack, so any host that does so should be carefully monitored.

In addition to standard CoralReef options, it accepts:

-o outfile
Dump to file outfile (without this option, there is no output. Use '-o -' to print to stdout)
-P
Do not allow partial packets

crl_dnsstat (members only)

The crl_dnsstat application watches for DNS queries on UDP port 53 and counts numbers of messages and numbers of queries, aggregated by any of source IP, destination IP, opcode, query type, query class. The subjects of queries are never recorded. The example output below shows the finest aggregation (most detail) it is capable of recording; command line options can be used to reduce the detail. The "notes" column displays any unusual statistics: the number of messages that contained multiple queries or zero queries, and the number of messages for which the number of queries was impossible to determine. In order to get complete query type information, the source must include full packet payloads.

In addition to standard CoralReef options, it accepts:

-plen
aggregate hosts by CIDR prefix length len (default: 32)
-a
resolve IP addresses to hostnames (requires -p32)
-n
print DNS code numbers, not symbols
-S
ignore IP source address
-D
ignore IP destination address
-Q
don't count by query opcode/class/type
-h
print in more human-friendly format
-r
do not count messages with RD set
-s
print information on hash table usage
-u
print contents of unusual messages to stderr
-ooutfile
write output to outfile (default: stdout). Outfile may contain % timestamp conversion specifiers as described in the command usage documention of rotating files; the timestamp used is that of the beginning of the trace.
-Ooutfile
Like -o, except the file is rotated every interval; the timestamp used is that of the beginning of the interval.

Sample output of crl_dnsstat -D, with source IP addresses changed for privacy.


Traffic flow applications

crl_flow

Creates summaries of IPv4 and IPv6 traffic flow for post-processing by t2_report++ and other scripts. After each interval (defined with the -Ci=n option), crl_flow outputs tables of flows which expired during the interval and tables of flows which are still active, aggregated by source and destination IP addresses, protocol, and source and destination ports. The tables are broken down by subinterface (e.g., ATM VPI/VCI).

Note that crl_flow does not reassemble IP fragments; since only the first fragment contains port information, other fragments are not assigned to the correct flow.

In addition to standard CoralReef options, crl_flow accepts:

-4
Count IPv4 flows only.
-6
Count IPv6 flows only.
-p length
Specify IPv4 prefix masklength; can be between 8 and 32. Default to 32.
-A
Print active flows in addition to expired flows every interval (otherwise, still-active flows will be printed at end of run). See also "-ci".
-a
Display the cname of the addresses. (Must use masklength of 32 to make sense.)
-s
Print hashtable statistics (for development).
-b
Dumps tables in binary format, suitable for efficient input to any t2_* application
-B
Dumps tables in a more compact experimental format, which currently can not be read by any application
-h
Dumps tables in a format that's easier for humans to read.
-o outfile
Specifies the (non-rotating) output file to write to (default: stdout). Outfile may contain % specifiers as described in the command usage documention of rotating files; the timestamp used is that of the beginning of the trace. We recommend a ".t2" suffix for crl_flow output files.
-r
Rotates the output file every interval. The timestamp used in % specifiers in the filename is that of the beginning of the interval. If the filename does not have % specifiers sufficient to make it unique every interval, the file will be overwritten. When -r is used, the default filename is "%010s.%f.t2".
-O outfile
Equivalent to "-r -o outfile".
-I
Expire all flows at the end of each interval (like crl_traffic2 in earlier versions of CoralReef).
-Talgorithm
Use algorithm to expire flows:
fN
expire flows after a fixed period of N seconds of inactivity (like NetFlow).
mM,I
expire flows after an inactive period of M times the largest inter-packet gap, with the gap initially set to I seconds (like NeTraMet).
N
no expiry
-ci
with -A -I, make byte and packet counters cover only the interval
-cl
with -A -I, make byte and packet counters cover the flow's entire lifetime (default)
-z
with -A -T, include flows with zero packets
-m
merge streams of multiple interfaces (and subinterfaces) into one (implies -Csort). Note that matching flows on separate (sub)interfaces will be counted as 1 flow. See also t2_merge.

If neither -I nor -T is given, -I is assumed. If -Ci is not given, it defaults to 300s.

If interrupted, crl_flow will stop reading, and will report the last partial interval (prior to version 3.4, crl_traffic2 would lose the data for the last interval).

Sample output.

crl_traffic2

Equivalent to crl_flow -I.

crl_flowest

Roughly similar to crl_flow, but instead of outputting full tables of tuples, it aggregates into smaller tables, and calculating flow counts via estimators instead of keeping track of each one. In this way, it deals much more robustly with large memory demands, at the cost of some accuracy. It can be compiled with several different methods of estimation, with the default using the most accurate method. It also can use the PSH (packet sample and hold) or FSH (flow sample and hold) algorithms to attempt to identify packet/flow hogs. Compared to crl_flow, it doesn't store the first and latest timestamp values, and uses different values for ports_ok, representing different levels of validity. It also has an optional extra 'other' row in its output tables, when sampling is used to further reduce table size.

Note that crl_flowest does not reassemble IP fragments; since only the first fragment contains port information, other fragments are not assigned to the correct flow.

For complete information regarding flow estimation, read the full paper: A Robust System for Accurate Real-time Summaries of Internet Traffic.

In addition to standard CoralReef options, crl_flowest accepts:

-p length
Specify prefix masklength; can be between 8 and 32. Default to 32.
-a
Display the cname of the addresses. (Must use masklength of 32 to make sense.)
-s
Print hashtable statistics (for development).
-h
Dumps tables in a format that's easier for humans to read.
-o outfile
Specifies the (non-rotating) output file to write to (default: stdout). Outfile may contain % specifiers as described in the command usage documention of rotating files; the timestamp used is that of the beginning of the trace. We recommend a ".t2" suffix for crl_flowest output files.
-r
Rotates the output file every interval. The timestamp used in % specifiers in the filename is that of the beginning of the interval. If the filename does not have % specifiers sufficient to make it unique every interval, the file will be overwritten. When -r is used, the default filename is "%010s.%f.t2".
-O outfile
Equivalent to "-r -o outfile".
-e
"Folds" ephemeral ports by setting one to 0 if it is outside the well known range (0-1023) and the other is within it.
-H
Generates source and destination IP tables.
-P1
Generates a protocol/source port/destination port table.
-P2
Generates protocol/source port and protocol/destination port tables.
-M max
Sets max as maximum number of entries in tables.
-K N
Samples only 1 in 2^N packets.
-F N
Samples only 1 in 2^N flows.
-A N
Samples packets/flows adaptively to stay within N entries.
-u N
Updates rates after reaching 1/N of entry limit [2]. Used with -A.
-N N -E E
Primary bitmap parameters, for general usage. Either the (N,E) or the (d,g,c,b) parameters must be specified for crl_flowest to run. -N specifies the maximum number of flows expected to be seen in an interval, and -E specifies the allowable error (eg .1 for 10% deviation). Error cannot exceed .25.
-d d -g g -c c -b b
Primary bitmap parameters, for advanced configuration. Either the (N,E) or the (d,g,c,b) parameters must be specified for crl_flowest to run. -d is used when crl_flowest is compiled to use direct bitmaps for estimation and can be ignored for default usage. -g sets the number of hash values kept in a list before a multiresolution bitmap is used to store flow entries, with typical values of 2-8. -c sets the number of virtual bitmap components in the multiresolution bitmap, with typical values of 3-5, and must be greater than 0. -b sets the number of bits in each bitmap component (except the last), with typical values of 100-800, and must be greater than 1.

If interrupted, crl_flowest will stop reading, and will report the last partial interval.
crl_flowest will quit with a variety of error messages if input bitmap parameters are invalid:

  • MultiResBmp c must be > 0
  • MultiResBmp b is too small
  • MultiResBmp b must be > 1
  • DirectBmp b must be > 1
  • BloomBmp b must be >= 8 and <= 2^(H-1)
  • MultiResBmp error must be <= 0.25
  • DirectBmp error must be < 1.0
  • BloomBmp error must be < 1.0
  • These MultiResBmp parameters are unsafe with 32 bits
  • These MultiResBmp parameters are unsafe with 64 bits
The last two messages involve the combination of parameters and don't necessarily indicate a single invalid parameter.
In addition, if the bitmap is too small to properly count flows, the string overflow will be output instead of a number. This may case problems with applications that do not properly parse it. Also, when using FSH or PSH sampling, the 'other' row will show a count of 0 flows, potentially leading to inaccuracies in analysis.

Sample output.

crl_flowbloom

A variant of crl_flowest which uses Bloom filters instead of bitmap estimators. Options are the same as above, except the N,E,d,g,c,b bitmap parameters are replaced with:
-b b
Sets size of Bloom filter in bits (will be rounded up to power of 2). Recommended size is 2^28 (268435456).
-c
Applies bitmap correction factor to Bloom filter counts

Sample output.

crl_anf

Estimates bytes, packets, and (optionally) flows seen in IPv4 and IPv6 network traffic, as described in "Building a Better NetFlow" [Estan, Keys, Moore 2004]. The Adaptive NetFlow (ANF) algorithm samples packets and generates tuple table entries with byte and packet counts. The sampling rate adapts dynamically to keep the size of the tuple table within the desired range. An entry's counters do not show the actual counts seen for that tuple, but rather are scaled to indicate the count of the set of tuples of which it is a sample. There are two options for flow estimation. The Flow Counting Extension (FCE) adaptively samples flows and generates additional tuple table entries with scaled flow counts. SYN counting adds scaled flow counts into tuple table entries generated by ANF, but counts only TCP flows that start within the measurement interval, and is less accurate than FCE.

Like crl_flow, this program outputs full tables of tuples, but using sampling to keep CPU and memory usage within set bounds. Given a set number of records to keep (per subinterface), it will adaptively change the sampling rate to maintain a constant table size. It can sample either packets or flows. Compared to crl_flow, it doesn't store the first and latest timestamp values, and uses different values for ports_ok, representing different levels of validity.

Note that crl_anf does not reassemble IP fragments; since only the first fragment contains port information, other fragments are not assigned to the correct flow.

For complete information regarding flow estimation, read the full paper: Building a Better NetFlow.

In addition to standard CoralReef options, crl_anf accepts:

-4
Count IPv4 flows only.
-6
Count IPv6 flows only.
-A n
report n table entries for ANF
-f n
count flows with FCE, reporting n table entries (to do FCE without ANF, use "-fn -A0")
-y
count flows with SYN counting
-p length
Specify prefix masklength; can be between 8 and 32. Default to 32.
-a
Display the cname of the addresses. (Must use masklength of 32 to make sense.)
-s
Print hashtable statistics (for development).
-b
Dumps tables in binary format, suitable for efficient input to any t2_* application
-h
Dumps tables in a format that's easier for humans to read.
-o outfile
Specifies the (non-rotating) output file to write to (default: stdout). Outfile may contain % specifiers as described in the command usage documention of rotating files; the timestamp used is that of the beginning of the trace. We recommend a ".t2" suffix for crl_anf output files.
-r
Rotates the output file every interval. The timestamp used in % specifiers in the filename is that of the beginning of the interval. If the filename does not have % specifiers sufficient to make it unique every interval, the file will be overwritten. When -r is used, the default filename is "%010s.%f.t2".
-O outfile
Equivalent to "-r -o outfile".
-R
with -r, do reporting from a child process
-K d
set initial ANF sampling rate to 1/d packets
-F d
set initial FCE sampling rate to 1/d flows
-m
merge streams of multiple interfaces (and subinterfaces) into one
-s
Periodically prints memory usage to coral errfile.
-n
normalize immediately instead of in parallel
-q
periodically print memory usage to stderr
-u
print unscaled counters in tuple table

If interrupted, crl_anf will stop reading, and will report the last partial interval.

Sample output.

t2_top

Takes crl_flow output, and sorts entries by keys, bytes, packets, or flows. Data is read from standard input.
Options:
-D[m]
Controls display options as follows:
m
shows meta-information about the input tables.
-S[bpf]
Controls sorting options as follows:
b
sorts by bytes.
p
sorts by packets.
f
sorts by flows.
(with no sorting option, entries will be sorted by keys)
-n topn
Limits the number of displayed entries to topn. (defaults to showing all)
-h
Formats the output in a more human-readable form, attempting to line up the field columns.

Sample output.

t2_convert

Takes t2-format output, and converts tables from one form to another. Multiple tables can be requested as separate arguments on the command line. If the requested table cannot be found or converted to, it is ignored. Can also be used to convert tables from binary to text format. Data is read from standard input.
Options:
-t
Outputs timestamp information for each entry.
-F
Flattens the flows count to 1 for each entry in the converted table.
-R routetable
specifies a file containing an ASFinder-compatible routing table to be used for AS aggregation.
-P portstable
specifies a file containing AppPorts-compatible port-mapping configurations to be used for application conversion.
-b
Outputs in binary format.

Example call: crl_flow -Ci=10 /dev/point0 | t2_convert IP_Matrix Proto_Ports_Table

The set of table types which can be aggregated are listed in the documentation for the Tables objects as the various make_XYZ commands. These include: Tuple_Table, IP_Matrix, src_IP_Table, dst_IP_Table, Proto_Ports_Table, Port_Matrix, src_Port_Table, dst_Port_Table.

t2_merge

Reads t2-format data from standard input and merges multiple subinterfaces (or interfaces), outputting on standard output. This is particularly useful when a general analysis of an entire link is desired, instead of just information about each subinterface. If interfaces are merged, all data will be listed under interface 0.
Options:
-t
Outputs timestamp information for each entry.
-i
Merges interfaces as well as subinterfaces.
-b
Outputs tables in binary format.

Example call: crl_flow -I if:eth0 | t2_merge

See also: crl_flow -m.

t2_rate

Similar to crl_rate, outputs information about every interface/subinterface, with the addition of a tuple (unique connection) rate, and works on crl_flow output instead of on CoralReef devices or tracefiles.
Options:
-s
Outputs information in 'short mode', with only a single line for each interval.
-D[SITsit]
controls display options as follows:
S
displays subinterface-level information.
I
displays interface-level totals.
T
displays totals for all interfaces.
s, i, and t are the same as S, I, and T, respectively, except that redundant information is suppressed (e.g., if there's only one interface, only that total will be displayed).
The default is equivalent to -Dsit.

Sample output (normal mode).

Sample output (short mode).

netflow_to_t2

Example script that takes a very specific text input format (flowd-reader -Uc) and converts it into the t2 file format. These t2 files can then be processed with all other t2_ tools, including the report generator applications.

Report generator applications

These applications are used in combination to generate and display automated reports on the output of crl_flow. Their interactions are outlined in a separate document, and there is also a tutorial for setting up a monitor to generate reports. The first four applications read from a common configuration file.

store_monitor_data

Takes crl_flow output and stores it into RRDs for later graphing. However, it is intended to be replaced modularly by any script that takes incoming data and stores it in a format that is understood by create_graphs. The input is read from STDIN and takes two non-optional parameters, the names of the general configuration file and the subinterface configuration file, respectively. The subinterface configuration file is specific to the machine that is collecting the data, and maps subinterfaces to generalized monitor names. The format of this mapping file can be seen here.

Example usage:
spoolcat -M storage_dir '*.t2' | store_monitor_data report.conf monitor1.conf

For every interval stored, store_monitor_data updates a file named .lastdata in the monitor subdirectory of the primary RRD directory.

store_monitor_data responds to two signals specially; SIGHUP will cause it to reload its configuration files without quitting, and SIGUSR1 will cause it to quit after it finishes processing a complete interval (to avoid RRD corruption issues).

create_report

Checks the .lastdata files created by store_monitor_data to see if stored data is newer than the associated graphs. If so, it calls config_graphs and create_graphs and copies the report information to the specified web server. After creating them, it touches a file named .graphsdone in the monitor subdirectory of the main graph directory.
Options:
-f
Forces generation of graphs for all monitors with data, regardless of timestamp.

Example usage:
create_report report.conf

config_graphs

Reads general configuration from STDIN and generates output specific to create_graphs, to determine what graphs to generate. It takes as parameters the list of monitors to generate graph commands for.

Example usage:
cat report.conf | config_graphs monitor1 > graph.conf

create_graphs

Reads graph-specific configuration from STDIN and creates graphs based on that configuration. Also outputs text tables and legends for the pie graphs. The graphing command format is not needed by general users, however one might want to specify something not requested by config_graphs

Example usage:
cat graph.conf | create_graphs

display_report

CGI script that generates a grid of images and/or text tables. It can be configured to allow web users to choose different display characteristics, and can also have preconfigured display styles to choose from. It can also allow one to choose between different monitors.

Data to be displayed exists in two places, the server's CGI directory (for any text data to be output), and a potentially different image directory. These are separated into subdirectories by monitor.

Currently the display characteristics that are true for all reports are: Graph type, Timescale, Counter (in the case of tables, this is what the table is sorted by), and Data source. The choices for these characteristics are defined via the config files. In addition, the user can specify which characteristics are used for row and columns, and which only allow a single selection.

There is one top-level config file named cgi.conf and each monitor's subdirectory has a monitor.conf file.

For Javascript-enabled browsers, the menus are cleaner and easier to read, and there are some simple warnings about invalid selections, but it is also completely functional without Javascript.


Routing table parser

parse_bgp_dump

Reads in a Cisco BGP dump file (such as those available at http://archive.routeviews.org/oix-route-views/) named as the first argument on the command line, and converts it into a simpler format that maps network prefixes and masklens to origin ASes. The input file can be uncompressed, or compressed with gzip or bzip2. The output is uncompressed and is written to a new file named similarly to the input file, except the .gz or .bz2 is removed, and "_parsed.txt" is appended. Thus, running parse_bgp_dump oix-full-snapshot-2007-03-01-0000.dat.bz2 will create an output file named oix-full-snapshot-2007-03-01-0000.dat_parsed.txt The output is used by ASFinder in apps such as crl_bycountry or the report generator. When a prefix maps to multiple conflicting origin ASes, they are merged together into one, such as 10_30_20, sorted by the number of entries in the BGP file. Alternatve behavior is specified with command line switches:
-g
'Guesses' the best AS by choosing the one with the most entries.
-c
Includes entry count in AS list.

Sample output.


Traffic validation

crl_idle_verify

Despite the name, checks to make sure incoming ATM cells are valid IP (not idle) cells. Displays some stats on ones that aren't. The assumption is that since much ATM traffic is IP, if a large percentage of these cells are failing IP checks, they must be corrupt. Useful for low-level debugging of ATM networks, in conjuction with crl_fail.

Sample output (stdout).
Sample output (stderr).

crl_fail

Makes a trace of those cells which fail IP validity checks. It does not, however, do checksum calculations. Useful for general network debugging and culling out interesting (e.g., possibly bad cells) parts of a trace, to save space. For example, if crl_idle_verify shows an abnormally high amount of non-IP cells on an ATM link known to be carrying IP traffic (and culling out non-IP VPI/VCI pairs), one could run crl_fail to output a small trace of bad cells, and run crl_llcsnap on that trace. The default value of the output file is '%s.crl'. '%s' will be replaced with the number of seconds since 1970-01-01 00:00:00 Z. The filename can be overridden by the '-o outfile' commandline option. If outfile ends in ".gz", and CoralReef was compiled with libz, the output file will be gzipped.

crl_nonip

Soon-to-be-defunct program.
Displays LLC/SNAP information about any non-IP packets in a coral trace. Note: Assumes that encapsulation is LLC/SNAP, outputs the appropriate 8 bytes in LLC/SNAP format. Thus it only works on links with DLT_ATM_RFC1483 encapsulation.

Sample output.


Other

spoolcat

Usage: spoolcat glob

Continuously scans for files matching glob and copies them to stdout. glob is a file globbing pattern (used by Perl's glob() function), and must be quoted from the shell. spoolcat keeps track of the last file output (alphabetically) so as to not repeat files, but does not store this information on exit by default. In order to remember the last output file, use the -S option below. Useful for spooling .t2 files into t2_report++.
Options:

-S statefile
Maintains filename of last output file. The pathname matches the same form as glob
-M directory
Moves file into directory after outputting it. If the move fails, the file is left where it was.
-d
Deletes file after outputting it.
For example, if a directory contains the files b1.t2 and b2.t2 when a user runs spoolcat '*.t2' in it, first b1.t2 and then b2.t2 will immediately be copied to stdout. If, while spoolcat is running, b3.t2 appears in the directory, it too will be copied. However, if a1.t2 appears as well, it will be ignored, as it is alphabetically sorted before the last output file (in this case, b3.t2).

playback

Generates a cell, repeatedly, on a device. Currently only works with the Fore ATM devices. Takes the board number and optionally the vpi and vci for the connection. The cell to be generated is hard-coded. This application is not built by default.

crl_hostmatrix

Soon-to-be-defunct.
Takes the output from crl_ipmatrix and displays the source hostname, destination hostname, packets, ratio of packets to total packets, bytes, and ratio of bytes to total bytes. All arguments are passed directly onto crl_ipmatrix. Useful for general debugging of traffic flow and scanning which hosts are sending a large amount of traffic. Each line has the following tab-delimited format:
src_hostname dst_hostname packet_count packet_percent byte_count byte_percent

Sample (sanitized) output.

crl_flowscnt

Groups packets into bins in order to measure flows. In addition to a final report, it will report partial results every 100000 packets. Basis of a framework for comparing different definitions of flows and analyzing which definition gives more useful data.

Sample output.

crl_sample

Sample Perl application for the CRL module. Prints statistics about a trace. Currently prints the number of IP packets, IP bytes, TCP packets, TCP bytes, TCP acks, TCP pushes, number of packets per protocol, and the number of bytes per protocol.

Sample output.


Obsolete

crl_netnet

Soon-to-be-defunct program.
Outputs a matrix showing the amount of traffic flowing between different networks, separated by VPI/VCI pairs. Note: does not use routing table information, only prefix masklength. This is especially handy for analyzing traffic on a particular network, like a university, and studying (from within it) the traffic between different sections. Options:
-p length
Specify prefix masklength; can be between 8 and 32. Default to 16.
-a
Display the cname of the addresses. (Must use '-p 32' to make sense.)
-s
Show statistics; will dump the hashtable's statistics before quitting.

Sample output.

crl_ipmatrix

Soon-to-be-defunct. Use only if you wish to pipe the output into crl_hostmatrix; otherwise, use "crl_flow <options> | t2_convert IP_Matrix".

Keeps track of the total number of packets and bytes that have been seen between each source IP and destination IP. Can be instructed to dump the information periodically if the dumpRate variable is set. Otherwise, the first two lines of output are the total number of packets and bytes, respectively. Useful for general debugging of traffic flow and scanning which IP addresses are sending a large amount of traffic.

crl_portsummary

Soon-to-be-defunct program.
Gives a summary of the ports packets are going to and from across a coral device. Lists the interface, protocol, source and destination port of the packet as well as the length. Periodically dumps information. Useful mainly as input to crl_hist. Each line has the following tab-delimited format:
interface protocol ports_ok src_port dst_port packet_len packet_count

Sample output.

crl_ipaddr

Obsolete. Use "crl_print_pkt -s" instead.

Prints out the timestamp, IP source, destination, and protocol for every packet. Useful for looking at the interarrival times in a low-level analysis of packets between specific IP addresses.

Sample output.

crl_timestamps

Obsolete. See crl_time.

crl_toascii

Obsolete. Use crl_print or crl_print_pkt.

t2_aggregate

This program is deprecated. Use t2_convert and t2_top instead.

t2_top10

This program is deprecated. Use "t2_convert src_IP_Table | t2_top [-S[b|p|f]] -n 10" instead.

t2_ASmatrix

This program is deprecated. Use "t2_convert -R route_file AS_Matrix | t2_top [-S[b|p|f]] -n 10" instead.

t2_tuple

This program is deprecated. Use "t2_rate -Ds".

t2_report++

Generates HTML summary reports from the output of crl_flow. Described in its own document.

This program served a similar purpose to report generator but was more limited in functionality. Its use is not recommended.

Related Objects

See https://catalog.caida.org/software/coralreef/ to explore related objects to this document in the CAIDA Resource Catalog.