Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
Architecture
arch_diagram_xsmall.png
Corsaro-Out Architectural Overview

Conceptually, the Corsaro package is comprised of three main components, each of which interacts with the libtrace trace processing library. At the core is the libcorsaro library which responsible for coordinating the analysis plugins and reading and writing files. The plugins module contains a set of analysis plugins which libcorsaro uses to process packets and deserialize plugin output. The libcorsaro library can then be leveraged by Tools to carry out analysis.

libcorsaro

At the heart of Corsaro is the libcorsaro library. libcorsaro fundamentally operates in one of two modes, Corsaro-Out and Corsaro-In. At a high level, the Corsaro-Out mode is used to analyze packet traces and generate aggregated output. The Corsaro-In mode, on the other hand, is used to read the data generated by Corsaro in the Corsaro-Out mode and allow further analysis to be carried out.

One function that Corsaro takes care of in both modes is plugin management. Once a plugin has been correctly implemented within the plugin sub-library (see the Creating a New Plugin tutorial for more), Corsaro takes care of compiling, linking and running the plugin.

There are currently two different methods for controlling which of the available plugins are used - at compile time by using the

--with[out]-<plugin> 

option to configure (see the Installation section), and by giving an explicit list of plugins to the corsaro tool using the -p argument. The configure option has the effect of only compiling and linking the enabled plugins. That is, disabled plugins are not present in the final binary. The runtime option on the other hand allows a subset of the compiled plugins to be run. Neither method is significantly more or less efficient, but conceptually, one can think of the compile-time method as setting the 'default' plugin set, and the runtime method as specifying a custom list of plugins for a specific run of the tool. Note that using the compile-time method means that tools such as cors2ascii will not be able to read data written by a plugin that was disabled.

Intervals

The primary method that Corsaro uses to facilitate data aggregation is the compression of data into time bins. These bins are known as intervals and are an integral part of the Corsaro trace analysis framework.

Based on a configurable interval length (see corsaro_set_interval), Corsaro notifies plugins at the start and end of each interval (see Plugins). Interval times are based on the timestamps of packets (that is, the time the packet was captured at), with the first interval beginning at the time of the first packet (truncated down to the nearest second). An interval has both a start time and an end time associated with it. All packets that fall within the interval will have a timestamp which is greater than or equal to the start time, and less than or equal to the end time. That is:

start_time <= packet.timestamp <= end_time

Corsaro supports interval durations at second granularity. That is, the shortest possible interval is 1 second. As such, all packet times are truncated down to the nearest second when making interval calculations. For example, given the following interval (of length 60):

# CORSARO_INTERVAL_START 0 1325390400
# CORSARO_INTERVAL_END 0 1325390459

Packets with the following timestamps would be included in this interval:

1325390400.806566 # when truncated is == start time
1325390420.365544 # when truncated is > start_time and < end_time
1325390459.734654 # when truncated is == end_time

The last timestamp is one which may seem counter-intuitive at first, but this policy allows for the accurate representation of the last interval of an analysis run, the length of which may be less than the specified interval length. If the last packet is received before the scheduled interval end time, the recorded end time of the interval will be the timestamp of the last packet in the interval, truncated down to the nearest second.

When using the Corsaro-Out mode to process a trace file, plugins can use intervals to generate aggregated statistics. For example, the FlowTuple plugin maintains flow records which are written out and reset at the end of each interval. Note that while plugins may take advantage of the interval architecture, they are not required to, and may simply ignore the interval start and end event notifications (or a subset of them). For example, the RS DoS plugin aggregates data to 5 minute intervals regardless of the interval length passed to Corsaro.

For more information about the implementation of the interval framework, see the Corsaro-Out and Plugins sections of this document.

IO Framework

Corsaro is fundamentally designed to batch process large volumes of trace data, and output analyzed, but still potentially large quantities of aggregated data. It has also been designed to be easily extensible. To facilitate this, the Corsaro IO framework was created to abstract much of the details of opening, reading from, and writing to, Corsaro files.

The Corsaro IO framework is built on the libwandio library (included with libtrace) which we leverage to provide threaded and compressed IO. By default each file has a dedicated thread to perform (de)compression (if needed) and disk access. Compression is enabled based on the suffix of the file passed to corsaro_alloc_output, as described in Corsaro-Out.

IO performance can be affected by several factors, such as disk speed. CPU speed can also significantly affect performance if compression is used. As such it is important to test Corsaro under peak-load if you plan to use it on a live interface

This section provides an introductory guide to the basic features of the Corasro IO framework, a more advanced tutorial is planned for future releases.

Opening a File

When a plugin needs to write to an output file, most of the time, it simply calls corsaro_io_prepare_file and passes it a pointer to the current corsaro state and a string which represents the plugin name. For example, the FlowTuple plugin makes the following call:

state->outfile = corsaro_io_prepare_file(corsaro, plugin->name)

Which will open a file with the current compression method, and output format, and it will be named based on the template given to the corsaro_alloc_output function as described in Corsaro-Out. In this case, the %P in the template will be replaced by the name of the plugin (flowtuple).

To determine which output mode the file has been opened in (and as such, which mode Corsaro is operating in), simply use the CORSARO_FILE_MODE macro.

Writing to a File

Corsaro supports many methods of writing to files, but the two simplest (and most flexible) are corsaro_file_write and corsaro_file_printf

If Corsaro is operating in binary mode, the corsaro_file_write function should be used, whereas in ASCII mode, the corsaro_file_printf function should be used.

Binary Output

corsaro_file_write takes four arguments:

  1. a pointer to the corsaro state
  2. a pointer to a corsaro_file_t opaque structure (returned by corsaro_io_prepare_file)
  3. a pointer to the data to write
  4. the length of the data

For example, the FlowTuple plugin writes a corsaro_flowtuple_t record by calling:

 corsaro_file_write(corsaro, STATE(corsaro)->outfile, 
                    flowtuple, sizeof(corsaro_flowtuple_t))

ASCII Output

corsaro_file_printf functions in much the same way as fprintf(3), albeit requiring 3 arguments rather than 2:

  1. a pointer to the corsaro state
  2. a pointer to a corsaro_file_t opaque structure (returned by corsaro_io_prepare_file)
  3. a pointer to the format string (just like fprintf(3))

For example, the FlowTuple plugin prints the contents of a corsaro_flowtuple_t record by calling:

corsaro_file_printf(corsaro, file, "%s|%s"
                           "|%"PRIu16"|%"PRIu16
                           "|%"PRIu8"|%"PRIu8"|0x%02"PRIx8
                           "|%"PRIu16
                           ",%"PRIu32"\n",                                      
                           ip_a, ip_b,
                           ntohs(flowtuple->src_port), 
                           ntohs(flowtuple->dst_port),
                           flowtuple->protocol, 
                           flowtuple->ttl,
                           flowtuple->tcp_flags,
                           ntohs(flowtuple->ip_len), 
                           ntohl(flowtuple->packet_cnt)); 

Closing a File

To close an output file, simply call corsaro_file_close and pass a pointer to the corsaro state, and a pointer to the corsaro_file_t structure which represents the file to be closed.

Logging

Corsaro supports limited logging functionality, both to stderr, and to a dedicated log file.

To write to the Corsaro log file (and to stderr if the –enable-debug option is passed to configure), one can simply call the corsaro_log function (corsaro_log_in if operating in Corsaro-In mode). The log file is named using the normal output file template, but using the reserved plugin name, log.

corsaro_log and corsaro_log_in are both based on printf(3), and require 3 arguments:

  1. a string which identifies the function writing to the log
    • the variable __func__ will automatically be converted to the function name by the compiler
  2. a pointer to the corsaro (or corsaro_in) state
  3. a printf(3) style format string

For example, we could record a message in the log when malloc has failed within the corsaro_alloc_output function:

 corsaro_log(__func__, corsaro, "malloc failed");

If this ever gets called, the log will contain a line giving the time at which the log message was issued, the function that issued it, and the actual message. The output for the above example would be similar to the following:

[15:52:43:828] corsaro_alloc_output: malloc failed

There are other logging functions which may be useful in certain occasions when a corsaro state pointer is not available. See corsaro_log.h for more information.

Note:
The Corsaro Logging framework is very limited and as such will likely be replaced or improved in future.

Corsaro-Out

The Corsaro-Out mode is used when processing packet traces. In this mode, Corsaro is passed a series of packets, which are in turn passed down a chain of analysis plugins.

The basic process for using libcorsaro in the Corsaro-Out mode is:

  1. Allocate a corsaro instance using corsaro_alloc_output
  2. Optionally call corsaro_set_traceuri etc to set parameters
  3. Call corsaro_start_output to initialize the plugins (and create the files)
  4. Call corsaro_per_packet with each packet to be processed
  5. Call corsaro_finalize_output when all packets have been processed

corsaro_alloc_output

To initialize Corsaro-Out, the corsaro_alloc_output function is used. corsaro_alloc_output takes a single argument, a string which describes the template to use for the output files created. Currently the only field supported (and required) is %P which is replaced with a plugin-specific string, usually the plugin name. The suffix of the file name will be checked to determine the appropriate compression (if any) to use. Supported options are .gz (gzip), and .bz2 (bzip).

Upon allocation, libcorsaro sends allocate events to each of the enabled plugins. For the moment the plugin simply passes back a reference to a static structure describing properties of the plugin, such as the name and unique ID. This allows plugins to be dynamically enabled and disabled based on name. A pointer to an initialized corsaro opaque structure is then passed back to the user. Every function in libcorsaro that references some state must be passed a pointer to the appropriate structure. This is usually a pointer to a corsaro structure. That is to say, there is no global state in libcorsaro, thus potentially allowing multiple instances of Corsaro-Out to be used simultaneously. While this functionality is fully implemented, no effort has thus-far been made to make Corsaro thread-safe.

corsaro_set_[option]

Using this object, the user may then optionally call functions to set parameters such as corsaro_set_traceuri, corsaro_set_interval and corsaro_set_monitorname.

corsaro_start_output

Once all options have been set, the corsaro_start_output function is called, again passing a pointer to the corsaro structure returned by corsaro_alloc_output. The plugin manager is now started, which scans the list of available plugins, and issues an initialization event to all enabled plugins. Each plugin then establishes any state needed for analysis, opens output files, etc. The Global Output File is also opened at this time. This file contains meta-data about the parameters provided to libcorsaro, such as the trace URI, interval duration, start time, finish time and active plugins. It also provides a common place for all plugins to write meta-data to.

corsaro_per_packet

libcorsaro is now ready to start processing packets. The user now simply calls corsaro_per_packet, passing a reference to a libtrace packet object. The first packet will cause a start interval event to be sent to each plugin. Based on the timestamp in this first packet and the interval specified (either by the default in corsaro_int.h or using the corsaro_set_interval function), Corsaro will determine the time at which the interval should complete. The packet is then passed down the chain of plugins.

The corsaro_per_packet function should be repeatedly called, once for each packet to be processed. Once a packet is received which has a timestamp outside the calculated interval bounds, the plugins are notified with an interval end event, and they (optionally) write out aggregated data for the interval. If the time of the packet which triggered the interval end event exceeds the calculated interval end by more than the interval length (i.e. at least an entire interval passed before receiving the packet), interval start and end events will be generated until the packet's timestamp is within the current interval - effectively generating empty intervals. This ensures that plugins receive interval events for every possible interval between the start and end times.

corsaro_finalize_output

Once all packets have been processed, libcorsaro is notified using the corsaro_finalize_output function. The current interval is ended, using the timestamp of the last packet received (this is the only interval that may be shorter than the configured duration). The trailers are then written to the Global Output File which is then closed. Each plugin is sent an finalize event which allows them to free state and close output files. The corsaro structure is then freed.

Corsaro-In

The Corsaro-In mode is used when processing data which has been created by a Corsaro plugin in the Corsaro-Out mode. This allows further analysis to be carried out on the aggregated data generated by a plugin.

The basic steps for using libcorsaro in the Corsaro-In mode are:

  1. Allocate a corsaro_in instance using corsaro_alloc_input
  2. Call corsaro_start_input to open the input file and load the appropriate plugin
  3. Call corsaro_in_read_record until all records have been read
    • For each record returned, cast to the appropriate type and carry out any processing required
  4. Call corsaro_finalize_input when all records have been processed

corsaro_alloc_input

To initialize Corsaro-In, the corsaro_alloc_input function is used. corsaro_alloc_input takes a single argument - the file to be read. As with the initialization process for Corsaro-Out, the plugin manager generates a set of properties which describe the available plugins, but none of them are initialized until corsaro_start_input is called. Once the necessary data structures have been created, corsaro_alloc_input returns a pointer to an corsaro_in opaque structure that will be used to maintain state as described in the Corsaro-In section.

corsaro_start_input

The corsaro_start_input function can then be called with the corsaro_in pointer returned by corsaro_alloc_input. When corsaro_start_input is called, the input file (as passed to corsaro_alloc_input) is opened, and Corsaro attempts to find the correct plugin to read the data contained in the file.

There are two methods that are used to identify the plugin to use to read an input file. The first is by searching the file name for a string that identifies the plugin that created it. For example, all files generated by the FlowTuple plugin contain the string "flowtuple" in the name (unless manually renamed after creation). If no plugin name is found in the file name, the first few bytes of the file are inspected in an attempt to find a magic number that identifies a plugin. For example, the FlowTuple plugin searches for the number 0x53495854 (or 0x53495855 if compiled without /8 optimizations) beginning in the 14th byte of the file (after the corsaro magic numbers).

The global output file, described in the Corsaro-Out section is treated as a special plugin and detected using the same heuristics as other plugins.

corsaro_in_read_record

Once Corsaro-In has been started, corsaro_in_read_record can then be called to read the first record in the file. corsaro_in_read_record takes three parameters:

  1. corsaro, a corsaro_in instance pointer
  2. record_type, a pointer to a corsaro_in_record_type_t value
  3. record, a pointer to an allocated corsaro_in_record_t object

The corsaro_in pointer, contains the state of this Corsaro-In instance. It is the same pointer that was returned by corsaro_alloc_input, and will be needed for almost all Corsaro-In functions.

The corsaro_in_record_type_t pointer (record_type) is used both for input to, and output from, the function. If the value it points to is set on input it will act as a filter, and only records with that type will be returned. Setting the value it points to to NULL on input will act as a wildcard, and the next record from the file will be returned, regardless of type. When the function returns, the value will be set to the type of the record. See corsaro_in_record_type for the list of possible values.

The final parameter is a pointer to a corsaro_in_record_t structure. A corsaro_in_record structure provides a reusable buffer for reading data from files. It is allocated by using the corsaro_in_alloc_record function, and should be reused for each call to corsaro_in_read_record.

If successful, corsaro_in_read_record returns the number of bytes read from the file, a value of 0 indicates EOF, and -1 indicates an error occurred. If the function was successful, the record_type parameter will be set to the type of record read, and the data will have been loaded into the record parameter. The corsaro_in_get_record_data function returns a void pointer to the record data. This pointer can be cast to the appropriate type by checking the value of the record_type parameter. See the note in the corsaro_in_record_type documentation for more information about how to cast this pointer.

The corsaro_in_read_record function should be called repeatedly until the desired record(s) have been read, or until 0 (EOF) is returned.

corsaro_finalize_input

Once all desired records have been read, corsaro_finalize_input should be called to shutdown the plugins, close the input file, and free any allocated memory. At this point the corsaro_in state object is freed and must no longer be used.

Plugins

Corsaro has been designed to facilitate the easy addition of packet analysis logic. This is implemented by a set of plugins that each contain some specific logic for analyzing packets and generated aggregated output.

At a high level, the plugin interface is a simple set of event notifications from libcorsaro. There are also several other functions which can optionally be implemented if the plugin complies with the Corsaro-In API for de-serializing data.

The events that Corsaro-Out issues to a plugin are:

  • Initialize Output
    • Establish state and open any output files needed.
  • Close Output
    • Close all output files and free state.
  • Start Interval
    • Establish state for a new interval.
  • End Interval
    • Write out any data for the current interval.
  • Process Packet
    • Analyze the given packet and possibly update state.

The Process Packet events are issued in sequence to each plugin, in order from highest to lowest priority. This allows plugins to pass state down the chain by augmenting the corsaro_packet_state structure contained in the corsaro_packet structure passed with the event. Passing state between plugins minimizes rework. For example, the RS DoS plugin uses the class determined by the FlowTuple plugin to detect when a packet has been classified as backscatter.

For more information about the structure of a plugin and instructions for creating a new plugin, see the Creating a New Plugin tutorial.

Tools

In the Corsaro architecture, Tools are pieces of software that use the libcorsaro library to provide functionality to users. As with libcorsaro, these tools fall into two broad categories:

  1. Tools that read trace data and analyze packets
    • uses libcorsaro in Corsaro-Out mode
    • e.g. the corsaro tool
  2. Tools that read Corsaro data and perform further transformation/analysis
    • uses libcorsaro in Corsaro-In mode
    • e.g. the cors2ascii tool

For more details about these and other tools provided in the Corsaro package, see the Tools page.