The architecture of the CoralReef Internet Traffic monitoring software suite
Ken Keys, David Moore, Ryan Koga, Edouard Lagache, Michael Tesch, and kc
claffy
Cooperative Association for Internet Data Analysis - CAIDA
San Diego Supercomputer Center,
University of California, San Diego
Passive data collection tools have traditionally been designed for
specific tasks such as accounting (NeTraMet) or packet capture (tcpdump).
The CoralReef suite (
http://www.caida.org/tools/measurement/coralreef/) was
designed to provide a uniform interface to passive data for a wide range of
applications: from raw capture to real-time report generation. CoralReef
provides convenient set of passive data tools for a diverse audience, from
network administrators to researchers.
The CoralReef architecture is based on a toolbox paradigm. The base
programming interface is implemented in a C library named libcoral. In
order to allow users to write a single program to access many data sources,
libcoral provides a consistent API for capture cards from multiple
providers on ATM, POS, and 10-100-1000 ethernet capture cards from multiple
vendors, as well as pcap interfaces. In addition, libcoral provides a
consistent interface to many packet capture file formats including: all
coral formats (NLANR formats included), pcap, DAG ATM, and DAG POS. This
API satisfies an important design goal of providing multiple ways to
approach the same problem. The libcoral API can operate on ATM cells,
blocks of cells, or network packets, one at a time or via callbacks; the
application developer can use whichever is most convenient. NeTraMet and
Vern Paxson's Bro are two applications that have been adapted to use the
Coral API. To facilitate rapid prototyping and development, another design
goal is to provide the same interface in C/C++ and Perl. Because the Perl
module CRL.pm directly calls the C routines, Perl scripts using CRL.pm
perform well enough for many data analysis applications. Additional
libraries and modules have been provided to perform more complex tasks.
There are libraries and Perl modules for doing AS lookups from BGP routing
tables and others for determining geographic locations via NetGeo.
CoralReef includes modules for the storage and manipulation of frequently
collected data including: source and destination hosts, IP protocols,
ports, and amounts of traffic in bytes, packets and flows. These modules
provide methods to automatically aggregate data into other table types and
allow for efficiently selecting those entries that generate the most
traffic. For example, it is possible to convert an AS table of byte/packet
counts into a country table with a single method call. Higher level
applications are written using these building blocks, so that for example
creating an IP or AS matrix is accomplished with just a small Perl program,
while larger programs (such as the realtime report generator t2_report) are
more complex arrangements of the same building blocks.
In this paper, we will present the CoralReef design philosophy, overall
architecture, and capabilities. CoralReef is a package of libraries, device
drivers, classes, and applications written in, and for use with, several
programming languages. By highlighting design and architectural decisions
at all levels in CoralReef, we will show how CoralReef is a powerful,
extensible, efficient, and convenient package for passive data collection
and analysis.