Guide to the report generator
Guide to the report generator
How the pieces of the report generator fit togetherThe report generator was designed to be as modular as possible, to allow for easy changes or substitutions in the processing pipeline. It's currently written based on taking data from
crl_flowand storing them in RRDs.
CollectionFor the time series graphs to be useful, monitor data should be output frequently enough to show trends, but not so often as to throw off the flow counts. The default interval for
crl_flowof 5 minutes is usually sufficient.
crl_flow -I -r -o %s.t2 if:fxp0
Note: The .t2 suffix is from when the main flow analysis program was
crl_traffic2, and is used only out of habit. You may
choose whatever filenames you desire.
Transport and processingThe simple script
spoolcataids in the transport and organizing of multiple interval files. It can be used locally or piped over ssh (or equivalent) to another machine for processing. It can also delete or store the files once they've been output. If you want to merge subinterfaces or interfaces together, use t2_merge. At this point, the data are processed by
store_monitor_dataand stored in the desired format. Note that due to the way RRDtool stores information, if you store something with a lot of entries (source and destination ASes, for instance), you can end up creating many RRD files and subsequently using up a lot of disk space, perhaps in the tens of gigs. Keep this in mind.
Report generationOnce the data have been stored, they can be graphed or turned into tables or whatever other form the user wants. Another simple script to help with this process is
create_report, which checks to see if the data are newer than the graphs and calls the graph/table generation scripts if necessary, and then copies the files to the web server. It does not process and store data; its operation is separate from
store_monitor_data. Ideally, it will be called regularly via cron job or similar. It requires a passwordless file transfer, otherwise the transfers will fail.
For those who specifically want to control a single part of the process, they can do it manually:
Config parsingSince the report configuration file contains per-monitor information, but the graphing configuation is done on a per-graph basis,
config_graphsis needed to convert the global configuration into specific commands for the grapher.
create_graphstakes the commands generated above and creates graphs as well as any associated text data used by the web page.
UploadOnce generated, graphs and text need to be transferred to the appropriate directories on a web server.
Web pageReports are currently all viewed via a single CGI web page, generated by
Putting it all togetherThis is an example of how one would set up and configure a standard report generator setup. It is assumed that all the scripts are installed and in one's default path.
First, you need a link to monitor. For this example we'll monitor an interface that captures all traffic in and out of a university campus. There is a machine that listens on a link (which we'll call fxp0), and all this data will be classified as coming from the 'campus' monitor. To start collecting data, we'd use the command:
crl_flow -I -r -o %s.t2
Now we make sure our configuration files are correctly set up. Make
a copy of
subif_map.conf from the doc
directory. You want a different subinterface map for each machine you
collect data from, so we'll rename
campus.conf, and the entries must be changed to match the
interfaces used. We'll map the interface/subinterface
0 to our monitor name, campus. Anything
listed as 'REQUIRED' in the example config files are necessary to
proper operation, and the scripts may not run properly if they are not
there. Some entries in report.conf that must be changed to
ensure proper operation are the rrd_dir and
graph_dir values. If you want to output larger versions
of the timeseries graphs, you must change big_dir, and if
you want to output text tables of stored data, you must change
table_dir. Also, a valid input file for ASFinder is
required in the routes entry if you want to do AS or
Then, the files created by
crl_flow will be transferred
off the monitoring machine and onto a processing machine. (Of course,
one could have the monitoring, processing, and web machines all be the
same, but they are separated here for sake of clarity.) Those data are
then stored into RRD files for later graphing and table generation. We
do this with the following command:
Assuming the configuration files have been correctly set up, all the data will be copied to the processing machine, converted into the desired table types, stored in RRDs and deleted from the monitoring machine.
At this point, you should make sure the config file is set up to allow
transfer from the processing machine to the web server. The
appropriate block in your
report.conf is named
transfer, and requires that you specify the name of the
server you wish to transfer graphs and tables to, the
cgi_dir directory where
display_report will run
from, and the html_dir where non-CGI items (ie, image files)
will be stored. The cp_cmd entry specifies what command will be
used to transfer the files, along with any desired options, although
currently only scp and rsync have been tested.
Once that's set up, you should be able to simply use:
In most cases, you'll want some sort of periodic report generation, such as from a cron job. Here's an example cron entry:
(It's important to set the PATH appropriately in order to have these
scripts work properly.)
0,15,30,45 * * * * create_report report.conf
Once the data files are on the web server, you need to put
the correct CGI directory. If CoralReef is installed on the web
server, you can either copy
display_report into the CGI
directory or make a symbolic link to the installed version. Otherwise,
you will have to manually copy files from a machine with a CoralReef
scp /usr/local/Coral/bin/display_report www:/cgi-bin/example
(The directories have to exist prior to copying files into them.)
scp /usr/local/Coral/lib/CAIDA/Traffic2/ReportSupport.pm www:/cgi-bin/example/lib/CAIDA/Traffic2
To view the reports, you would go to the appropriate URL for the specified CGI directory. This will likely differ depending on the web server's configuration. An example URL might look like:
Proper viewing of reports will require configuration of the
cgi.conf file in the cgi-bin directory, and
each monitor's subdirectory will require configuring a
monitor.conf file. If these are set up
according to the comments in the files, you should be able to view
reports about your monitor's traffic!
Transitioning from old report generator to new oneThe most important data to transfer from the old report generator (
t2_report++) are the RRD files used to generate timeseries graphs. The new report generator system relies on RRD files for all its data storage, although at some point that'll be replaced with a more general archival system.
Different versions of the report generator have used different directory
structures and file formats, necessitating conversion when upgrading.
Information on updating RRD directories can be found in a separate document.
Certain features of
t2_report++ have not yet been implemented
in the new report generator. In particular, it does not show IP addresses
for the most recent interval, as it currently only displays information
stored in RRD files.