Guide to the report generator

How the pieces of the report generator fit together

The report generator was designed to be as modular as possible, to allow for easy changes or substitutions in the processing pipeline. It's currently written based on taking data from crl_flow and storing them in RRDs.

Collection

For the time series graphs to be useful, monitor data should be output frequently enough to show trends, but not so often as to throw off the flow counts. The default interval for crl_flow of 5 minutes is usually sufficient.

Example usage:
crl_flow -I -r -o %s.t2 if:fxp0

Note: The .t2 suffix is from when the main flow analysis program was crl_traffic2, and is used only out of habit. You may choose whatever filenames you desire.

Transport and processing

The simple script spoolcat aids in the transport and organizing of multiple interval files. It can be used locally or piped over ssh (or equivalent) to another machine for processing. It can also delete or store the files once they've been output. If you want to merge subinterfaces or interfaces together, use t2_merge. At this point, the data are processed by store_monitor_data and stored in the desired format. Note that due to the way RRDtool stores information, if you store something with a lot of entries (source and destination ASes, for instance), you can end up creating many RRD files and subsequently using up a lot of disk space, perhaps in the tens of gigs. Keep this in mind.

Report generation

Once the data have been stored, they can be graphed or turned into tables or whatever other form the user wants. Another simple script to help with this process is create_report, which checks to see if the data are newer than the graphs and calls the graph/table generation scripts if necessary, and then copies the files to the web server. It does not process and store data; its operation is separate from store_monitor_data. Ideally, it will be called regularly via cron job or similar. It requires a passwordless file transfer, otherwise the transfers will fail.

For those who specifically want to control a single part of the process, they can do it manually:

Config parsing
Since the report configuration file contains per-monitor information, but the graphing configuation is done on a per-graph basis, config_graphs is needed to convert the global configuration into specific commands for the grapher.
Graph generation
create_graphs takes the commands generated above and creates graphs as well as any associated text data used by the web page.
Upload
Once generated, graphs and text need to be transferred to the appropriate directories on a web server.

Web page

Reports are currently all viewed via a single CGI web page, generated by display_report.

Putting it all together

This is an example of how one would set up and configure a standard report generator setup. It is assumed that all the scripts are installed and in one's default path.

First, you need a link to monitor. For this example we'll monitor an interface that captures all traffic in and out of a university campus. There is a machine that listens on a link (which we'll call fxp0), and all this data will be classified as coming from the 'campus' monitor. To start collecting data, we'd use the command:

crl_flow -I -r -o %s.t2 if:fxp0

Now we make sure our configuration files are correctly set up. Make a copy of report.conf and subif_map.conf from the doc directory. You want a different subinterface map for each machine you collect data from, so we'll rename subif_map.conf to campus.conf, and the entries must be changed to match the interfaces used. We'll map the interface/subinterface 0[0] to our monitor name, campus. Anything listed as 'REQUIRED' in the example config files are necessary to proper operation, and the scripts may not run properly if they are not there. Some entries in report.conf that must be changed to ensure proper operation are the rrd_dir and graph_dir values. If you want to output larger versions of the timeseries graphs, you must change big_dir, and if you want to output text tables of stored data, you must change table_dir. Also, a valid input file for ASFinder is required in the routes entry if you want to do AS or country lookups.

Then, the files created by crl_flow will be transferred off the monitoring machine and onto a processing machine. (Of course, one could have the monitoring, processing, and web machines all be the same, but they are separated here for sake of clarity.) Those data are then stored into RRD files for later graphing and table generation. We do this with the following command:

spoolcat -d '*.t2' | ssh computeserver "store_monitor_data report.conf campus.conf"

Assuming the configuration files have been correctly set up, all the data will be copied to the processing machine, converted into the desired table types, stored in RRDs and deleted from the monitoring machine.

At this point, you should make sure the config file is set up to allow transfer from the processing machine to the web server. The appropriate block in your report.conf is named transfer, and requires that you specify the name of the server you wish to transfer graphs and tables to, the cgi_dir directory where display_report will run from, and the html_dir where non-CGI items (ie, image files) will be stored. The cp_cmd entry specifies what command will be used to transfer the files, along with any desired options, although currently only scp and rsync have been tested.

Once that's set up, you should be able to simply use:

create_report report.conf

In most cases, you'll want some sort of periodic report generation, such as from a cron job. Here's an example cron entry:

PATH=/bin:/usr/bin:/usr/local/bin:/usr/local/Coral/bin 0,15,30,45 * * * * create_report report.conf (It's important to set the PATH appropriately in order to have these scripts work properly.)

Once the data files are on the web server, you need to put display_report in the correct CGI directory. If CoralReef is installed on the web server, you can either copy display_report into the CGI directory or make a symbolic link to the installed version. Otherwise, you will have to manually copy files from a machine with a CoralReef installation:

scp /usr/local/Coral/bin/display_report www:/cgi-bin/example scp /usr/local/Coral/lib/CAIDA/Traffic2/ReportSupport.pm www:/cgi-bin/example/lib/CAIDA/Traffic2 (The directories have to exist prior to copying files into them.)

To view the reports, you would go to the appropriate URL for the specified CGI directory. This will likely differ depending on the web server's configuration. An example URL might look like:

http://www.example.com/cgi-bin/example/display_report

Proper viewing of reports will require configuration of the cgi.conf file in the cgi-bin directory, and each monitor's subdirectory will require configuring a monitor.conf file. If these are set up according to the comments in the files, you should be able to view reports about your monitor's traffic!

Transitioning from old report generator to new one

The most important data to transfer from the old report generator (t2_report++) are the RRD files used to generate timeseries graphs. The new report generator system relies on RRD files for all its data storage, although at some point that'll be replaced with a more general archival system.

Different versions of the report generator have used different directory structures and file formats, necessitating conversion when upgrading. Information on updating RRD directories can be found in a separate document. Certain features of t2_report++ have not yet been implemented in the new report generator. In particular, it does not show IP addresses for the most recent interval, as it currently only displays information stored in RRD files.

Related Objects

See https://catalog.caida.org/software/coralreef/ to explore related objects to this document in the CAIDA Resource Catalog.