CAIDA::Tables - General purpose table objects


NAME

CAIDA::Tables - General purpose table objects


SYNOPSIS

    use CAIDA::Traffic2::FlowCounter;
    use CAIDA::Tables::Tuple_Table  # Tuple_Table is only an example
    $table = new CAIDA::Tables::Tuple_Table;
    $counter = new CAIDA::Traffic2::FlowCounter($packets, $bytes, $flows,
                                                $first, $latest);
    $table->entry_add($src_ip, $dst_ip, $ip_protocol, $ports_ok,
                      $src_port, $dst_port, $counter);
    $counter = $table->entry_get($src_ip, $dst_ip, $ip_protocol,
                                    $ports_ok, $src_port, $dst_port);
    $data_hash_ref = $table->data();
    while (($opaque_key, $counter) = each %$data_hash_ref) {
        @fields = $table->get_key_fields($opaque_key);
        # Do stuff with @fields.
    }
    @top5 = $table->sort_by_counter_field('bytes', 5);
    foreach $opaque_key (@top5) {
        # Do stuff with $opaque_key
    }
    $size = $table->num_entries();
    #Many many aggregators...
    %agg_options = ('table_size' => 100);
    $ip_matrix = $table->aggregate_columns(0, 1); # Avoid using.
    $ip_matrix = $table->naggregate_columns(0, 1); # Avoid using.
    $ip_matrix = $table->make_IP_Matrix(\%agg_options);
    # IP_Matrix is only one example, see Conversion functions below.
    $save_file = new FileHandle("> save_file");
    $load_file = new FileHandle("< save_file");
    $table->save_text($save_file, $full_count);
    $table->load_text($load_file);
    $table->save_binary($save_file, $full_count);
    $table->load_binary($load_file);
    $table->add($other_table_of_same_type);
    $table->nadd($other_table_of_same_type);
    $table->clear();


DESCRIPTION

CAIDA::Tables is a set of containers for holding large amounts of data, all of which are associated with a counter. The default is to use a FlowCounter to count the numbers of bytes, packets, and flows associated with a set of data.

The API for these tables is in Perl, but the processing backends are in both Perl and C++. Other than adding a few extra options, the C++ backend should be transparent (except for the speed increase). When both are available, the C++ backend will be attempted first, and if it fails, then the Perl backend is used. (See also FORCE_C and FORCE_PERL.)

The Perl versions exist mainly to support systems with archaic C++ compilers; they may be removed in future releases.

Each table type has its own module name. For example, to use an IP_Matrix:

    use CAIDA::Tables::IP_Matrix;
All tables support a general set of member functions, and in addition have specific tranform functions to convert from one table into another. (See diagrams Table_layout, Tables_convert_from and Tables_convert_to for different ways of visualizing how to convert tables.)

The available tables (and associated keys) are:

Tuple_Table
(source IP address, destination IP address, IP protocol, ports ok, source port, destination port) Note: ports ok is a boolean regarding the validity of the source and destination ports.

IP_Table
(IP address)

IP_Matrix
(source IP address, destination IP address)

Proto_Ports_Table
(IP protocol, ports ok, source port, destination port) Note: ports ok is a boolean regarding the validity of the source and destination ports.

Proto_Port_Table
(IP protocol, ports ok, port) Note: ports ok is a boolean regarding the validity of the port.

IP_Proto_Ports_Table
(IP address, IP protocol, ports ok, source port, destination port) Note: ports ok is a boolean regarding the validity of the port.

IP_Proto_Port_Table
(IP address, IP protocol, ports ok, port) Note: ports ok is a boolean regarding the validity of the port.

Port_Table
(port)

Port_Matrix
(source port, destination port)

Proto_Table
(IP protocol)

AS_Table
(AS) Note: AS is a string.

AS_Matrix
(source AS, destination AS) Note: AS is a string.

Country_Table
(country)

Country_Matrix
(source country, destination country)

App_Table
(application)

AppInfo_Table
(description, name, group, contrib, date, notes, reference, url) Note: Unlike other Tables, a call to entry_get() will only match on the 'name' field.

VPVC_Table
(vp/vc pair)

Prefix_Table
(prefix/masklength)

Prefix_Matrix
(source prefix/masklength, destination prefix/masklength)

Length_Table
(length)

LatLon_Table
(latitude and longitude as a single comma-separated string, eg: '33.023,-117.276')

String_Table
(user string)

Global flags

DEBUG
When set, enables certain error messages that would otherwise be silent.

FORCE_C
When set, only allows the usage of the C++ backend.

FORCE_PERL
When set, only allows the usage of the Perl backend.

Common functions

These examples assume Tuple_Table as the object being used.

new (COUNTER, OPTIONS)
new (COUNTER)
new ()
Creates a new table object whose class name specifies its key type. For example, new CAIDA::Tables::Tuple_Table creates a table with tuple keys (as shown above), new CAIDA::Tables::IP_Matrix creates a table of IP address pairs, etc. COUNTER refers to any object that implements new() and add() member functions. COUNTER's add() method takes another object of the same type as an argument, adds that object to itself, and returns a reference to itself. If COUNTER is omitted or undef, the table defaults to using FlowCounter.

NOTE: The C++ backend currently only supports using a FlowCounter object; the use of a non-FlowCounter for COUNTER will force the use of the Perl backend.

OPTIONS is a reference to a hash containing configuration options. The list of options are:

force_perl
Boolean. Rarely used by the user, this option forces a new table to use the Perl backend instead of the C++ backend. See also FORCE_PERL.

table_size
Used only for the C++ backend, this specifies the size of the underlying hash table to help optimize memory usage.

entry_add (LIST, COUNTER)
Adds an entry into the table. LIST contains all the fields that make up the table's key, as listed above. COUNTER is the counter object specified by new(). If there is an existing entry, COUNTER is added to the existing counter object. Returns a reference to entry's counter.

entry_get (LIST)
Returns the counter data from the table. LIST contains all the fields that make up the table's key, as listed above.

data ()
Returns a hash reference that can be used to directly read the data in the table, using get_key_fields().

get_key_fields (KEY)
Returns the individual fields of a particular opaque KEY (such as that returned by each or keys on the hash referenced by data()'s return value).

add (TABLE)
nadd (TABLE)
Performs a merge of another table into the current one. Returns a reference to the original table. TABLE must be of the same type as the original. nadd() is the same as add(), but it is free to do destructive operations on TABLE. TABLE should not be used again for anything after calling nadd().

clear ()
Removes all entries from the table.

sort_by_counter_fields (FIELD, NUMBER, ASCEND)
sort_by_counter_fields (FIELD, NUMBER)
Returns a list of opaque keys, sorted by a specific counter field. FIELD specifies the field by which the keys are sorted. FIELD must be the same name as a method of the counter; for the default FlowCounter, acceptable field names are pkts, bytes, and flows.

NOTE: The C++ backend currently only supports using FlowCounter; thus only pkts, bytes, and flows fields are supported in the C++ backend.

NUMBER specifies how many of the top keys should be listed. If NUMBER is set to -1, then sort_by_keys() returns all keys. ASCEND is a boolean which determines whether the results are sorted in ascending order. If ASCEND is omitted or set to false, then the results are in descending order.

sort_by_keys (NUMBER, ASCEND)
sort_by_keys (NUMBER)
Returns a list of opaque keys, sorted by the key fields. The sort order is such that the first field is sorted first, the second is sorted second, etc. NUMBER specifies how many of the top keys should be listed. If NUMBER is set to -1, then sort_by_keys() returns all keys. ASCEND is a boolean which determines whether the results are sorted in ascending order. If ASCEND is omitted or set to false, then the results are in descending order.

num_entries ()
Returns the number of entries in the table.

load_text (FILE)
load_binary (FILE)
Loads data into the table from FILE. FILE can either be an open filehandle or a file name. The data in the table previous to the load is not overwritten. Instead, the file data is added as if via a set of entry_add() calls. The file format is that which is used by the save functions, although other applications (such as crl_flow) can output in that format as well.

The return value is the string which terminates the table, which applications might use to store useful information. An example of this might be:

    # end of Tuple Table, interface 0, vp:vc 2:134

In the case of a loading error, these functions return undef.

The file format is described here.

WARNING: These functions assume that the table uses a FlowCounter object. Use of these functions with a different counter object is a very bad idea.

save_text (FILE, SAVE_ALL)
save_text (FILE)
save_binary (FILE, SAVE_ALL)
save_binary (FILE)
Saves table data to FILE. FILE can either be an open filehandle or a file name. These functions are primarily meant to provide persistent storage of the table and not for reading by humans. However, the text format is provided for those who wish to do quick debugging or scanning of data. SAVE_ALL is a boolean that relates to the FlowCounter object, and when true, saves the first and latest fields. This is, sadly, a hack, which will be removed in the future. When SAVE_ALL is omitted, it defaults to true.

The return value is a boolean indicating whether the function successfully saved the table.

WARNING: These functions assume that the table uses a FlowCounter object. Use of these functions with a different counter object is a very bad idea.

Conversion functions

Several tables have functions to create a different table from their own data. For example, one can turn an IP_Matrix (with a source and destination IP address) into an IP_Table by combining any entries with the same source IP address.

All of the following accept a reference to a hash containing optional arguments to the function. For most tables (except make_AS_Matrix, make_AS_Table, make_Country_Matrix and make_Country_Table), the OPTIONS are...optional.

The only option that applies to all tables is 'table_size', which is used with the C++ backend to specify the size of the internal hash table.

make_IP_Matrix (OPTIONS)
make_IP_Matrix ()
Member function of Tuple_Table; makes an IP_Matrix.

make_src_IP_Table (OPTIONS)
make_src_IP_Table ()
Member function of Tuple_Table and IP_Matrix; makes an IP_Table from the source (first) IP address.

make_dst_IP_Table (OPTIONS)
make_dst_IP_Table ()
Member function of Tuple_Table and IP_Matrix; makes an IP_Table from the destination (second) IP address.

make_IP_Table (OPTIONS)
make_IP_Table ()
Member function of IP_Proto_Ports_Table and IP_Proto_Port_Table; makes an IP_Table from the IP address.

make_Proto_Ports_Table (OPTIONS)
make_Proto_Ports_Table ()
Member function of Tuple_Table and IP_Proto_Ports_Table; makes a Proto_Ports_Table.

make_Port_Matrix (OPTIONS)
make_Port_Matrix ()
Member function of Tuple_Table, IP_Proto_Ports_Table and Proto_Ports_Table; makes a Port_Matrix. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Port_Matrix will only include ports used by that protocol.

make_src_Proto_Port_Table (OPTIONS)
make_src_Proto_Port_Table ()
Member function of Tuple_Table, IP_Proto_Ports_Table and Proto_Ports_Table; makes a Proto_Ports_Table from the protocol, ports_ok and source (first) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Proto_Port_Table will only include ports used by that protocol.

make_dst_Proto_Port_Table (OPTIONS)
make_dst_Proto_Port_Table ()
Member function of Tuple_Table, IP_Proto_Ports_Table and Proto_Ports_Table; makes a Proto_Ports_Table from the protocol, ports_ok and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Proto_Port_Table will only include ports used by that protocol.

make_Proto_Port_Table (OPTIONS)
make_Proto_Port_Table ()
Member function of IP_Proto_Port_Table; makes a Proto_Port_Table from the protocol, ports_ok and port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Proto_Port_Table will only include ports used by that protocol.

make_src_Port_Table (OPTIONS)
make_src_Port_Table ()
Member function of Tuple_Table, Port_Matrix, IP_Proto_Ports_Table and Proto_Ports_Table; makes a Port_Table from the source (first) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Port_Table will only include ports used by that protocol.

make_dst_Port_Table (OPTIONS)
make_dst_Port_Table ()
Member function of Tuple_Table, Port_Matrix, IP_Proto_Ports_Table and Proto_Ports_Table; makes a Port_Table from the destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Port_Table will only include ports used by that protocol.

make_Port_Table (OPTIONS)
make_Port_Table ()
Member function of Proto_Port_Table and IP_Proto_Port_Table; makes a Port_Table from the destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting Port_Table will only include ports used by that protocol.

make_Proto_Table (OPTIONS)
make_Proto_Table ()
Member function of Tuple_Table, Proto_Ports_Table, Proto_Port_Table IP_Proto_Ports_Table and IP_Proto_Port_Table; makes a Proto_Table.

make_src_IP_Proto_Ports_Table (OPTIONS)
make_src_IP_Proto_Ports_Table ()
Member function of Tuple_Table; makes an IP_Proto_Ports_Table from the source (first) IP address, protocol, ports_ok, source (first) port, and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Ports_Table will only include ports used by that protocol.

make_dst_IP_Proto_Ports_Table (OPTIONS)
make_dst_IP_Proto_Ports_Table ()
Member function of Tuple_Table; makes an IP_Proto_Ports_Table from the destination (second) IP address, protocol, ports_ok, source (first) port, and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Ports_Table will only include ports used by that protocol.

make_src_IP_Proto_src_Port_Table (OPTIONS)
make_src_IP_Proto_src_Port_Table ()
Member function of Tuple_Table; makes an IP_Proto_Port_Table from the source (first) IP address, protocol, ports_ok and source (first) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_src_IP_Proto_dst_Port_Table (OPTIONS)
make_src_IP_Proto_dst_Port_Table ()
Member function of Tuple_Table; makes an IP_Proto_Port_Table from the source (first) IP address, protocol, ports_ok and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_dst_IP_Proto_src_Port_Table (OPTIONS)
make_dst_IP_Proto_src_Port_Table ()
Member function of Tuple_Table; makes an IP_Proto_Port_Table from the destination (second) IP address, protocol, ports_ok and source (first) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_dst_IP_Proto_dst_Port_Table (OPTIONS)
make_dst_IP_Proto_dst_Port_Table ()
Member function of Tuple_Table; makes an IP_Proto_Port_Table from the destination (second) IP address, protocol, ports_ok and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_IP_Proto_src_Port_Table (OPTIONS)
make_IP_Proto_src_Port_Table ()
Member function of IP_Proto_Ports_Table; makes an IP_Proto_Port_Table from the IP address, protocol, ports_ok and source (first) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_IP_Proto_dst_Port_Table (OPTIONS)
make_IP_Proto_dst_Port_Table ()
Member function of IP_Proto_Ports_Table; makes an IP_Proto_Port_Table from the IP address, protocol, ports_ok and destination (second) port. OPTIONS may include an entry whose key is 'protocol' and whose value is a protocol number. If the protocol is set, then the resulting IP_Proto_Port_Table will only include ports used by that protocol.

make_AS_Matrix (OPTIONS)
Member function of Tuple_Table and IP_Matrix; makes an AS_Matrix. OPTIONS must include an entry whose key is 'as_finder' and whose value is an ASFinder object.

make_src_AS_Table (OPTIONS)
make_src_AS_Table ()
Member function of Tuple_Table, IP_Matrix and AS_Matrix; makes an AS_Table. For Tuple_Table and IP_Matrix, OPTIONS must include an entry whose key is 'as_finder' and whose value is an ASFinder object.

make_dst_AS_Table (OPTIONS)
make_dst_AS_Table ()
Member function of Tuple_Table, IP_Matrix and AS_Matrix; makes an AS_Table. For Tuple_Table and IP_Matrix, OPTIONS must include an entry whose key is 'as_finder' and whose value is an ASFinder object.

make_AS_Table (OPTIONS)
Member function of IP_Table; makes an AS_Table. OPTIONS must include an entry whose key is 'as_finder' and whose value is an ASFinder object.

make_Country_Matrix (OPTIONS)
Member function of Tuple_Table, IP_Matrix and AS_Matrix; makes a Country_Matrix. OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then Tuple_Table and IP_Matrix require that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. Because NetGeo returns 2-letter country codes and netacq returns 3-letter country codes, OPTIONS may include an entry whose key is 'countries' and whose value is a Countries object, which is used to convert netacq's responses into 2-letter country codes.

make_src_Country_Table (OPTIONS)
make_src_Country_Table ()
Member function of Tuple_Table, IP_Matrix, AS_Matrix and Country_Matrix; makes a Country_Table from the source IP/AS/Country. For Tuple_Table, IP_Matrix, and AS_Matrix, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then Tuple_Table and IP_Matrix require that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. Because NetGeo returns 2-letter country codes and netacq returns 3-letter country codes, OPTIONS may include an entry whose key is 'countries' and whose value is a Countries object, which is used to convert netacq's responses into 2-letter country codes.

make_dst_Country_Table (OPTIONS)
make_dst_Country_Table ()
Member function of Tuple_Table, IP_Matrix, AS_Matrix and Country_Matrix; makes a Country_Table from the destination IP/AS/Country. For Tuple_Table, IP_Matrix, and AS_Matrix, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then Tuple_Table and IP_Matrix require that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. Because NetGeo returns 2-letter country codes and netacq returns 3-letter country codes, OPTIONS may include an entry whose key is 'countries' and whose value is a Countries object, which is used to convert netacq's responses into 2-letter country codes.

make_Country_Table (OPTIONS)
Member function of IP_Table and AS_Table; makes a Country_Table. For IP_Table, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then IP_Table requires that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. Because NetGeo returns 2-letter country codes and netacq returns 3-letter country codes, OPTIONS may include an entry whose key is 'countries' and whose value is a Countries object, which is used to convert netacq's responses into 2-letter country codes.

make_src_LatLon_Table (OPTIONS)
Member function of Tuple_Table, IP_Matrix, AS_Matrix and Country_Matrix; makes a LatLon_Table from the source IP/AS/Country. For Tuple_Table, IP_Matrix, and AS_Matrix, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then Tuple_Table and IP_Matrix require that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. For Country_Matrix, OPTIONS must include an entry whose key is 'countries' and whose value is a Countries object.

make_dst_LatLon_Table (OPTIONS)
Member function of Tuple_Table, IP_Matrix, AS_Matrix and Country_Matrix; makes a LatLon_Table from the destination IP/AS/Country. For Tuple_Table, IP_Matrix, and AS_Matrix, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then Tuple_Table and IP_Matrix require that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. For Country_Matrix, OPTIONS must include an entry whose key is 'countries' and whose value is a Countries object.

make_LatLon_Table (OPTIONS)
Member function of IP_Table, AS_Table, and Country_Table; makes a LatLon_Table. For IP_Table and AS_Table, OPTIONS must either include an entry whose key is 'netgeo' and whose value is a NetGeo or NetGeoClient object, or an entry whose key is 'netacq' and whose value is the path to a netacq executable. In addition, if netacq is not used, then IP_Table requires that OPTIONS also include an entry whose key is 'as_finder' and whose value is an ASFinder object. For Country_Table, OPTIONS must include an entry whose key is 'countries' and whose value is a Countries object.

make_App_Table (OPTIONS)
Member function of Tuple_Table and Proto_Ports_Table; makes an App_Table. OPTIONS must include an entry whose key is 'app_ports' and whose value is an AppPorts object loaded with application mapping information. In addition, OPTIONS may include an entry whose key is 'name_func' and whose value is a reference to a Perl subroutine that will name any applications not found via AppPorts. This function takes (in order) the protocol, ports ok, source port and destination port. If this function is not given, the default naming for unknown applications is UNKNOWN_$proto_($src_port,$dst_port). Also note that because AppPorts can contain IP-based rules, using a Tuple_Table as the source table can lead to greater accuracy in the mapping.


ERRORS


EXAMPLES


ENVIRONMENT

LD_LIBRARY_PATH
If using the C++ backend, it might be necessary to set your LD_LIBRARY_PATH to find the proper libraries, such as libstdc++.


SEE ALSO

The FlowCounter manpage.


NOTES

There are three additional tables (only in Perl) which are meant to be used only for the creation of new tables. They are called Generic::Split, Generic::SingleKey, and Generic::Pack (within the CAIDA::Tables namespace). They are not meant for the general user, but anyone wishing to create a new table can use them as a starting point.


WARNINGS


DIAGNOSTICS


BUGS


RESTRICTIONS

Because the text file format used by load_text and save_text is tab-delimited, tab characters cannot be used in any keys if the text format is used. However, they are allowable if only load_binary and save_binary are used.


AUTHORS

Ryan Koga <rkoga@caida.org>, David Moore <dmoore@caida.org>

Related Objects

See https://catalog.caida.org/software/coralreef/ to explore related objects to this document in the CAIDA Resource Catalog.