scamper architecture

General Requirements

Probing of the network should be as unintrusive as possible. This condition means minimizing the number of packets sent at both prefix and individual IP address level within a given time period.
scamper should be able to interleave and concurrently probe different lists of destinations from the same monitor.
The destination lists can overlap, but at any moment of time there should be no more than one instance of a given IP address in the currently probed set of IPs.
Scamper should probe lists in cycles, where a cycle means a complete run through all the IP addresses in the list. Under normal operating conditions, a cycle will be the smallest logical unit of a list.

scamper Structure

scamper is a distributed active probing system. When fully deployed, it will include a number of remote monitors located all over the world and a central host that controls them. The three main parts of scamper architecture are: Data Collection, Data Storage, and Control Repository.

Data Collection on each individual monitor contains three processes List Generator, List Manager, and Prober. This part of the scamper architecture handles selecting of IP addresses to probe, start/end of lists and cycles, probing, and recording of the collected data at the monitor host.
1. List Generator supplies a stream of IP addresses to probe and commands to start/end a probing cycle.
  It is possible to have a few List Generators in operation. Each one has its own executable code and, possibly, auxiliary state files stored in its respective directory. The simplest Generator would read IP addresses sequentially from a static file, check and filter out prefixes which are not to be probed, and pass the addresses read to the next process, List Manager, for further handling. However, a Generator can be as complex and dynamic as its user needs.
  A List Generator generates the following commands:
  - Begin List Cycle
    - marks the beginning of the current cycle for a given list and sets up probing options to be used in this list cycle. (Possible options are: collect intermediate hop RTTs, set number of tries for a non-responsive hop, use source routing.)
    - contains ListID, CycleID, list of options, and human readable text field.
  - End List Cycle
    - marks the end of a list and causes scamper to free up the corresponding resources after finishing the current cycle.
    - contains ListID.
  - IP
    - submits a destination IP address to probe.
    - contains IP, ListID, and CycleID.
2. List Manager manages the currently running List Generators and interleaves their command streams.
  There is only one List Manager. It reads commands from active Generators, orders them according to a pre-determined weight of each Generator, and feeds the merged stream of commands to the next process, Prober.
  The List Manager maintains a schedule that stores the list of files for each Generator and directives what to do if/when these files change. For example, a file change can result in a command to the affected Generator to update its state, restart a new cycle, or restart the Generator itself as a process.
  The List Manager periodically checks with the central Control Repository (below) to update its schedule and to synchronize state files and code of each List Generator with the master copy stored in the Repository.
3. Prober sends probe packets and processes and records replies.
  It reads and interprets commands provided by the List Manager to probe lists found in its command stream and to start/end lists and cycles. The Prober stores all collected data in a single file on the monitor host disk. This output file is changed at the request of Data Server (below).
  Prober consists of four logical components:
  1. Command Queue
    - commands passed by the List Manager to the Prober that are yet to be processed.
  2. Active Window
    - IP Probe States for the IP addresses (mixed from all destination lists) that are currently probed. An IP Probe State contains a destination IP address, ListID, CycleID, the state of probing, and partially collected data.
  3. Hold List
    - list of IP Commands which are temporarily on hold because their IP address is currently either in the Active Window (that is, being probed) or in the Hold List (that is, already waiting to be probed).
  4. List-Cycle Table
    - list of all currently active {ListID, CycleID} pairs.
Data Storage involves three processes: Data Server on each monitor box and Data Client and Data Sorter on the central machine. Together they handle the transfer of data from the monitor to the archive and sorting of individual traces to their final storage locations.
1. Data Server transmits data accumulated over a certain period of time from a monitor to the central host.
  It receives a request for data from Data Client and sends an interrupt to the Prober requiring it to release its current data file. Upon receiving the interrupt, the Prober finishes the record it is writing, releases the lock on the old file, and starts writing data to the new file. The Data Server then locks the old file and begins data transfer to the Data Client. Once all the data in the file have been transferred the file is deleted.
2. Data Client initiates and handles the transfer of accumulated data from the monitor to the central host.
  Initially it stores the trace data into a temporary file. Upon receiving the last trace from the Data Server it closes this file and starts the Data Sorter.
3. Data Sorter sorts the individual traces into final files.
  It separates traces by monitor, ListID, and CycleID. If a probing cycle runs for more than 24 hours, its data will be subdivided into multiple files on 24 hours boundaries since the beginning of the cycle.
Control Repository has a single process, Control Server, that resides on the central host. It contains current List Managers' schedules for each scamper monitor and master copies of List Generators' state files and executable code.

Flow of Control on a Probing Monitor

Each of the List Generators generates a stream of commands and passes them to the List Manager.
List Manager reads commands from active List Generators in rounds. The number of commands read from each List Generator during a given round is equal to its weight in the current List Manager's schedule. List Manager merges these commands into a single stream and passes it on to the Prober. It can also issue an Exit command to terminate the Prober process.
Prober reads commands from the List Manager, stores them into its Command Queue, and processes them in the order they were received when there is a free space in its Active Window.
- List Cycle commands update the List-Cycle Table. Upon processing a Begin List Cycle command, scamper activates the corresponding {ListID, CycleID} pair by entering it into the Table. When an End List Cycle command has been processed, scamper waits until all IP Probe States with this combination of {ListID, CycleID} clear from the Active Window and then deactivates this pair by removing it from the Table.
- IP commands update either the Active Window or the Hold List. First, scamper searches the list of IP Probe States in the Active Window for an IP address matching the one in the command. If there is no match, then it creates a new IP Probe States in the Active Window. If a match is found (meaning that this IP address is currently being probed) then scamper adds this IP command to the Hold List with a Hold Timestamp which indicates the earliest time at which this IP command can be reprocessed.
  The Hold Timestamp is the largest of either the current time, or the Hold Timestamp of the largest Hold Timestamp in the Hold List for the same IP, plus a certain Hold Period. The Hold Period (TBD) should be much longer then the time it takes to probe the current IP plus a rest period.
  IP commands in the Hold List are sorted by their Hold Timestamp, the smallest value first^*.
  * - we assume that the Hold List will be small and will use a sort method with the lowest possible overhead. For example, insertion sort could be a good choice.
- Exit command makes Prober to finish probing all currently active IP addresses (those with IP Probe States in the Active Window), process all IP commands in the Hold List, and then terminate^**. It will not process any remaining commands in the Command Queue and will accept no more commands from the List Manager.
  ** - we would save currently active lists to enable continuation of measurements from the break point.
An IP address is active if it has an IP Probe State in the Active Window. The Prober accepts network packets only for active IPs and disregards all other packets. It handles the incoming packets based on (i) the options and state stored in the IP Probe State and (ii) the options set for its corresponding {ListID, CycleID} pair.
An IP Probe State stays in the Active Window until it receives all expected responses OR times out^***, and then it is removed. Depending on options set, the Prober writes collected data to a file as either an IP Path Object, or an MTU Object, or both (described below).
*** - perhaps, we should wait for a certain amount of time after all expected responses came, in case there are some unexpected responses.
Removal of the IP Probe State from the Active Window deactivates this IP address. The Prober ignores any response arriving after that for this address.
To replenish IP addresses in the Active Window, the Prober will first check the Hold Timestamp of the first IP command in the Hold List to see if it has expired. If yes, then it will process this IP command. If no, or if the Hold List is empty, then it will process the next command in the Command Queue. If the Command Queue is empty and the Hold List is not, the Prober will sleep until the Hold Timestamp expires. If they both are empty and there are no further commands from the List Manager, the Prober will sleep for a short period of time before rechecking the Command Queue. It will continue in a "sleep-n-check" mode until it is killed or the Command Queue is refilled.

Data Organization and Storage

Prober can generate the following Data Objects:
- Start Cycle object
  - marks the beginning of a cycle for a given destination list and records the set of probing options to be used in this cycle.
- End Cycle object
  - marks the end of a cycle for a given destination list and is used by the Data Sorter to close off the file handler for this cycle.
- IP Path object
  - stores the results of IP path discovery performed by the Prober.
- MTU Path object
  - stores the results of MTU discovery performed by the Prober.
Prober writes all data objects sequentially into a single file. At a request from Data Client running on the central machine, Data Server on the monitor host directs the Prober's output to a new file, transfers the old file and deletes it. Data Client stores the trace data into a temporary file on the central host. Upon receiving the last trace from the Data Server, it closes this file and starts the Data Sorter. The Data Sorter sorts the individual traces into final files and finally removes the temporary file.
In the final data archive, each data file stores one cycle of data for one destination list from one monitor if the cycle duration is less than 24 hours. Longer cycles are split into multiple files on 24 hour boundaries from the beginning of a cycle.
Data directories have a format of <list>/<monitor>/<year>/<mon>/<day>
Files are named as: <list>.<monitor>.<year><mon><day>.<CycleID><_nn>.warts
The values of <year>, <mon>, and <day> are determined from the CycleID which is the timestamp of the start of the cycle. The <_nn> starts from <_00> and numbers 24 hour intervals since the beginning of the cycle.

Related Objects

See https://catalog.caida.org/software/scamper/ to explore related objects to this document in the CAIDA Resource Catalog.