scamper architecture
General Requirements
-
Probing of the network should be as unintrusive as possible. This condition
means minimizing the number of packets sent at both prefix and individual IP
address level within a given time period.
- scamper should be able to interleave and concurrently probe different
lists of destinations from the same monitor.
- The destination lists can overlap, but at any moment of time there
should be no more than one instance of a given IP address in the currently
probed set of IPs.
- Scamper should probe lists in cycles, where a cycle means a complete
run through all the IP addresses in the list. Under normal operating conditions,
a cycle will be the smallest logical unit of a list.
scamper Structure
scamper is a distributed active probing system. When fully deployed, it will
include a number of remote monitors located all over the world and a central
host that controls them. The three main parts of scamper architecture are:
Data Collection,
Data Storage, and
Control Repository.
-
Data Collection on each individual
monitor contains three processes
List Generator,
List Manager, and
Prober. This part of the
scamper architecture handles
selecting of IP addresses to probe, start/end of lists and cycles,
probing, and recording of the collected data at the monitor host.
-
List Generator
supplies a stream of IP addresses to probe and commands to start/end a probing
cycle.
It is possible to have a few List Generators in operation. Each
one has its own executable code and, possibly, auxiliary state files stored in
its respective directory. The simplest Generator would read IP
addresses sequentially from a static file, check and filter out prefixes which
are not to be probed, and pass the addresses read to the next process, List
Manager, for further handling. However, a Generator can be as complex and dynamic
as its user needs.
A List Generator generates the following commands:
-
Begin List Cycle
- marks the beginning of the current cycle for a given list and sets
up probing options to be used in this list cycle. (Possible options are: collect
intermediate hop RTTs, set number of tries for a non-responsive hop, use source
routing.)
- contains ListID, CycleID, list of options, and human readable
text field.
-
End List Cycle
- marks the end of a list and causes scamper to free up the
corresponding resources after finishing the current cycle.
- contains ListID.
-
IP
- submits a destination IP address to probe.
- contains IP, ListID, and CycleID.
-
List Manager manages the currently running
List Generators and interleaves their command streams.
There is only one List Manager. It reads commands from active Generators,
orders them according to a pre-determined weight of each Generator, and
feeds the merged stream of commands to the next process, Prober.
The List Manager maintains a schedule that stores the list
of files for each Generator and directives what to do if/when these files
change. For example, a file change can result in a command to the affected
Generator to update its state, restart a new cycle, or restart the Generator
itself as a process.
The List Manager periodically checks with the central Control Repository
(below) to update its schedule and to synchronize state files and code of each
List Generator with the master copy stored in the Repository.
-
Prober sends probe packets and
processes and records replies.
It reads and interprets commands provided by the List Manager to probe
lists found in its command stream and to start/end lists and cycles. The
Prober stores all collected data in a single file on the monitor host disk.
This output file is changed at the request of
Data Server (below).
Prober consists of four
logical components:
-
Command Queue
- commands passed by the List Manager to the Prober that are yet
to be processed.
-
Active Window
- IP Probe States for the IP addresses (mixed from all destination lists)
that are currently probed. An IP Probe State contains a destination IP
address, ListID, CycleID, the state of probing, and partially
collected data.
-
Hold List
- list of IP Commands which are temporarily on hold because their
IP address is currently either in the Active Window (that is, being probed) or in
the Hold List (that is, already waiting to be probed).
-
List-Cycle Table
- list of all currently active {ListID, CycleID} pairs.
-
Data Storage involves three
processes: Data Server on each monitor box
and Data Client and
Data Sorter on the central machine.
Together they handle the transfer of data from the monitor to the archive
and sorting of individual traces to their final storage locations.
-
Data Server transmits data accumulated over a
certain period of time from a monitor to the central host.
It receives a request for data from Data Client and sends an interrupt to the
Prober requiring it to release its current data file. Upon receiving the
interrupt, the Prober finishes the record it is writing, releases the lock on
the old file, and starts writing data to the new file. The Data Server then
locks the old file and begins data transfer to the Data Client. Once all the
data in the file have been transferred the file is deleted.
-
Data Client initiates and handles the transfer
of accumulated data from the monitor to the central host.
Initially it stores the trace data into a temporary file.
Upon receiving the last trace from the Data Server it closes this file and
starts the Data Sorter.
-
Data Sorter sorts the individual traces into
final files.
It separates traces by monitor, ListID, and CycleID. If a probing
cycle runs for more than 24 hours, its data will be subdivided into multiple
files on 24 hours boundaries since the beginning of the cycle.
-
Control Repository has
a single process, Control Server,
that resides on the central host. It contains current
List Managers' schedules for each scamper monitor and
master copies of List Generators' state files and executable code.
Flow of Control on a Probing Monitor
-
Each of the List Generators generates a stream of commands and passes
them to the List Manager.
-
List Manager reads commands from active List Generators in
rounds. The number of commands read from each List Generator during a
given round is equal to its weight in the current List Manager's
schedule. List Manager merges these commands into a single stream and
passes it on to the Prober. It can also issue an Exit command to
terminate the Prober process.
-
Prober reads commands from the List Manager, stores them into its
Command Queue, and processes them in the order they were received when there is
a free space in its Active Window.
-
List Cycle commands update the List-Cycle Table. Upon processing a Begin
List Cycle command, scamper activates the corresponding {ListID, CycleID}
pair by entering it into the Table. When an End List Cycle command
has been processed, scamper waits until all IP Probe States with
this combination of {ListID, CycleID} clear from the Active Window
and then deactivates this pair by removing it from the Table.
-
IP commands update either the Active Window or the Hold
List. First, scamper searches the list of IP Probe States in the Active
Window for an IP address matching the one in the command. If there is no match,
then it creates a new IP Probe States in the Active Window. If a match
is found (meaning that this IP address is currently being probed) then scamper
adds this IP command to the Hold List with a Hold Timestamp which indicates
the earliest time at which this IP command can be reprocessed.
The Hold Timestamp is the largest of either the current time, or the Hold
Timestamp of the largest Hold Timestamp in the Hold List for the same IP, plus
a certain Hold Period. The Hold Period (TBD) should be much longer then the
time it takes to probe the current IP plus a rest period.
IP commands in the Hold List are sorted by their Hold Timestamp, the
smallest value first*.
* - we assume that the Hold List will be small
and will use a sort method with the lowest possible overhead. For example,
insertion sort could be a good choice.
-
Exit command makes Prober to finish probing all currently
active IP addresses (those with IP Probe States in the Active Window),
process all IP commands in the Hold List, and then terminate**.
It will not process any remaining commands in the Command Queue and will
accept no more commands from the List Manager.
** - we would save currently active lists to
enable continuation of measurements from the break point.
-
An IP address is active if it has an IP Probe State in
the Active Window. The Prober accepts network packets only for
active IPs and disregards all other packets. It handles the incoming packets
based on (i) the options and state stored in the IP Probe State and
(ii) the options set for its corresponding {ListID, CycleID} pair.
An IP Probe State stays in the Active Window until it receives all
expected responses OR times out***, and then it is removed.
Depending on options set, the Prober writes collected data to a file
as either an IP Path Object, or an MTU Object, or both
(described below).
*** - perhaps, we should wait for a certain amount
of time after all expected responses came, in case there are some unexpected
responses.
Removal of the IP Probe State from the Active Window deactivates this
IP address. The Prober ignores any response arriving after that for this
address.
-
To replenish IP addresses in the Active Window, the Prober will first
check the Hold Timestamp of the first IP command in the Hold List to see if it
has expired. If yes, then it will process this IP command. If no, or if the
Hold List is empty, then it will process the next command in the Command
Queue. If the Command Queue is empty and the Hold List is not, the
Prober will sleep until the Hold Timestamp expires. If they both are
empty and there are no further commands from the List Manager, the
Prober will sleep for a short period of time before rechecking the
Command Queue. It will continue in a "sleep-n-check" mode until it is killed
or the Command Queue is refilled.
Data Organization and Storage
-
Prober can generate the following Data Objects:
-
Start Cycle object
- marks the beginning of a cycle for a given destination list and records
the set of probing options to be used in this cycle.
-
End Cycle object
- marks the end of a cycle for a given destination list and is used by
the Data Sorter to close off the file handler for this cycle.
-
IP Path object
- stores the results of IP path discovery performed by the Prober.
-
MTU Path object
- stores the results of MTU discovery performed by the Prober.
-
Prober writes all data objects sequentially into a single file.
At a request from Data Client running on the central machine,
Data Server on the monitor host directs the Prober's output to
a new file, transfers the old file and deletes it. Data Client
stores the trace data into a temporary file on the central host. Upon
receiving the last trace from the Data Server, it closes this file
and starts the Data Sorter. The Data Sorter sorts the
individual traces into final files and finally removes the temporary file.
-
In the final data archive, each data file stores one cycle of data for one
destination list from one monitor if the cycle duration is less than 24 hours.
Longer cycles are split into multiple files on 24 hour boundaries from the
beginning of a cycle.
Data directories have a format of
<list>/<monitor>/<year>/<mon>/<day>
Files are named as:
<list>.<monitor>.<year><mon><day>.<CycleID><_nn>.warts
The values of <year>, <mon>, and <day>
are determined from the CycleID which is the timestamp of the start of
the cycle. The <_nn> starts from <_00> and numbers
24 hour intervals since the beginning of the cycle.
|
|