Introduction
Many factors, including load balancing, topology changes, and asymmetric paths, result in a complicated graph structure for network path data gathered over time. Both divergence and reconvergence of paths occur frequently making the visualization of paths a challenging task.
PlotPaths depicts, as clearly as possible, paths collected from a single source host to one or more destinations. It shows the connections between intermediate nodes along each path, while preserving higher order groupings, e.g. Autonomous Systems.Overview
Previous CAIDA tools, including Mapnet and GeoPlot, use latitude and longitude to create a geographic layout of network topology. However this approach is not applicable to data sets in which many nodes have similar physical locations, and therefore overlap. With PlotPaths we lay out a topology graph into columns, based on organizational entities, (e.g., ASes or countries) and rows, based on the topological distance (hop count) of each node from the source.
The PlotPaths program performs the following operations:
- Read input and parameters from configuration files
- Calculate topological distance (hop depth) from the source to each node
- Apply node placement algorithm
- Print output
Phase 1. PlotPaths reads input from two related data files. The paths file specifies the characteristics of all paths in the data set, including source, destination, and intermediate nodes for each. The nodes file allows one to specify the name of each node as well as its column number and name. For example, you might choose Autonomous System as your column variable, and use the AS number as the column number and the AS name as the column name. PlotPaths also allows you to leave the column of the node undefined. A group of path input files may share a single node data file.
Phase 2. Topological distance (node depth) calculation involves the construction of a directed graph that will provide a y-axis coordinate for each node. Placement begins at the source, with nodes along the path increasing in depth. When multiple paths leading to a node yield different node depths, PlotPaths uses the largest, ultimately stretching links along short paths rather than crowding nodes along long paths. The final depth of each node is derived from all paths loaded into the graph. Phase 3. PlotPaths spatially organizes nodes into rows within their preassigned columns. Nodes at the same depth within a column are sorted and spaced to maximize visibility. We describe the layout algorithm in detail below. Phase 4. Generate formatted output for viewing tools. Otter is the only currently supported viewing tool.Algorithm in Detail (phases 2 and 3)
Major steps in image production:
- calculate node depth
- create rows and columns
- disperse extra column (optional)
- sort columns
- space columns
- prevent vertical link overlap
- assign X and Y coordinates to nodes
- arrange nodes horizontally
Calculate Node Depth:
We use the input path files to create a single directed graph. We add each path individually to the graph, using existing nodes (nodes common to other paths) when required. Because the graph can include bidirectional data, we add the nodes of reverse paths to the graph in reverse order. Once all paths have been added, we calculate the maximum depth of each node. Create Rows And Columns: Each individual node is assigned a row position based upon its maximum depth in the graph once all paths have been added. The column position for each node is specified in the node input file. These position coordinates provide a framework for link overlap and link distance minimization algorithms. In particular, the spacing of node clusters at a single node depth within a column relies on preassigned row and column values. We apply our plot simplification algorithms to reposition node clusters in order to produce a plot that minimizes overlap and maximizes legibility. Disperse Extra Columns: As mentioned above, PlotPaths requires that you specify the column value for each node in the node input file. However, many real-world datasets can be ambiguous. Thus PlotPaths allows you to specify in the input file that the column value for a node is unknown. PlotPaths can handle nodes with an unknown column value in two ways. PlotPaths can produce a separate column for all nodes lacking a column attribute. Alternatively, PlotPaths can try to place the nodes in the unknown column into the specified column with which it shares the greatest number of links. In this scenario, PlotPaths is able to attempt placement of all unknown nodes except source and destination nodes. The column dispersion algorithm traverses from the bottom of the unknown column to the top. PlotPaths examines the parents and children of each node and assigns the unknown node to the column that contains the greatest number of its parents and children. In the event that two or more columns have identical maximum numbers of parents and children, PlotPaths chooses randomly among the tied columns. If an unknown node links only to other unknown nodes, it cannot be reassigned.



Examples
Actual Otter Examples: (requires java)
Java Documentation
You can find javadoc documentation here. This comprises a detailed description of the java implementation.
Source Code
Source code with sample scripts and input files. plotpaths.tar.gz.
Authors
Project Design:
- John Gallagher
- Bradley Huffaker
- John Gallagher
Acknowledgments:
This material is based upon work supported by the National Science Foundation under Grant No. 9996248 and Grant No. 9711092.
Please send any comments/suggestions you may have to info@caida.org.