Center for Applied Internet Data Analysis
tools : utilities : topostats
topostats    topostats is a package of programs that calculate various statistics on network topologies (that is, on graphs). The computed statistics are defined in the paper "Lessons from Three Views of the Internet Topology: Technical Report", which also gives a sample of the computed statistics. The topostats package computes most but not all of these statistics, and this package does not itself make any plots.

Also available on GitHub

## Description of results

The statistics currently output by these tools are:

• Number of nodes
The count of unique nodes found in the graph
• Number of edges
The count of undirected edges between nodes found in the graph
• Average node degree
The sum of the degrees for each node, divided by the number of nodes
• Maximum node degree
The largest degree found across all nodes
• Degree distribution exponent
The calculated exponent of a power law distribution that best fits the distribution of degrees in the graph
• Power-law maximum degree
The natural cutoff value of the power law distribution that best fits the distribution of degrees in the graph, calculated from the degree distribution exponent
• Normalized average average neighbor degree
The value of each node's average neighbor degree (the sum of its neighbors' degrees, divided by the number of neighbors), summed over all nodes and divided by the number of nodes, then further divided by the number of nodes minus 1
• Normalized maximum average neighbor degree
The maximum value of average neighbor degree (the sum of a node's neighbors' degrees, divided by the number of neighbors) across all nodes, divided by the number of nodes minus 1
• Assortative coefficient
The Pearson correlation coefficient, which indicates the amount of relationship between nodes of similar or dissimilar degrees
• Mean clustering
The local clustering of each node (defined as the ratio of the number of links between a node's neighbors to the maximum possible number of links), summed over all nodes and divided by the total number of nodes.
• Clustering coefficient
The percentage of 3-cycles (caused by three nodes that are all connected to each other) among all the connected node triplets in the graph
• Top clique size
The largest number of nodes, when sorted by descending degree, that are completely connected to each other
• Minimum node coreness
The smallest coreness (defined as the maximum k for which, after nodes with degree k are iteratively removed from the graph, the node is still in the graph) across all nodes
• Average node coreness
The coreness (defined as the maximum k for which, after nodes with degree k are iteratively removed from the graph, the node is still in the graph) of all nodes, divided by the number of nodes
• Maximum node coreness
The largest coreness (defined as the maximum k for which, after nodes with degree k are iteratively removed from the graph, the node is still in the graph) across all nodes
• Core size
The number of nodes that have maximum coreness (defined above)
• Minimum degree in core
The smallest degree (prior to node removal) found amongst those nodes with maximum coreness (defined above)
• Fringe size
The number of nodes that have minimum coreness (defined above)
• Maximum degree in fringe
The largest degree (prior to node removal) found amongst those nodes with minimum coreness (defined above)
• Average distance
The number of hops between all pairs of nodes, summed and divided by the number of node pairs.
• Standard deviation of distance
The standard deviation of the distribution of the number of hops between all pairs of nodes
• Average eccentricity
The eccentricity (defined as the maximum distance from one node to any other node) of each node, summed and divided by the number of nodes
The minimum eccentricity (defined above) across all nodes
• Minimum degree in graph center
The smallest degree amongst all nodes that have minimum eccentricity (defined above)
• Maximum degree in graph periphery
The largest degree amongst all nodes that have maximum eccentricity (defined above)
• Minimum node betweeness
The smallest betweenness (which measures how often a node or edge is on the shortest path between other nodes) across all nodes
• Average node betweeness
The betweenness (which measures how often a node or edge is on the shortest path between other nodes) of all nodes, summed and divided by the number of nodes
• Maximum node betweeness
The largest betweenness (which measures how often a node or edge is on the shortest path between other nodes) across all nodes
• Minimum edge betweeness
The smallest betweenness (which measures how often a node or edge is on the shortest path between other nodes) across all edges
• Average edge betweeness
The betweenness (which measures how often a node or edge is on the shortest path between other nodes) of all edges, summed and divided by the number of edges
• Maximum edge betweeness
The largest betweenness (which measures how often a node or edge is on the shortest path between other nodes) across all edges

## Tools and example output

The included tools are

• topology_stats: calculates many simple statistics,
• distance: calculates statistics on the distance between all nodes in the graph,
• betweenness: calculates statistics on the betweenness of all nodes/links in the graph, and
• components: a utility for viewing all connected components in a graph and for optionally extracting the largest connected component into a separate graph file.

Shown below is the output of these tools on the skitter dataset from the data supplement to "Lessons from Three Views of the Internet Topology: Technical Report".

topology_stats output:

```Number of nodes:	9204
Number of edges:	28959
Avg node degree:	6.29269882659713
Max node degree:	2070
Degree dist exponent (via CCDF) [warning: can be inaccurate]:	2.13036587884151
Power-law maximum degree [warning: can be inaccurate]:	3212
Normalized avg avg neighbor degree:	0.0469423442421394
Normalized max avg neighbor degree:	0.0530352021237445
Assortative coefficient:	-0.235624710744895
Mean clustering:	0.456656295004884
Clustering coefficient:	0.0257810954106928
Top clique size:	16
Min node coreness:	0
Avg node coreness:	2.22761842677097
Max node coreness:	27
Core size:	47
Min degree in core:	68
Fringe size:	2460
Max degree in fringe:	5
```

distance output:

```loaded 9204 nodes, 28959 undirected links, 42352206 pairs
... decreasing radius from 0 to 5 with node 1
... raising diameter from 0 to 5 with node 1
... raising diameter from 5 to 6 with node 9
... decreasing radius from 5 to 4 with node 22
... raising diameter from 6 to 7 with node 3287
... decreasing radius from 4 to 1 with node 21372
average distance = 3.115
std deviation of distance = 0.635
average eccentricity = 5.108
graph diameter = 7
min degree in graph center = 1
max degree in graph periphery = 1
```

betweenness output:

```loaded 9204 nodes, 28959 undirected links, 42352206 pairs
min node betweenness = 0.0000e+00
average node betweenness = 2.2997e-04
max node betweenness = 2.4114e-01
min edge betweenness = 2.3617e-08
average edge betweenness = 1.0760e-04
max edge betweenness = 8.6025e-03
```

components output:

```loaded 9204 nodes, 28959 undirected links
component at node 1: 9200 nodes, 28957 undirected links
component at node 21372: 2 nodes, 1 undirected links
component at node 21437: 2 nodes, 1 undirected links
3 components; largest at node ID 1
```