Routing information is the most basic and, perhaps, the most complicated function that networks perform. Conventional wisdom states that to find paths to destinations through the complex network maze, nodes must communicate and exchange information about the status of their connections to other nodes, since without some knowledge of changing network connectivity, it is not possible to successfully route information through the network.
In the Internet, this required inter-node communication makes routing both expensive and fragile. The recent Internet Architecture Board report on routing and addressing (RFC4984) identifies convergence costs of deployed routing protocols as one of the most serious scaling issues with the existing Internet routing architecture, aggravated by explosive rates of routing table size growth. Worse yet, in our recent review of compact routing described below, we show that the required number of messages for routing state to converge on small-world networks cannot scale better than linearly with network size for any routing algorithm.
In many other real networks however, nodes can efficiently communicate, even though they do not exchange any information about the current global state of the network topology. In 1969, Stanley Milgram performed the following experiment. He asked some random individuals (sources) to send a letter to a specific person (the destination), described by his name, occupation, age, and the city he lived in. The sources were asked to pass the letter to friends chosen to maximize the probability of the letter reaching its destination. Approximately 30% of the letters reached their destination, even though nodes in this routing example had no global knowledge of the human acquaintance network topology, except their local connections and some characteristics (e.g., occupation, age, city of dwelling) of their connections.
Much later, Jon Kleinberg offered the first popular explanation of this surprising effect. In his model, each node, in addition to being a part of the graph representing the global network topology, resides in a coordinate space -- a grid embedded in the Euclidean plane. The coordinates of a node in the plane, its address, abstracts the information about the destination in Milgram's experiments. Each node knows: 1) its coordinates; 2) the coordinates of its neighbors; and 3) the coordinates of the destination written on the packet. Given these three pieces of information, the node can route greedily by selecting the direct neighbor closest to the destination in the plane.
Clearly, the described greedy routing strategy can be efficient only if the network topology is in some way congruent with the underlying space. Kleinberg indeed finds that the paths produced by greedy routing are polylogarithmically short only if the probability that there is a link between a pair of nodes depends in a very specific way on the distance between the two nodes in the plane. This finding implies that greedy routing cannot be equally efficient on any arbitrary network topology built over a given underlying space. Only certain topologies, congruent with the underlying space, will perform well.
Given that the network topology is so critically important, the Kleinberg model stands closer to the beginning of an explanation for Milgram's experiment than to its end. The model does not (try to) reproduce the basic topological properties of social networks through which letters were traveling in Milgram's experiments. For instance, the Kleinberg model produces k-regular graphs while social networks, the Internet, and many other complex networks are known to be scale-free, meaning that: i) the distribution P(k) of node degrees k in a network follows power laws P(k) ~ k-γ with exponent γ lying between 2 and 3; and ii) the network has strong clustering, i.e., a large number of triangular subgraphs.
Extending the Kleinberg's approach, we assume in this project that nodes in the Internet and other complex networks exist in some spaces that underlie the observed network topologies. We call these spaces hidden metric spaces. The observed network topology is coupled to the hidden space geometry in the following way: a link between two nodes in the topology exists with a certain probability that depends on the distance between two nodes in the hidden geometry. In the most general settings, hidden distances abstract intrinsic similarities between nodes. The assumption that more similar nodes are more likely to be connected makes the connection probability a decreasing function of the hidden distance.
The ultimate goal of this project is to find the hidden spaces underlying real networks and to identify node coordinates in them. Unfortunately, it is quite difficult to find something if one does not know what it is. Therefore, we have developed the following step-by-step research programme that we believe is optimal to achieve our ultimate goal:
- Obtain empirical evidence that hidden spaces do exist and that they are metric indeed;
- Identify navigability mechanisms that influence the efficiency of greedy routing in complex networks;
- Find the basic geometrical and topological properties of hidden spaces that make them maximally congruent with respect to the identified navigability mechanisms;
- Obtain empirical evidence that hidden spaces under real networks do possess these properties; and
- Find mappings of nodes in real networks to the identified spaces or their models.
The results obtained so far are described below.
- D. Krioukov, kc claffy, K. Fall, and A. Brady
On Compact Routing for the Internet
Published in the ACM SIGCOMM Computer Communication Review (CCR), v.37, n.3, pp. 41-52, 2007.
Here we review compact routing and discuss its applicability to routing in the Internet.
Compact routing is the routing theory that studies the fundamental limits for routing efficiency, and tries to construct routing algorithms that meet those limits. Measures of routing efficiency include the routing table size, routing path stretch, and communication overhead, often estimated by the number of messages required for the algorithm to converge upon a network topology change. The global network topology knowledge is usually assumed, and all the efficiency parameters are estimated in the worst case, across all possible network topologies on which the routing algorithm correctly operates.
The most pessimistic fact from this theory is that there can exist no routing algorithm that would be able to converge with the number of control messages growing slower than linearly with the network size in the worst case. The most pessimistic finding in this paper is that the small-world topologies are this worst case. Almost all complex network topologies, including the Internet, are small-world: the average shortest path length in them grows (sub)logarithmically with the network size.
- M. Angeles Serrano, D. Krioukov, and M. Boguna
Self-Similarity of Complex Networks and Hidden Metric Spaces
Published in Physical Review Letters, vol. 100, no. 078701, in February 2008.
Here we obtain empirical evidence that metric spaces do underlie real networks.
We do so the following way. We consider the simplest possible degree-thresholding network renormalization procedure: given a network, obtain its subnetworks by throwing out all nodes of degrees less than a certain threshold. We apply this procedure to real networks and find that the degree distribution, degree correlation, and clustering are self-similar: both before and after renormalization, all these characteristics follow the same curves. We then randomize the networks. We randomly rewire links such that the degree distribution is preserved. In these randomized networks, the degree distribution and correlations remain self-similar, but clustering does not.
We then introduce the model of scale-free networks with the simplest hidden metric space (a circle) underneath, generate synthetic networks in this model, perform all the same operations described above with these synthetic networks, and find that they exhibit qualitatively the same effects as the real networks. Given the intimate relationship between the triangle inequality in the hidden geometry (transitivity of being close) and clustering in the observed topology (transitivity of being connected), these findings provide a strong evidence that metric spaces do underlie real networks, including the Internet, and are a plausible explanation of their self-similarity.
An interesting and fundamentally important "by-product" of these results is that they offer a new perspective on self-similarity of complex networks. These networks turn out to be self-similar with respect to distance rescaling in the hidden space, because this distance rescaling is equivalent to the degree-thresholding renormalization described above.
- M. Boguna, D. Krioukov, and kc claffy
Navigability of Complex Networks
Published online in Nature Physics, vol. 5, no. 1, pp.74-80, January 2009.
Here we discuss the general mechanisms behind navigability of complex networks and show that the structural characteristics of real networks correspond to most navigable configurations.
Specifically, we show that more navigable networks are networks with more heterogenous degree distributions (γ's closer to 2) and stronger clustering. The first property is easy to explain: higher heterogeneity implies relatively more hubs in the network that boost up its navigability. Clustering is trickier. Networks with stronger clustering are those that have stronger ties between the network topology and hidden geometry. More formally, in such networks, connections between nodes close in the hidden space are more preferred, or equivalently, the connection probability function decreases faster with the hidden distance. The stronger the ties between the network topology and hidden geometry, the more congruent they are, the closer and more successfully the greedy routing paths follow the shortest paths in the network.
- M. Boguna and D. Krioukov
Navigating Ultrasmall Worlds in Ultrashort Time
Published in Physical Review Letters, vol.102, no. 058701, 2009.
Here we show that greedy routing finds asymptotically the shortest paths in networks with scale-free degree distributions and strong clustering.
Given that topologies of many real networks, including the Internet, do have these properties, our findings imply, surprisingly, that, even without any global knowledge of network topology, nodes in complex networks can propagate information along the shortest routes. In other words, topologies of many real networks have a peculiar structure that guarantees that the lack of global topological awareness imposes asymptotically no impact on the structure of information flows in the network: with or without the global topology knowledge, information can flow along the shortest routes. There are other, regular networks, such as grids, that also possess these properties, but they require specific embeddings into specific spaces. Greedy routes on scale-free networks, on the other hand, are shortest regardless the specifics of a hidden metric space or connection probability.
Complex networks thus have the structure that allows them to perform, in the most efficient way, one of their most basic and common functions: to propagate or signal information to specific targets through a complex network maze whose global connectivity is unknown to any node. Our findings have optimistic practical implications as they open up a possibility to find shortest-path routing strategies for the Internet that would not require any global topology knowledge.
- D. Krioukov, F. Papadopoulos, M. Boguna, and A. Vahdat
Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric Spaces
Archived in arXiv cond-mat.stat-mech/.1266.
Here we establish the main geometric property of hidden spaces that make them maximally congruent with respect to the described mechanisms behind navigability of complex networks. We argue that these spaces are hyperbolic, i.e., negatively curved.
Specifically, we consider the simplest example of a compact hyperbolic space, a hyperbolic disc, and show that if nodes are uniformly distributed in it, then the scale-free (power-law) network topologies naturally emerge in these settings as a consequence of negative curvature in the hidden space. Greedy routing is maximally efficient in these settings, too. Almost all paths are shortest and successfully reach destinations. If we perturb topologies by removing links from them, thus emulating network dynamics, the efficiency of greedy routing deteriorates very slowly. Removal of up to 10% of the links degrades the percentage of successful paths by less than 1%.
The main reason for such outstanding efficiency is the exceptional congruency between the metric structures of scale-free topologies and hyperbolic geometries. The shortest paths in the network follow very closely the hyperbolic geodesics between the source and destination, and there are many shortest paths between the same source and destination that all do so. Link removals affect some of these shortest paths but not all of them.
The reason why the hidden spaces under real networks are negatively curved is as follows. Complex networks connect distinguishable, heterogenous elements abstracted as nodes. Understood broadly, this heterogeneity implies that there is at least some taxonomy of elements, meaning that all nodes can be somehow classified. In most general settings, this classification implies that nodes can be split in large groups consisting of smaller subgroups, which in turn consist of even smaller subsubgroups, etc. The relationships between such groups and subgroups can thus be approximated by tree-like structures, sometimes called dendrograms, that represent hidden hierarchies in networks. But the geometries of trees and hyperbolic spaces are intimately related. Trees embed with minimal, logarithmic, distortion into hyperbolic spaces, while they embed with exponential distortion into Euclidean spaces. Colloquially, hyperbolic spaces are continuous versions of trees. The metric structures of both are essentially the same. In other words, the hierarchical organization of complex networks translates into the negative curvature of hidden metric spaces.
Other FIND-funded papers
- X. Dimitropoulos, D. Krioukov, A. Vahdat, and G. Riley
Graph Annotations in Modeling Complex Network Topologies
Archived in arXiv cs.NI/.3879.
Here we develop a methodology to generate annotated complex network topologies of different sizes.
We focus on the AS-level Internet with AS links annotated by business relationships between ASs inferred in our previous work. The methodology starts with extracting a set of specific properties of the Internet topology that remain the same during its evolution. This set of properties is contained in the correlation profile of the annotated Internet. We first extend the generic dK-series formalism (also developed in our previous work) to apply to annotated graphs. By annotated graphs we essentially mean colored graphs, except that each link can be colored not by one but by two colors: each stub (half-link) is colored by a different color. These colors can be AS relationships. For example, provider=red, customer=green, and peer=blue, so that links connecting customers to providers are red-green, while links between peers are all blue.
We extract different kind of correlations from the real Internet topologies annotated by AS relationships. The 1K-correlations are correlations among colors of stubs attached to nodes, i.e., the joint distribution of colors showing how many nodes with x red stubs, y green stubs, and z blue stubs the network has. The 2K-correlations tell us how many differently colored links connecting nodes of different colored degrees the network has. It turns out that these types of correlations fully charcterzied the structure of the annotated AS Internet.
We then decompose these joint distributions into a collections of marginals and the normalized correlation profiles using copulas, a technique from statistics specifically designed to do so. We then rescale separately the marginals, by preserving their 1-dimensional shapes, and correlations, by using copulas, to obtain the corresponding joint distributions for target synthetic networks of desired sizes. By construction, these networks have the same correlation profile as the annotated Internet, so that they reproduce the topological invariants of its evolution.
- P. Krapivsky and D. Krioukov
Scale-Free Networks as Pre-Asymptotic Regimes of Super-Linear Preferential Attachment
Published in Physical Review E, v. 78, 026114, 2008.
Here we show that scale-free networks, including the Internet, may exist in pre-asymptotic regimes of network evolution processes that do not have any scale-free, self-similar structures as their asymptotes.
Specifically, we study the Positive Feedback Preference model, which is a growing network model that, among growing models, reproduced the AS topology of the Internet most accurately according to a long list of topological characteristics. However, the model is based on super-liner preferential attachment, which is known to lead to not scale-free but star-like graphs in the limit of large graph sizes.
We resolve this paradox by showing that the model has a vast pre-asymptotic regime, in which it is capable, according to our analysis confirmed by extensive simulations, produce scale-free networks. The asymptotic, star-like graph regime starts, given the specific set of model parameters, only at network sizes of the order of 1017.
These findings have potential to impact both practice and theory of network science. In practice, if we want to construct topology generators of growing Internet-like networks, we no longer have to limit ourselves to mechanisms, such as preferential attachment, that quickly, for small network sizes, achieve power laws as their asymptotes. In theory, the research area on what growth mechanisms produces scale-free networks not in their asymptotic but pre-asymptotic regimes is virtually untouched.