Toward Mathematically Rigorous Next-generation Routing Protocols for Realistic Network Topologies

Network routing research and the theoretical results thereof apply to many domains such as computer science, physics, mathematics, economics, social and political sciences. CAIDA's routing research focuses on applying key theoretical results in distributed computation theory to the development of protocols that will address issues of future Internet scalability. The research includes specific areas such as Autonomous Systems (AS) topology discovery and generation, AS relationship inference, taxonomy and classification, and pioneering research into new interdomain routing algorithms and protocols. CAIDA also conducts analysis on available Internet addressing and routing data in support of policy discussions regarding IPv4 address exhaustion, allocations, concentration of address ownership, and IPv6 adoption.

Introduction

In 2004, in response to NSF's Program Solicitation for Research in Networking Technology and Systems (NeTS), CAIDA researchers proposed a brand new area of routing research that moved toward mathematically rigorous next-generation routing protocols for realistic network topologies. The new research applies key theoretical routing results from distributed computation to the scaling problems we saw, and still see, in the area of Internet routing. The research agenda specifies three related and clearly defined tasks: 1) execute the next step on the path toward construction of practically acceptable next-generation routing protocols based on mathematically rigorous routing algorithms; 2) validate the applicability of the above algorithms against several sources of real Internet topology data; 3) build and evaluate a model for Internet topology evolution, which reflects fundamental laws of evolution of large-scale networks.

Workshops

Internet topology analysis has recently experienced a surge of activity in computer science, physics, mathematics, and statistics communities. Notably, researchers often approach essentially the same problems from different angles, but their findings are not always complementary and sometimes even conflict, leading to inconsistent conclusions. CAIDA believes interdisciplinary workshops will help to move this area of research forward.

To help bring the most active researchers from the involved communities together, CAIDA hosted the ISMA - Workshop on the Internet Topology (WIT). The main objectives of the workshop were to promote synergy, enable interdisciplinary cross-fertilization, and reveal common ground in different approaches. At a high-level, the workshop focused on recent advances, sought to resolve arguments, and identify open problems that require further research. CAIDA plans to host further workshops in routing and topology next year in 2007 and will post details on the CAIDA Workshops web page.

Topology

The first year of this research focused mostly on studies of the Internet topology.[1,2,3,4,5] Since scalability characteristics of routing algorithms depend strongly on topological properties of underlying networks, we first concentrated on topology analysis. Additionally, we conducted research on inference of AS relationships as well as building an AS taxonomy and proposing an AS classification scheme based on empirically observed differences among AS characteristics.

One of the primary goals of routing research is to create the most veracious models of the Internet topology as possible to better understand the performance of network applications and protocols. Several potential data sources exist to generate AS-level topologies; the most frequently used include RouteViews BGP table dumps (BTDs), traceroutes, and the WHOIS data. Our research introduces new AS-topology discovery methodologies that use active measurements derived from CAIDA's Macroscopic Topology Measurements project. The first application of the new methodology delivered a surprising 61.5% more AS-links than the BTD-derived AS-topologies.

Further research into the differences between topologies derived from traceroute, BGP, and WHOIS shows that the BGP and traceroute topologies show a distinct similarity but the WHOIS topology differs substantially. We find that use of the joint degree distribution of ASes provides the best metric for characterizing Internet AS topologies and most narrowly defines values for other important parameters. Additionally, we introduce an evaluation criteria for determining the accuracy of topology generators and clearly show that previous application of generators that rely entirely on reproducing degree distributions cannot capture the full spectrum of the fundamental topological characteristics of the topologies derived from the sources listed above.[2]

Realistic Topology Generation: The dK-series

The ability to capture the fundamental characteristics that define the Internet topology, and then apply this knowledge to produce a tool that generates such a topology stands as a key component toward the development and evaluation of new routing algorithms and protocols and to the study of what drives Internet evolution.

Intuitively, one focuses on practically important characteristics such as 1) distance (shortest path length) distribution, critical to scalability of routing protocols; 2) spectrum (set of eigenvalues of graphs adjacency matrix), important for network robustness and performance analysis; and 3) betweenness (the number of shortest paths passing through a node or link) distribution, important for analyzing the hierarchical structure of the network and for estimating the accuracy of traceroute-like explorations of the network when developing methods for synthesizing topologies.

Brute force attempts at generating synthetic topologies that reproduce any of the above three metrics do not work. We know of no known algorithms for constructing a graph with a target distance distribution, betweenness distribution, or spectrum. Furthermore, these approaches suffer from a methodological problem whereby the discovery of new metrics require updates to the underlying algorithm for the topology generator.

So, CAIDA researchers postulated that one could reproduce most simple characteristics that may not have direct practical importance and, in the process, also capture other properties including practically important attributes. More precisely, for any graph G, we wish to identify a series of graph properties that satisfy the requirements:

constructibility, we can construct graphs having these properties;
inclusion, any property Ρ_d subsumes all properties Ρ_i with i = 0, ... , d - 1: that is, a graph having property Ρ_d is guaranteed to also have all properties Ρ_i for i < d.
convergence: as d increases, the set of graphs having property Ρ_d "converges" to G: that is, there exists a value of index d = D such that all graphs having property Ρ_D are isomorphic to G.

Our research shows that the connectivity of the nodes in a network provide the most fundamental topology metrics for use in synthetic topology generation. Table 1 below lists these characteristics ordered by an increasing amount of information about local connectivity structure of the network.

Tag	Name	Degree correlations of nodes at distance
0K	Average node degree	None
1K	Degree distribution	0
2K	Joint node degree distribution	1
3K	Joint edge degree distribution	2
...	...	...
(d+1)K	Full degree distribution	D

Table 1: Connectivity characteristics of a network with maximum distance D.

Table 1 shows that as we increase the value of d in our notation dK, where d represents the number of specified nodes with degree correlations, we define distributions of degrees of nodes located at a maximum distance D from each other. In doing so, we gain information about the local structure of the topology which allows a more accurate description of a node's (increasingly wider) neighborhood. This area of research asks the question: do we need to go all the way up to (d+1)K to construct graphs that accurately reproduce values of commonly-used graph metrics or can we obtain reasonable accuracy at a lower value of d, such as 2 or 3?

To this end, we implemented efficient topology generators that construct graphs for d = 0,1,2, and 3. We find that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies. In particular, we find that the d = 2 case provides sufficient accuracy for most practical purposes, while d = 3 reconstructs the Internet AS- and router-level topologies exactly. This systematic method of analyzing and synthesizing topologies offers a dramatic improvement to the set of tools available to network topology and protocol researchers.[6]

AS Adjacencies

As a part of the Macroscopic Topology Project CAIDA has developed the infrastructure for continual traceroute-based Internet topology measurements. During the first year of this effort, we extended the CAIDA software to automatically map these measurements into AS adjacency matrices representing the Internet graph at the AS level. CAIDA posts the adjacency matrix of the Internet AS-level graph computed daily from observed measurements.

As a traceroute-based tool, skitter provides a view of Internet topology that differs from those derived from BGP tables, e.g. RouteViews. Because skitter data reflects packets that have actually traversed a forward path to a destination, rather than paths calculated and propagated across the loosely coupled BGP system, it is more likely than BGP data in isolation to faithfully correspond to IP topology.

AS Relationships

In addition to AS link discovery, our research includes development of new methodologies for AS-link correction. These methodologies generalize the problem of AS relationship inference as a multiobjective optimization problem with node-degree-based corrections. The method significantly reduces the number of invalid BGP paths. [7]

Many of our research efforts concerned with performance, robustness, and evolution of the global Internet rely on accurate data describing the structure of actual relationships among ASes. For these purposes, CAIDA collects and analyzes, on an ongoing basis, AS-level topology and AS relationships, and provides this inferred data for use by the community. AS links annotated with AS relationships are available for download at http://as-rank.caida.org/data/.

AS Classification and Taxonomy

Though the Internet AS-level topology has received fairly extensive study over the past few years, details of the AS taxonomy have not. Each AS that routes messages on the Internet can represent a wide variety of organizations including but not limited to ISPs, small private businesses, universities, government offices. These organizations each have vastly different network characteristics, external connectivity, network growth tendencies, and other properties that we must understand to make progress on realistic modeling and simulations of the Internet.

Statistical knowledge of the ASes in the Internet provides essential data for identifying the types of ASes that drive AS number exhaustion as well as modeling the structure and evolution of Internet topology. Early application of machine learning techniques that analyzed and indexed the short names and descriptions extracted from the Internet Routing Registries (IRR) were used as the basis for AS classification schemes for building an Internet AS taxonomy that resulted in as many as 63.01% of ASes classified. [8]

CAIDA has more recently developed an AS classification scheme resulting in the most veracious Internet AS taxonomy to date. Using similar collection methods of data from IRRs and RouteViews, we annotate every AS with the following six attributes: 1) the organization description record, 2) the number of inferred customers, 3) the number of inferred providers , 4) the number of inferred peers, 5) the number of advertised IP prefixes, and 6) the equivalent number of /24 prefixes covering all the advertised IP space. Using this method, we successfully classify 95.3% of ASes with an expected accuracy of 78.1%. We release to the community the Autonomous System Taxonomy Repository augmented with: 1) the AS taxonomy information and 2) the set of AS attributes we used to classify ASes.[9]

Interdomain Routing with Compact Algorithms and Protocols

As originally proposed by CAIDA researchers, the progress in topology generation, AS classification and link discovery research has enabled progress in research on compact interdomain routing algorithms and protocols.

Driven by empirically verifiable concerns over scalability of the current routing system and its inherently bad match for the naturally evolving structure of the Internet interdomain topology, we considered the roots of the scalability problems with current Internet interdomain routing and with all known proposals for future Internet interdomain routing. We demonstrated that according to the best available knowledge about Internet topology, a class of algorithms known as compact routing algorithms offers the best candidates for a potential solution. We formulated the following four most important problems concerning the potential applicability of compact routing to interdomain routing: stretch scaling, scale-free routing, name-independence, and dynamic routing.

The stretch of a routing algorithm is defined as a worst-case multiplicative path-length increase factor. Specifically, for every pair of nodes in all graphs in the set of graphs the algorithm can operate on, we find the ratio of the length of the shortest path between the same pair of nodes. The maximum of this ratio among all node pairs in all the graphs represents the algorithm's stretch. When designing routing algorithms, we find a tradeoff between the network stretch and the size of the Routing Table (RT).
The network stretch/RT size tradeoff relates to the next problem of scale-free networks. The routing stretch produced by the current hierarchical approach to network routing employed on the Internet only works optimally for networks where long paths prevail. pertains to networks which have a fat-tail, e.g., power-law, node degree distributions. These networks do not respond well to the current hierarchical routing algorithms
Name-independence
Dynamic routing refers to a network whose member nodes may connect, disconnect, and reconnect in varying geographic locations.

Accordingly, we started to work on the stretch scaling problem and on the scale-free routing problem. Both tasks are currently in progress.

We also started a new task dealing with predictions of upcoming exhaustion of IPv4 and IPv6 address space and AS numbers. (See Internet Identifier Consumption)

Evolution

We have started to construct non-equilibrium network models reflecting the Internet evolution. We seek a simple model that describes how market forces acting on rational players result in the evolution of the multi-faceted topology of the Internet that we see today. We model the system based on two levels of dynamic engagement of players--at the global level with clusters of ASes operating in fixed regions, and at the regional level where each AS tries to make appropriate decisions to increase its sphere of influence. Regions appear on the map at a rate based on global market conditions. The economic relationships between two regions are decided by the dominant ASes in that region. Within each region, ASes appear at a rate based on the economy of the region. The relationships between these ASes follow from the economics of connectivity within the region. We show that this two-level model naturally produces a topology similar to the current AS-level topology. Based on measured economic details, the model explains both the observed power-law node degree distribution and the exact value of the power-law exponent, which is 2.2. Using our model, we also demonstrate that topological awareness of Internet players naturally leads to consolidation of providers over time, unless some exogenous force interferes. We validate our model using historical and current measurements from BGP routing tables.

Collaboration

IRTF Routing Research Group's (RRG) Future Domain Routing (FDR) Scalability Research Subgroup (RR-FS)

A. Brady and L. Cowen (Tufts University)

A comparison of the performance of name-independent and name-dependent compact routing algorithms on both synthetic and measured Internet topologies. We find that the name-independent addressing mechanisms incur only minor performance overhead compared to the name-dependent cases.
Priya Mahadevan, Amin Vahdat (UCSD)

Realistic topology generation and analysis using degree correlations: the dK-series.
George Riley, Xenofontas Dimitropoulos(Georgia Tech)

Work on AS relationships, AS ranking, and AS taxonomy.
S. Sreenivasan, P. Krapivsky (Boston University)

Pursuit of an analytic solution to the stretch scaling problem
Kevin Fall (Intel Research) and Xiaowei Yang (MIT)

Implemented the universal stretch-3 routing scheme of Thorup and Zwick (TZ) on a model of the Internet inter-AS graph, and showed that this scheme's observed performance on this graph was qualitatively better than the worst-case guarantees. This work suggests the average stretch function may be an indirect indicator of the optimization criteria influencing the Internet's inter-domain topology evolution.
Scott Shenker, M. Caesar and J. Kannan (UCB)

Work on virtual ring routing (VRR) on flat labels
C. Papadimitriou, H. C. H. Lin(?)

Dense tree-based decomposition of scale-free graphs
P. Fraigniaud, E. Lebhar (LRI at Universite de Paris-Sud)

Construction of a metric space, underlying a Kleinberg-like model, and an associated link-generating distribution, such that the resulting graphs have power-law degree distributions.

References

X. Dimitropoulos, D. Krioukov, G. Riley. "Revisiting Internet AS-Level Topology Discover", PAM 2005
P. Mahadevan, D. Krioukov, M. Fomenkov, B. Huffaker, X. Dimitropoulos, kc claffy, A. Vahdat. "Lessons from Three Views of the Internet Topology", CAIDA-TR-2005-02, 2005
P. Mahadevan, A. Vahdat, D. Krioukov, B. Huffaker, kc claffy. "Impact of Degree Correlations on Topology Generators", ACM SIGCOMM, 2005
X. Dimitropoulos, G. Riley, D. Krioukov, R. Sundaram. "Towards a Topology Generator Modeling AS Relationships", IEEE International Conference on Network Protocols, 2005
P. Mahadevan, A. Vahdat, D. Krioukov, M. Fomenkov, B. Huffaker, kc claffy, X. Dimitropoulos. "The Internet AS-Level Topology: Three Data Sources and One Definitive Metric", ACM SIGCOMM Computer Communications Review (CCR), 2005
P. Mahadevan, D. Krioukov, K. Fall, A. Vahdat " Systematic Topology Analysis and Generation Using Degree Correlations", SIGCOMM, 2006
X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, G. Riley. "Inferring AS Relationships: Dead End or Lively Beginning?", Workshop on Efficient and Experimental Algorithms (WEA), 2005
X. Dimitropoulos, D. Krioukov, G. Riley, kc claffy. "Classifying the Types of Autonomous Systems in the Internet", SIGCOMM Poster, 2005
X. Dimitropoulos, D. Krioukov, G. Riley, kc claffy. "Revealing the Autonomous System Taxonomy: The Machine Learning Approach", Passive and Active Network Measurement Workshop (PAM), 2006