Contents
In 2004, in response to NSF's Program Solicitation for Research in
Networking Technology and Systems (NeTS), CAIDA researchers proposed
a brand new area of routing research that moved toward mathematically rigorous next-generation routing protocols for realistic network topologies. The
new research applies key theoretical routing results from distributed
computation to the scaling problems we saw, and still see, in the
area of Internet routing. The research agenda specifies three related
and clearly defined tasks: 1) execute the next step on the path
toward construction of practically acceptable next-generation routing
protocols based on mathematically rigorous routing algorithms; 2)
validate the applicability of the above algorithms against several
sources of real Internet topology data; 3) build and evaluate a
model for Internet topology evolution, which reflects fundamental
laws of evolution of large-scale networks.
Internet topology analysis has recently experienced a surge of
activity in computer science, physics, mathematics, and statistics
communities. Notably, researchers often approach essentially
the same problems from different angles, but their findings are not
always complementary and sometimes even conflict, leading to
inconsistent conclusions. CAIDA believes interdisciplinary workshops
will help to move this area of research forward.
To help bring the most active researchers from the involved communities
together, CAIDA hosted the ISMA - Workshop on the Internet Topology (WIT). The main
objectives of the workshop were to promote synergy, enable
interdisciplinary cross-fertilization, and reveal common ground in
different approaches. At a high-level, the workshop focused on
recent advances, sought to resolve arguments, and identify open
problems that require further research. CAIDA plans to host further
workshops in routing and topology next year in 2007 and will post
details on the CAIDA Workshops
web page.
The first year of this research focused mostly on studies of the
Internet topology.[1,2,3,4,5] Since scalability characteristics of
routing algorithms depend strongly on topological properties of
underlying networks, we first concentrated on topology analysis.
Additionally, we conducted research on inference of AS relationships
as well as building an AS taxonomy and proposing an AS classification
scheme based on empirically observed differences among AS
characteristics.
One of the primary goals of routing research is to create
the most veracious models of the Internet topology as possible to
better understand the performance of network applications and
protocols. Several potential data sources exist to generate AS-level
topologies; the most frequently used include RouteViews BGP table
dumps (BTDs), traceroutes, and the WHOIS data. Our research introduces
new AS-topology discovery methodologies that use active measurements
derived from CAIDA's Macroscopic Topology Measurements project. The first application of the new
methodology delivered a surprising 61.5% more AS-links than the
BTD-derived AS-topologies.
Further research into the differences between topologies derived
from traceroute, BGP, and WHOIS shows that the BGP and traceroute
topologies show a distinct similarity but the WHOIS topology differs
substantially. We find that use of the joint degree
distribution of ASes provides the best metric for
characterizing Internet AS topologies and most narrowly defines
values for other important parameters. Additionally, we introduce
an evaluation criteria for determining the accuracy of topology
generators and clearly show that previous application of generators
that rely entirely on reproducing degree distributions cannot
capture the full spectrum of the fundamental topological
characteristics of the topologies derived from the sources listed
above.[2]
The ability to capture the fundamental characteristics that define
the Internet topology, and then apply this knowledge to produce a
tool that generates such a topology stands as a key component toward
the development and evaluation of new routing algorithms and protocols
and to the study of what drives Internet evolution.
Intuitively, one focuses on practically important characteristics
such as 1) distance (shortest path length) distribution, critical
to scalability of routing protocols; 2) spectrum (set of eigenvalues
of graphs adjacency matrix), important for network robustness and
performance analysis; and 3) betweenness (the number of shortest
paths passing through a node or link) distribution, important for
analyzing the hierarchical structure of the network and for estimating
the accuracy of traceroute-like explorations of the network when
developing methods for synthesizing topologies.
Brute force attempts at generating synthetic topologies that reproduce
any of the above three metrics do not work. We know of no known
algorithms for constructing a graph with a target distance distribution,
betweenness distribution, or spectrum. Furthermore, these approaches
suffer from a methodological problem whereby the discovery of new
metrics require updates to the underlying algorithm for the topology
generator.
So, CAIDA researchers postulated that one could reproduce most
simple characteristics that may not have direct practical importance
and, in the process, also capture other properties including
practically important attributes. More precisely, for any graph
G, we wish to identify a series of graph properties
that satisfy the requirements:
-
constructibility, we can construct graphs having these properties;
-
inclusion, any property Ρd subsumes all properties
Ρi with i = 0, ... , d - 1: that is,
a graph having property Ρd is guaranteed to
also have all properties Ρi for i <
d.
-
convergence: as d increases, the set of graphs having
property Ρd "converges" to G: that is, there exists a
value of index d = D such that all graphs having
property ΡD are isomorphic to G.
Our research shows that the connectivity
of the nodes in a network provide the most fundamental topology
metrics for use in synthetic topology generation. Table 1 below lists these
characteristics ordered by an increasing amount of information
about local connectivity structure of the network.
| Tag |
Name |
Degree correlations of nodes at distance |
| 0K | Average node degree | None |
| 1K | Degree distribution | 0 |
| 2K | Joint node degree distribution | 1 |
| 3K | Joint edge degree distribution | 2 |
| ... | ... | ... |
| (d+1)K | Full degree distribution | D |
Table 1: Connectivity characteristics of a network with maximum distance D.
Table 1 shows that as we increase the value of d in our
notation dK, where d represents the number of specified
nodes with degree correlations, we define distributions of degrees
of nodes located at a maximum distance D from each other.
In doing so, we gain information about the local structure of the
topology which allows a more accurate description of a node's
(increasingly wider) neighborhood. This area of research asks the
question: do we need to go all the way up to (d+1)K to
construct graphs that accurately reproduce values of commonly-used
graph metrics or can we obtain reasonable accuracy at a lower value
of d, such as 2 or 3?
To this end, we implemented efficient topology generators that
construct graphs for d = 0,1,2, and 3. We
find that these graphs reproduce, with increasing accuracy, important
properties of measured and modeled Internet topologies. In particular,
we find that the d = 2 case provides
sufficient accuracy for most practical purposes, while d = 3
reconstructs the Internet AS- and router-level topologies exactly. This
systematic method of analyzing and synthesizing topologies offers a
dramatic improvement to the set of tools available to network topology
and protocol researchers.[6]
As a part of the Macroscopic Topology Project CAIDA has developed the
infrastructure for continual traceroute-based Internet topology
measurements. During the first year of this effort, we extended the
CAIDA software to automatically map these measurements into AS
adjacency matrices representing the Internet graph at the AS level.
CAIDA posts the adjacency matrix of the Internet AS-level graph computed daily
from observed measurements.
As a traceroute-based tool, skitter provides a view of Internet
topology that differs from those derived from BGP tables, e.g.
RouteViews.
Because skitter data reflects packets that have actually traversed
a forward path to a destination, rather than paths calculated and
propagated across the loosely coupled BGP system, it is more likely
than BGP data in isolation to faithfully correspond to IP topology.
In addition to AS link discovery, our research includes development
of new methodologies for AS-link correction. These methodologies
generalize the problem of AS relationship inference as a multiobjective
optimization problem with node-degree-based corrections. The method
significantly reduces the number of invalid BGP paths. [7]
Many of our research efforts concerned with performance, robustness,
and evolution of the global Internet rely on accurate data describing
the structure of actual relationships among ASes. For these purposes,
CAIDA collects and analyzes, on an ongoing basis, AS-level topology
and AS relationships, and provides this inferred data for use by
the community. AS links annotated with AS relationships are available
for download at
http://as-rank.caida.org/data/.
Though the Internet AS-level topology has received fairly extensive
study over the past few years, details of the AS taxonomy have not.
Each AS that routes messages on the Internet can represent a wide
variety of organizations including but not limited to ISPs, small
private businesses, universities, government offices. These
organizations each have vastly different network characteristics,
external connectivity, network growth tendencies, and other properties
that we must understand to make progress on realistic modeling and
simulations of the Internet.
Statistical knowledge of the ASes in the Internet provides essential
data for identifying the types of ASes that drive AS number exhaustion
as well as modeling the structure and evolution of Internet topology.
Early application of machine learning techniques that analyzed and
indexed the short names and descriptions extracted from the Internet
Routing Registries (IRR) were used as the basis for AS classification
schemes for building an Internet AS taxonomy that resulted in as many
as 63.01% of ASes classified. [8]
CAIDA has more recently developed an AS classification scheme
resulting in the most veracious Internet AS taxonomy to date. Using
similar collection methods of data from IRRs and RouteViews, we
annotate every AS with the following six attributes: 1) the
organization description record, 2) the number of inferred customers,
3) the number of inferred providers , 4) the number of inferred
peers, 5) the number of advertised IP prefixes, and 6) the equivalent
number of /24 prefixes covering all the advertised IP space. Using
this method, we successfully classify 95.3% of ASes with an expected
accuracy of 78.1%. We release to the community the Autonomous System Taxonomy Repository augmented with: 1) the AS taxonomy information and 2)
the set of AS attributes we used to classify ASes.[9]
As originally proposed by CAIDA researchers, the progress in topology
generation, AS classification and link discovery research has
enabled progress in research on compact interdomain routing algorithms
and protocols.
Driven by empirically verifiable concerns over scalability
of the current routing system and its inherently bad match for the
naturally evolving structure of the Internet interdomain topology,
we considered the roots of the scalability problems with current
Internet interdomain routing and with all known proposals for future
Internet interdomain routing. We demonstrated that according to the
best available knowledge about Internet topology, a class of
algorithms known as compact routing algorithms offers the best
candidates for a potential solution. We formulated the following four
most important problems concerning the potential applicability of
compact routing to interdomain routing: stretch scaling,
scale-free routing, name-independence, and dynamic routing.
-
The stretch of a routing algorithm is defined as a worst-case
multiplicative path-length increase factor. Specifically, for every
pair of nodes in all graphs in the set of graphs the algorithm can
operate on, we find the ratio of the length of the shortest path
between the same pair of nodes. The maximum of this ratio among all
node pairs in all the graphs represents the algorithm's stretch. When
designing routing algorithms, we find a tradeoff between the network
stretch and the size of the Routing Table (RT).
-
The network stretch/RT size tradeoff relates to the next problem of scale-free networks.
The routing stretch produced by the current hierarchical approach to network routing
employed on the Internet only works optimally for networks where long paths prevail.
pertains to networks which have a fat-tail,
e.g., power-law, node degree distributions. These networks do not respond
well to the current hierarchical routing algorithms
-
Name-independence
-
Dynamic routing refers to a network whose member nodes may connect, disconnect, and reconnect
in varying geographic locations.
Accordingly, we started to work on the stretch scaling problem and
on the scale-free routing problem. Both tasks are currently in
progress.
We also started a new task dealing with
predictions of upcoming exhaustion of IPv4 and IPv6 address space
and AS numbers. (See Internet Identifier Consumption)
We have started to construct non-equilibrium network models reflecting
the Internet evolution. We seek a simple model that describes how
market forces acting on rational players result in the evolution
of the multi-faceted topology of the Internet that we see today.
We model the system based on two levels of dynamic engagement of
players--at the global level with clusters of ASes operating in fixed
regions, and at the regional level where each AS tries to make
appropriate decisions to increase its sphere of influence. Regions
appear on the map at a rate based on global market conditions. The
economic relationships between two regions are decided by the
dominant ASes in that region. Within each region, ASes appear at a
rate based on the economy of the region. The relationships between
these ASes follow from the economics of connectivity within the
region. We show that this two-level model naturally produces a
topology similar to the current AS-level topology. Based on measured
economic details, the model explains both the observed power-law
node degree distribution and the exact value of the power-law
exponent, which is 2.2. Using our model, we also demonstrate that
topological awareness of Internet players naturally leads to
consolidation of providers over time, unless some exogenous force
interferes. We validate our model using historical and current
measurements from BGP routing tables.
-
IRTF Routing Research Group's (RRG) Future Domain Routing (FDR) Scalability Research Subgroup (RR-FS)
- A. Brady and L. Cowen (Tufts University)
A comparison of the performance of name-independent and
name-dependent compact routing algorithms on both synthetic and measured Internet
topologies. We find that the name-independent addressing mechanisms incur only minor
performance overhead compared to the name-dependent cases.
- Priya Mahadevan, Amin Vahdat (UCSD)
Realistic topology generation and analysis using degree correlations: the dK-series.
- George Riley, Xenofontas Dimitropoulos(Georgia Tech)
Work on AS relationships, AS ranking, and AS taxonomy.
- S. Sreenivasan, P. Krapivsky (Boston University)
Pursuit of an analytic solution to the stretch scaling problem
- Kevin Fall (Intel Research) and Xiaowei Yang (MIT)
Implemented the universal stretch-3 routing scheme of Thorup and Zwick (TZ)
on a model of the Internet inter-AS graph, and showed that this
scheme's observed performance on this graph was qualitatively
better than the worst-case guarantees. This work suggests the
average stretch function may be an indirect indicator of the
optimization criteria influencing the Internet's inter-domain
topology evolution.
- Scott Shenker, M. Caesar and J. Kannan (UCB)
Work on virtual ring routing (VRR) on flat labels
- C. Papadimitriou, H. C. H. Lin(?)
Dense tree-based decomposition of scale-free graphs
- P. Fraigniaud, E. Lebhar (LRI at Universite de Paris-Sud)
Construction of a metric space, underlying a Kleinberg-like model, and an associated
link-generating distribution, such that the resulting graphs have power-law degree distributions.
- X. Dimitropoulos, D. Krioukov, G. Riley. "Revisiting Internet AS-Level Topology Discover", PAM 2005
- P. Mahadevan, D. Krioukov, M. Fomenkov, B. Huffaker, X. Dimitropoulos, kc claffy, A. Vahdat. "Lessons from Three Views of the Internet Topology", CAIDA-TR-2005-02, 2005
- P. Mahadevan, A. Vahdat, D. Krioukov, B. Huffaker, kc claffy. "Impact of Degree Correlations on Topology Generators", ACM SIGCOMM, 2005
- X. Dimitropoulos, G. Riley, D. Krioukov, R. Sundaram. "Towards a Topology Generator Modeling AS Relationships", IEEE International Conference on Network Protocols, 2005
- P. Mahadevan, A. Vahdat, D. Krioukov, M. Fomenkov, B. Huffaker, kc claffy, X. Dimitropoulos. "The Internet AS-Level Topology: Three Data Sources and One Definitive Metric", ACM SIGCOMM Computer Communications Review (CCR), 2005
- P. Mahadevan, D. Krioukov, K. Fall, A. Vahdat " Systematic Topology Analysis and Generation Using Degree Correlations", SIGCOMM, 2006
- X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, G. Riley. "Inferring AS Relationships: Dead End or Lively Beginning?", Workshop on Efficient and Experimental Algorithms (WEA), 2005
- X. Dimitropoulos, D. Krioukov, G. Riley, kc claffy. "Classifying the Types of Autonomous Systems in the Internet", SIGCOMM Poster, 2005
- X. Dimitropoulos, D. Krioukov, G. Riley, kc claffy. "Revealing the Autonomous System Taxonomy: The Machine Learning Approach", Passive and Active Network Measurement Workshop (PAM), 2006
