Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
CAIDA's Annual Report for 2010
A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2010.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Contents


Executive Summary

This annual report covers CAIDA's activities in 2010, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, and policy. Our infrastructure activities support several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems.

We continue to make progress on our Internet topology research agenda, supported by the expanding Ark measurement infrastructure. We collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6, and by the end of 2010, 15 of our hosting sites provided IPv6 connectivity and topology measurements. We are still improving existing techniques and developing and testing new technology for IP address alias resolution for large Internet graphs, and will release a paper and implementation of our algorithms in 2011. Using these new techniques, we collected, analyzed, processed and released three Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in January, April, and July 2010. Each 2010 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We will be augmenting ITDKs with additional meta-data in 2011.

On the theoretical side of topology research, we developed a geometric model to study the structure and function of complex networks. This model assumes one of our discoveries last year, that hyperbolic geometry seems to underlie many complex networks. If true, then the heterogeneous degree distributions and strong clustering that characterize so many complex networks emerge naturally as simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. The mathematically inclined will appreciate another accomplishment this year -- we established a mapping between our geometric framework and the statistical mechanics of complex networks.

Our study of real-world Internet topology has always enriched our routing research agenda, a long-term objective of which is to enable dramatically more scalable global Internet routing. We are still exploring the ramifications of our exciting discoveries regarding greedy routing on networks with underlying hyperbolic metric spaces. This year we showed that this type of routing can be maximally efficient and remarkably robust even in the face of damage to the network topology. While motivated by Internet routing, we spent the past year investigating the applications of this work to other disciplines: physics, biology, chemistry, and economics. The most challenging part of this routing research as it pertains to the Internet still lies ahead, and will require a broader community of engaged thinkers: application of these and other theoretical results to real-world Internet security, economic, and policy contexts.

We continue to expand our economics and policy research agenda. In 2010 we received our first (NSF NetSE-Small) research grant dedicated to the economics of transit and peering interconnections in the Internet. Despite much recent interest in the economic aspects of the Internet, such as network interconnection (peering), pricing, performance, and the profitability of various network types, two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex - in traffic dynamics, topology, and economics - for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. In 2010 we began an exciting project to pursue progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible.

Finally, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at http://www.caida.org/home/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's long-term topology research agenda includes three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.

Activities

  1. Macroscopic Topology Measurements:
    1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the third full calendar year of the IPv4 Routed /24 Topology Dataset. By the end of 2010, we increased the number of vantage points to 50 Ark monitors deployed in 29 countries.
    2. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2010, Ark had 15 monitors collecting the IPv6 Topology Dataset for researchers to get a view of the emerging IPv6 global topology.
    3. We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
  2. Analysis of Observable Topology:
    1. We continued to improve our measurement techniques and analysis methodologies for alias resolution inferences. We use the Ark platform and run the following three tools: kapar, iffinder and MIDAR. We then combine the outcomes to map IP addresses to routers as accurately and completely as feasible. Using publicly available data from many networks and ground-truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. We published preliminary results in a survey of alias resolution techniques ("Internet-Scale IP Alias Resolution Techniques") in the January 2010 issues of ACM Computer Communications Review (CCR).
    2. Resulting from our improved measurement and analysis techniques, we collected, analyzed, processed and released three Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in January, April, and July 2010. Derived from the traces collected as part of the IPv4 Routed /24 Topology Dataset, each 2010 ITDK includes: two related router-level topologies; router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We will be augmenting ITDKs with additional meta-data in 2011.
    3. We continued to produce the AS-level topologies annotated with business relationships between ASes dataset on a bi-weekly basis. We use our published algorithms to infer these relationships, recognizing their directional nature, and annotate each link in an AS topology as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship.
    4. We created a new version of our popular AS Core Graph visualizations for both IPv4 and IPv6 address space using January 2010 data collected by Ark monitors.
  3. Topology Modeling:
    1. We developed a geometric model to study the structure and function of complex networks. Our model assumes that hyperbolic geometry underlies these networks, and we showed that with this assumption, heterogeneous degree distributions and strong clustering in complex networks emerge naturally as simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. Conversely, in the paper "Hyperbolic Geometry of Complex Networks", published in Physical Review E, 2010, we showed that if a network has some metric structure, and if the network degree distribution is heterogeneous, then the network has an effective hyperbolic geometry underneath. We then established a mapping between our geometric framework and statistical mechanics of complex networks. This mapping interprets edges in a network as non-interacting fermions whose energies are hyperbolic distances between nodes, while the auxiliary fields coupled to edges are linear functions of these energies or distances. The geometric network ensemble subsumes the standard configuration model and classical random graphs as two limiting cases with degenerate geometric structures. Finally, we showed that targeted transport processes without global topology knowledge, made possible by our geometric framework, are maximally efficient, according to all efficiency measures, in networks with strongest heterogeneity and clustering, and that this efficiency is remarkably robust with respect to even catastrophic disturbances and damages to the network structure.

Major Milestones

Funding Sources

Our topology research received support from:


Routing

Discovering Hyperbolic Metric Spaces Hidden Beneath the Internet and Other Complex Networks

Goals

The primary objective of CAIDA's research in Internet routing remains the development and evaluation of solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, we spent the past year investigating the implications of this work to other disciplines, physics, biology, chemistry, and economics.

Activities

  1. In "Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces" published in IEEE INFOCOM 2010, we show that complex (scale-free) network topologies naturally emerge from hyperbolic metric spaces. Hyperbolic geometry facilitates maximally efficient greedy forwarding in these networks. Greedy forwarding is topology-oblivious. Nevertheless, greedy packets find their destinations with 100% probability following almost optimal shortest paths. This remarkable efficiency sustains even in highly dynamic networks. Our findings suggest that forwarding information through complex networks, such as the Internet, may be possible without the overhead of existing routing protocols, and may also find practical applications in overlay networks for tasks such as application-level routing, information sharing, and data distribution.
  2. Rapidly growing overheads associated with the primary function of the Internet -- routing information packets between any two computers in the world -- cause concerns among Internet experts that the existing Internet routing architecture may not sustain even another decade. In "Sustaining the Internet with Hyperbolic Mapping", published in Nature Communications, vol 1, no. 62, September 2010, we show a method to map the Internet to a hyperbolic space. Guided with the constructed map, which we release with this paper, Internet routing exhibits scaling properties close to theoretically best possible, thus resolving serious scaling limitations that the Internet faces today. Besides this immediate practical viability, our network mapping method can provide a different perspective on the community structure in complex networks.
  3. Networks portray a multitude of interactions through which people meet, ideas are spread, and infectious diseases propagate within a society. Identifying the most efficient "spreaders" in a network is an important step to optimize the use of available resources and ensure the more efficient spread of information. In the paper " Identifying Influential Spreaders in Complex Networks", published in Nature Physics, vol 6, Nov 2010, pp. 888-893, we show that the most influential spreaders in a social network do not correspond to the best connected people or to the most central people (high betweenness centrality). Instead, we find: (i) The most efficient spreaders are those located within the core of the network as identified by the k-shell decomposition analysis. (ii) When multiple spreaders are considered simultaneously, the distance between them becomes the crucial parameter that determines the extend of the spreading. Furthermore, we find that-- in the case of infections that do not confer immunity on recovered individuals-- the infection persists in the high k-shell layers of the network under conditions where hubs may not be able to preserve the infection. Our analysis provides a plausible route for an optimal design of efficient dissemination strategies.

Major Milestones

Student Involvement

  • Wolfgang John from Chalmers University of Technology, Sweden
  • Maurizio Dusi from the Universita di Brescia, Italy

Funding Sources

Our routing research received support from:


Future Internet Architecture

Named Data Networking (NDN)

Goals

In September 2010 we began participation in the Named Data Networking (NDN) project funded by NSF's new Future Internet Architecture (FIA) program.

The team of collaborating institutions includes UC Los Angeles, Palo Alto Research Center (PARC), Colorado State University, University of Arizona, University of Illinois/Urbana-Champaign, UC Irvine, UC San Diego, University of Memphis, Washington University, and Yale University, and is led by Lixia Zhang (UCLA) and Van Jacobson (PARC).

The goal of this project is to explore a new Internet architecture that shifts the architectural focus from where -- addresses and hosts in today's Internet -- to what -- the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of the today Internet: routing scalability, network security, content protection and privacy. The fundamentals of the NDN approach are motivated by empirical understanding of Internet evolution, dynamics, and usage. CAIDA will participate in NDN testbed measurements, evaluation of the architecture's performance, routing scalability, robustness, and ability to support existing and new applications, and will investigate combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms.

Funding Sources

Our routing research received support from:


Traffic Analysis

Internet traffic measurement, classification, and analysis

Goals

CAIDA has a long history of passive traces acquisition and curation aimed at traffic monitoring, classification, and workload characterization. In 2010 we continued to host visiting researchers who, in collaboration with CAIDA researchers, analyzed properties of available traces.

Activities

  1. We continued to collect passive traces from high-speed backbone links from traffic monitors located in San Jose and Chicago. We make The CAIDA Anonymized 2010 Internet Traces Dataset
  2. Working with visiting researchers Wolfgang John from Chalmers University of Technology, Sweden and Maurizio Dusi from Università degli Studi di Brescia, Italy, we used passively captured network data to estimate the amount of traffic actually routed symmetrically on a specic link. From this analysis, we propose a Flow-based Symmetry Estimator to assess symmetry in terms of flows, packets and bytes through a set of metrics that disregard inherently asymmetrical traffic such as UDP, ICMP and TCP background radiation. This normalized metric allows fair comparison of symmetry across different links. We evaluate our method on a large heterogeneous dataset, and confirm anecdotal reports that routing symmetry typically does not hold for non-edge Internet links, and decreases as one moves toward core backbone links, due to routing policy complexity. We published our work in the paper "Estimating Routing Symmetry on Single Links by Passive Flow Measurements" presented at the 1st International Workshop on TRaffic Analysis and Classification (TRAC) colocated with the 6th International Wireless Communications & Mobile Computing Conference (IWCMC 2010) in June 2010. Our proposed metric for traffic asymmetry induced by routing policies will help the community improve traffic characterization techniques and formats, but also support quantitative formalization of routing policy effects on links in the wild.
  3. We published work completed in 2009 analyzing DNS Evolution as observed at the DNS root name servers during the annual Day-in-the-life-of-the-Internet (DITL) experiments.

Major Milestones

Student Participants

CAIDA hosted the following visiting graduate student scholars:

  • Maurizio Dusi from the Universita di Brescia, Italy
  • Wolfgang John from Chalmers University of Technology, Sweden
  • Mia Zhang from Beijing Jiaotung University, China

Funding Sources

Our traffic research received support from DHS contract (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection", and indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.


Data Sharing for Security

Goals

With support from DHS's Science and Technology Directorate (S&T) through the PREDICT project, CAIDA collects routing data, peering point passive traces, and denial-of-service attack and Internet worm data to provide to academic and government researchers as well as our members. In 2010 CAIDA also maintained the the UCSD Network Telescope, a passive data collection system focused on a globally routed /8 network that carries almost no legitimate traffic, as a unique resource whose data may provide insights for network security researchers. The network telescope allows us to monitor traffic pollution to almost 1/256th of all IPv4 destination addresses on the Internet. To deliver different types of Internet data to the research community requires technology to accomplish the data capture, as well as policy infrastructure to protect the rights and minimize risk to stakeholders. CAIDA continued to evolve our policy frameworks that enable the sharing of the data we capture with the network security researcher community.

Activities

  1. Passive Trace Collection
  2. UCSD Network Telescope
    • In response to community requests for this specific type of data, we made the CAIDA "DDoS Attack 2007" Dataset available to researchers.
    • Development and refinement of measurement and analysis tools for one-way unsolicited traffic monitoring.
  3. Data Sharing Policy Development

Major Milestones

Funding Sources

Our research in data sharing for security comes from DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".


Economics, Ownership, and Trust

Goals

In 2010 we received our first (NetSE-Small) research grant dedicated to the economics of transit and peering interconnections in the Internet, from the National Science Foundation. Despite much recent interest in the economic aspects of the Internet, such as network interconnection (peering), pricing, performance, and the profitability of various network types, two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex - in traffic dynamics, topology, and economics - for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. This year we began a project to pursue transformative progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible. Our computational model (simulator) takes as input the interconnection policies of various network types, interdomain traffic demands, routing policies, geographical constraints, and pricing/cost factors, and computes an equilibrium - a state where no network has the incentive to change its connectivity. We will pursue a two-pronged approach to validating our model. First, we will verify that it can reproduce known macroscopic properties of the Internet AS topology. Second, we will use historical, publicly available financial and topological data to verify that the model can reproduce known trends in the evolution of the Internet. We will then use the model to study various interconnection practices, the stability and dynamics of interdomain links, and economic properties of the resulting equilibrium.

On the policy side, we regularly respond to requests from government agencies and policymaking bodies (including the FCC, DHS, ICANN) for comments and positions on inform policy with the best available empirical data. KC continued to serve on two ICANN advisory committees (RSSAC and SSAC), and was appointed a member of the FCC's current Technical Advisory Committee (TAC).

Activities

  1. Collaborator Pierre Francois' presented a slideset for the paper "A Value-based Framework for Internet Peering Agreements" published in the proceedings of the 22nd International Teletraffic Congress (ITC 22) in September 2010. This study proposed a quantitative framework for settlement-free and paid-peering links, based on the value of a peering link, i.e., the benefit that networks see from that link. We first studied a solution where a centralized oracle determines a provably stable, optimal and fair price for a paid-peering link, based on perfect knowledge of the revenues and costs of each network. We then studied the effects of inaccurate estimation of peering value by the peering networks. Finally, we examined how value-based peering affects the density of peering links, the nature of end-to-end paths, and the profitability of various network types in the global Internet.
  2. Amogh Dhamdhere presented "The Economics of Transit and Peering Interconnections in the Internet" at the University of New South Wales and National ICT Australia (NICTA) in November 2010.
  3. Amogh Dhamdhere presented The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh at the 6th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), Philadelphia PA, December 2010.
  4. Amogh also published two economics-related essays on CAIDA's blog: AS-level growth trends and IP-AS mappings.
  5. KC attended and wrote about the FCC's Technical Advisory Committee's first meeting in November 2010.
  6. We published a paper describing our hybrid policy/technology strategy to support privacy-sensitive data sharing with the network research community, ''Dialing privacy and utility: a proposed data-sharing framework to advance Internet research'', in IEEE Security and Privacy v. 8, no. 4, July/Aug 2010.

Major Milestones

Funding Sources

Our economics research received support from:


Infrastructure Projects


Archipelago (Ark):

On September 12, 2007, our next generation active measurement infrastructure, Archipelago (Ark) began collecting its first production data as part of the IPv4 Routed /24 Topology Dataset. Ark provides the hardware and software infrastructure for the Macroscopic Topology Project and replaces the previous skitter-based infrastructure. Ark achieves greater scalability and flexibility than the previous measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to eventually allow collaborators to run their vetted measurement tasks on a security-hardened distributed platform. We still use on all Ark nodes the scamper macroscopic probing tool, which uses ICMP Paris traceroute by default instead of UDP. We also provided feedback throughout the year to Matthew and his collaborators on scamper functionality.

The Macroscopic Topology Project now enjoys a 50-node infrastructure required to probe a broader range of dynamically generated IP addresses covering all routed /24 prefixes in IPv4 address space as well as implementing better mechanisms for signaling file and cycle completion. We developed of tools for downloading, and processing collected data in a scalable and fault tolerant manner. We also developed, implemented, tested and executed procedures for remote upgrades of legacy NLANR AMP and CAIDA skitter machines to repurpose for the Archipelago infrastructure, which we used for repurposing over a dozen machines. We are continuing to develop features required to support community requests to run measurement experiments.


Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

2010 Tool Development

Topostats

Topostats is a package of programs that calculate various statistics on network topologies (graphs). The computed statistics are defined in the paper "Lessons from Three Views of the Internet Topology: Technical Report", which also gives a sample of the computed statistics. The topostats package computes most but not all of these statistics, and this package does not itself make any plots. We released topostats version 1.0 beta in March of 2010.

RouterToAsAssignment

RouterToAsAssignment is a tool that assigns each router from a router-level graph of the Internet to the Autonomous System (AS) that owns that router. We have presented our router-AS assignment techniques in our PAM 2010 paper. The RouterToAsAssignment code takes as input a router-level topology of the format as in our Macroscopic Internet Topology Data Kit. For each router in the dataset, RouterToAsAssignment infers the AS (from among the ASes that have interfaces on that router) that owns that router. RouterToAsAssignment uses the Election+Degree heuristic to assign routers to ASes. We released RouterToAsAssignment version 0.2 in June of 2010.

CAIDA Tools Download Report

The table below displays all the supported CAIDA developed and supported tools distributed via our home page at http://www.caida.org/tools/ and the number of downloads of each version during 2010.

  • Currently Supported Tools

    Tool Description Downloads
    Autofocus Internet traffic reports and time-series graphs. 330
    CoralReef Measures and analyzes passive Internet traffic monitor data. 537
    Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 120
    dnsstat DNS traffic measurement utility. 138
    iffinder Discovers IP interfaces belonging to the same router. 215
    otter Visualizes arbitrary network data. 306
    plotpaths Displays forward traceroute path data. 107
    RouterToAsAssignment Assigns each router from a router-level graph of the Internet to its Autonomous System (AS). 116
    sk_analysis_dump A tool for analysis of traceroute-like topology data. 98
    topostats Computes various statistics on network topologies. 149
    Walrus Visualizes large graphs in three-dimensional space. 1893
    libsea Scalable graph file format and graph library. 131
    Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 76
    plot-latlong Plots points on geographic maps. 214
  • Past Tools (Unsupported)

    Tool Description Downloads
    arts++ A binary file format for storing network data. 919
    cflowd Former NetFlow analysis tool 183
    Mapnet Historical visualization of international backbone providers. 9786
    GeoPlot Geographically plots nodes and links. 406
    GTrace Geographical front-end to traceroute. 527
    plankton Historical visualization of international cache topology 25

Data

In 2010, CAIDA captured and curated data from three primary sources of network data:

  • macroscopic topology data with the Archipelago infrastructure,
  • passive traffic traces at tier1 high-speed Internet backbone links,
  • passive traffic traces from the UCSD Network Telescope
We derived several datasets from these data that we make publicly available to researchers. These include our AS Rank, AS adjacencies, and Router adjacencies datasets. We combined and released Internet backbone traces for 2010 in the Anonymized High-speed Internet Traces 2010 dataset; released a Distributed Denial-Of-Service (DDoS) Attack dataset; and released several macroscopic topology ITDKs. Some datasets are made publicly available by CAIDA without restrictions to the user, while access to other datasets is restricted to academic researchers and CAIDA members, with data access subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

Major Milestones

Data Collected in 2010

The table below lists the amount of data collected in our ongoing data collection operations.

Data Type First date Last date Total size1
Macroscopic Topology Measurements, IPv4 (Archipelago) 2010-01-01 2010-12-31 509.2 GB (1.6 TB)
Macroscopic Topology Measurements, IPv6 (Archipelago) 2010-01-01 2010-12-31 519.2 MB (1.8 GB)
Internet backbone Traces 2010-01-21 2010-12-17 4.1 TB (6.9 TB)3
"Live" Network Telescope Data 2010-01-01 2010-12-31 33 TB (61 TB)2,4
DNS Names for IPv4 Routed /24 Topology Dataset 2010-01-01 2010-12-31 6.3 GB (24.2 GB)
AS Links for IPv4 Routed /24 Topology Dataset 2010-01-01 2010-12-31 124.2 MB (500.7 MB)
Macroscopic Internet Topology Data Kit (ITDK) 2010-01-01 2010-07-31 2.6 GB (13.5 GB)
DNS root/gTLD RTT Dataset 2010-01-01 2010-12-31 762.6 MB
1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only. The specified numbers are totals captured over the whole year.
3This includes one-hour traces on April 14 during DITL 2010.
4This includes 369 GB of data collected during DITL 2010.

Datasets Distributed in 2010

We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2010, this resulted in the following datasets:

  • Publicly Available Data

    These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

Dataset Unique visitors (IPs) Data Downloaded
AS Rank 386 537 MB
AS Links (AS Adjacencies) 562 5.0 GB
AS Relationships 1189 10.3 GB
Router Adjacencies 272 666 MB
AS Taxonomy 176 89.1 MB *
Witty Worm Dataset 190 319.3 MB
Code-Red Worms Dataset 1259 7.7 GB
We count the volume of data downloaded per unique user per unique file, so if a user downloads a file multiple times, we only count that file once for that user. This significantly underestimates the total volume of data served through our dataservers.
* AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
  • Restricted Access Data

    These datasets require that users:

    • be academic or government researchers, or join CAIDA;
    • request an account and provide a brief description of their intended use of the data; and
    • agree to an Acceptable Use Policy.
Dataset Unique visitors (usernames) Data Downloaded *
Anonymized Internet Backbone Traces 152 13.2 TB
Backscatter Datasets 43 2.0 TB
(Raw Topology Traces from Archipelago infrastructure)
54 1.7 TB
Raw Topology Traces (skitter) 24 754.9 GB
DNS Names for IPv4 Routed /24 Topology Dataset 30 29.6 GB
Macroscopic Internet Topology Data Kit 61 41.0 GB
Witty Worm Dataset 13 122.1 GB
DNS Root/gTLD server RTT Dataset 6 11.7 MB
DDoS Attack Dataset 64 270 GB
Telescope Datasets 21 472 GB
* We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2010, but it is necessary because of limitations in dataserver logging combined with abberant user behaviour.
  • Restricted Access Data Requests

    The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

    We received about 14% more requests in 2010 then in 2009, and approved 20% more requests for access to restricted datasets. Almost 80% of the users that are granted access actually access our webservers to download data.

Dataset Number of requests received Number of users granted access Number of users that accessed data
Anonymized Backbone and Peering Link Traces 185 150 126
Active Topology Trace Datasets 163 113 80
Backscatter Datasets 73 47 35
Witty Worm Dataset 16 13 11
DNS Root/gTLD server RTT Dataset 7 5 4
DDoS Attack Dataset 108 74 65
Telescope Datasets 34 23 19
Totals 586 425 340

Workshops

As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

ISMA 2010 AIMS - 2nd Workshop on Active Internet Measurements

On February 8-10, 2010, CAIDA hosted the 2nd Active Internet Measurements (AIMS-2) Workshop supporting science and policy in La Jolla, CA. This workshop sought to review priorities, objectives, and plans of various active measurement infrastructures especially aimed at macroscopic security, stability, and performance measurements, as well as discuss approaches to sharing data from measurement infrastructures and promote coordination among them.

3rd CAIDA-WIDE-CASFI Workshop

The 3rd CAIDA-WIDE-CASFI Joint Measurement Workshop was held on April 24-25, 2010 in Osaka, Japan. This workshop continued a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the Workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and DNS research. The Workshop will also cover miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants.

UCSD Complex Network Seminar - Different Angles on Network Complexity, Engineering, and Science (DANCES)

Starting in October 2010, CAIDA began hosting the UCSD Complex Network Seminar: Different Angles on Network Complexity, Engineering, and Science (DANCES). As a series of seminars, the goal of DANCES is to bring together junior and senior researchers, including UCSD graduate students and post-docs, studying networks. The seminar will foster communication and collaboration among researchers from diverse disciplines that study networks from different perspectives (physics, biology, sociology, computer science, ECE, math, bioengineering, cognitive science, etc), and provide young researchers a forum to practice their presentation and communication skills.

Publications

The following table contains the papers published by CAIDA for the calendar year of 2010. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

Year Author(s) Title Publication
2010
Dhamdhere, A.
Dovrolis, C.
The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh ACM SIGCOMM CoNEXT
2010
Kitsak, M.
Gallos, L.
Havlin, S.
Liljeros, F.
Muchnik, L.
Stanley, H.
Makse, H.
Identifying influential spreaders in complex networks Nature Physics
2010
Kenneally, E. Using Network Science To Understand and Apply Privacy Usage Controls? W3C Workshop on Privacy and Data Usage Control
2010
claffy, k.
Aben, E.
Auge, J.
Beverly, R.
Bustamante, F.
Donnet, B.
Friedman, T.
Fomenkov, M.
Haga, P.
Luckie, M.
Shavitt, Y.
The 2nd Workshop on Active Internet Measurements (AIMS-2) Report ACM SIGCOMM Computer Communication Review (CCR)
2010
Merindol, P.
Donnet, B.
Pansiot, J.
Luckie, M.
Hyun, Y.
MERLIN: MEasure the Router Level of the INternet Universite catholique de Louvain
2010
Krioukov, D.
Papadopoulos, F.
Kitsak, M.
Vahdat, A.
Boguna, M.
Hyperbolic Geometry of Complex Networks Physical Review E
2010
Dhamdhere, A.
Dovrolis, C.
Francois, P.
A Value-based Framework for Internet Peering Agreements International Teletraffic Congress (ITC)
2010
Boguna, M.
Papadopoulos, F.
Krioukov, D.
Sustaining the Internet with Hyperbolic Mapping Nature Communications
2010
Kenneally, E.
claffy, k.
Dialing privacy and utility: a proposed data-sharing framework to advance Internet research IEEE Security & Privacy
2010
Huffaker, B.
Dhamdhere, A.
Fomenkov, M.
claffy, k.
Toward Topology Dualism: Improving the Accuracy of AS Annotations for Routers Passive and Active Network Measurement Workshop (PAM)
2010
claffy, k. Workshop on Internet Economics (WIE2009) Report ACM SIGCOMM Computer Communication Review (CCR)
2010
Shakkottai, S.
Fomenkov, M.
Koga, R.
Krioukov, D.
claffy, k.
Evolution of the Internet AS-Level Ecosystem European Physical Journal B
2010
Papadopoulos, F.
Krioukov, D.
Boguna, M.
Vahdat, A.
Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces IEEE Conference on Computer Communications (INFOCOM)
2010
Kitsak, M.
Riccaboni, M.
Havlin, S.
Pammolli, F.
Stanley, H.
Scale-free models for the structure of business firm networks Physical Review E
2010
John, W.
Dusi, M.
claffy, k.
Estimating Routing Symmetry on Single Links by Passive Flow Measurements ACM 1st International Workshop on TRaffic Analysis and Classification (TRAC)
2010
Castro, S.
Zhang, M.
John, W.
Wessels, D.
claffy, k.
Understanding and preparing for DNS evolution Traffic Monitoring and Analysis Workshop (TMA)
2010
Keys, K. Internet-Scale IP Alias Resolution Techniques ACM SIGCOMM Computer Communication Review (CCR)
2010
Kenneally, E.
Bailey, M.
Maughan, D.
A Framework for Understanding and Applying Ethical Principles in Network and Security Research Workshop on Ethics in Computer Security Research (WECSR)


Presentations

The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2010. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

Year Month Presenter(s) Title Venue Topic(s)
2010
Dec Krioukov, D. Complex network geometry and navigation Different Angles on Network Complexity, Engineering, and Science (DANCES)
- network geometry
- routing
- topology
2010
Dec Kitsak, M. Epidemics in Social Networks University of Nevada, Reno
- network geometry
2010
Dec Fomenkov, M. DHS PREDICT project: CAIDA update PREDICT PI
- data
- measurement methodology
- overview
- policy
- security
2010
Dec Dhamdhere, A. The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT)
- economics
- routing
- topology
2010
Nov Squarcella, C. Visualizing geolocated Internet measurements RIPE
- software/tools
- visualization
2010
Nov Krioukov, D. Robustness of Targeted Transport in Complex Networks Robustness of Complex Networks
- routing
- topology
2010
Nov Dhamdhere, A. The Economics of Transit and Peering Interconnections in the Internet CAIDA
- economics
- routing
- topology
2010
Oct Kenneally, E. Can Network Science Help Re-Write the Privacy Playbook? W3C Workshop on Privacy and Data Usage Control
- policy
2010
Sep Francois, P. A Value-based Framework for Internet Peering Agreements International Teletraffic Congress (ITC)
- economics
- policy
- routing
- topology
2010
Sep Fomenkov, M. CAIDA Activities, 2009-2010 The Quilt
- overview
2010
Sep claffy, k. Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity PI Meeting
- active data analysis
- measurement methodology
- security
- topology
- visualization
2010
Jul Polterock, J. IRNC-SP: Sustainable data-handling and analysis methodologies for the IRNC networks NSF-IRNC
- active data analysis
- ipv6
- passive data analysis
- policy
2010
Jul Krioukov, D. Optimal routing in a hyperbolically mapped Internet Toward Evolutive Routing Algorithms for scale-free/internet-like NETworks (TERA-NET)
- network geometry
- routing
- topology
2010
Jul Fomenkov, M. DHS PREDICT project: CAIDA update PREDICT PI
- data
- measurement methodology
- overview
- policy
- security
2010
Jun Krioukov, D. Hyperbolic mapping of complex networks Algorithms for Modern Massive Data Sets (MMDS)
- routing
- topology
2010
Jun John, W. Estimating Routing Symmetry on Single Links by Passive Flow Measurements International Workshop on TRaffic Analysis and Classification (TRAC)
- measurement methodology
- routing
2010
May Krioukov, D. Hyperbolic geometry of complex networks International School and Conference on Network Science (NetSci)
- network geometry
- routing
- topology
2010
May Krioukov, D. Navigability of Networks NSF Workshop on Shared Organizing Principles in the Computing and Biological Sciences
- network geometry
- routing
- topology
2010
May Kitsak, M. Metric Structure of Bipartite Networks Center for Complex Network Research (CCNR)
- network geometry
2010
May Kitsak, M. Identification of Influential Spreaders in Complex Networks International School and Conference on Network Science (NetSci)
- network geometry
2010
May Huffaker, B. Inferring Geolocation Ownership of Internet Identifiers Sprint Electronic Crimes Task Force (ECTF)
- overview
- routing
- security
- topology
2010
Apr John, W. Understanding and Preparing for DNS Evolution Traffic Monitoring and Analysis
- dns
- passive data analysis
- security
- trends
2010
Apr Huffaker, B. CAIDA Report 2010 WIDE-CASFI
- overview
2010
Mar Krioukov, D. Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces INFOCOM
- network geometry
- routing
- topology
2010
Mar Huffaker, B. DatCat: Overview, Lessons Learned NSF GENI Engineering Conference (GEC7)
- data
- overview
- software/tools
- trends
2010
Mar claffy, k. CAIDA Updates UCSD CSE SysNet Lunch
- overview
2010
Feb Keys, K. Internet-Scale Alias Resolution with MIDAR ISMA AIMS
- measurement methodology
- software/tools
- topology
2010
Feb Hyun, Y. Archipelago Measurement Infrastructure: Updates and Case Study ISMA AIMS
- measurement methodology
- software/tools
- topology
2010
Feb Huffaker, B. Geolocation Comparison: CAIDA's Geolocation Tools Comparison ISMA AIMS
- data
- measurement methodology
- topology
2010
Feb Huffaker, B. AS Assignment for Routers PAM
- active data analysis
- routing
- topology
2010
Feb claffy, k. DHS PREDICT project: CAIDA update PREDICT PI
- data
- measurement methodology
- overview
- policy
- security
2010
Jan Krioukov, D. Navigability of complex networks Decision Making: A Psychophysics Application of Network Science
- network geometry
- routing
- topology
2010
Jan Kenneally, E. Framework for Understanding and Applying Ethical Principles in Network and Security Research Workshop on Ethics in Computer Security Research
- overview
- policy
- trends


Web Site Usage

In 2010, CAIDA's web site continued to attract considerable attention from a broad, international audience.

The graph and table below present the monthly history of traffic to www.caida.org for 2010. To show a more accurate representation of website traffic, these statistics do not include non-viewed traffic including traffic from spiders, crawlers or other robots.



Web Usage Bar Graph
MonthUnique visitorsNumber of visitsPagesHitsBandwidth
Jan 201042,28164,676204,5321,005,37146.78 GB
Feb 201036,07557,638186,712829,96746.16 GB
Mar 201038,65563,709209,298969,51949.03 GB
Apr 201034,06058,815178,703847,19640.14 GB
May 201030,26053,730201,717822,95141.79 GB
Jun 201027,98949,751173,384797,07238.95 GB
Jul 201026,55748,964185,284765,18136.21 GB
Aug 201027,36047,993187,204804,15740.94 GB
Sep 201031,15653,223223,597889,52049.32 GB
Oct 201034,15555,490179,461840,27945.11 GB
Nov 201031,51852,790165,434756,55541.30 GB
Dec 201028,25549,578199,386794,91143.50 GB
Total388,321656,357
(1.69 visits/visitor)
2,294,712
(3.49 pages/visit)
10,122,679
(15.42 hits/visit)
519.24 GB
(829.52 kb/visit)

Organizational Chart

CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2010. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.

[Image of CAIDA Functional Organization Chart]

CAIDA Functional Organization Chart


Funding Sources

CAIDA thanks our 2010 sponsors, members, and collaborators.

The charts below depict funds received by CAIDA during the 2010 calendar year.

Funding SourceAllocationsPercentage of Total
NSF3,392,80271%
DOI689,18614%
DHS610,42113%
GIFT100,5202%
Total4,792,929100%

Figure 1. Allocations by funding source received during 2010.


Operating Expenses

The charts below depict CAIDA's Annual Expense Report for the 2010 calendar year. The NSF funds received are larger than spent since we got total budgets for four three-year grants, i.e., the money must last the entire duration of the four projects. We expect this number to be considerable leaner next year.

LABOR Salaries and benefits paid to staff and students
IDC Indirect Costs paid to the University of California, San Diego including grant overhead (54.5%).
SUPPLIES & EXPENSES Computer supplies and equipment (including computer hardware and software costing less than $5000); telephone, Internet, and other IT services, and general office supplies.
TRAVEL Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
EQUIPMENT Computer hardware or other equipment costing more than $5000.
TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
Program AreaExpensesPercentage of Total
Labor1,384,50660%
IDC747,50532%
Supplies and Expenses75,2983%
Travel38,9472%
Equipment60,0303%
Transfers4,4070%
Total2,310,693100%

Figure 2. 2010 Operating Expenses


Program AreaExpensesPercentage of Total
Infrastructure951,72641%
Topology862,04237%
Routing362,08516%
Policy90,5734%
Outreach44,2672%
Total2,310,693100%

Figure 3. 2010 Expenses by Program Area

  Last Modified: Wed Nov-6-2013 17:18:19 PST
  Page URL: http://www.caida.org/home/about/annualreports/2010/index.xml