The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

CAIDA's Annual Report for 2008

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2008.

Mission Statement: CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared
  • improve the integrity of the field of Internet science
  • inform science, technology, and communications public policies

Executive Summary

This annual report covers CAIDA's activites in 2008, summarizing highlights from our research, infrastructure, and outreach activities. Our current research projects, funded by the U.S. National Science Foundation (NSF) and the Department of Homeland Security (DHS), include several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. We made fundamental advances in several of our research projects this year, supported by increased coverage by our measurement infrastructure, and increased collaborations with colleagues around the world. We completed the first full calendar year of a continous provisioning of the most comprehensive annotated view of IPv4 topology thus far. We scientifically discerned which IPv4 topology probing method worked the best, and began to integrate and optimize our IP alias resolution techniques for large graphs. We also began to deploy IPv6 Ark nodes, and early IPv6 probe destination lists.

Some of our topology research focused on how different routing approaches in nature are maximally efficient on certain types of peculiarly structured topologies, conveniently, those structured like the Internet AS graph. Further, we found that self-similarity of clustering in real complex networks provides strong empirical evidence that some hidden metric spaces underlie these networks. In trying to model self-similar (scale-free) networks embedded into such a hidden space, we discover that a certain approach to routing -- greedy routing -- is phenomenally successful and efficient in such a model. We are still exploring the ramifications of this intense discovery, and the even more intriguing breakthrough that this hidden space seems to be hyperbolic. Our research into network growth dynamics also yielded two papers with surprising results about different regimes of network growth: (1) that there may be a vast pre-asymptotic regime of complex network growth that gives rise to power-law like effects in degree distribution; (2) a simple customer-provider-based modification of the preferential attachment model can account for Internet topology evolution, including the ISP consolidation toward monopoly.

Per our mission, our infrastructure activities aim to narrow the growing gap that impedes the field of network research, as well as telecommunications policy and infrastructure sustainability: a dearth of available empirical data on the public Internet since the infrastructure privatized in the mid-1990s. In 2008 we continued to maintain a catalog of Internet measurement data sets, contributed to and used the (DHS-funded) PREDICT repository of datasets to support cybersecurity research, and developed and deployed new active and passive measurement infrastructure. We continued expanding our newest active measurement infrastructure, now collecting the most comprehensive set of IPv4 topology measurements ever made available to researchers, enhanced with DNS information. Our data repository includes weekly archives of complete Internet AS-level topologies enriched with AS relationship information, and weekly updates of AS (ISP) rankings. We also coordinated and analyzed another DITL's worth of data, and wrote a few web pages and published a paper in CCR; hopefully someone else will pick up DITL next year since we've not had dedicated funding for it yet.

We also led and participated in tool development to support analysis, indexing, and dissemination of Internet infrastructure data. Highlights include updates to our real-time report generator of passively observed traffic, geographical visualizations of DNS workload to a given set of servers, updates to our IPv4 and IPv6 AScore posters, and visual maps of IPv4 address space consumption.

In 2007 (annual report) CAIDA began to expand its scope to include economics and policy research. Our notable contribution in 2008 was a set of blog entries that became a short Internet research tutorial for lawyers. Finally, we engaged in a variety of outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2007-2010 is available at Please do not hesitate to send comments or questions to info at caida dot org.

Research Projects


Macroscopic Topology Measurements, Analysis, and Modeling


CAIDA's topology research agenda is focused on three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.


In 2008 CAIDA made steady progress in all three of its topology research focus areas.

  1. Macroscopic Topology Measurement:
    1. We continued large-scale macroscopic topology measurements using our set of monitors distributed worldwide and coordinated by Archipelago (Ark), our state-of-the-art global measurement platform. We are aware of the gaps in geography and topology coverage -- still not well-quantified by researchers -- induced by the relatively small number of vantage points. Yet with 30 Ark monitors deployed in 21 countries by the end of 2008, we achieve our most comprehensive view of IPv4 topology thus far, completing the first full calendar year of the IPv4 Routed /24 Topology Dataset.

    2. On December 12, 2008, we began to use the Ark infrastructure to collect IPv6 topology data. We released the IPv6 Topology Dataset for researchers to get a view of the nascent IPv6 global topology as seen by six Ark monitors. More IPv6 topology data will be available in 2009.

    3. Led by Matthew Luckie in the WAND research group at the University of Waikato, we conducted experiments to see which traceroute probing methods captured the most topology information and published our results in "Traceroute Probe Method and Forward IP Path Inference" in IMC '08. We also released the corresponding Traceroute Probe Method 2008-08 Dataset.

    4. We implemented a new technique for alias resolution measurements on Ark platform. Our new tool kapar is a scalable version of APAR developed by M. Gunes and K. Sarac at the University of Texas. Using publicly available data from four networks, we tested the efficiency and veracity of various combinations of alias resolution methods including iffinder, TTL measurements, and kapar. We published our findings as a CAIDA technical report, "IP Alias Resolution Techniques". A paper is in preparation.

  2. Analysis of the Observable Topology:
    • We continue to annotate the IPv4 topology graph with automated DNS reverse lookups of IP addresses discovered by the probes.

    • We continued work on techniques to accurately annotate Internet topologies based on observations and inference of Internet structural and commercial characteristics. Our efforts focused on the AS-level Internet with AS links annotated by business relationships between ASes. We infer these relationships, recognizing their bidirectional nature, and annotate each link as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship. The paper "Graph Annotations in Modeling Complex Network Topologies" was accepted for publication in ACM Transactions on Modeling and Computer Simulation (TOMACS).

    • We created new versions of our popular AS Core Graph visualizations for both IPv4 and IPv6. The 2008 IPv4 graph was the first to make use of the topology data collected on the new Ark platform. Because we did not have enough IPv6 support in the Ark infrastructure, the 2008 IPv6 graph relied on data collected by volunteers responding to a request sent to the North American Network Operators' Group (NANOG) mailing list. We will have semi-automated IPv6 topology discovery running on Ark early in 2009.

    • Several users of CAIDA's AS relationship inference data asked us why it contained AS relationship cycles, e.g., cases where AS A is a provider of AS B, B is a provider of C, and C is a provider of A, or other cycle types. We published a paper, "On Cycles in AS Relationships" in ACM SIGCOMM Computer Communications Review (CCR), v.38, n.3, p.102-104, 2008 that provides our answers.

  • Topology Modeling:
    1. We demonstrated that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization. We published our results in "Self-similarity of Complex Networks and Hidden Metric Spaces" in Physical Review Letters, v. 100, 078701, 2008.

    2. We studied the paradox associated with networks growing according to super-linear preferential attachment: super-linear preference cannot produce scale-free networks in the thermodynamic (asymptotic) limit, but there are super-linearly growing network models that perfectly match the scale-free structure of some real networks, including as the Internet. We demonstrated that a super-linearly growing network model can reproduce, in its pre-asymptotic regime, the structure of a real network, if the model captures some sufficiently strong structural constraints, e.g., rich-club connectivity. These findings suggest that real scale-free networks of finite size may exist in pre-asymptotic regimes of network evolution processes that lead to degenerate network formations in the thermodynamic limit. We published our results in "Scale-free networks as pre-asymptotic regimes of super-linear preferential attachment" in Physical Review E, v.78, 026114, 2008.

    3. In collaboration with S. Shakkottai from Texas A&M University we refined our model of Internet growth which attempts to explain preferential attachment based on economic realities of the AS-level Internet. We simulated a growing AS-level topology and annotated links in the model graph with AS relationships (customer-provider or peer-to-peer). We compared the degree distributions for all different types of nodes in the simulated topology to those observed in measured Internet topologies, and found that the distributions are similar. To our knowledge, this is the first Internet evolution model that is realistic, analytically tractable, and entirely based on physical, that is, measurable, parameters. The paper was accepted for publication in The First International Conference on Complex Sciences: Theory and Applications (Complex 2009).

  • Major Milestones

    Student Involvement

    A. Jamakovic (TU Delft) continued her work from 2007 on application of the dK-series methodology to study randomness of various complex networks, which we hope to publish in 2009.

    Funding Sources

    This research received support from:


    Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates


    CAIDA's research in Internet routing continued to focus on two related topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. Leveraging a decade of CAIDA institutional knowledge of topology discovery, collection, and analysis, we have a bold research objective: a far-reaching solution to the routing scalability problems of today's Internet. But our work in this area has profound implications for network science in other disciplines (physics, biology, chemistry, social sciences).

    To foster our research goals, we developed and pursued the following step-by-step program:

    • Obtain empirical evidence that hidden spaces do underlie complex networks and that they are metric;
    • Identify navigability mechanisms that influence the efficiency of greedy routing in complex networks;
    • Find the basic geometrical and topological properties of hidden spaces that make them maximally congruent with respect to the identified navigability mechanisms;
    • Obtain empirical evidence that hidden spaces underlying real networks do possess these properties; and
    • Find mappings of nodes in real networks to the identified spaces or their models.


    1. We studied the process of routing information through networks as a universal phenomenon existing in both natural and man-made complex systems. In many complex networks found in nature, nodes communicate efficiently even without full knowledge of global network connectivity. We demonstrated that the peculiar structural characteristics of observable complex networks is consistent with maximizing communication efficiency when using greedy routing approaches without global knowledge. We also described a general mechanism that explains this connection between network structure and function. The paper "Navigability of complex networks" was published in Nature Physics online.

    2. We began the follow-on work to the above, and submitted for publication a paper, "Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric Spaces". This paper shows that the hierarchical structure of complex networks is congruent with negatively curved geometries hidden beneath observed topologies, i.e., the hidden metric space is hyperbolic. Mapping nodes to these hidden metrics leads to scale-free topologies in the observable network, and even more pleasantly surprising, greedy routing on this embedding can achieve 100% reachability and optimal paths. The question remains as to whether we can find hidden metric spaces to map and better navigate real world networks such as the Internet.

    Major Milestones

    Funding Sources

    Our routing research received support from:


    Improving the Integrity of Domain Name System (DNS) Monitoring and Protection


    CAIDA researchers conduct DNS measurements and develop tools, models, and analysis methodologies for use by DNS operators and researchers.


    1. Measurements of traffic at the DNS Root Servers
      1. During our January 2008 CAIDA/WIDE workshop, participants discussed the Day in the Life of the Internet (DITL) project, reflected on the lessons and results of the 2007 collection event, and compiled a sample list of the top research questions and the corresponding data that researchers would like to procure from the DITL project.

      2. In collaboration with ISC and OARC, we held the third large-scale simultaneous "Day in the Life of the DNS Root Servers" data collection event on March 18-19, 2008 (DITL 2008). We captured tcpdump traces at nearly all anycast instances of the A, C, E, F, H, old-J, K, L, old-L, and M root servers and from two alternative Open Root Server Network (ORSN) servers. In comparison with DITL 2007, the total amount of data doubled. This unique dataset represents the most comprehensive measurements of the root servers to date, and provides researchers with unprecedented insight into the root server workload characteristics and performance. summary of the collection event, and cataloged the data into DatCat. These data are available to the research community via the DNS-OARC. Academic researchers can participate in the DNS-OARC for free.

      3. We began analysis of the DNS root server data collected during the DITL 2008 and presented our findings at NANOG42 and at the DNS-OARC 2008 DNS Ops Workshop in Brooklyn. We published a paper, "A Day at the Root of the Internet" in the ACM Computer Communications Review, v. 38, pp.41-46, 2008.

      4. Sebastian Castro (a visiting student from Chile) collaborated with CAIDA researchers to analyze DNS data collected during the 2006, 2007, and 2008 DITL events. He focused on the "heavy hitters" analysis, which raised more questions than we answered. The sources of heavy pollution (queries that cannot possibly be appropriate) change over time, often associated with application software that is lazy (laissez-faire) about managing its own DNS traffic. One clear pattern is continuous growth -- there is an order of magnitude more pollution (invalid queries) at the roots than valid queries, and the number of invalid queries grows faster than the number of valid queries. Perhaps more importantly, there is no organization who has the incentive and capital to spend to fix this pollution. The root cause of a lot of it has to do with writing lazy software because it is cheaper.

      5. We published a paper "Influence Maps - a novel 2-D visualization of massive geographically distributed data sets" in the Internet Protocol Forum in October, 2008. In this paper, we present a novel visualization technique -- the Influence Map -- which renders a compressed representation of geospatially distributed Internet data.

    2. Other DNS measurements:
      • To complement the IPv4 Routed /24 Topology Dataset, in March 2008, we began using our custom-built bulk DNS lookup service to resolve the fully-qualified domain names for IP addresses seen by our monitors. We make these names available in the DNS Names for IPv4 Routed /24 Topology Dataset.

      • Duane Wessels, who became director of DNS-OARC in June 2008, continued our open resolvers survey and posted daily reports that identify open DNS resolvers. These resolvers represent a dangerous vulnerability to Internet users since they allow resource squatting, are easy to poison, and can be used in widespread Distributed Denial of Service (DDoS) attacks.

      • In collaboration with Prof. N. Brownlee (University of Auckland (UA), New Zealand), we maintained NeTraMet traffic meters installed at various locations in the US, New Zealand, and Japan, and continued monitoring requests to, paired with responses from, root/gTLD servers generated by large campus/enterprise networks. The resulting longitudinal dataset is available for researchers. It contains information useful for evaluating performance conditions and trends on the global Internet, although note that DNS RTTs are influenced by several factors, including remote server load, network congestion, route instability, and local effects such as link or equipment failures.

    Major Milestones

    Funding Sources

    This research received support from:

    Economics and Policy

    Research and Analysis on Internet Protocol version 6 (IPv6)


    In the face of exhaustion of the Internet Assigned Numbers Authority (IANA) IPv4 address resources in the next several years, CAIDA seeks salient data and objective quantitative analysis that will inform the development of address allocation policies that will accommodate continued growth and innovation of the Internet. We hope to foster discussion of scenarios that accommodate four realties: 1) the current Internet has become critical infrastructure for governments, organizations, and individuals throughout the world, 2) the Internet (on any timeline) requires an upgrade to a more scalable and sustainable addressing solution 3) the fact that any such solution requires an infusion of capital and skilled labor, and 4) the major organizations currently associated with ownership, maintenance, and upgrade of such Internet infrastructure do not currently enjoy resources that would allow for such investments in the required upgrades. All four S's -- security, scalability, sustainability, and stewardship -- in one messy problem.


    1. In 2008, CAIDA worked with ARIN to collect and analyze information on IPv6 uptake. We conducted two surveys; The March 2008 survey went to respondents from the ARIN region, a September 2008 survey collated responses from all regions. Claffy presented these results at the April and October ARIN Public Policy Meetings, and also presented remotely to the Internet Society's (ISOC) Advisory Council in November.

    2. Claffy chaired a panel at TPRC's 36th Research Conference on Communication, Information, and Internet Policy hosted by the Center for Technology and the Law, George Mason University Law School, Arlington, Virginia in September 2008.

    Major Milestones

    Student Involvement

    REU student Jennifer Hsu worked with CAIDA staff to produce A History of Internet Infrastructure Ownership, but gave up on it until we get a better source of available data.

    Funding Sources

    This research received support from a gift made by the American Registry for Internet Numbers (ARIN).

    Cooperative Measurement and Modeling of Open Networked Systems (COMMONS)


    We continued activity on the COMMONS project, mostly in learning what policy changes or support would be required to support such an experiment. Last year we proposed that NLR and/or Internet2 offer measurement technology and connectivity to community networks in exchange for opt-in access to measurement of the resulting interdomain network and its costs. This year we shifted our efforts on trying to educate enough people in the communications policy community about Internet technology problems so we can have a more interdisciplinary conversation next year.


    1. In collaboration with co-author Sascha Meinrath, we published, "The COMMONS Initiative: Cooperative Measurement and Modeling of Open Networked Systems", in the CommLaw Conspectus: Journal of Communications Law and Policy, Volume 16.2. This article proposes to develop a requirements document and roadmap to support the use of a national OC-192 transit backbone for community wireless networks and other public sector networks to reach each other. This would enable a large-scale, incentive-based network of Internet workload, performance, economic, and behavioral measurement on an unprecedented national, inter-segment, inter-provider scale. First we should talk for a couple of years about how to respect privacy in Internet research.

    2. In March 2008, kc claffy attended a meeting hosted by Google and Stanford Law School - Legal Futures, which inspired a follow set of blog postings and slightly updated pdf on, "Ten Things Lawyers Should Know About the Internet".

    Major Milestones

    In its second year, the COMMONS project reported the following major milestones.

    Student Involvement

    We supervised an undergraduate student, Connie Lyu, who worked with Adobe Illustrator to add graphics and layout to the top ten things lawyers should know about the Internet blog entries to produce the booklet version.

    Funding Sources

    This research was supported by a gift from Cisco Systems, Inc..

    Infrastructure Projects

    CONMI: Cooperative Network Measurement Infrastructure


    The core objective of the "Community-Oriented Network Measurement Infrastructure" (CONMI) project is to provide needed data sets to the scientific community studying the Internet. To accomplish this goal, CAIDA deploys both active and passive infrastructure to measure a wide cross-section of the Internet and collect and distribute the resulting data.


    In 2008, we continued to focus our efforts on the two tasks described in the proposal: 1) implementing Archipelago, our state-of-the-art, community-oriented, active measurement infrastructure; and 2) deploying monitors capable of collecting passive traces on Internet links, including new monitors for OC192 backbone links and web pages displaying reports from publicly accessible realtime traffic monitors.

    1. Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure
      1. The Archipelago (Ark) Project made progress in 2008 on monitor deployment and software development. By the end of 2008, CAIDA had 30 Ark montiors deployed in 21 countries, including eight monitors in the US. The topology of Archipelago includes much of that of our previous skitter infrastructure. which used PCs located in networks around the world that send measurement results via the Internet to a central server located at CAIDA at the San Diego Supercomputer Center. The new design pays a great deal of attention to communication & coordination, software installation & execution environment, and data storage & management. In 2008, the Ark infrastructure was primarily used for ongoing topology measurements in the IPv4 address space; we will expand its research scope in 2009.

      2. We continue to expand the Ark infrastructure, adding 1-2 monitors per month. Increasing the number of monitors is vital for topology measurements since it reduces the gaps in coverage, decreases the cycle times and allows us to increase the number of traces attempted for each destination /24. The application load gets distributed across the teams and monitors based on resource availability. Locations interested in hosting an Ark monitor, should send a message to

      3. We added infrastructure for automated IPv4 Routed /24 AS Links Dataset creation, automated ongoing DNS lookup of IP addresses seen in the Routed /24 Topology traces, and tcpdump captures of DNS query/response traffic.

      4. We released the rb-wartslib library that enables warts data processing from Ruby. We also produced numerous scripts to further automate collection, data management, and archival. All tools for downloading and managing collected data stress scalability and fault tolerance.

      5. With a nod toward the future, we implemented a prototypical probing methodology for Internet Protocol version 6 (IPv6) on six nodes of the Ark infrastructure with the requisite IPv6 connectivity. We will expand IPv6 measurements in 2009.

    2. Deployment of Passive Monitors for Trace Collection on Backbone Links
      • Early in the year, we spent much time and effort dealing with problematic older unsupported hardware hoping to repurpose it, but have not had any resources to upgrade these older systems. Finally, in July 2008, we successfully deployed four passive traffic monitors on high-speed, tier 1, OC192 Internet backbone links. Working with Equinix and a tier 1 ISP, we sited two monitors (four hosts) to tap two bidirectional links, one from Seattle, WA to Chicago, IL and another from San Jose, CA to Los Angeles, CA.

      • The first 1-hour trace was obtained in March 2008 as a part of our global Internet measurement experiment, 'Day in the Life of the Internet 2008'. The data was anonymized using the Crypto-PAn prefix-preserving anonymization technique and is available for use by vetted researchers. We continue monthly collection of traces at the Equinix facilities.

      • We improved our CoralReef based traffic report generator that produces publicly accessible realtime reports from the data we capture on traffic monitors. We also publish observed packet size distributions.


    Major Milestones

    Student Involvement

    CAIDA hosted visits of graduate students Alberto Dainotti and Maurizio Dusi who worked on various research tasks for the CONMI project.

    Funding Sources

    The CONMI project received support from the NSF grant (CRI 05-51542) "Toward Community-Oriented Network Measurement Infrastructure."

    PREDICT: Network Traffic Data Repository to Develop Secure IT Infrastructure


    The Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT) was designed to provide sensitive security datasets to qualified researchers, while preserving privacy and preventing data misuse. PREDICT seeks to provide a secure technical and policy framework to process applications for data sharing from network providers that include tools for collection, processing, and hosting of data that PREDICT makes available through the program as well as secured infrastructure to support serving datasets to researchers.


    CAIDA's involvement in the PREDICT effort included assisting with development of background pieces of the project, from iterating on NDAs to deploying measurement infrastructure to curation of data. CAIDA acts as a data provider and a data hosting site, serving denial-of-service backscatter data, Internet worm data, Network Telescope data, and IP topology data to approved researchers. We also hired part-time an attorney with experience in cyberlaw, who gave us feedback on our data descriptions and IRB application, described below.

    1. CAIDA continued delivering both active and passive data as a PREDICT data provider. Our Macroscopic Topology data deliverable is the IPv4 Routed /24 Topology Dataset collected on Ark infrastructure (which has replaced previous skitter-based measurements). Our passive data deliverables are monthly hour-long traffic traces captured by two monitors (four hosts) on two bidirectional links, one from Seattle, WA to Chicago, IL and another from San Jose, CA to Los Angeles, CA We clean, anonymize, and distribute these traces to researchers.

    2. In October 2008, we submitted an application to the UCSD Human Research Protections Program (HRPP) office requesting review of our research protocol by the campus Institutional Review Board (IRB) . The application covered the general traffic and other data analysis work we have done for the last 10 years, not including any research involving payload (which we define as anything past the TCP/IP header). Although we expected it to go to a full panel review, our application was given expedited review and approved within 10 days. Since we would like to begin a longer conversation with our IRB regarding appropriate conduct during network research, we plan to submit a follow up application that will propose privacy-respecting payload analysis and we will ask that the application specifically get a full panel review.

    3. We wrote a report describing the landscape of anonymization tools for network data, "summary of anonymization best practice techniques" and created an anonymization bibliography on our web site.

    4. We participated in the first ACM Workshop on Network Data Anonymization (NDA 2008) which convened in Washington DC in association with the 15th ACM Conference on Computer and Communications Security (CCS). The workshop focused on the theory and practice of anonymization as it applies to network data for use by the Internet measurement research community and operators deploying network measurement technologies. PI Dr. Claffy moderated a panel discussion on Economic, Ownership, and Trust Issues in Network Data Sharing. Workshop participants seemed to agree on the need for two documents:

      1. An ethics-based code of conduct for the network measurement community. PREDICT will host a workshop in 2009 to discuss this topic.
      2. A case for legislative change. All three attorneys at the workshop echoed the belief that by the time Internet measurement-related legislation comes up for revision, the network measurement community better have a compelling story for what we want changed and why.

    5. As a result of internal assessment of the research utility and use of the anonymized telescope data balanced against the (small) privacy risk and (large) cost of upkeep, CAIDA decommissioned the current network telescope data collection infrastructure on October 13, 2008 but then had to immediately kickstart it again because of the onset of the Conficker worm. With much help from Professor Stefan Savage and his team in CSE, we are re-implementing the network telescope with a fresh research agenda, data collection and curation methodology, as well as new hardware, in 2009.

    6. We collaborated with Internet2 to draft a proposal for the Network Research Review Council, which would act in some ways like an IRB for the Internet2 community. Although Internet2 is faced with the same privacy, fear of lawsuits, and operational cost issues as commerical providers, we hope the NRRC can help Internet2 navigate these issues to better provide network data to researchers.

    Major Milestones

    • Improvements in data collection infrastructure
      • Completed conversion from skitter-based active measurements to the Archipelago platform .
      • Installed, configured, tested and deployed four OC192 monitors.
    • Adding Metadata to Predict Portal for Researchers
      • Submitted four quarters of the Denial-of-Service (DoS) Backscatter Datasets.
      • Submitted Denial-of-Service (DoS) Backscatter-TOCS Dataset.
    • Outreach
      • Submitted an application to our campus IRB and received expedited review and approval.
      • Published the summary of anonymization best practice techniques and anonymization bibliography.
      • Prepared a recommended reading list on Internet technology policy for students and other researchers.

    Student Involvement

    A graduate student Wolfgang John analyzed data collected on the UCSD Network Telescope looking for worm traffic.

    Funding Sources

    The PREDICT project received support from the DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection."

    DatCat: Internet Measurement Data Catalog


    CAIDA's Internet Measurement Data Catalog (IMDC) facilitates access, archiving, and long-term storage of Internet data as well as sharing Internet measurement metadata among Internet researchers. Since its launch in June 2006, the catalog has received contributions of metadata for over 100 collections indexing 150,000+ files totaling over 26TB of data. Funding for the project ended in 2006, but we still hope to add some usability features: (1) incorporate extensive user feedback into development of a streamlined contribution mechanism requiring much less time from the contributor; (2) perform more detailed log analysis of DatCat user behavior, and refine user interface to optimize user time searching the catalog; (3) maintain and extend the catalog with additional and newer datasets. We're making slight progress on (1) and (3); well get back to this in 2009 or 2010 if it's still considered useful.

    Major Milestones

    • New Contributions of Metadata
      • During 2008, the DatCat catalog received 37 entries documenting the metadata for collections and publications. Highlights include:
        • Collection: Day in the Life of the Internet, March 18-19, 2008 (DITL-2008-03-18)
        • Collection: CAIDA Anonymized 2008 Internet Traces Dataset
        • Collection: Mesh Routing Data Collection - Routing tables and ScanWireless Scans
        • Publication: Traceroute Probe Method and Forward IP Path Inference published 2008-10 in ACM SIGCOMM Internet Measurement Conference

    Funding Sources

    Though no longer directly funded, DatCat received some support in 2008 from:


    CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

    2008 Tool Development


    The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (10GE/OC192, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.

    We released CoralReef version 3.8.2 late in 2008.

    Anonymization Tools Taxonomy

    In late 2008, we published the Anonymization Tools Taxonomy to help those searching for tools to help perform anonymization of Internet log files and trace data. The Anonymizations Tools Taxonomy provides a summary of each tool along with pointers to more detailed information in addition to review comments, when available. We also released the Summary of Anonymization Best Practice Techniques as part of the DHS PREDICT Project.

    CAIDA Tools Download Report

    The table below displays all the CAIDA developed tools distributed via our home page at and the number of downloads of each version during 2008.

    As a change from 2007, this year's download reports do not contain accesses by spiders, crawlers, or other robots, nor does it count multiple accesses by the same downloader.

    • Currently Supported Tools

      Tool Description Downloads
      coralreef A software suite to collect and analyze data from passive Internet traffic monitors. 1157
      dsc A system for collecting and exploring statistics from DNS servers. 2,154
      dnsstat An application that collects DNS queries on UDP port 53 to report statistics. 222
      dnstop A libpcap application that displays tables of DNS traffic. 7,405
      sk_analysis_dump A tool for analysis of traceroute-like topology data. 242
      walrus A tool for interactively visualizing large directed graphs in 3D space. 3,569
      libsea A file format and a Java library for representing large directed graphs. 406
      Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on The numbers here reflect only downloads directly from, as download statistics from CPAN are not available. 127
      plot-latlong A tool for plotting points on geographic maps. 255
    • Past Tools (Unsupported)

      Tool Description Downloads
      Mapnet A tool for visualizing the infrastructure of multiple backbone providers simultaneously. 14,065
      GeoPlot A light-weight java applet creates a geographical image of a data set. 538
      GTrace A graphical front-end to traceroute. 714
      otter A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths. 535
      plotpaths An application that displays forward and reverse network path data. 136
      plankton A tool for visualizing NLANR's Web Cache Hierarchy 51


    In 2008, CAIDA captured and curated data from three primary sources of network data: 1) macroscopic topology, 2) passive traffic traces at tier1 Internet Backbone links, and 3) passive traffic traces from the UCSD Network Telescope. We derived several datasets from this data that we make pubilcly available to researchers including our AS Rank, AS adjacencies, and Router adjacencies datasets as well as several Backscatter datasets. CAIDA makes some data available to anyone without restriction. CAIDA makes a subset of its collected data available only to academic researchers and CAIDA members, with data access subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, ensure security of network infrastructure, and comply with the terms of our agreements with data providers.

    Major Milestones

    • We collected our first trace data from the equinix-chicago and equinix-sanjose passive monitors connected to tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA.
    • We deactivated skitter data collection and transitioned to our next generation topology measurement infrastructure named Archipelago (Ark) for collecting IPv4 topology data.
    • We started collecting IPv6 topology data on the Archipelago infrastructure.
    • We collected data on the Conficker worm on the UCSD Network Telescope

    Data Collected in 2008

    Data Type First date Last date Total size (on disk)
    Macroscopic Topology Measurements, IPv4 (Archipelago) 2008-01-01 2008-12-31 259 GB
    Macroscopic Topology Measurements, IPv6 (Archipelago) 2008-12-12 2008-12-31 7.6 MB
    Internet backbone Traces 2008-03-19 2008-12-17 2.0 TB
    Network Telescope 2008-01-01 2008-12-31 7.2 TB
    A Day In The Life (DITL) of the Internet - OARC 2008 2008-03-18 2008-03-19 1.9 TB
    DNS Names for IPv4 Routed /24 Topology Dataset 2008-03-01 2008-12-31 11 GB*
    DNS root/gTLD RTT Dataset 2008-01-01 2008-12-31 1.5 GB

    * Size of this dataset may vary as we store and serve a rotating window of the last 30 days for this dataset.

    Data Distributed in 2008

    We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2008, this resulted in the following datasets:

    • Publicly Available Data

      These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

    Dataset Unique visitors (IPs) Data Downloaded
    AS Rank 4200 10.8 GB
    AS Links (AS Adjacencies) 571 2.69 GB
    AS Relationships 779 11.4 GB
    Router Adjacencies 106 316 MB
    Witty Worm Dataset 139 244 MB
    AS Taxonomy 45 23.9 MB *
    Code-Red Worms Dataset 115 9.07 GB
    We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
    * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
    • Restricted Access Data

      These datasets require that users:

      • be academic or government researchers, or join CAIDA;
      • request an account and provide a brief description of their intended use of the data; and
      • agree to an Acceptable Use Policy.
    Dataset Unique visitors (usernames) Data Downloaded *
    Anonymized Internet Backbone Traces 122 3.31 TB
    Backscatter Datasets 42 1.81 TB
    (Raw Topology Traces from Archipelago infrastructure)
    58 702 GB
    Raw Topology Traces (skitter) 48 496 GB
    Witty Worm Dataset 20 144 GB
    DNS Names for IPv4 Routed /24 Topology Dataset 32 3.57 GB
    2003 Internet Topology Data Kit 48 8.10 GB
    DNS Root/gTLD server RTT Dataset 6 14.7 MB
    * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
    • Restricted Access Data Requests

      Statistics on how many requests for data access we got and how many we granted. We got 60% more requests in 2008 then in 2007, and approved 50% more requests for access to restricted datasets.

    Dataset Number of requests received Number of requests granted access
    Anonymized Backbone and Peering Link Traces 207 139
    Active Topology Trace Datasets 134 77
    Backscatter-2008 Dataset 109 52
    Witty Worm Dataset 33 21
    DNS Root/gTLD server RTT Dataset 12 7
    Totals 495 296
    • NLANR Datasets

      This data was collected by the NLANR project. When this project came to an end in July 2006, CAIDA inventoried NLANR equipment and took over temporary curation and distribution of NLANR data. CAIDA now maintains both the NLANR AMP and PMA public data repositories, on a best effort basis. Our efforts of serving this data are currently unfunded and we plan to cease serving this data in May 2009. For sponsorship or taking over hosting responsibility for this data please contact We've had significant outages and gaps in logging, which make it impossible for us to provide relevant statistics for the AMP Topology Traces.

    Dataset Unique visitors (IPs) Data Downloaded
    PMA Traffic Traces 3991 206 TB
    AMP Topology Traces -- --


    As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA hosted the 9th CAIDA-WIDE Worshop, co-hosted the SFI Workshop on Networks and Navigation, and held the 1st CAIDA-WIDE-CASFI workshop.

    Please check our web site for a complete listing of past and upcoming CAIDA workshops.

    9th CAIDA-WIDE Workshop

    The 9th CAIDA/WIDE workshop was held on January 19th and 20th, 2008 (by invitation only) in the East-West Center on the University of Hawaii campus as part of Techs in Paradise (TIP2008). The main topics presented and discussed at the workshop included: Internet measurement projects and DNS. We used the venue to discuss the upcoming 2008 Day In The Life of the Internet event and compiled a list of the top questions and data types.

    External - SFI Workshop of Networks and Navigation

    On August 4-6, 2008 Santa Fe Institute (SFI) hosted an interdisciplinary "Networks and Navigation" workshop jointly organized and supported by SFI and CAIDA. The main topics of discussion included examination of similarities between complex networks sharing small world characteristics and navigation of such networks using only local information. A deeper understanding of the origin of these locally-navigable structures would (1) clarify the role (if any) that latent metric spaces play in the navigability of networks, and potentially point to novel generative mechanisms based on such spaces, (2) point us toward novel routing algorithms and search protocols for Internet-like topologies, other communication networks, and possibly social networks, (3) shed light on the potentially different behavior of passive versus active spreading on these networks, i.e., diffusion versus search, (4) identify the relationship between navigability and other network properties, e.g., community structure, degree heterogeneity, etc.

    10th CAIDA-WIDE Workshop - 1st CAIDA/WIDE/CASFI Workshop

    The 10th CAIDA/WIDE workshop was held on August 15-16, 2008 (by invitation only) in Marina del Rey, CA. This workshop supported a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main topics presented and discussed at the workshop included: updates on active Internet measurements of Internet topology, reverse paths, DNS source port randomness, analysis of DITL 2008 data, trends in residential user traffic, and automated application signature generation for traffic identification.


    The following table contains the papers published by CAIDA for the calendar year of 2008. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

    Year Month Author(s) Title Publication
    2008 Dec
    1. Kim, Hyunchul
    2. claffy, kc
    3. Fomenkov, Marina
    4. Barman, Dhiman
    5. Faloutsos, Michalis
    6. Lee, KiYoung
    Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT)
    2008 Dec
    1. Keys, Ken
    IP Alias Resolution Techniques: Technical Report Cooperative Association for Internet Data Analysis (CAIDA)
    2008 Oct
    1. Luckie, Matthew
    2. Hyun, Young
    3. Huffaker, Bradley
    Traceroute Probe Method and Forward IP Path Inference ACM Internet Measurement Conference (IMC)
    2008 Oct
    1. Huffaker, Bradley
    2. Fomenkov, Marina
    3. claffy, kc
    Influence Maps - a novel 2-D visualization of massive geographically distributed data sets Internet Protocol Forum
    2008 Oct
    1. Castro, Sebastian
    2. Wessels, Duane
    3. Fomenkov, Marina
    4. claffy, kc
    A Day at the Root of the Internet ACM SIGCOMM Computer Communication Review (CCR)
    2008 Sep
    1. Raghavan, Veena
    2. Riley, George
    3. Jaafar, Talal
    Realistic Topology Modeling for the Internet BGP Infrastructure IEEE Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems (MASCOTS)
    2008 Aug
    1. Krapivsky, Paul
    2. Krioukov, Dmitri
    Scale-free networks as pre-asymptotic regimes of super-linear preferential attachment Physical Review E
    2008 Aug
    1. claffy, kc
    Ten Things Lawyers Should Know About Internet Research Cooperative Association for Internet Data Analysis (CAIDA)
    2008 Jul
    1. Dimitropoulos, Xenofontas
    2. Serrano, Mirian Ángeles
    3. Krioukov, Dmitri
    On Cycles in AS Relationships ACM SIGCOMM Computer Communication Review (CCR)
    2008 Jul
    1. Meinrath, Sascha
    2. claffy, kc
    The COMMONS Initiative: Cooperative Measurement and Modeling of Open Networked Systems CommLaw Conspectus
    2008 May
    1. Krioukov, Dmitri
    2. Papadopoulos, Fragkiskos
    3. Boguñá, Marián
    4. Vahdat, Amin
    Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric Spaces arXiv cond-mat.stat-mech/0805.1266
    2008 Feb
    1. Serrano, Mirian Ángeles
    2. Krioukov, Dmitri
    3. Boguñá, Marián
    Self-similarity of complex networks and hidden metric spaces Physical Review Letters


    CAIDA staff and collaborators actively attend and contribute to relevant workshops and conferences and other events to present our research and gain better understanding of Internet infrastructure, trends, topology, routing, and security. Last year, CAIDA staff presented at the ARIN meeting, DHS/SRI, the University of Chile and Jornadas Chilenas de Computación, the Internet Measurement Conference, University of Aveiro and DHS Cybersecurity PI Meeting, the Santa Fe Institute, NANOG, our own WIDE and WIDE-CASFI workshops and DNS/OARC workshops.

    The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2008. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

    Year Month Presenters(s) Title Venue
    2008 Dec
    1. Krioukov, Dmitri
    Hyperbolic geometry and scale-free topology of complex networks BCNet Workshop
    2008 Nov
    1. claffy, kc
    Internet as emerging critical infrastructure: what needs to be measured? UChile / Jornadas Chilenas de Computacion (JCCC)
    2008 Oct
    1. claffy, kc
    Internet Science: Why Wall Street and Main Street Should Care (a survey of CAIDA activities) DHS/SRI Infosec Technology Transition Council Meeting
    2008 Oct
    1. claffy, kc
    ARIN & CAIDA IPv6 Survey Summary ARIN
    2008 Oct
    1. Huffaker, Bradley
    Traceroute Probe Method and Forward IP Path Inference ACM Internet Measurement Conference (IMC)
    2008 Sep
    1. Aben, Emile
    CAIDA Passive Measurement Infrastructure MOMENT
    2008 Sep
    1. claffy, kc
    Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity PI Meeting
    2008 Sep
    1. Aben, Emile
    DITL 2008 and DatCat MOMENT
    2008 Aug
    1. Hyun, Young
    Archipelago Measurement Infrastructure: Status and Experiences WIDE-CASFI
    2008 Aug
    1. Castro, Sebastian
    Measurements of Root Server Traffic in DITL 2008 WIDE-CASFI
    2008 Aug
    1. Vest, Tom
    What Happens When IPv4 Runs Out? WIDE-CASFI
    2008 Aug
    1. Aben, Emile
    DITL 2008 and DatCat WIDE-CASFI
    2008 Aug
    1. Huffaker, Bradley
    2008 Aug
    1. Papadopoulos, Fragkiskos
    Application of Hyperbolic Embedding in Overlay Network Construction Santa Fe Institute (SFI)
    2008 Aug
    1. Moore, David
    Understanding Global Internet Health Santa Fe Institute (SFI)
    2008 Aug
    1. Aben, Emile
    CAIDA Passive Measurement Infrastructure CAIDA-WIDE-CASFI Joint Measurement Workshop
    2008 Jun
    1. Castro, Sebastian
    DITL 2008 Analysis OARC
    2008 Jun
    1. Krioukov, Dmitri
    Routing in the Internet and Navigability of Scale-Free Networks Various
    2008 Jun
    1. Castro, Sebastian
    In the search of heavy hitters OARC
    2008 May
    1. claffy, kc
    CAIDA participation in PREDICT DHS PREDICT PI Meeting
    2008 May
    1. Krioukov, Dmitri
    What we know and what we don't about the Internet University of Aveiro
    2008 Apr
    1. claffy, kc
    ARIN & CAIDA IPv6 Survey Results ARIN
    2008 Feb
    1. Wessels, Duane
    Day In The Life of the Internet 2008 Data Collection Event North American Network Operators' Group (NANOG)
    2008 Jan
    1. Aben, Emile
    Cataloging DITL data for research use WIDE
    2008 Jan
    1. Wessels, Duane
    October 2007 survey of open resolvers in the Internet WIDE
    2008 Jan
    1. Castro, Sebastian
    DNS: comparison of 2006 and 2007 snapshots WIDE
    2008 Jan
    1. Polterock, Joshua
    DITL 2007 Collection Summary WIDE
    2008 Jan
    1. Wessels, Duane
    Lessons from DITL 2007 - and what we should do different in 2008 WIDE
    2008 Jan
    1. Castro, Sebastian
    Comprehensive approach to the analysis of DITL DNS data WIDE
    2008 Jan
    1. Huffaker, Bradley
    IPv6 Collection 2008 a View of the IPv6 Networks WIDE
    2008 Jan
    1. Polterock, Joshua
    Bulk DNS Lookup Service WIDE
    2008 Jan
    1. Wessels, Duane
    DNS nameserver database at OARC WIDE

    Web Site Usage

    In 2008, CAIDA's web site continued to attract considerable attention from a broad, international audience. Visitors seem to have particular interest in CAIDA's tools and analysis.

    The table below presents the monthly history of traffic to for 2008. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.

    Month Unique visitors Number of visits Pages Hits Bandwidth (GB)
    Jan 2008 40,897 72,619 307,839 1,362,935 120.67 GB
    Feb 2008 44,872 74,938 249,354 1,437,310 58.29 GB
    Mar 2008 42,439 78,108 278,366 1,535,978 54.83 GB
    Apr 2008 46,375 79,438 280,187 1,571,985 53.85 GB
    May 2008 46,354 78,448 272,240 1,440,674 50.09 GB
    Jun 2008 45,458 76,786 248,134 1,431,765 41.89 GB
    Jul 2008 44,671 75,530 235,541 1,427,714 42.61 GB
    Aug 2008 49,759 77,842 279,990 1,686,476 56.82 GB
    Sep 2008 48,804 72,929 272,963 1,607,294 49.06 GB
    Oct 2008 50,830 76,711 279,988 1,596,161 59.53 GB
    Nov 2008 45,743 70,342 233,295 1,367,139 52.13 GB
    Dec 2008 41,957 63,733 225,140 1,301,988 46.05 GB
    Total 548,159 897,424 3,163,037 17,767,419 685.81 GB

    Organizational Chart

    CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2008. The image below shows the functional organization of CAIDA. Please check the CAIDA Staff page for more complete information about CAIDA staff.

    [Image of CAIDA Functional Organization Chart]

    CAIDA Functional Organization Chart

    Funding Sources

    CAIDA thanks our 2008 sponsors, members, and collaborators.

    The charts below depict funds received by CAIDA during the 2008 calendar year.

    Funding Source Allocations Percentage of Total
    DHS 551,850 33.1%
    DOI 483,216 29.0%
    NSF 274,999 16.5%
    GIFT 231,942 13.9%
    CNS 62,410 3.7%
    CSE 61,503 3.7%
    Total 1,665,920 100%

    Figure 1. Allocations by funding source received during 2008

    Operating Expenses

    The charts below depict CAIDA's Annual Expense Report for the 2008 calendar year.

    LABOR Salaries and benefits paid to staff and students
    IDC Indirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services.
    SUBCONTRACTS Subcontracts to the Internet Systems Consortium (ISC), Georgia Institute of Technology, and The Measurement Factory
    TRAVEL Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
    SUPPLIES & EXPENSES All office supplies and equipment (including computer hardware and software) costing less than $5000.
    EQUIPMENT Computer hardware or other equipment costing more than $5000.
    TRANSFERS Exchange of funds between groups for recharge for IT desktop support and Oracle database services.
    Program Area Expenses Percentage of Total
    Labor 1,509,134 56.5%
    IDC 819,285 30.7%
    Subcontract 141,070 5.3%
    Travel 109,833 4.1%
    Supplies & Expenses 67,663 2.5%
    Equipment 20,758 0.8%
    Transfers 5,170 0.2%
    Total 2,672,913 100.0%

    Figure 2. 2008 Operating Expenses

    These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.

    Program Area Expenses Percentage of Total
    DNS 899,198 33.7%
    Topology 551,291 20.6%
    Infrastructure 938,373 35.1%
    Routing 209,306 7.8%
    Policy 51,483 1.9%
    Outreach 20,830 0.8%
    Total 2,670,481 100.0%

    Figure 3. 2008 Expenses by Program Area

    Last Modified