CAIDA's Annual Report for 2009
Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:
- provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
- foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
- improve the integrity of the field of Internet science,
- inform science, technology, and communications public policies.
Executive Summary
This annual report covers CAIDA's activities in 2009, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, and policy. Our infrastructure activities support several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems.
We made significant advances (again...) in Internet topology research, supported by the expanding Ark measurement infrastructure and growing interest in understanding more about the Internet's robustness, security, and scalability. We continue to share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6, and ten of our hosting sites provide IPv6 connectivity. We have developed substantial additional software to better support distributed measurement experiments. Specific to our IPv4 topology mapping project, we have taken on the task of optimizing and improving on existing techniques for IP address alias resolution for large Internet graphs, and are planning to package up and release an implementation of our algorithms next year. In 2009 we expanded the capability of other researchers to use the Ark infrastructure for independent experiments, including an extensive Internet-wide test of network filtering hygiene.
On the theoretical side of topology research, we finally published our topology modeling framework that treats annotations as an extended correlation profile of a network, which supports rescaling topologies while retaining the same (measured) annotation profile. We also advanced our exploration of geometric structure underlying Internet-like topologies as observed in our and other measurements. Specifically, hyperbolic geometry captures an important property of complex networks: exponential expansion in space. We explored even deeper connections between network topological structure (e.g., degree distribution, clustering) and physical phenomena such as curvature and temperature.
These discoveries about topology drive our routing research agenda, a long-term objective of which is to enable dramatically more scalable global Internet routing. We explored the ramifications of the discoveries we made last year regarding efficient routing on graph topologies statistically similar to those of the Internet. Based on the evidence, e.g, clustering, observable on the Internet and other complex networks, we found that underlying hyperbolic hidden metric spaces provide a natural explanation for why so many of these complex networks found in nature can achieve such phenomenally efficient (greedy) routing without distributing global topology knowledge. Since the distribution of global knowledge about network structure is perhaps the most critically limiting requirement of the current Internet interdomain routing system, we are still investigating theoretical details of a potentially radical solution to Internet routing scalability, which takes advantage of what nature knows that we do not (yet).
We undertook several traffic analysis activities, including creating a structured taxonomy of Internet traffic classification papers and their data sets, and analyzing the "Day in the Life of the Internet" 2009 data set, consisting of 24 hours of detailed DNS packet data collected at many participating root servers as well other high-profile DNS servers. We have reduced our traffic analysis activities in lieu of pursuing progress in the policy space through participation in DHS's PREDICT project (Protected Repository of Data for Internet Cyber Threats). As part of this project, we have proposed a more flexible privacy-sensitive data-sharing framework and an experiment to test it on the UCSD network telescope instrumentation next year.
We are growing the scope of our economics and policy research. We responded to several requests from Internet governance as well as U.S. government agencies for comments and guidance on policy matters. We launched a workshop series in Internet economics, to try to begin framing a research agenda for the emerging but stunted field of Internet infrastructure economics. On the theoretical side, we published an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes), which builds on the preferential attachment (PA) model but captures fundamental differences between transit and non-transit networks. This multi-class PA model predicts a definitive set of statistics characterizing the AS topology structure, closing the "measure-model-validate-predict" loop, and providing further evidence that preferential attachment is the main driving force behind Internet evolution.
Finally, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at https://www.caida.org/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.
Research Projects
Topology
Macroscopic Topology Measurements, Analysis, and Modeling
Goals
CAIDA's topology research agenda includes three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.
Activities
- Macroscopic Topology Measurements:
- We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the second full calendar year of the IPv4 Routed /24 Topology Dataset. By the end of 2009, we increased the number of vantage points to 40 Ark monitors deployed in 22 countries.
- We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2009, Ark had 10 monitors collecting the IPv6 Topology Dataset for researchers to get a view of the emerging IPv6 global topology.
- We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
- Analysis of the Observable Topology:
- We improved our measurement techniques and analysis methodologies for alias resolution inferences. We use the Ark platform and run the following three tools: kapar, iffinder and MIDAR. We then combine the outcomes in order to map IPs to routers as accurately and completely as feasible. Using publicly available data from many networks and ground-truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. Our preliminary results were submitted to ACM Computer Communications Review (CCR), and appeared ("Internet-Scale IP Alias Resolution Techniques") in the January 2010 issue.
- We continued to produce the AS-level topologies annotated with business relationships between ASes dataset on a bi-weekly basis. We use our published algorithms to infer these relationships, recognizing their directional nature, and annotate each link in an AS topology as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship.
- We created a new version of our popular AS Core Graph visualizations for both IPv4 and IPv6 address space using January 2009 data collected by Ark monitors.
- We introduced a network topology modeling framework that treats annotations as an extended correlation profile of a network. The framework includes an algorithm to rescale and construct networks of varying size that still reproduce the original measured annotation profile. These results are published in a paper "Graph Annotations in Modeling Complex Network Topologies" in ACM Transactions on Modeling and Computer Simulation (TOMACS).
- We developed an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes) -- the multi-class preferential attachment (MPA) model. All of the model parameters are measurable from available Internet topology data. Given the estimated values of these parameters, our analytic results predict a definitive set of statistics characterizing the AS topology structure that is not part of the model formulation. The MPA model thus closes the "measure-model-validate-predict" loop, and provides further evidence that preferential attachment is the main driving force behind Internet evolution. The results were published in "Evolution of the Internet AS-Level Ecosystem", presented at the First International Conference on Complex Sciences: Theory and Applications (Complex'2009).
- We established a connection between observed scale-free topologies and hidden hyperbolic geometries of complex networks. Space expands exponentially in hyperbolic geometry, and scale-free topologies emerge as a consequence of this exponential expansion. Fermi-Dirac statistics connects observed topology to hidden geometry: observed edges are fermions, hidden distances are their energies; the curvature of the hidden space affects the heterogeneity of the degree distribution, while clustering is a function of temperature. Understanding the connection between topology and geometry of complex networks contributes to studying the efficiency of their functions, and may find practical applications in many disciplines, ranging from Internet routing to brain, cell signaling, or protein folding research. We published the paper "Curvature and Temperature of Complex Networks" in Physical Review E.
- We showed that the global structure of some real networks is statistically determined by the distributions of local motifs (small building blocks of complex networks) of size at most 3, once we augment motifs to include node degree information. We applied our analysis to various complex networks, such as: a social web of trust, protein interactions, scientific collaborations, air transportation, the Internet, and a power grid. In all cases except the power grid, random networks that maintain the degree-enriched connectivity profiles for node triples in the original network reproduce all its local and global properties. Therefore, network topology generators are guaranteed to reproduce essential local and global network properties as soon as they reproduce 3-node connectivity statistics. Our results are published on our web site ("How Small Are Building Blocks of Complex Networks") and in arxiv.
Major Milestones
- Research results
-
- The paper Graph Annotations in Modeling Complex Network Topologies was published in ACM Transactions on Modeling and Computer Simulation (TOMACS), v.19, n.4, 17, 2009.
- The paper Curvature and Temperature of Complex Networks was published in Physical Review E, v.80, 035101(R), 2009.
- The paper How Small Are Building Blocks of Complex Networks was published in arXiv physics.soc-ph/0908.1143v1.
- We increased the total number of deployed Ark monitors to 40+ sites.
- We created new IPv4 and IPv6 AS Core Graph visualizations using January 2009 Ark data.
- Outreach
-
- D. Krioukov gave a talk at the First International Conference on Complex Sciences: Theory and Applications (Complex'2009).
- Y. Hyun presented a talk IP-to-Router Mapping Techniques at the ISMA workshop on Active Internet Measurements (AIMS).
- Ongoing data releases
- We made publicly available the following topology datasets:
-
- The IPv4 Routed /24 Topology Dataset from Ark measurements;
- The adjacency matrix of the observed Internet AS-level graph computed daily from Ark measurements;
- AS relationship repository where we archive, on a weekly basis, a comprehensive Internet AS-level topology enriched with AS relationship information for every pair of AS neighbors;
- bi-weekly updates of AS-ranking data
- The Routeviews Prefix to AS mappings Dataset (pfx2as) created on a daily basis starting from 2005-05-09.
- daily files of the DNS reverse name lookups for the IPv4 core traceroute data.
- December 2009 saw the first complete year of the IPv6 Topology Dataset collection.
Funding Sources
Our topology research received support from:
- DHS Science and Technology Directorate contract (N66001-08-C-2029) "Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security"
- NSF grant (CNS-0722070) NeTS-FIND: Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates"
- NSF grant (NSF-0551542) CRI: Toward Community-Oriented Network Measurement Infrastructure"
- a University Research Program gift from Cisco Systems, Inc.
Routing
Toward Mathematically Rigorous Next-Generation Routing Protocols for Realistic Network Topologies
Goals
The primary objective of CAIDA's research in Internet routing is to develop and evaluate solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, our work in this area has profound implications for network science in other disciplines (physics, biology, chemistry, social sciences).
Activities
- We studied the process of routing information through networks as a universal phenomenon existing in both natural and man-made complex systems. In many complex networks found in nature, nodes communicate efficiently even without full knowledge of global network connectivity. We demonstrated that the peculiar structural characteristics of observable complex networks is consistent with maximizing communication efficiency when using greedy routing approaches without global knowledge. We also described a general mechanism that explains this connection between network structure and function, in "Navigability of complex networks" published in Nature Physics and given significant press coverage.
- Random scale-free networks are ultrasmall worlds since the average length of the shortest paths in networks of size N scales as lnlnN. We showed that these ultrasmall worlds can be navigated in ultrashort time. Greedy routing on scale-free networks embedded in metric spaces uses only local information yet finds asymptotically the shortest paths, direct computation of which requires global topology knowledge. Our findings imply that the peculiar structure of complex networks ensures that the lack of global topological awareness has asymptotically no impact on the length of communication paths. These results have important consequences for communication systems such as the Internet, where maintaining knowledge of current topology is a major scalability bottleneck. We published "Navigating Ultrasmall Worlds in Ultrashort Time" in Physical Review Letters. This paper received favorable press coverage in Nature, NewScientist, and PhysOrg.
- We showed that complex (scale-free) network topologies naturally emerge from hyperbolic metric spaces. The negatively curved hyperbolic spaces also ensure extremely efficient greedy forwarding on these topologies, achieving almost 100% reachability and optimal (i.e., shortest) path lengths, even under dynamic network conditions. Our findings suggest that forwarding information through complex networks like the Internet may be possible without the current overhead of routing protocols, and may also find practical applications in overlay networks for tasks such as application-level routing, information sharing, and data distribution. These results are published in "Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces" in ACM SIGMETRICS Performance Evaluation Review.
Major Milestones
- Research Results
-
- The paper Navigability of complex networks was published in Nature Physics, vol. 5, no. 1,74-80, 2009.
- The paper Navigating Ultrasmall Worlds in Ultrashort Time was published in Physical Review Letters, v.102, 058701, 2009.
- The paper Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces was published in ACM SIGMETRICS Performance Evaluation Review, v.37, n.2, 15-17, 2009.
- Outreach
-
- F. Papadopoulos gave a talk at the 11th ACM SIGMETRICS Workshop on Mathematical Performance Modeling and Analysis (MAMA)
- D. Krioukov presented slides at the Spring 2009 NSF NeTS FIND Initiative meeting.
Student Involvement
An undergraduate student Connie Liu developed informative visualizations representing hyperbolic spaces and other routing research results.
Funding Sources
Our routing research received support from:
- NSF grant (CNS-0722070) NeTS-FIND: Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates"
- DHS Science and Technology Directorate contract (N66001-08-C-2029) "Cybersecurity: L everaging the Science and Technology of Internet Mapping for Homeland Security"
- a University Research Program gift from Cisco Systems, Inc.
Traffic Analysis
Internet traffic measurement, classification, and analysis
Goals
CAIDA has a long history of passive traces acquisition and curation aimed at traffic monitoring, classification, and workload characterization. In 2009 we continued to host visiting researchers who, in collaboration with CAIDA researchers, analyzed properties of available traces.
Activities
- With the help of visiting scholar Mia Zhang, we created a structured taxonomy of Internet traffic classification papers and their data sets.
- Visiting scholars Maurizio Dusi and Wolfgang John developed a flow-based symmetry estimation tool to evaluate routing asymmetry in Internet traffic.
- Maurizio Dusi developed and tested his new tool, gt, which gathers and indexes ground truth information about passively collected network traffic. A paper describing the tool, "GT: picking up the truth from the ground for Internet traffic" was published in ACM SIGCOMM Computer Communication Review (CCR).
- Working with traffic traces from backbone links in the US and in Sweden collected over the period 2002-2009, visiting scholars Wolfgang John and Mia Zhang analyzed UDP traffic in the Internet. They found that most UDP flows use random high ports and carry few packets with little content, consistent with UDP's role in signaling protocols for increasingly popular P2P applications.
Major Milestones
- Research Results
-
- Published the paper, "GT: picking up the truth from the ground for Internet traffic" in ACM SIGCOMM Computer Communication Review (CCR) online, October 2009.
Student Participants
CAIDA hosted the following visiting graduate student scholars:
- Maurizio Dusi from the Universita di Brescia, Italy
- Mia Zhang from Beijing Jiatung University, China
- Wolfgang John from Chalmers University of Technology, Sweden
Funding Sources
Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.
- Universita di Brescia, Italy
- Beijing Jiatung University, China
- Chalmers University of Technology, Sweden
DNS
Improving the Integrity of Domain Name System (DNS) Monitoring and Protection
Goals
CAIDA researchers conduct DNS measurements and develop tools, models, and analysis methodologies for use by DNS operators and researchers.
Activities
NSF funding supporting CAIDA DNS research ended in August 2009. However, we continued collection and analysis of data from the DNS root nameservers continuing the series of annual Day-in-the-life-of-the-Internet (DITL) experiments.
- In collaboration with ISC and OARC, we held the fourth large-scale data collection event on March 30 - April 1, 2009 (DITL 2009). We captured tcpdump traces at nearly all anycast instances of the A, C, E, F, H, K, L, and M root servers as well as numerous AS112, gTLD and ccTLD domain servers. The 2009 collection spans three full days of continuous capture. This unique dataset again represents the most comprehensive measurements of the root servers to date, and provides researchers with unprecedented insight into root server workload characteristics and performance. OARC published a summary of the collection event. These data are available to the research community via the DNS-OARC. Academic researchers can participate in the DNS-OARC for free.
- We also capture tcpdump traces of these DNS queries for other potential annotations and for analysis of EDNS0, DNSSEC, and other emerging protocols.
Major Milestones
- DNS Research Update
-
- Participation in DITL 2009 large-scale simultaneous DNS root data collection event.
- kc claffy prepared and distributed CAIDA's DNS Research Updates to the attendees of the DNS RSSAC meeting in March 2009.
Student Involvement
CAIDA hosted the following visiting graduate student scholars:
- Mia Zhang, a PhD candidate from Beijing Jiatung University, China; and
- Wolfgang John, a PhD candidate from Chalmers University of Technology, Sweden.
Funding Sources
Our DNS research received support from NSF grant (SCI-0427144) DNS-ITR: "Improving the Integrity of Domain Name System (DNS) Monitoring and Protection" (though it ended early this year).
Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.
Data Sharing for Security
Goals
CAIDA recognizes the UCSD Network Telescope, a passive data collection system focused on a globally routed /8 network that carries almost no legitimate traffic, as a unique resource whose data may provide insights for network security researchers. Because we can easily separate the legitimate traffic from the incoming packets, the network telescope provides us with a monitoring point for anomalous traffic that represents almost 1/256th of all IPv4 destination addresses on the Internet.
Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, the monitor does not capture legitimate traffic, but rather communications that results from wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).
To deliver such data to the research community requires technology to accomplish the data capture and further requires policy infrastructure to protect the rights and avoid risk to stakeholders. CAIDA spent much effort in 2009 on building the policy infrastructure and data sharing framework required to enable the sharing of the data we capture with the network security researcher community.
Activities
- UCSD Network Telescope
- In line with our mission to foster a collaborative environment for data acquisition and sharing, we made Two days in November 2008 from our network telescope available to researchers.
- Reports in late November 2008 of a worm outbreak drew our attention to our telescope to look for evidence. We published a web report, "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" that includes background information on the worm, a description of the scanning behavior we observed, heuristics for determining which packets were likely associated with the Conficker worm, some animations that show the growth of TCP/445 scanning globally, and some correlations we observed with other data sources.
- Development and refinement of measurement and analysis tools for one-way unsolicited traffic monitoring.
- Data Sharing Policy Development
- We refined our strategy for using technical tools (such as anonymization of data) in conjunction with policy tools (such as limits on distribution and use of data) to enable us to share our data with the network security research community, described in ''Dialing privacy and utility: a proposed data-sharing framework to advance Internet research'', submitted to IEEE Security and Privacy for the July 2010 issue.
- With support from the DHS contract (NBCHC070133) Supporting Research and Development of Security Technologies through Network and Security Data Collection", we began work on drafting a document in the spirit of the Belmont Report that would address ethical principles and guidelines for the protection of human subjects in Information and Communications Technologies (ICT) research. In May 2009, PI kc claffy presented, The Belmont Report Overview at an SRI workshop in Arlington, VA. (As we move into 2010, we refer to this document as the Menlo Report because most of the meetings have been held in Menlo Park, CA.)
- In May, Erin Kenneally presented "Legal Issues in Network Research: Determining Content in Web-Browsing Communications" to the Department of Homeland Security Directorate for Science and Technology Workshop on Ethical Issues in Network Research.
- In October, Erin Kenneally presented the paper "An Internet Data Sharing Framework For Balancing Privacy and Utility" at Engaging Data: First International Forum on the Application and Management of Personal Electronic Information held at the Massachusetts Institute of Technology Cambridge, MA.
- In October, we submitted a paper on ``A Framework for Understanding and Applying Ethical Principles in Network and Security Research'' to the Workshop on Ethics in Computer Security Research (WECSR 2010) to take place in January 2010.
Major Milestones
- UCSD Network Telescope
-
- We released the Two days in November 2008 dataset sourced from our network telescope.
- Data Sharing Policy Development
-
- We published the paper, "An Internet Data Sharing Framework For Balancing Privacy and Utility" presented at Engaging Data: First International Forum on the Application and Management of Personal Electronic Information held at the Massachusetts Institute of Technology Cambridge, MA.
Funding Sources
Our research in data sharing for security comes from DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".
Economics, Ownership, and Trust
Goals
Our dependence on the Internet for our professional, personal, and political lives has rapidly grown much stronger than our comprehension of its underlying structure, performance limits, dynamics, and evolution. In light of recent milestones in regulatory policy, our understanding of the underlying economic forces and dynamics of the Internet is of increasing relevance.
To follow up on our several years of work studying IPv4 exhaustion and IPv6 deployment (or lack thereof) in response to RIR needs, in 2009 we offered a draft recommendation for an IPv4 exhaustion research agenda (none of which, so far as we know, have been pursued). We also responded to requests from government agencies and policymaking bodies (including the FCC, DHS, FTC) for comments and positions on inform policy with the best available empirical data. As society recognizes the need for an equitable way to pay for this new communications infrastructure, policymakers will need metrics to more effectively describe, and policies for more transparently reporting on, infrastructure penetration, performance, peering, and prices for bit transmission services.
Activities
- In March, PI kc claffy presented "Broadband Conditions" at the NTIA Broadband Technology Opportunities Program meeting. The video is also available at the meeting URL under March 23, 2009 "Show Session 1: Roundtable on Nondiscrimination and Interconnection Obligations", and a text transcript of that session 1 is available on the NTIA website. kc claffy followed up with a posting of the Top ten ($7.2B) broadband stimulus: ideal conditions
- kc claffy presented "Ten Things the FCC Should Know about the Internet" to the Federal Communications Commission in Washington D.C. on May 29, 2009.
- Early in the year, we put forth a proposal for an ICANN/RIR scenario planning exercise to conduct a more structured conversation according to established discipline of scenario planning. While this never happened, later in the year, on September 23, 2010 CAIDA, in collaboration with Georgia Tech, hosted the 1st Workshop on Internet Economics via web videoconference. The event made use of the electronic conference hosting facilities supported by the California Institute of Technology (CalTech) EVO Collaboration Network. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. We published the final report in ACM SIGCOMM Computer Communication Review (CCR), April 2010. Vol 40, no. 2, pp. 55-59.
- In October, Dmitri Krioukov presented "Evolution of the Internet Ecosystem" at the Southern California Symposium on Network Economics and Game Theory (SoCal NEGT). The paper was also presented at The First International Conference on Complex Sciences: Theory and Applications (Complex'2009), and published in the European Physical Journal B, vol. 74, no. 2, March 2010, pp. 271-278.
Major Milestones
- We presented the paper "Evolution of the Internet Ecosystem" at the First International Conference on Complex Sciences: Theory and Applications (Complex'2009), and published in the European Physical Journal B, vol. 74, no. 2, March 2010, pp. 271-278.
- We published the final report for the first Workshop on Internet Economics (WIE09) in ACM SIGCOMM Computer Communication Review (CCR), April 2010. (see above).
- kc claffy presented the slideset "Historical and Architectural Context for Traffic Management Needs Today" at the FCC's Technical Advisory Process Workshop on Broadband Network Management.
Funding Sources
Our economics research received support from:
- NSF grant (CNS-0722070) NeTS-FIND: Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates"
- NSF grant (NeTS-NR 04-540) "Toward Mathematically Rigorous Next-Generation Routing Protocols for Realistic Network Topologies"
- DHS Science and Technology Directorate contract (N66001-08-C-2029) "Cybersecurity: Leveraging the Science and Technology of Internet Mapping for Homeland Security"
- a University Research Program gift from Cisco Systems, Inc.
Infrastructure Projects
Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure
Goals
On September 12, 2007, our next generation active measurement infrastructure, Archipelago (Ark) began collecting its first production data as part of the IPv4 Routed /24 Topology Dataset, using the scamper probing tool. Ark provides the hardware and software infrastructure for the Macroscopic Topology Project and replaces the previous skitter-based infrastructure. Ark achieves greater scalability and flexibility than the previous measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to support vetted measurement tasks on a dedicated distributed platform.
Ark's uniquel design considers coordination the fundamental activity of a measurement infrastructure. Coordination allows the many pieces of the infrastructure to work together efficiently toward a common goal and is necessary to enable collaborative use of the infrastructure by multiple researchers. Archipelago utilizes Marinda, a coordination facility inspired by David Gelernter's tuple-space based Linda coordination language. Archipelago extends Gelernter's tuple space model with features needed to support a globally distributed measurement infrastructure that hosts heterogeneous measurements by a community of researchers.
Activities
- The Archipelago (Ark) Project expanded its infrastructure scope in 2009, from 30 monitors in 21 countries at the end of 2008 to 41 monitors in 25 countries at the end of 2009. We also implemented IPv6 measurements on 10 Ark boxes, and a prototyped a systemwide topology-measurement-on-demand service.
- We improved our infrastructure for meta-data annotations of Autonomous Systems and IP addresses, augmenting it with DNS data.
- Building on our study of existing state-of-the-art IP address alias resolution technology, we did research, development, and evaluation of probing and inference algorithms to resolve independent IP addresses into the same physical device (router). We are planning to publish the results of this work in 2010.
- Our team-probing application uses scamper as its primary active measurement topology tool. Developed by Matthew Luckie, it supports IPv4 & IPv6, TCP-, UDP-, and ICMP traceroutes, ping, path MTU discovery, fine-grained multiplexing of destination lists, programmatic control via socket, warts format files with more information than arts++ files including cycle start & end markers and measurement metadata (e.g., probing parameters). We contributed patches to scamper, and several software tools to make it easier to write measurement tools and servers: ScamperDataFeed, ScamperIO. We also implemented a derivative tool based on scamper to enable lighter weight measurements that can still benefit from of some of scamper's functionality.
- We implemented persistence in the Marinda tuple space, allowing us to transparently checkpoint and restart the global server without disrupting ongoing experiments. We wrote extensive Marinda installation and programming guides and shared the software with collaborators for evaluation and feedback before we release it more broadly.
- In collaboration with Rob Beverly of the Naval Postgraduate School, we developed software support to enhance the spoofer project, which used Ark to globally expand its measurement of source address validation and filtering. Using Ark's distributed infrastructure and approximately 12,000 active measurement clients, our measurements revealed little improvement over four years of measurement. 80% of the source address filters we observed were implemented a single IP hop from sources, with over 95% of blocked packets observably filtered within the source's autonomous system. Our results were published and presented at IMC2009 in ``Understanding the Efficacy of Deployed Internet Source Address Validation Filtering''.
Major Milestones
- kc claffy presented "Leveraging the Science and Technology of Internet Mapping for Homeland Security" at the DHS Cybersecurity PI Meeting at SRI Menlo Park on Sep 10, 2009.
- kc claffy presented "Internet Mapping: From Art to Science", published in the proceedings of the IEEE DHS Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH) Conference in March, 2009, pp. 205-211.
- kc claffy presented "Archipelago Measurement Infrastructure" at the Africa-Asia Forum Workshop.
Tools
CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.
2009 Tool Development
CoralReef
The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (10GE/OC192, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.
We released CoralReef version 3.8.6 in June of 2009.
CAIDA Tools Download Report
The table below displays all the CAIDA developed tools distributed via our home page at https://catalog.caida.org/software and the number of downloads of each version during 2009.
-
Currently Supported Tools
Tool Description Downloads Autofocus Tool for generating Internet traffic reports and time-series graphs. 324 CoralReef A software suite to collect and analyze data from passive Internet traffic monitors. 818 Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 150 dsc A system for collecting and exploring statistics from DNS servers. 1,570 dnsstat An application that collects DNS queries on UDP port 53 to report statistics. 199 dnstop A libpcap application that displays tables of DNS traffic. 8,944 iffinder One of several tools to perform alias resolution, to discover IP interfaces belonging to the same router. 288 sk_analysis_dump A tool for analysis of traceroute-like topology data. 164 Walrus A tool for interactively visualizing large directed graphs in 3D space. 3,410 libsea A file format and a Java library for representing large directed graphs. 353 Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 91 plot-latlong A tool for plotting points on geographic maps. 248 -
Past Tools (Unsupported)
Tool Description Downloads arts++ A binary file format for storing network data. 1647 cflowd A Netflow analysis tool 311 Mapnet A tool for visualizing the infrastructure of multiple backbone providers simultaneously. 12,215 GeoPlot A light-weight java applet creates a geographical image of a data set. 552 GTrace A graphical front-end to traceroute. 579 otter A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths. 374 plotpaths An application that displays forward and reverse network path data. 114 plankton A tool for visualizing NLANR's Web Cache Hierarchy 35
Data
In 2009, CAIDA captured and curated data from three primary sources of network data:
- macroscopic topology data with the Archipelago infrastructure,
- passive traffic traces at tier1 OC192 Internet Backbone links,
- passive traffic traces from the UCSD Network Telescope .
Major Milestones
- We released two dataset based on "raw" traces from the UCSD Telescope ( Two days in November 2008 and Three days of Conficker)
- We initiated a real-time collection of traces from the Network Telescope. This "live" dataset currently covers a two-month time window from two months ago up to the current time.
- The passive traffic traces from the equinix-chicago and equinix-sanjose monitors connected to tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA, for 2009 are made available in the CAIDA Anonymized 2009 Internet Traces dataset.
Data Collected in 2009
Data Type First date Last date Total size1 Macroscopic Topology Measurements, IPv4 (Archipelago) 2009-01-01 2009-12-31 388 GB (1.2 TB) Macroscopic Topology Measurements, IPv6 (Archipelago) 2009-01-01 2009-12-31 222 MB (784 MB) Internet backbone Traces 2009-01-15 2009-12-17 4.8 TB (8.8 TB) Network Telescope Datasets 2009-01-01 2009-12-31 96 GB (233 MB) "Live" Network Telescope Data 2009-01-01 2009-12-31 3.2 TB (6.2 TB)2 DNS Names for IPv4 Routed /24 Topology Dataset 2009-01-01 2009-12-31 4.7 GB (17 GB) 2 DNS root/gTLD RTT Dataset 2009-01-01 2009-12-31 1.2 GB 1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only.
Data Distributed in 2009
We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2009, this resulted in the following datasets:
- Anonymized 2009 Internet Traces dataset
- UCSD Telescope "Two days in November 2008" dataset
- UCSD Telescope "Three days of Conficker" dataset
- Inferred AS Relationships Dataset (Ongoing)
- AS Links Dataset (Ongoing)
- DNS Names from Topology Measurements (Ongoing)
- DNS Traces from Topology Measurements (Ongoing)
-
Publicly Available Data
These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.
Dataset Unique visitors (IPs) Data Downloaded AS Rank 3419 4.7 GB AS Links (AS Adjacencies) 361 2.40 GB AS Relationships 1336 18.2 GB Router Adjacencies 287 657.5 MB Witty Worm Dataset 94 231 MB AS Taxonomy 197 104.3 MB * Code-Red Worms Dataset 72 3.8 GB We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour. * AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
-
Restricted Access Data
These datasets require that users:
- be academic or government researchers, or join CAIDA;
- request an account and provide a brief description of their intended use of the data; and
- agree to an Acceptable Use Policy.
Dataset Unique visitors (usernames) Data Downloaded * Anonymized Internet Backbone Traces 129 4.5 TB Backscatter Datasets 39 567 GB (Raw Topology Traces from Archipelago infrastructure)35 1.0 TB Raw Topology Traces (skitter) 21 1.2 TB Witty Worm Dataset 11 71 GB DNS Names for IPv4 Routed /24 Topology Dataset 28 19.4 GB 2003 Internet Topology Data Kit 19 2.9 GB DNS Root/gTLD server RTT Dataset 1 2.7 MB * We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
-
Restricted Access Data Requests
The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.
We received about 4% more requests in 2009 then in 2008, and approved 16% more requests for access to restricted datasets. Almost 80% of the users that are granted access actually access our webservers to download data.
Dataset Number of requests received Number of users granted access Number of users that accessed data Anonymized Backbone and Peering Link Traces 242 181 151 Active Topology Trace Datasets 136 90 63 Backscatter-2008 Dataset 101 62 45 Witty Worm Dataset 28 18 14 DNS Root/gTLD server RTT Dataset 7 3 3 Totals 514 354 276
Workshops
As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.
ISMA 2009 AIMS - 1st Workshop on Active Internet Measurements
On February 12-13, 2009, CAIDA hosted the 1st Active Internet Measurements (AIMS) Workshop supporting science and policy in La Jolla, CA. This workshop sought to define priority directions of various active measurement infrastructures especially aimed at macroscopic security, stability, and performance measurements.
2nd CAIDA/WIDE/CASFI Workshop
The 2nd CAIDA/WIDE/CASFI Workshop was held on April 4-5, 2009 in Seoul, South Korea. This workshop continued a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the Workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and DNS research. The Workshop will also cover miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants.
1st Workshop on Internet Economics (WIE'09)
The 1st Workshop on Internet Economics (WIE'09) hosted by CAIDA and Georgia Tech was held on September 23, 2009 by web videoconference. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics.
Publications
The following table contains the papers published by CAIDA for the calendar year of 2009. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.
Year | Month | Author(s) | Title | Publication |
---|---|---|---|---|
2009 | Nov |
|
Understanding the Efficacy of Deployed Internet Source Address Validation Filtering | ACM Internet Measurement Conference (IMC) |
2009 | Oct |
|
The Workshop on Active Internet Measurements (AIMS) Report | ACM SIGCOMM Computer Communication Review (CCR) |
2009 | Oct |
|
Graph Annotations in Modeling Complex Network Topologies | ACM Transactions on Modeling and Computer Simulation (TOMACS) |
2009 | Oct |
|
GT: picking up the truth from the ground for Internet traffic | ACM SIGCOMM Computer Communication Review (CCR) |
2009 | Oct |
|
An Internet Data Sharing Framework For Balancing Privacy and Utility | Engaging Data: First International Forum on the Application and Management of Personal Electronic Information |
2009 | Sep |
|
Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces | ACM SIGMETRICS Performance Evaluation Review |
2009 | Sep |
|
On Curvature and Temperature of Complex Networks | Physical Review E |
2009 | Sep |
|
How Small Are Building Blocks of Complex Networks | arXiv physics.soc-ph/0908.1143 |
2009 | Jul |
|
Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces: Technical Report | Cooperative Association for Internet Data Analysis (CAIDA) |
2009 | Mar |
|
Internet Mapping: from Art to Science | IEEE DHS Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH) |
2009 | Feb |
|
Navigating ultrasmall worlds in ultrashort time | Physical Review Letters |
2009 | Feb |
|
Evolution of the Internet AS-Level Ecosystem | International Conference on Complex Sciences (Complex) |
2009 | Jan |
|
Navigability of Complex Networks | Nature Physics |
Presentations
The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2009. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.
Year | Month | Presenters(s) | Title | Venue |
---|---|---|---|---|
2009 | Dec |
|
Historical and Architectural Context for Traffic Management Needs Today | FCC Technical Advisory Process Workshop |
2009 | Nov |
|
Archipelago Measurement Infrastructure | Africa-Asia Forum Workshop |
2009 | Oct |
|
Hyperbolic geometry of complex networks | USC Center for Applied Mathematical Sciences Colloquia |
2009 | Oct |
|
An Internet Data Sharing Framework For Balancing Privacy and Utility | Engaging Data: First International Forum on the Application and Management of Personal Electronic Information |
2009 | Oct |
|
Evolution of the Internet Ecosystem | Southern California Symposium on Network Economics and Game Theory (SoCal NEGT) |
2009 | Sep |
|
Leveraging the Science and Technology of Internet Mapping for Homeland Security | DHS Cybersecurity PI Meeting |
2009 | Aug |
|
CAIDA participation in PREDICT | DHS PREDICT PI Meeting |
2009 | Jun |
|
Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces | ACM SIGMETRICS MAMA |
2009 | Jun |
|
dK-series and hidden hyperbolic metric spaces | Telefonica Research |
2009 | Jun |
|
Belmont Report Overview | DHS Directorate for Science and Technology Workshop |
2009 | May |
|
Legal Issues in Network Research: Determining Content in Web-Browsing Communications | DHS Directorate for Science and Technology Workshop |
2009 | May |
|
Ten Things the FCC Should Know about the Internet | Federal Communications Commission (FCC) |
2009 | Apr |
|
CAIDA's Topology Updates and Analysis | WIDE-CASFI |
2009 | Apr |
|
CAIDA Report | WIDE-CASFI |
2009 | Apr |
|
DatCat: Lessons Learned | WIDE-CASFI |
2009 | Apr |
|
netmap: the mini-Ark project | WIDE-CASFI |
2009 | Apr |
|
State of the Art in Traffic Classification | WIDE-CASFI |
2009 | Apr |
|
A Measurement-Based Study of Xunlei | WIDE-CASFI |
2009 | Apr |
|
Hidden Metric Spaces and Navigability of Complex Networks | NeTS FIND |
2009 | Mar |
|
DNS Research Update from CAIDA | DNS RSSAC Meeting |
2009 | Mar |
|
Broadband Conditions | NTIA Broadband Technology Opportunities Program |
2009 | Mar |
|
Internet Visualization with Walrus | Simulation Interoperability Workshop |
2009 | Feb |
|
Evolution of the Internet AS-Level Ecosystem | International Conference on Complex Sciences (Complex) |
2009 | Feb |
|
IP-to-Router Mapping Techniques | ISMA Workshop on Active Internet Measurements (AIMS) |
2009 | Feb |
|
Archipelago Measurement Infrastrucure: Updates and Analyses | ISMA Workshop on Active Internet Measurements (AIMS) |
2009 | Jan |
|
Conficker | ISOI |
Web Site Usage
In 2009, CAIDA's web site continued to attract considerable attention from a broad, international audience. The portion of the increased traffic in the earlier half of the year can be attributed to attention following our release of the "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" web report.
The table below presents the monthly history of traffic to www.caida.org for 2009. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.
Month | Unique visitors | Number of visits | Pages | Hits | Bandwidth (GB) |
---|---|---|---|---|---|
Jan 2009 | 38,521 | 64,600 | 203,654 | 1,155,049 | 40.14 GB |
Feb 2009 | 41,436 | 67,709 | 208,371 | 1,246,721 | 46.61 GB |
Mar 2009 | 42,996 | 70,676 | 229,058 | 1,338,538 | 64.77 GB |
Apr 2009 | 40,486 | 64,965 | 209,267 | 1,183,418 | 44.84 GB |
May 2009 | 33,777 | 57,081 | 201,441 | 1,013,896 | 37.83 GB |
Jun 2009 | 34,253 | 57,040 | 174,455 | 1,105,566 | 43.86 GB |
Jul 2009 | 31,552 | 54,348 | 178,023 | 954,572 | 38.93 GB |
Aug 2009 | 33,145 | 54,392 | 170,597 | 1,040,296 | 37.07 GB |
Sep 2009 | 36,824 | 62,205 | 200,598 | 1,034,684 | 42.82 GB |
Oct 2009 | 39,742 | 67,003 | 220,984 | 1,080,407 | 52.81 GB |
Nov 2009 | 35,262 | 58,451 | 254,290 | 1,007,644 | 46.47 GB |
Dec 2009 | 31,377 | 53,110 | 204,633 | 868,402 | 35.87 GB |
Total | 439,371 | 731,580 | 2,455,371 | 13,029,193 | 532.02 GB |
Organizational Chart
CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2009. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.
CAIDA Functional Organization Chart
Funding Sources
CAIDA thanks our 2009 sponsors, members, and collaborators.
The charts below depict funds received by CAIDA during the 2009 calendar year.
Funding Source | Allocations | Percentage of Total |
---|---|---|
DHS | 331,816 | 31% |
DOI | 488,830 | 46% |
NSF | 17,160 | 2% |
GIFT | 200,000 | 19% |
CSE | 27,716 | 3% |
Total | 1,065,522 | 100% |
Figure 1. Allocations by funding source received during 2009.
Operating Expenses
The charts below depict CAIDA's Annual Expense Report for the 2009 calendar year.
LABOR | Salaries and benefits paid to staff and students |
IDC | Indirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services. |
SUBCONTRACTS | There were no subcontracts in 2009. |
TRAVEL | Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment. |
SUPPLIES & EXPENSES | All office supplies and equipment (including computer hardware and software) costing less than $5000. |
EQUIPMENT | Computer hardware or other equipment costing more than $5000. |
TRANSFERS | Exchange of funds between groups for recharge for IT desktop support and Oracle database services. |
Program Area | Expenses | Percentage of Total |
---|---|---|
Labor | 1,413,587 | 60% |
IDC | 769,474 | 33% |
Subcontract | 0 | 0% |
Travel | 62,427 | 3% |
Supplies & Expenses | 69,550 | 3% |
Equipment | 27,586 | 1% |
Transfers | 6,046 | 0% |
Total | 2,293,238 | 100.0% |
Figure 2. 2009 Operating Expenses
These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.
Program Area | Expenses | Percentage of Total |
---|---|---|
DNS | 407,447 | 17.3% |
Infrastructure | 909,442 | 38.7% |
Routing | 399,360 | 17.0% |
Policy | 36,133 | 1.5% |
Topology | 550,304 | 23.4% |
Outreach | 45,985 | 2.0% |
Total | 2,348,671 | 100.0% |
Figure 3. 2009 Expenses by Program Area