Mission Statement: The Center for Applied Internet Data Analysis (CAIDA) is an independent analysis and research group based at the University of California's San Diego Supercomputer Center. CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on:
- collection, curation, analysis, visualization, dissemination of sets of the best available Internet data,
- providing macroscopic insight into the behavior of Internet infrastructure worldwide,
- improving the integrity of the field of Internet science,
- improving the integrity of operational Internet measurement and management,
- informing science, technology, and communications public policies.
CAIDA is actively engaged in the following three main program areas:
|I||Measurements and Infrastructure||Create state-of-the art infrastructure for measurements, data procurement, and curation; conduct measurements for comprehensive characterization of the Internet|
|II||Research and Analysis||Analyze and model pertinent features and trends of current Internet usage, develop novel approaches to enable future Internet growth|
|III||Data Collections||Provide best available datasets and associated analysis tools to the research community|
For the last 20 years UC San Diego's Center for Applied Internet Data Analysis (CAIDA) has been developing data-focused services, products, tools and resources to advance the study of the Internet, which has permeated disciplines ranging from theoretical computer science to political science, from physics to tech law, and from network architecture to public policy. As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, is gathering increasing attention by researchers, operators and policy makers, all of whom bring questions that they lack the capability to answer themselves. CAIDA has spent years cultivating relationships across disciplines (networking, security, economics, law, policy) with those interested in CAIDA data, but the impact thus far has been limited to a handful of researchers. The current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet.
On a more operational dimension, large-scale Internet cyber-attacks and incidents -- route hijacking, network outages, fishing campaigns, botnet activities, large-scale bug exploitation, etc. -- represent a major threat to public safety and to both public and private strategic and financial assets. Mitigation and recovery, as well as prevention of further attacks of similar nature, are often impeded by the fact that such events can remain unnoticed or are hard to understand and characterize. Because of their macroscopic nature, identifying such events and understanding their scope and dynamics requires: (a) combining data of different type and origin; and (b) teamwork of experts with varied background and skills; (c) agile tools for rapid, cooperative, interactive analysis.
These two infrastructure research challenges will require high performance research infrastructure, and CAIDA will embark on a new stage in our infrastructure development endeavors to support these challenges, re-using and sharing software and data components wherever possible. We will integrate existing as well as develop new measurement and analysis components and capabilities into interactive online platforms, accessible via web interfaces as well as APIs. These novel developments will enable researchers from various disciplines including non-networking experts to access and productively use Internet data, thus advancing more complex and visionary scientific studies of the Internet ecosystem. We hope these efforts will enable us and others to widen access to and utility of the best possible Internet measurement data available to research, operational, and policy communities worldwide.
On the research side, we will continue our Internet cartography efforts, improving our IPv4 and IPv6 topology mapping capabilities, and our ability to measure and analyze interdomain congestion. We will also continue development our of Internet Topology Data Kit (ITDK) data sets, but shift our focus to simplified versions of the data and visual interfaces that are easier for researchers to use. We will undertake a new project that studies topological weaknesses from a nation-state security and stability perspective. We will explore implications of these analysis for network resiliency, economics, and policy. And in the intersection between research and infrastructure, we will collaborate on a new research project that explores an ambitious new way of designing measurement infrastructure platforms to facilitate broader deployment and sharing of nodes across scientific experimenters.
As always, we will lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Our outreach activities will include peer-reviewed papers, workshops, blogging, presentations, educational videos, and technical reports.
Note that not all of the activities described in this program plan are fully funded yet; we are seeking additional support to enable us to accomplish our ambitious agenda.
This program plan outlines CAIDA's anticipated activities for 2018-2022, in the areas of research, infrastructure, data collection and analysis to support these goals. Our annual reports are at https://www.caida.org/home/about/annualreports/. This program plan is available at https://www.caida.org/home/about/progplan/. Feedback and questions are welcome at info at caida.org.
Our next generation of Internet measurement and data analytics infrastructure will focus on system integration of existing and evolving components into user-friendly platforms to enable new scientific directions, experiments, and data products. Our new PANDA project builds on input from dozens of researchers over seven years of interdisciplinary workshops (networking, security, economics, and policy), and our design plan supports its use, evaluation, extensibility, and sustainability. In parallel, and leveraging some of the same software components, we are pursuing an even more ambitious objective of developing a new operational research environment that will facilitate interactive exploration of live and historic macroscopic Internet data from various sources. We will continue to support our active and passive measurement platforms that will serve as components to these system integration efforts.
PANDA - Integrated Platform for Applied Network Data Analysis
The proposed system will integrate active Internet measurement capabilities, multi-terabyte data archives, and heavily curated data sets revealing coverage and business relationships, and traffic measurements that represent, for many researchers, the holy grail of scientific sources of information about Internet structure and dynamics. It will enable a broad set of researchers to access, query, visualize, and analyze Internet data, as well as create new data products in ways that promote valid interpretations of data and derived inferences. We will also develop new visualization tools to allow non-experts to understand various aspects of Internet structure, using geographic and economic annotations on the data, with access controls where appropriate for sensitive data.
Our platform development goals include:
- improve existing PANDA components
- link existing components into a multifunctional platform
- integrate new external data infrastructure building blocks into PANDA
- include data from home routers participating in the FCC Measure Broadband America program into MANIC
- incorporate large scale active measurements of DNS by OpenINTEL project
- integrate user traffic data from home networks with BGP-aware and IXP-aware functionality
Our community development goals include:
- increase community accessibility of unified platform and its
- improve ITDKs
- provide data products in easier-to-use, domain-specific formats
- create user-friendly interface to spoofer results accessible via PANDA
- provide support for multidisciplinary collaborations
- develop and implement a Science Gateway style interface for interactive access to PANDA
We envision HI-CUBE as a web-based private/public interactive collaborative platform used by experts to analyze diverse sets of streamed Internet data with various interactive and visual tools. Building upon existing software components and datasets, it will include the following four elements: (a) web services and visual interfaces; (b) data processing and analytics to support interactive querying and anomaly detection; (c) new incident-based dataset creation and curation, and (d) deployment and community outreach to support iterative improvements. HI-CUBE will serve as an extensible lab for testing cooperative analysis and fusion of diverse Internet data in the context of cyber-security analytics. Ultimately, this platform promises to improve our ability to identify, monitor, and mitigate the infrastructure vulnerabilities that threaten the security and reliability of the nation's communication capabilities.
Our goals for this area include:
- enable combination and correlation of diverse Internet cybersecurity data
- structure an exploratory data architecture around a set of common dimensions: time and Internet coordinates, which identify network elements and their properties
- enable incident detection and investigation
- prototype a collaborative environment as a social web-based platform, including authentication and authorization framework
Archipelago (Ark) is CAIDA's active measurement infrastructure, consisting of a central server at CAIDA and 200+ (and growing) monitors deployed in multiple countries on six continents. Ark represents a unique laboratory in which researchers can quickly design, implement, and easily coordinate the execution of experiments across a globally distributed set of dedicated hosts. CAIDA researchers currently employ Ark for ongoing measurements of macroscopic Internet topology and performance, gathering the largest and longest-running set of Internet topology data available to the research community.
Our goals include:
- manage and maintain existing remote Ark monitors
- expand the scale and manageability of Ark, deploying new monitors in strategic locations
- curate, archive, and distribute collected data
- continue development of alias resolution tool MIDAR toward more automatic execution
- refine Ark's measurement-on-demand web interface (Vela) to improve functionality and user experience
- Create API that integrates Ark as measurement component for both PANDA and HI-CUBE.
The UCSD network telescope is a portion of routed IP address space that sends and receives little or no legitimate traffic exists ("darkspace"). Observing unsolicited Internet traffic (or Internet Background Radiation - IBR) reaching such unoccupied address space allows visibility into a wide range of security-related events.
Our goals in this area include:
- upgrade and modernize the UCSD network telescope infrastructure
- transition the data analysis infrastructure to use NSF-funded high performance computing resources on the backend
- deploy cloud-compute support using novel virtualization features on the Comet supercomputer
- develop and deploy live packet capture and distribution software
- reduce processing complexity and automate common data analysis tasks
- organize community Workshops, and publish reports
We developed and implemented methods to accurately identify interdomain links between autonomous networks, and to analyze congestion on these links. Our technique monitors latency patterns at both ends of an interconnection link; persistently elevated latency to the far end of the link, but no corresponding elevation to the near side, can be a signal of congestion at the interdomain link. We implemented this methodology into MANIC - an adaptive system that provides a near real-time view of congestion events by managing latency probing from our set of distributed vantage points (VPs), collecting and organizing data, and presenting that data for easy analysis and visualization. To adapt to routing changes, we map interconnections continuously on each VP, and update probing with the latest topology information. The system pulls probing data from VPs and indexes it into an influxDB time-series database with a Grafana front end for interactive querying and graphing within 30 minutes of being generated.
Our goals include:
- create modules for continuous analysis of time series data to generate automatic alarms when discovering evidence of congestion
- develop a reactive system to conduct on-demand measurements triggered by alarms
- enable different types of reactive measurement tasks, such as:
- confirming the latency-based evidence of congestion by launching probes to measure loss rate
- estimating the impact on achievable throughput by running NDT tests
- estimating potential impacts to user Quality of Experience
- add Periscope and RIPE Atlas VPs to MANIC
- integrate geolocation information including our facility-level data to enable systematic analysis of congestion inferences between networks
- re-architect the influxDB database instance to work on a cluster of XSEDE nodes to improve scalability and support many concurrent users and queries
- cross-correlate MANIC results with our AS-relationship data
- correlate MANIC results with outage information from various sources (IODA, NANOG, Twitter feeds)
- open up the data exploration interface to external users
Challenges in Internet outage identification are exacerbated by the heterogeneity of disruption events and their characteristics: many factors can trigger an outage, ranging from human error such as misconfiguration, government-mandated shutdowns, and cyber-attacks, to cable cuts, network failures, natural disasters, power outages, etc. Our previously developed proof-of-concept IODA system demonstrated that combining three data sources - Internet Background Radiation, BGP routing information, and active probing - improves coverage, increases the confidence level of outage inferences, and helps classify disruption events and reveal their root cause. In our next phase, we will try to formalize metric-based definitions of targeted events and quantitative objectives in terms of accuracy and coverage. We will enable visual interfaces for the inspection and correlation of current and historical data, and evaluate the system capabilities and limitations in the real world. Ultimately, our objective is to deploy this framework as a near-realtime capability for 24/7 monitoring of Internet outages affecting large geographic regions and/or specific Internet operators.
Our goals include:
- new and improved methodologies for collection, sanitization, aggregation, analysis, and reporting of macroscopic connectivity disruptions in near-realtime
- develop document APIs to access live stream alerts generated by the system
- develop interactive dashboards for targeted inspection of detected outage signals
- deploy and evaluate performance and usability of system in cooperation with a partner entity
Despite forged source IP addresses (spoofing) being a known vulnerability for at least 25 years, and despite many efforts to shed light on this problem, spoofing remains a viable exploit method enabling redirection, amplification, and anonymity in Distributed Denial-of-Service (DDoS) attacks. Fixing this problem requires operators to ensure their networks block packets with spoofed source IP addresses, a best current practice (BCP) known as source address validation (SAV - BCP38). However, a network that deploys SAV primarily helps other networks (not itself), a classic tragedy of the commons in the Internet. To provide objective data on the status of BCP38 compliance, we developed and support an open-source client-server system Spoofer for testing deployment of source address validation. When installed on a networked computer running Windows, MacOS, or UNIX-like OSes, the client periodically tests a network's ability to both send and receive spoofed packets, including private and neighboring addresses, and sends results to the central server at CAIDA. Measuring whether appropriate inbound filtering of potentially malicious spoofed traffic exists improves user incentives to run the tests since this best practice directly affects them. We produce reports, remediation analyses, and visualizations that help inform operators, response teams, and policy analysts.
Minimizing Internet's susceptibility to spoofed DDoS attacks is the overarching goal of this project. Other goals include:
- develop software client for deployment in resource-constrained open-source home routers
- analyze characteristics of networks deploying SAV (e.g., country of governance, network location, business type)
- correlate BCP38 (non)compliance with network appearances in various security reputation-based blacklists of networks observed engaging in malicious activity
Many activities in large-scale network empirical analysis, modeling, security, policy, and architecture development require access to real traffic data, including from core Internet backbone links. We collected traffic from backbone links in Internet exchange points in San Jose, CA and Chicago, IL from 2008 until 2016 when the links were upgraded to a higher speed than our equipment could handle. Fortunately, in March 2018 we were able to resume our traffic collection on a link in an Internet exchange point in New York, NY. We continue to support efforts to enable Internet traffic measurements and privacy-respecting sharing.
- upgrade monitoring equipment to 100GB links
- find 100GB backbone links to deploy monitors and collect and sanitize (anonymize) traffic data
- archive collected data and share with approved researchers
CAIDA pursues research activities spanning various domains related to Internet science and engineering. We seek to characterize fundamental behavior of the Internet as an evolving complex system and predict -- and in some cases design -- salient aspects of future evolution. Internet science has become a fundamentally interdisciplinary endeavor -- requiring consideration of economic, policy, regulatory, and international relations factors. We are excited to continue several existing projects on Internet structure, performance, and economic factors, and start an entirely new one that investigates the susceptibility of the Internet topology to country-level connectivity disruption and manipulation.
Internet cartography is emerging as its own discipline, essential to characterizing this critical infrastructure and understanding its macroscopic properties, dynamic behavior, performance, and evolution. Maps are also crucial for realistic modeling, simulation, and analysis of the Internet and other large-scale complex networks. These maps can be constructed for different layers (or granularities), e.g., fiber/copper cable, IP address, router, Points-of-Presence (PoPs), autonomous system (AS), ISP/organization. We have demonstrated the utility of maps at these layers to powerfully inform and calibrate vulnerability assessments and situational awareness of critical network infrastructure. ISP-level topologies, sometimes called AS-level or interdomain routing topologies offer a baseline against which to interpret other topology layers and reveal insights into technical, economic, policy, and security needs of the still largely unregulated peering ecosystem. CAIDA has conducted and shared measurements of the Internet macroscopic topology since 1998. Our current tools (since 2007, scamper deployed on the Archipelago measurement infrastructure) track global IP level connectivity by sending probe packets from a set of source monitors to millions of geographically distributed destinations across the IPv4 address space. Since 2008, we have continuously probed IPv6 address space as well.
Our goals in this area include:
- improve the accuracy of our derived router-level maps of the Internet, for both IPv4 and IPv6 address spaces, by improving IP address alias resolution techniques, and conducting large scale alias resolution probing measurement experiments
- improve the completeness of our maps by integrating traceroute-based AS-level Internet topologies from all available sources
- add more detailed economic, geographic, and infrastructure annotations to our maps by conducting novel measurement experiments, improving our inference methodologies, and developing and supporting user-friendly interactive validation functionality
- infer congestion at interconnection points using combinations of various methods
- create an AS-traceroute measurement tool that integrates control path and forward path data in real-time
- create informative visualizations of large-scale network topology measurements
- create software to analyze statistical properties of router-level Internet graphs, including awareness of data idiosyncrasies that prevent standard statistical computations
- improve the usability of our Internet Topology Data Kits: add annotated estimates of the number of routers and aliases observed in a given data set, filter out artifacts of data that inhibit insight, identify falsely inferred AS links
- create a simplified version of ITDK with multiple origin ASes, AS loops and sets, and hyperlinks removed
As the global Internet expands to meet demand, profound changes are occurring in its interconnection structure, traffic dynamics, and the economic and political power of different players in the ecosystem. These changes not only impact network engineering and operations, but also shift the balance of power among players in the Internet ecosystem. These shifts are attracting regulatory interest and presenting broader challenges for technology investment, future network design, public policy, and scientific study of the Internet itself. CAIDA researchers study not only technical, but economic and policy aspects of the Internet, developing measurement tools and modeling methodologies to investigate questions related to infrastructure resiliency, emerging trends, network economics, and public policy.
Our goals in this area include:
- explore implications of congestion inference results for network resiliency, economics, policy, and science
- characterize IPv4 address transfer market using IP-level, BGP, and DNS data
- develop a framework for assessment of consumer harms on the Internet and metrics to support their analysis
Systematic monitoring of global Internet behavior remains a disturbingly elusive priority. We have no rigorous framework for measurement, analysis, or quantifying the impact of abnormal connectivity dynamics on a global scale, nor metrics for assessing vulnerabilities of a network's connectivity to attacks, country-level censorship or natural disasters. Detecting, understanding and quantifying the impact of such events requires integration of heterogeneous data types that capture different dimensions of the phenomenon.
Our goals in this area include:
- identify topological vulnerabilities for specific countries/regions
- study the evolution of the topology and topological weaknesses of countries/regions over time
- study methods to detect and characterize BGP hijacking events, including extent, frequency, and impact
- provide hijacking-related datasets to researchers
- analyze and validate metrics for use in detecting and characterizing these vulnerabilities
Research groups invest considerable effort to secure access to diverse end hosts and operate them as measurement endpoints. The result has been a proliferation of Internet measurement platforms with different underlying architectures, implementations, functionalities, APIs, and user bases. To run experiments on these platforms at scale, outside researchers and platform operators must overcome incompatibility, incentive, and trust issues with platform operators.
A possible new approach to facilitating sharing of infrastructure to support network measurement is by providing a lightweight, universal interface (we call it PacketLab) to existing measurement endpoints. The idea is to give control to platform operators over how their nodes are used, and make it easy to expose existing and new network vantage points to the measurement community. For experimenters, PacketLab will provide a single interface to multiple measurement platforms, so that researchers can develop and test their experiments once and then run them on any endpoint exporting the PacketLab interface.
In collaboration with the University of Illinois Urbana-Champaign researchers, we will investigate architecting such a novel measurement infrastructure and its universal measurement endpoint interface. We will contribute to architecture design discussions, software development, data collection, and data analysis.
CAIDA is reknown worldwide as the leader in conducting macroscopic Internet measurements to support scientific research. Scientific data collection, tool development, and analysis are among CAIDA's core objectives. We continually seek better technology and methods to meet the challenges of Internet measurements. We continue to grow our ongoing data collections, host and share data sets that were either one-time measurements or terminated collections, and regularly release data associated with published research studies. Many of our data sets are available for public download. Please see the Data Overview page for a complete list of our current data offerings. We are always interested in, and regularly request, feedback from researchers on what Internet data is required to support their research.
Privacy-Sensitive Data Sharing
Motivation: Concerns regarding end-user privacy and potential risks stemming from unauthorized or unintended data disclosure present daunting challenges to researchers looking for access to real world Internet data. For over 20 years, CAIDA has been navigating the numerous challenges of collecting, coordinating, curating, and sharing data sets for the network research and operational communities in support of Internet science. We have been providing vetted researchers with network operational data in a secure and controlled manner that respects the privacy, legal, and ethical concerns of Internet users and network operators.
CAIDA data sharing experience and activities enabled us to become one of the founding and leading members of the Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT) program founded by the U.S. Department of Homeland Security, Science & Technology Directorate, Cyber Security Division (DHS S&T CCD). IMPACT's mission is to coordinate, enhance and develop real world data, analytics and information sharing capabilities, tools, models, and methodologies making these components broadly available as national and international resources to support the three-way partnership among cyber security researchers, technology developers and policymakers in academia, industry and the government. CAIDA participates in IMPACT as Data Provider, Data Host, and Data-Analytics-as-a-Service Provider.
Our goals in this area are:
- curate, process, and archive macroscopic Internet measurement data to support cybersecurity research and development activities
- manage, maintain, and share CAIDA data with vetted security researchers
- continue to maintain and distribute previously collected data of interest to researchers
- generate new data sets that reflect immediate threats, vulnerabilities, and hazards to critical infrastructures
- document and distribute traces from a large-ISP backbone link
- index new CAIDA data into the IMPACT catalog as they become available
- work with IMPACT portal developers to optimize the portal utility, convenience, and overall user experience
- advance the policy community's understanding of Internet research, its successes and challenges, and related data needs
- inform the current legal landscape in data collection and sharing, privacy protection mechanisms, and guidelines for addressing ethical issues in network and security research
- promote educational use of Internet data in undergraduate- and graduate-level classes
|Funded Project||Funding Agency||Period of Performance||Measurement and Infrastructure Projects||Research and Analysis|
|(NSF OAC-1724853)PANDAIntegrated Platform for Applied Network Data Analysis||National Science Foundation (NSF)||2017 - 2022||PANDA, Ark, MANIC, Spoofer||-|
|(pending)ASSISTSAdvancing Scientific Study of Internet Security and Topological Stability||Department of Homeland Security (DHS)||2017 - 2019||HICUBE, Passive Monitors, Privacy-Sensitive Data Sharing||-|
|(NSF CNS-1513283)iLENS-NPInternet Laboratory for Empirical Network Science: Next Phase||National Science Foundation (NSF)||2015 - 2019||Ark, MANIC, IODA-NP||Mapping the Internet, Security and Stability|
|(NSF CNS-1730661)STARDUSTSustainable Tools for Analysis and Research on Darknet Unsolicited Traffic||National Science Foundation (NSF)||2017 - 2020||Telescope, IODA-NP||Security and Stability|
|(DHS S&T contract HHSP 233201600012C)SISTERScience of Internet Security: Technology and Experimental Research||Department of Homeland Security (DHS)||2016 - 2018||Ark||Mapping the Internet, Security and Stability|
|(DHS S&T contract D15PC00188)SpooferSoftware Systems for Surveying Spoofing Susceptibility||Department of Homeland Security (DHS)||2015 - 2018||Spoofer||Security and Stability|
|(NSF CNS-1705024)MapkitInvestigating the Susceptibility of the Internet Topology to Country-level Connectivity Disruption and Manipulation||National Science Foundation (NSF)||2017 - 2020||-||Mapping the Internet, Security and Stability|
|(NSF CNS-1414177)CongestionMapping Interconnection in the Internet: Colocation, Connectivity and Congestion||National Science Foundation (NSF)||2014 - 2018||MANIC||Mapping the Internet, Internet Economics|
|(NSF CNS-1528148)Modeling IPv6 AdoptionA Measurement-driven Computational Approach||National Science Foundation (NSF)||2015 - 2018||-||Internet Economics|
|(NSF CNS-1513847)Economics of Contractual Arrangements for Internet Interconnections||National Science Foundation (NSF)||2015 - 2019||-||Internet Economics|
|(pending)NTT-QUINCEA reactive crowdsourcing-based QoE monitoring platform||NTT||2018||-||Internet Economics|
|(AT&T 20161289)AT&T InterconnectionMeasuring Internet Interconnection Performance Metrics||AT&T||2015 - 2019||-||Internet Economics|
|(NSF CNS-1423659)HIJACKSDetecting and Characterizing Internet Traffic Interception Based on BGP Hijacking||National Science Foundation (NSF)||2014 - 2019||-||Security and Stability|
|(pending)CIRICritical Infrastructure Resilience Institute: Quantifying Interdependencies of the Logical/Physical Internet topologies||Department of Homeland Security (DHS)||2017 - 2018||-||Security and Stability|
|(pending)PARIDINEPredict, Assess Risk, Identify (and Migrate) Disruptive Internet-scale Network Events||Department of Homeland Security (DHS)||2018 - 2020||Ark, Telescope, IODA-NP||Security and Stability|
|(pending)Mapping DNSMapping DNS DDoS Vulnerabilities to Improve Protection and Prevention||Department of Homeland Security (DHS)||2018 - 2022||-||Security and Stability|
As of April 2018, CAIDA employs 15 researchers and support staff based at SDSC; 1 remotely based staff; and 3 postdoctoral researchers. We regularly involve UCSD undergraduate students in our research, and we provide summer and/or longer term internships to graduate students and young scientists from all over the world.
Our primary sources of support are competitively awarded grants and contracts from the National Science Foundation and the Science and Technology Directorate of the Department of Homeland Security. In addition, CAIDA could not survive without the generosity of its affiliates, members, and sponsors. The following organizations have made designated gifts or provided in-kind support to CAIDA, enabling us to maximize use of research dollars: Comcast, NTT, and the Internet Society.
For further information about our Program Plan, please send a message to info at caida dot org.