CAIDA's Annual Report for 2018

A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2018.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Executive Summary

This annual report summarizes CAIDA's activities for 2018 in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet cartography, security and stability studies (of outages, performance, and vulnerabilities), economics, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem.

Internet Mapping and Performance Measurement. Most notably, we completed our NSF-funded study of interconnection congestion, which required maintaining significant software, hardware, and data processing infrastructure for years to observe, calibrate and analyze trends. We also undertook several research efforts in how to identify and characterize different types of congestion and effects on quality of experience. Our original motivation for this work was an increase in heated peering disputes between powerful players in the U.S. which raised questions about intentional degradation of performance as a business strategy to obtain (or avoid) interconnection fees. The prevalence of these public disputes dropped around the time of the FCCs 2015 Open Internet Order, in which the FCC asserted authority over interconnection, sending a signal to industry to resolve disputes or trigger regulatory oversight. However, our measurements reveal indications of persistently congested transit links, which regardless of cause implies clear motivation for large players to engage in direct peering negotiations. The most important contribution of this work was addressing this decades-long gap in an objective third-party's ability to study peering disputes in an open, objective, scientifically validated way. Especially in today's deregulatory political climate, we consider such measurement to be the most promising strategy for incentivizing good ISP behavior. Other Internet cartography studies we undertook included: extending our ability to identify interconnection boundaries; revealing the load-balancing behavior of YouTube traffic on interdomain links; tracking the topological evolution of content providers in the Internet core; analyzing the African web ecosystem; and inferring carrier-grade NAT deployment without access to a vantage point behind the NAT.

Monitoring Global Internet Security and Stability. Our activities in Internet security and stability monitoring included: surveying network operators on BGP prefix hijacking and developing approaches to quickly neutralize this threat; characterizing the Denial-of-Service ecosystems, and attempts to mitigate DoS attacks via BGP blackholing; and devising metrics to infer the influence of specific Autonomous Systems (ASes) on country-level Internet connectivity. We also continued support for the Spoofer project, including supporting the existing Spoofer measurement platform as well as developing and applying new methods to expand visibility of compliance with source address validation best practices.

Economics and Policy. We published a study on the policy implications of our interconnection measurements, which included attempts to visualize the data in ways we considered most informative to policymakers. We published a study of a game-theoretic approach to interconnection modeling. Finally, we held another lively workshop on Internet economics, where we continued the discussion on what a future Internet regulatory framework should look like. The likelihood of federal regulation is increasing, if only to mitigate the risk of dealing with a patchwork of state laws related to network management or piracy. There is an expanding awareness that if policymakers hope to rely on academic or scientific research to inform policy, there will need to be increased accuracy and disclosure of data relevant to a given question. As the ecosystem evolves, required measurements/reporting could span from metrics such as security incidents; outages; broadband availability, cost, and pricing; cloud computing capacity and traffic; consumer usage patterns; how various parties in the ecosystem use consumer data. Policymakers and academics must tie the need for these measurements to concrete harms that they would support monitoring. There is also an increasing need to identify sustainable sources of funding for independent, open, trusted measurement of the Internet, and its communication to users and policy makers.

Infrastructure Operations. We continued to operate active and passive measurement infrastructure to provide visibility into global Internet behavior, and associated software tools that facilitate network research and security vulnerability analysis for the community. We made progress on our new project to integrate and increase the accessibility of several of our data collection platforms, starting with improving AS Rank and MANIC (Measurement and Analysis of Interdomain Congestion), and creating APIs for these and other platforms. We also maintained data analytics platforms for Internet Outage Detection and Analysis (IODA) and BGP data analytics (BGPStream).

As always, we engaged in a variety of tool development, and outreach activities, including maintaining web sites, publishing 15 peer-reviewed papers, 1 technical reports, 2 workshop reports, 30 presentations, 9 blog entries. This report summarizes the status of our activities; details about our research are available in papers, presentations, and interactive resources on our web sites. We provide listings and links to software tools and data sets shared, and statistics reflecting their usage. Finally, we offer a "CAIDA in numbers" section: statistics on our performance, financial reporting, and supporting resources, including visiting scholars and students, and all funding sources.

CAIDA's program plan for 2018-2023 is available at www.caida.org/about/progplan/progplan2018/. Please feel free to send comments or questions to info at caida dot org.


Research and Analysis


Internet Mapping and Performance Measurement

We continued to improve and refine our state-of-the-art topology measurement and analytic techniques aimed at characterizing various aspects of critical Internet infrastructure. Our research advances included: identifying borders between ASes, studying interdomain congestion and load-balancing, tracking the evolution of the Internet core, analyzing specifics of a regional (African) web ecosystem, and designing a system to improve traffic delivery performance by optimizing the rich interconnection opportunities at IXPs.

Inferring Persistent Interdomain Congestion Figure. System for interdomain link discovery, active measurements, and congestion interference. (Inferring Persistent Interdomain Congestion, SIGCOMM).

Inferring Persistent Interdomain Congestion. We provided empirical grounding for discussions of interdomain congestion by developing a system and method to measure congestion on thousands of interdomain links without direct access to them. We implemented a system based on the Time Series Latency Probes (TSLP) technique that identifies links with evidence of recurring significant congestion suggestive of an under-provisioned link. We deployed our system at 86 vantage points worldwide and showed that congestion inferred using our lightweight TSLP method correlated with other metrics of interconnection performance impairment. We used our method to study interdomain links of eight large U.S. broadband access providers from March 2016 to December 2017, and validated our inferences against ground-truth traffic statistics from two of the providers. Our paper describing limitations, open challenges, and a path toward the use of this method for large-scale third-party monitoring of the Internet interconnection ecosystem received The Best Paper Award at SIGCOMM in August 2018 ( Inferring Persistent Interdomain Congestion). This publication completed our NSF-funded project Mapping Interconnection in the Internet: Colocation, Connectivity and Congestion, in collaboration with MIT Computer Science and Artificial Intelligence Laboratory -- MIT/CSAIL.

Our original motivation for this work was an increase in heated peering disputes between powerful players in the U.S. which raised questions about intentional degradation of performance as a business strategy to obtain (or avoid) interconnection fees. The prevalence of these public disputes dropped around the time of the FCCs 2015 Open Internet Order, in which the FCC asserted authority over interconnection, sending a signal to industry to resolve disputes or trigger regulatory oversight. However, our measurements reveal indications of persistently congested transit links, which regardless of cause implies clear motivation for large players to engage in direct peering negotiations.

The FCC recognized that they lacked sufficient understanding of interconnection to impose any regulations. In part to close this gap in understanding, during the next merger between an access and content provider (AT&T and DirecTV), the FCC imposed interconnection measurement and reporting conditions, for 4 years, under NDA agreements. Like other sources of interconnection data, this data tells a partial story, but in this case, a secret one. Thus, the most important contribution of this work was addressing this decades-long gap in an objective third-party's ability to study peering disputes in an open, objective, scientifically validated way. Especially in today's deregulatory political climate, we consider such measurement to be the most promising strategy for incentivizing good ISP behavior.

Pushing the Boundaries with bdrmapIT Figure. bdrmapIT's three phases: Constructing the Graph, Annotating Last Hops, and Annotating IRs and Interfaces. (Pushing the Boundaries with bdrmapIT: Mapping Router Ownership at Internet Scale, IMC).

Pushing the Boundaries with bdrmapIT: Mapping Router Ownership at Internet Scale. Two complementary approaches to mapping network boundaries from traceroute paths recently emerged: CAIDA's bdrmap and University of Pennsylvania's MAP-IT. Both approaches apply heuristics to inform inferences extracted from traceroute measurement campaigns. bdrmap used targeted traceroutes from a specific network, alias resolution probing techniques, and AS relationship inferences, to infer the boundaries between that specific network and directly connected networks. MAPIT tackled the ambitious challenge of inferring all AS-level network boundaries in a massive archived collection of traceroutes launched from many different networks. We explored the potential to combine the approaches and developed bdrmapIT, which yielded a more complete, accurate, and general solution to this persistent and central challenge of Internet topology research. bdrmapIT achieved 91.8%- 98.8% accuracy when mapping AS boundaries in two Internet-wide traceroute datasets, vastly improving on MAP-IT's coverage without sacrificing bdrmap's ability to map a single network (Pushing the Boundaries with bdrmapIT: Mapping Router Ownership at Internet Scale, IMC). The bdrmapIT source code is available from CAIDA's github repository.

Studying the Evolution of Content Providers in the Internet Core. Recent evidence indicates that the core of the Internet, which was formerly dominated by large transit providers, has been reshaped by the transition to a multimedia-oriented network, first by general-purpose CDNs and now by private CDNs. We used k-cores, an element of graph theory, to define which ASes composed the core of the Internet and to track the evolution of the core since 1999. Specifically, we investigated whether large players in the Internet content and CDN ecosystem belonged to the core and, if so, since when. We also investigated regional differences in the evolution of large content providers. We showed that the core of the Internet had incorporated an increasing number of content ASes in recent years. To enable reproducibility of this work, our collaborators provided a website to allow interactive analysis of our datasets. (Studying the Evolution of Content Providers in the Internet Core, TMA)

Revealing the Load-balancing Behavior of YouTube Traffic on Interdomain Links. For the last decade, YouTube has consistently been a dominant source of traffic on the Internet. To improve the quality of experience (QoE) for YouTube users, broadband access providers and Google apply techniques to load balance the extraordinary volume of video requests and traffic. We used traceroute-based measurement methods to infer these load-balancing techniques for assigning YouTube requests to specific Google video content caches, including the interconnection links between access providers and Google. We then used a year of measurement data (mid-2016 to mid-2017) collected from SamKnows probes hosted by broadband customers spanning a major ISP in the U.S. and three ISPs in Europe. We investigated two possible causes of different interdomain link usage behavior. We also compared the YouTube video cache hostnames and IPs observed by the probes, and found that the selection of video cache had little impact on BGP selection of interdomain links. (Revealing the Load-balancing Behavior of YouTube Traffic on Interdomain Links, PAM)

Exploring and Analyzing the African Web Ecosystem. We measured the availability and utilization of web infrastructure in Africa, finding that much popular web content was still served from the US and Europe, We discovered a lack of peering between networks hosting our measurement vantage points, preventing the sharing of CDN servers, as well as poorly configured DNS resolvers. Finally, our mapping of middleboxes in the region revealed that there was a greater presence of transparent proxies in Africa than in Europe or the U.S. (Exploring and Analysing the African Web Ecosystem, TWEB) Our new postdoc Roderick Fanou also completed publication of his graduate research, including a description of the African Routind Data Analyzer (ARDA) system he built and deployed to study interconnection in Africa. He showed how such a measurement system can help assess interconnection opportunities, policies, and impacts on traffic localization efforts. (A System for Profiling the IXPs in a Region and Monitoring their Growth: Spotlight at the Internet Frontier, IJNM)

Inferring Carrier-Grade NAT Deployment in the Wild. Given the increasing scarcity of IPv4 addresses, network operators are resorting to measures to expand their address pool or prolong the life of existing addresses. One such approach is Carrier-Grade NAT (CGN), where many end users in a network share a single public IPv4 address. The data about the prevalence of CGN is limited, despite the implications for performance, security, and ultimately, the adoption of IPv6. We used passive measurement-based techniques for detecting CGN deployments across the entire Internet, without the requirement of access to the machine behind the CGN. We identified patterns in how client IP addresses were observed at MLab servers and at the UCSD Network Telescope to infer whether those clients were behind a CGN. We found that CGN deployment increased rapidly from 2014 to 2016, with six times as many ASes using CGN as inferred by recent studies. (Inferring Carrier-Grade NAT Deployment in the Wild, IEEE INFOCOM)

Workshop on Active Internet Measurement Systems (AIMS). In March, CAIDA hosted our annual Workshop on Active Internet Measurement Systems (AIMS) at the UC San Diego Supercomputer Center. This workshop series promotes discussion between academics, industry, policymakers, and funding agencies on active Internet measurement, as well as enables exchange of research ideas and questions that have been answered, or could be answered, with existing and future measurement infrastructures. An overarching theme this year was how to inform new legislation of communications policy in the U.S.: what data is or could be measured to shape and support current and emerging policy debates. ( Workshop on Active Internet Measurements Report, CCR)


Monitoring Global Internet Security and Stability

ARTEMIS survey Figure. Survey showing results from network operators: (a) ranking of characteristics of a hijacking defense system based on their importance, (b) practices for detecting/learning about hijacking incidents against owned prefixes. (ARTEMIS: Neutralizing BGP Hijacking within a Minute, ToN).

Our activities in Internet security and stability monitoring included: surveying network operators on BGP prefix hijacking and developing approaches to quickly neutralize this threat; studying effects of BGP blackholing during DoS attacks and using passive measurement data to characterize collateral damage due to such blackholing; and devising metrics to infer the influence of specific Autonomous Systems (ASes) on country-level Internet connectivity. We also continued supporting the Spoofer measurement platform that helps asses network compliance with source address validation best practices worldwide.

A Survey among Network Operators on BGP Prefix Hijacking. Several mechanisms or modifications to BGP to protect the Internet against BGP prefix hijacking have been proposed. However, the reality is that most operators have not deployed them and are reluctant to do so. Instead, they rely on basic and often inefficient proactive defenses to reduce the impact of hijacking events, or on detection based on third-party services and reactive approaches that might take several hours to mitigate the attack. We surveyed 75 network operators to study: (a) operators' awareness of BGP prefix hijacking attacks, (b) presently used defenses (if any) against such hijacking, (c) willingness to adopt new defense mechanisms, and (d) reasons that hinder the deployment of BGP hijacking defenses. The findings of this survey increase the understanding of existing BGP hijacking defenses and the needs of network operators, as well as contribute toward designing new defense mechanisms that can satisfy the requirements of the operators -- such as ARTEMIS described above (A Survey among Network Operators on BGP Prefix Hijacking, CCR).

ARTEMIS: Neutralizing BGP Hijacking within a Minute. Existing BGP hijacking defense approaches, ranging from RPKI to popular third-party services, suffer from lack of detection comprehensiveness, limited accuracy, significant delays (up to days) with verification and mitigation of incidents, and lack of privacy and flexibility in post-hijack counteractions. In collaboration with FORTH/University of Crete, we developed ARTEMIS (Automatic and Real-Time dEtection and MItigation System), a defense approach based on accurate and fast detection operated by the AS itself, leveraging the pervasiveness of publicly available BGP monitoring services and their recent shift towards real-time streaming. Compared to previous work, our approach combines characteristics desirable to network operators enabling flexible and fast mitigation of hijacking events. We showed through real-world experiments that ARTEMIS enables an operator to neutralize a prefix hijacking attack within a minute. (ARTEMIS: Neutralizing BGP Hijacking within a Minute, IEEE/ACM ToN).

HIJACKS poster Figure. On detecting and characterizing Internet traffic interception based on BGP Hijacking. (HIJACKS Poster).
MADDVIPR project Figure. Our poster explaining our new project to map DNS DDoS vulnerabilities. (MADDVIPR poster).

On the Potential of BGP Flowspec for DDoS Mitigation at Two Sources: ISP and IXP. The blackholing approach to mitigating a DoS attack uses BGP tags to mark the (usually /32) IP prefix under attack, enabling adjacent peers to prevent overload by discarding all traffic to the victim. The IETF's BGP Flowspec standard supports more precise filtering rules for 12 different components, e.g., source and destination address, TCP flags. In collaboration with researchers from the Freie Universität Berlin, University of Twente, and HAW Hamburg, we studied DDoS traffic from an interdomain perspective, using passive measurements from a national Internet Service Provider and from a large regional Internet Exchange Point. We characterized the collateral damage that occurs while blackholing DDoS traffic, and quantified the benefits of deploying Flowspec, especially at an IXP. (On the Potential of BGP Flowspec for DDoS Mitigation at Two Sources: ISP and IXP, SIGCOMM Poster)

A First Joint Look at DoS Attacks and BGP Blackholing in the Wild. We analyzed two complementary sources of data on DoS attacks spanning from March 2015 to March 2018, to provide a longitudinal characterization of operational deployment of blackholing during DoS attacks. We found that BGP blackholing defense mechanisms can react extremely fast and appeared highly effective at protecting the targeted network. However, some blackholing events last far longer than the duration of the related attack, unnecessarily impacting the services and systems involved. (A First Joint Look at DoS Attacks and BGP Blackholing in the Wild, IMC)

Investigating the Susceptibility of the Internet Topology to Country-level Connectivity Disruption and Manipulation. At the end of 2017, we began a project focused on inferring the influence of specific Autonomous Systems (ASes) on country-level Internet connectivity. We are developing a method to identify the set of ASes with presence in a country or region, and to estimate their influence in providing connectivity to the country or region. We are considering three candidate metrics of influence: betweenness centrality, AS hegemony, and our new Aggregate Transit Influence (ATI) metric. We devised an experimental methodology to compare these metrics.

Mapping DNS DDoS vulnerabilities to improve protection and prevention (MADDVIPR). With researchers from the University of Twente, Netherlands, we launched (in December) a new project that will try to comprehensively characterize DDoS attacks targeting the DNS, and vulnerabilities that impede resilience of the DNS in the face of such DDoS attacks. An overview poster of the MADDVIPR project further explains the goals of the project and the proposed analysis system.


Economics and Policy

CAIDA researchers also study economic and policy aspects of the Internet. This year we analyzed the policy implications of our persistent interdomain congestion research, proposed a new techno-economic interconnection framework to address the root causes of peering disputes, and hosted our annual WIE workshop to continue structured conversation about measurements to inform public policy debates.

Policy Implications of Third-Party Measurement of Interdomain Congestion on the Internet Figure. Time series of TSLP latency and packet loss percentage for an interdomain link between Verizon and Google. Periods inferred as congested are shaded in gray. ( Policy Implications of Third-Party Measurement of Interdomain Congestion on the Internet, TPRC).

Policy Implications of Third-Party Measurement of Interdomain Congestion on the Internet. We developed new techniques for visualizing our interdomain congestion data in ways we believe are conducive to policy analysis, e.g, of infrastructure resilience, performance metrics, and potential consumer harm from persistently under-provisioned interconnection links. We showed how congestion varies over time, and by access provider, and described policy-relevant limitations and implications of this work. ( Policy Implications of Third-Party Measurement of Interdomain Congestion on the Internet, TPRC)

Nash-Peering: A New Techno-Economic Framework for Internet Interconnections. The current framework of Internet interconnections, based on transit and settlement-free peering relations, has systemic problems that often cause peering disputes and impair performance. We proposed a new game-theoretic interconnection framework based on Nash Bargaining, where payment is not necessarily determined by traffic flow or rigid customer-provider relationships but based on which AS benefits more from the interconnection. (Nash-Peering: A New Techno-Economic Framework for Internet Interconnections, IEEE Global Internet Symposium)

Workshop on Internet Economics (WIE). In December, CAIDA hosted the 9th interdisciplinary Workshop on Internet Economics (WIE) at the UC San Diego Supercomputer Center. To try to add clarity to a range of vigorous policy debates, and in pursuit of specific, actionable objectives, for this year's meeting the organizers used a slightly different approach to structuring the agenda. Each attendee chose a specific policy goal or harm, and structured their presentation to describe what data is needed to measure progress toward/away from this goal/harm, what methods could gather such data, and how such data should be managed and shared? Topics discussed included: analyzing the evolution of the Internet in a layered-platform context to gain new insights; measurement and analysis of economic impacts of new technologies using old tools; security and trustworthiness, reach (universal service) and reachability, sustainability of investment into Internet infrastructure, as well as infrastructure to measure the Internet. The workshop report highlights the discussions and presents relevant open research questions identified by participants. (Workshop on Internet Economics Final Report, CCR 2019)


Measurement Infrastructure and Data Sharing Projects


Platform for Applied Network Data Analysis (PANDA)

PANDA infrastructure overview Figure. Proposed architecture of the PANDA infrastructure.

For more than 20 years, CAIDA has developed many data-focused services, products, tools and resources to advance the study of the Internet. We have also spent years cultivating relationships across disciplines (networking, security, economics, law, policy) with those interested in CAIDA data, but the impact thus far has been limited to a handful of researchers. The current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet. To address this gap, we are integrating a number of existing measurement and analysis components previously developed by CAIDA into a new Platform for Applied Network Data Analysis (PANDA). Our goal is to enable new scientific directions, experiments and data products for a wide set of researchers from four targeted disciplines: networking, security, economics, and public policy. The platform will employ efficient indexing and processing of terabyte archives, provide advanced visualization tools to show geographic and economic aspects of Internet structure, and support careful interpretation of displayed results. Our initial development tasks focused on improving existing components to optimize performance and building APIs to access the data. In 2018 we re-architected our AS Rank service, upgrading to a new backend database, using a web application framework and development environment. We built similar capabilities for web-based API access to the data resulting from our MANIC (Measurement and Analysis of Interdomain Congestion) system, enabling researchers to explore and analyze the data. (announcement on CAIDA's blog.)


IODA platform

A high-level view of the architecture of IODA Figure. A high-level view of the architecture of IODA.

Our approach to detecting macroscopic Internet-edge outage events uses three data sources: Internet Background Radiation (IBR -- one-way unsolicited traffic generated by millions of Internet hosts worldwide), Border Gateway Protocol (BGP) update messages (used to exchange reachability information between Internet Service Providers), and active probing results that reveal the reachability of end-hosts. Fusing event signals extracted from these data sources increases IODA's overall accuracy and coverage. By analyzing how an event manifested itself across various data sources, we can investigate its potential underlying cause(s). The prototype IODA platform runs 24/7, with interactive dashboards accessible at ioda.caida.org. We deployed a beta version of the IODA HTTP API that allows users to query for subsets of outage alerts, e.g., those affecting a specific country or Autonomous System, or those generated during specific periods. To promote use of IODA by the security and political science communities, we attended the Citizen Lab Summer Institute, and created a Twitter account for IODA.


Archipelago (Ark), Vela, Henya

CAIDA has conducted measurements of the Internet topology since 1998. Our current measurement tool scamper (deployed on Ark since 2007) tracks global IP-level connectivity by sending probe packets from a set of source monitors to millions of geographically distributed destinations across the IPv4 and IPv6 address space. We also use a subset of Ark monitors to conduct daily traceroutes to every announced BGP prefix, with each monitor probing the entire set of targets independently and completing exactly one pass of the target set every calendar day (aligned on UTC boundaries). In addition to supporting these continuous measurements, Ark provides a secure and stable platform for researchers to run their own vetted experiments including: acting as probe receivers for the Spoofer Project; exploring new IPv6 topology measurement methods with yarrp6; quality-of-experience (QoE) measurements of YouTube; inferring borders between ISPs; and detecting congestion on these interdomain links ( Inferring Persistent Interdomain Congestion, SIGCOMM best paper).

In 2018, we deployed 35 vantage points (6 of which were replacements), adding 20 new nodes in the U.S. and nodes in Bhutan, Canada, China, Czechoslovakia, France, Ghana, Israel, Nigeria, Paraguay, Uruguay, and South Africa. Our Archipelago Monitor Locations web page shows all current hosting sites. We also continued to develop and improve Ark measurement capabilities: completed the MIDAR web API ; created a database to store the ITDK IP address aliases, and developed a web API and command-line tool for efficiently querying that database (aliasq). We also continued development of Vela , a prototype system for executing on-demand measurements on the Ark platform, and completed a Vela web API that enables 3rd party applications to trigger and run on-demand measurements on Ark nodes. This year, we received 30 requests for Vela accounts of which 23 were granted. Example usage includes research on BGP hijack detection, situational awareness of ISP security indicators, and peering strategy analysis.

We also completed development of a prototype system Henya for querying and visualizing historical and ongoing traceroute datasets. Henya allows users to find traceroute paths that contain one or more specified targets, which can be IP addresses, IP prefixes, AS numbers, or countries. One can also query to find all IP prefixes announced by a given AS in BGP, or all prefixes announced by ASes that geolocate to that country. The Henya system also prototypes a web-based interactive visualization of round-trip time (RTT) measurements to targets over time.


UCSD Network Telescope

We maintain and continue developing the UCSD Network Telescope measurement infrastructure to enable studying of Internet phenomena by monitoring and analyzing unsolicited traffic arriving at a globally routed underutilized /8 network. We maximize the research utility of these data by enabling near-real-time data access to vetted researchers, which requires tackling the associated challenges in flexible storage, curation, and privacy-protected sharing of large volumes of data. In collaboration with University of Waikato, in 2018 we developed a new architecture and library (nDAG) that captures the telescope traffic from a DAG card and distributes it using IP multicast to interested applications. Our implementation is efficient enough to keep up with line-rates of 10 Gbps. We continued developing our open source software framework Corsaro for capture, processing, management, analysis, visualization, and reporting of collected Telescope data. We are preparing a new version (v3.0) of Corsaro for release. It will include new meta-data tagging modules (related to spoofed traffic, erratic traffic components, IP geolocation, and AS lookup tagging) that will be faster and easier to maintain.


Spoofer

Despite forged source IP addresses (spoofing) being a known vulnerability for at least 25 years, and despite many efforts to remedy this problem, spoofing remains a viable exploit method enabling redirection, amplification, and anonymity in Distributed Denial-of-Service (DDoS) attacks. Fixing this problem requires operators to ensure their networks block packets with spoofed source IP addresses, a best current practice (BCP) known as source address validation (SAV - BCP38). We continue to develop and support an open-source client-server system Spoofer for testing deployment of SAV worldwide. When installed on a networked computer, the client periodically tests a network's ability to both send and receive spoofed packets, and sends results to the central server at CAIDA. An informative overview poster of the Spoofer project, and accompanying video, describes spoofing and our approach to elucidating the problem. In response to feedback from operational security communities, we produce reports, remediation analyses, and informative visualizations that are used by operators, response teams, and policy analysts. We automatically generate monthly reports of ASes from which we received packets with a spoofed source address, and publish these reports to network and security operations lists to ensure this information reaches operational contacts in these ASes.

Spoof Percentage by Country Figure. A world map showing percentage of IP blocks with the evidence of spoofing, grouped by country, generated from data collected by the Spoofer project. (Spoofer Results by Country).

Since March 2017 the Spoofer client also tests if the network has appropriate ingress filtering in place; that is, if the client is able to receive traffic from our server with spoofed source IP addresses in the same /30 (/120 for IPv6) subnet as the client. We also improved the client GUI across all supported platforms (Windows, Mac OSX, Ubuntu, OpenBSD) and released client 1.4.2, the first version featuring auto-update for the Spoofer client. This modification makes it easier for enterprises to add the spoofer infrastructure to their network hygiene toolkit. We began work on a prototype API for programmatic access to the spoofer data by external researchers. Our preliminary work focused on the needs of the Internet Society's Mutually Agreed Norms for Routing Security (MANRS) participants who have signed up to deploy SAV. Our code reports MANRS participants with unremediated spoofing in the past month.


Data

In the interests of reproducibility of our own work and to facilitate expanded scientific analysis of the research topics pursued, we invest significant effort to ensure that data we gather or derive from various raw data sources is available to other researchers. We list all available data sets, including legacy ones, on our CAIDA Data Overview page, and twice a year email our data users with updates and important news. In 2018, we added six new datasets (described below), and by the end of the year were serving 67 unique datasets.

New Datasets

BGP Community Dictionary Dataset provides geographical information encoded in BGP Community attributes. It represents our best effort to extract meaningful geolocation information encoded by network operators into the Community attributes they set up for their networks.

Internet eXchange Points (IXPs) Dataset provides information about Internet eXchange Points (IXPs) and their geographic locations, facilities, prefixes, and member ASes, derived by combining information from PeeringDB, Hurricane Electric, Packet Clearning House (PCH), and GeoNames.

CYMRU Bogon Reference Dataset provides historic daily bogons and fullbogons lists compiled by Team Cymru. The most recent daily lists are available at Team Cymru portal. This dataset contains a historical archive of lists that CAIDA has downloaded daily since September 2013.

Randomly and Uniformly Spoofed Denial-of-Service (RSDoS) Attack Metadata contains aggregated meta-data of the randomly spoofed denial-of-service attacks inferred from backscatter packets collected by the UCSD Network Telescope between March 1, 2015 and February 28, 2017. Possible uses of this data include: studying and modeling DoS attacks and characterizing victim populations. It is a restricted dataset that can be requested through IMPACT.

The ITDK 2018-03 was added to our ongoing collection of Macroscopic Internet Topology Data Kits (ITDK) that started in 2010 and now includes 16 Kits. This ITDK utilizes traceroutes not only from our Archipelago measurement infrastructure but also some traceroutes from the RIPE Atlas Internet measurement platform.

US backbone bidirectional traffic data In March 2018 we took the first monthly trace on our new 10 Gb link monitor in New York city. This monitor picks up where we left off in Chicago in March 2016 because that link upgraded to 100 Gb. All monthly 2018 traces (March-December) are available online.

The data supplement for our study Revealing the Load-balancing Behavior of YouTube Traffic on Interdomain Links (PAM 2018) paper was added to the growing number of CAIDA's publicly available datasets.

Data Collection Statistics

The graphs below show the cumulative amount of data accrued over the last several years by our primary data collection infrastructures, Archipelago and the UCSD Network Telescope. We are currently collecting about 4 TB of uncompressed data per day (more than 95% of which is Telescope data). In 2018 CAIDA captured about 13 TB of uncompressed topology traceroute data, and about 1.4 PB of Internet background radiation (IBR) traffic data.

[Figure: Archipelago cumulative data capture] [Figure: UCSD Network Telescope capture]


In 2018 we started collecting Anonymized Internet Passive traces on a 10GB backbone link in New York City. (Our last Internet backbone traffic trace was captured in April 2016, at which point the link was upgraded to 100GB, while our hardware only handle 10G). We secured access to a new 10GB link at equinix-nyc and resumed our trace collection in March 2018. We are still pursuing resources to develop monitors that can operate on 100 GB links.

Data Distribution Statistics

There are two complementary ways that users can request access to CAIDA's data: through the CAIDA portal and through the Information Marketplace for Policy and Analysis of Cyber-risk and Trust (IMPACT) portal. Datasets shared through the CAIDA portal fall into two categories: public and by-request. Public datasets are available to users who agree to CAIDA's Acceptable Use Policy for public data. These datasets are available for use by academic researchers, US government agencies, and corporate entities who participate in CAIDA's membership program. Users provide a brief description of their intended use of the data, and agree to an Acceptable Use Policy.

Access to the CAIDA datasets through IMPACT is subject to corresponding IMPACT Terms . These datasets are available for use by academic researchers, government agencies and corporate entities from DHS-Approved Locations (US, Canada, Australia, United Kingdom, Israel, Japan, the Netherlands, and Singapore).

The graphs below show the annual counts of unique visitors who downloaded CAIDA datasets (public, by-request, and IMPACT) and the total size of downloaded data. In 2018 we granted access to the CAIDA by-request and IMPACT datasets to more than 400 new users. Even though the number of users who downloaded Anonymized Internet traces and AS Relationships datasets increased significantly in 2018, the volume of downloaded data decreased to around 90 TB. This decrease of download volume might be explained by the fact that returning users downloaded only the most recent Anonymized Internet traces. These statistics do not include Near-Real-Time Telescope datasets ( raw traffic traces in pcap format, aggregated flow and daily RSDoS attack metadata) dissemination. Users can analyze these data sets only on CAIDA computers and are not allowed to download them. Currently, 14 days of the most recently collected raw telescope data are kept on disk. All (2008 - current) RSDoS metadata and Aggregated Flow data are stored in our cloud-based OpenStack Swift platform.

[Figure: 
total request counts statistics for data] [Figure: download statistics for CAIDA data]

Data Distribution Statistics: Number of unique users downloading CAIDA data and volume of data downloaded annualy. Colors indicate different datasets. Multiple downloads of the same file by the same user, which is common, only counted once.

Publications using public and/or restricted CAIDA data (by non-CAIDA authors)

We know of a total of 51 publications in 2018 by non-CAIDA authors that used these CAIDA data. (We update this data as we learn of new publications. Some papers used more than one dataset. Please let us know if you know of a paper using CAIDA data that is not on our list: Non-CAIDA Publications using CAIDA Data.

[Figure: request statistics for restricted
 data] [Figure: Cumulative number of citations of non-CAIDA papers using CAIDA data]

Impact of CAIDA data sharing: (a) Annual number of non-CAIDA publications using CAIDA data; (b) Cumulative number of citations of non-CAIDA publications using CAIDA data. Between 2002 and 2018 more than 1500 non-CAIDA papers using CAIDA datasets were published. These publications were cited more than 30,000 times, including about 600 mentions in various patents.

Tools

CAIDA develops and maintains supporting tools for Internet data collection, analysis and visualization. In 2018, Matthew Luckie (U. Waikato, subcontracted with CAIDA) made three releases of scamper, improving the tool's efficiency and adding documentation. (Latest 2018 release: 2081219.) We made five releases of our Spoofer software (Latest 2018 release: 20181219). We released two updates to BGPStream (Latest 2018 release: 1.2.1). MIDAR received an update to version 0.7.1, and updated kapar to version 0.6. ARTEMIS version 1.0 was released in December of 2018 as well (Latest 2018 release: 1.0.0)

The following chart and table display CAIDA developed and currently supported tools and number of external downloads (by unique IP address) during 2018.

[Figure: The number of times each tool was downloaded from the CAIDA web site in 2018.]
Tool Description Downloads
arkutil RubyGem containing utility classes used by the Archipelago measurement infrastructure and the MIDAR alias-resolution system. 387
Autofocus Internet traffic reports and time-series graphs. 248
BGPStream Open-source software framework for live and historical BGP data analysis, supporting scientific research, operational monitoring, and post-event analysis 1084
Chart::Graph A Perl module that provides a programmatic interface to several popular graphing package 124
CoralReef Measures and analyzes passive Internet traffic monitor data. 315
Corsaro Extensible software suite designed for large-scale analysis of passive trace data captured by darknets, but generic enough to be used with any type of passive trace data. 302
Cuttlefish Produces animated graphs showing diurnal and geographical patterns. 150
dbats High performance time series database engine 78
dnsstat DNS traffic measurement utility. 201
iatmon Ruby+C+libtrace analysis module that separates one-way traffic into defined subsets. 100
iffinder Discovers IP interfaces belonging to the same router. 301
kapar Graph-based IP alias resolution. 316
libsea Scalable graph file format and graph library. 142
libtimeseries Provides a high-performance abstraction layer for efficiently writing to time series databases. 90
Marinda A distributed tuple space implementation. 216
MIDAR Monotonic ID-Based Alias Resolution tool that identifies IPv4 addresses belonging to the same router (aliases) and scales up to millions of nodes. 301
Motu Dealiases pairs of IPv4 addresses. 61
mper Probing engine for conducting network measurements with ICMP, UDP, and TCP probes. 183
otter Visualizes arbitrary network data. 425
plot-latlong Plots points on geographic maps. 121
plotpaths Displays forward traceroute path data. 89
rb-mperio RubyGem for writing network measurement scripts in Ruby that use the mper probing engine. 364
RouterToAsAssignment Assigns each router from a router-level graph to its Autonomous System (AS). 488
rv2atoms (including straightenRV) A tool to analyze and process a Route Views table and compute BGP policy atoms. 28
scamper A tool to actively probe the Internet to analyze topology and performance. 332
sk_analysis_dump A tool for analysis of traceroute-like topology data. 78
topostats Computes various statistics on network topologies. 143
Walrus Visualizes large graphs in three-dimensional space. 464
* Note: Chart::Graph is also available on CPAN.org. The number shown is direct downloads from caida.org only (statistics from CPAN not available).

CAIDA 2018 in Numbers

In 2018, CAIDA published 15 peer-reviewed papers (see below), 1 technical report, and 2 workshop reports, made 30 presentations, and posted 9 blog entries. A complete list of presented materials is available on CAIDA Presentations page. We also organized and hosted three workshops: AIMS 2018: Workshop on Active Internet Measurements, IMAPS 2018: Internet Measurement And Political Science Workshop, and WIE 2018: 9th Workshop on Internet Economics.

In 2018, our web site www.caida.org attracted 440,511 unique visitors, with an average of 1.9 visits per visitor, serving an average of 5.7 pages per visit.

During 2018, CAIDA employed 20 staff (researchers, programmers, data administrators, technical support staff), hosted 4 postdocs, 3 PhD students, 3 graduate students, 17 undergraduate students, and 1 high school volunteer.

We received $2.8M to support our research activities from the following sources:

[Figure: Allocations by funding source]
Funding Source Amount ($) Percentage
NSF $543,091 19%
DHS $1,890,000 67%
Other $199,913 7%
Gift & Members $175,900 6%
Total $2,808,904 100%

Two views of historical funding allocations are shown below, presented by total amount received and by percentage based on funding source.


These charts below show CAIDA expenses, by type of operating expenses and by program area:

[Figure: Operating Expenses]
Expense Type Amount ($) Percentage
Labor $2,536,608.66 59%
Indirect Costs (IDC) $1,379,303.06 32%
Professional Development $8,353.96 < 1%
Supplies & Expenses $92,529.33 2%
Workshop & Visitor Support $23,638.10 <1%
CAIDA Travel $69,773.97 2%
Subcontracts $69,176.91 2%
Equipment $144,631.87 3%
Total $4,324,015.86 100%
[Figure: Expenses by Program Area]
Program Area Amount ($) Percentage
Economics & Policy $493,040.88 11%
Future Internet $107,021.29 2%
Mapping & Congestion $144,299.29 3%
Infrastructure $2,341,949.82 54%
Security & Stability $1,167,982.70 27%
Outreach $61,062.19 1%
CAIDA Internal Operations $8,659.70 <1%
Total $4,324,015.86 100%


Publications

(listed by primary topic area, but many cross multiple topics)

Supporting Resources

CAIDA's accomplishments are in large measure due to the high quality of our visiting students and collaborators. We are also fortunate to have financial and IT support from sponsors, members, and collaborators, and monitoring hosting sites.

UC San Diego Students

  • Alex Gamero-Garrido, PhD student at UC San Diego
  • Ojas Gupta, graduate student at UC San Diego
  • Gautam Akiwate, graduate student at UC San Diego
  • Chongyang Du, graduate student at UC San Diego
  • Unnikrishnan Sivaprasad, graduate student at UC San Diego

Visiting Scholars

  • Ricky Mok, postdoc from Hong Kong Polytechnic University
  • Danilo Cicalese, PhD student from Telecom ParisTech, France
  • Mattijs Jonker, PhD student from University of Twente, The Netherlands
  • Esteban Carisimo, graduate student from University of Buenos Aires, Argentina
  • Yves Vanubel, graduate student from Montefiore Institute, Belgium
  • Ahmed Adnan, graduate student from University of Iowa
  • Shinyoung Cho, PhD student from Stonybrook University
  • Raphael Hiesgen, PhD student from Hamburg University of Applied Sciences, Germany
  • Lucas Müller, PhD student from Federal University of Rio Grande do Sul (UFRGS), Brazil
  • Roderick Fanou, postdoc from IMDEA Networks Institute

Funding Sources

Published
Last Modified