Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > funding : dals-satc
Detection and analysis of large-scale Internet infrastructure outages
Sponsored by:
National Science Foundation (NSF)
We will apply successful results in analyzing large-scale Internet outages to the development, testing, and deployment of an operational capability to detect, monitor, and characterize future episodes of Internet connectivity disruptions.

Funding source: NSF CNS-1228994. Period of performance: September 1, 2012 - August 31, 2015.

|  Project Summary    Proposal  |

Project Summary

Our dependence on the Internet has rapidly grown much stronger than our comprehension of its underlying structure, global dynamics, operational threats, and overall network health. Wide-scale Internet service disruptions and even politically motivated interference with Internet access in order to hinder anti-government organization are not new. But the scale, duration, coverage, and violent context of the government-mandated country-level Internet censorship episodes in 2011 inspired scientific as well as popular interest in capabilities to not only detect but quickly and thoroughly characterize the causes of reachability problems.

We have developed and demonstrated a methodology that can identify not only which networks have been affected by an outage, but also which techniques have been used to effect a deliberate disruption (e.g., control plane vs. data plane intervention). We have also developed metrics to quantitatively gauge the geographic and topological extent of impact of geophysical disasters on Internet infrastructure, and techniques to investigate the chronological dynamics of the outage and restoration. Our approach relies on:

  • the extraction of signal from a pervasive and continuous source of malware-induced background radiation in Internet traffic (IBR);
  • combining multiple types of data (active probing, passive IBR measurement, BGP routing data, and address geolocation and registry databases) to assess the scope and progression of the outage.

This project will result in an experimental operational deployment to validate and extend an empirically-grounded methodology for detection and analysis of large-scale Internet outages. In addition to improving our understanding of how measurements yield insights into network behavior, and strengthening our ability to model large scale complex networks, use of such a system will also illuminate infrastructure vulnerabilities that derive from architectural, topological, or economic constraints, suggesting how to mitigate or eliminate these weaknesses in future Internet architecture and measurement research. A deployed platform will be able to detect and monitor connectivity disruption and censorship events on a planetary scale thus enabling situational awareness of the nature and causes of network outages to national decision-makers who must determine the type and extent of proper response.

Management Plan

The requested budget supports approximately 2 full time positions (25 person-months of effort per year) at CAIDA. The main proposed tasks are overlapping in time and each task will inform the others:

  •  Task 1 : investigating and defining strategies and methodologies for how to combine multiple heterogeneous data sources to detect and characterize outage events (Years 1, 2, and 3);
  •  Task 2 : defining (and refining) the system requirements for continuous monitoring and (near) real-time analysis of outages as they occur (will start in the second half of Year 1);
  •  Task 3 : testing and experimental deployment of such a system (Years 2 and 3).

Additional ongoing project activities will include:

  • developing project web pages to track project progress and disseminate data and tools;
  • maintaining a blog for timely dissemination of analysis and discussion of detected events;
  • coordination of our observations with other research and operational groups;
  • interaction with various stakeholders interested in our results.
The tentative schedule below details subtasks for each year of the project.

SubtaskDescriptionProjected TimelineStatus
1.1Select a geolocation license provider for the project and purchase a licenseYear 1done
1.2Define prefix and AS groupings by countries and/or by geographic regionsYear 1in progress
1.3Work with UCSD telescope researchers to define most relevant IBR traffic indicatorsYear 1done
1.4Start developing automated methods of monitoring prefix reachability in BGP tablesYear 1in progress
1.5Experiment with more frequent probing of globally routed prefixes by the Ark platformYear 1in progress
1.6Test on-demand active probing capabilities of the Ark measurement infrastructureYear 1
1.7Investigate combined indicators for event detection, characterization, and analysisYear 1in progress
2.1Evaluate the volume of data that needs to be stored locallyYear 1 (2nd half)in progress
2.2Evaluate the size of the time window for data aggregation and processingYear 1 (2nd half)in progress
2.3Evaluate the computational resources required for fast ongoing processingYear 1 (2nd half)in progress
2.4Analyze the feasibility of emerging requirements, balancing storage and processing resources vs. desired functionality vs. costYear 1 (2nd half)in progress
1.8Experiment with monitoring and analyzing IBR traffic by geographic regionsYear 2in progress
1.9Evaluate indicators for detection, characterization, and analysis of events with specific regard to aggregation by geographic regionYear 2
1.10Develop methods to integrate BGP data from Route Views and from RIPE RISYear 2
1.11Develop methods to integrate probing data from CAIDA's Ark and RIPE's Atlas platformsYear 2
1.12Develop triggers for on-demand active probing based on observed routing changesYear 2
1.13Develop triggers for on-demand active probing based on observed IBR traffic changesYear 2
1.14Develop and integrate change-point detection algorithms into the systemYear 2in progress
2.5Specify hardware parameters, obtain quotes, and purchase compute server and disk storageYear 2
2.6Put the server and storage into production modeYear 2
2.7Design and prototype web interfaces to control input/output of the monitoring systemYear 2
2.8Define efficient data structures for detection algorithmsYear 2
2.9Design and prototype web interfaces to present the analysis resultsYear 2
2.10Define requirements for merging routing data from Route Views and RIPE RISYear 2
2.11Define requirements for merging active probing data from CAIDA Ark and RIPE AtlasYear 2
2.12Document the system requirements and the selected designYear 2
3.1Implement a software library for a common layer of functions and data structuresYear 2
3.2Implement a web-based interface to focus the monitoring process on specific regions and/or to use specific subsets of dataYear 2
3.3Implement the monitoring software modulesYear 2
3.4Implement the analysis software modulesYear 2
3.5Test the platform using simulated replications of previously (manually) detected eventsYear 2
3.6Create informative demos of the system capabilitiesYear 2
3.7Implement interactive web interface to visualize the results of data analysisYear 2
1.15Evaluate the efficiency of implemented automated detection algorithmsYear 3 (1st half)
1.16Evaluate the effectiveness of data integration, data visualization, and user interfaceYear 3 (1st half)
2.13(optional) Investigate the possibility to plug-in additional data sourcesYear 3 (1st half)
2.14If necessary, adjust the system requirements based on experienceYear 3 (1st half)
2.15Update the documentationYear 3 (1st half)
3.8Test the system on real casesYear 3
3.9Experiment with various methods to deliver alerts (e.g., email, instant messaging) Year 3
3.10Release the developed software under an open source licenseYear 3
3.11Evaluate the potential impacts (positive and negative) of our analysis and dissemination of results on the network operators involved in the observed outage casesYear 3

  Last Modified: Fri Nov-8-2013 15:55:15 PST
  Page URL: http://www.caida.org/funding/dals-satc/index.xml