Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
Detection and analysis of large-scale Internet infrastructure outages (IODA)
Sponsored by:
National Science Foundation (NSF)
The Internet Outage Detection and Analysis (IODA) project will apply successful results in analyzing large-scale Internet outages to the development, testing, and deployment of an operational capability to detect, monitor, and characterize future episodes of Internet connectivity disruptions.

Funding source: NSF CNS-1228994. Period of performance: September 1, 2012 - August 31, 2016.

|   Project Summary    Project Page    Proposal   |

Project Summary

Our dependence on the Internet has rapidly grown much stronger than our comprehension of its underlying structure, global dynamics, operational threats, and overall network health. Wide-scale Internet service disruptions and even politically motivated interference with Internet access in order to hinder anti-government organization are not new. But the scale, duration, coverage, and violent context of the government-mandated country-level Internet censorship episodes in 2011 inspired scientific as well as popular interest in capabilities to not only detect but quickly and thoroughly characterize the causes of reachability problems.

We have developed and demonstrated a methodology that can identify not only which networks have been affected by an outage, but also which techniques have been used to effect a deliberate disruption (e.g., control plane vs. data plane intervention). We have also developed metrics to quantitatively gauge the geographic and topological extent of impact of geophysical disasters on Internet infrastructure, and techniques to investigate the chronological dynamics of the outage and restoration. Our approach relies on:

  • the extraction of signal from a pervasive and continuous source of malware-induced background radiation in Internet traffic (IBR);
  • combining multiple types of data (active probing, passive IBR measurement, BGP routing data, and address geolocation and registry databases) to assess the scope and progression of the outage.

This project will result in an experimental operational deployment to validate and extend an empirically-grounded methodology for detection and analysis of large-scale Internet outages. In addition to improving our understanding of how measurements yield insights into network behavior, and strengthening our ability to model large scale complex networks, use of such a system will also illuminate infrastructure vulnerabilities that derive from architectural, topological, or economic constraints, suggesting how to mitigate or eliminate these weaknesses in future Internet architecture and measurement research. A deployed platform will be able to detect and monitor connectivity disruption and censorship events on a planetary scale thus enabling situational awareness of the nature and causes of network outages to national decision-makers who must determine the type and extent of proper response.

Management Plan

The requested budget supports approximately 2 full time positions (25 person-months of effort per year) at CAIDA. The main proposed tasks are overlapping in time and each task will inform the others:

  •  Task 1 : investigating and defining strategies and methodologies for how to combine multiple heterogeneous data sources to detect and characterize outage events (Years 1, 2, and 3);
  •  Task 2 : defining (and refining) the system requirements for continuous monitoring and (near) real-time analysis of outages as they occur (will start in the second half of Year 1);
  •  Task 3 : testing and experimental deployment of such a system (Years 2 and 3).

Additional ongoing project activities will include:

  • developing project web pages to track project progress and disseminate data and tools;
  • maintaining a blog for timely dissemination of analysis and discussion of detected events;
  • coordination of our observations with other research and operational groups;
  • interaction with various stakeholders interested in our results.
The tentative schedule below details subtasks for each year of the project.

SubtaskDescriptionProjected TimelineStatus
1.1Select a geolocation license provider for the project and purchase a licenseYear 1done
1.2Define prefix and AS groupings by countries and/or by geographic regionsYear 1done
1.3Work with UCSD telescope researchers to define most relevant IBR traffic indicatorsYear 1done
1.4Start developing automated methods of monitoring prefix reachability in BGP tablesYear 1done
1.5Experiment with more frequent probing of globally routed prefixes by the Ark platformYear 1done
1.6Test on-demand active probing capabilities of the Ark measurement infrastructureYear 1done
1.7Investigate combined indicators for event detection, characterization, and analysisYear 1done
2.1Evaluate the volume of data that needs to be stored locallyYear 1 (2nd half)done
2.2Evaluate the size of the time window for data aggregation and processingYear 1 (2nd half)done
2.3Evaluate the computational resources required for fast ongoing processingYear 1 (2nd half)done
2.4Analyze the feasibility of emerging requirements, balancing storage and processing resources vs. desired functionality vs. costYear 1 (2nd half)done
1.8Experiment with monitoring and analyzing IBR traffic by geographic regionsYear 2done
1.9Evaluate indicators for detection, characterization, and analysis of events with specific regard to aggregation by geographic regionYear 2done
1.10Develop methods to integrate BGP data from Route Views and from RIPE RISYear 2done
1.11Develop methods to integrate probing data from CAIDA's Ark and RIPE's Atlas platformsYear 2done
1.12Develop triggers for on-demand active probing based on observed routing changesYear 2done
1.13Develop triggers for on-demand active probing based on observed IBR traffic changesYear 2done
1.14Develop and integrate change-point detection algorithms into the systemYear 2done
2.5Specify hardware parameters, obtain quotes, and purchase compute server and disk storageYear 2done
2.6Put the server and storage into production modeYear 2done
2.7Design and prototype web interfaces to control input/output of the monitoring systemYear 2done
2.8Define efficient data structures for detection algorithmsYear 2done
2.9Design and prototype web interfaces to present the analysis resultsYear 2done
2.10Define requirements for merging routing data from Route Views and RIPE RISYear 2done
2.11Define requirements for merging active probing data from CAIDA Ark and RIPE AtlasYear 2done
2.12Document the system requirements and the selected designYear 2done
3.1Implement a software library for a common layer of functions and data structuresYear 2done
3.2Implement a web-based interface to focus the monitoring process on specific regions and/or to use specific subsets of dataYear 2done
3.3Implement the monitoring software modulesYear 2done
3.4Implement the inference software modulesYear 2done
3.5Create informative demos of the system capabilitiesYear 2done
3.6Implement interactive web interface to visualize the results of data analysisYear 2done
1.15Evaluate the efficiency of implemented automated detection algorithmsYear 3 (1st half)done
1.16Evaluate the effectiveness of data integration, data visualization, and user interfaceYear 3 (1st half)done
2.13(optional) Investigate the possibility to plug-in additional data sourcesYear 3 (1st half)done
2.14If necessary, adjust the system requirements based on experienceYear 3 (1st half)done
2.15Update the documentationYear 3 (1st half)done
3.8Test the system on real casesYear 3done
3.9Experiment with various methods to deliver alerts (e.g., email, instant messaging) Year 3done
3.10Release the developed software under an open source licenseYear 3done
3.11Evaluate the potential impacts (positive and negative) of our analysis and dissemination of results on the network operators involved in the observed outage casesYear 3done

  Last Modified: Tue Jan-17-2017 22:18:36 PST
  Page URL: