Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > funding : ccri-fantail
FANTAIL - Facilitating Advances in Network Topology Analysis
Sponsored by:
National Science Foundation (NSF)

We propose to develop a system to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets.

Funding source: NSF CNS-1925729. Period of performance: September 1, 2019 - August 31, 2022.

|   Project Summary    Proposal   |

Project Summary

Internet cartography has emerged as a new field of computer as well as network science, with several global Internet measurement infrastructures executing comprehensive topology mapping measurement experiments, continuously, for years. UCSD's Center for Applied Internet Data Analysis (CAIDA) has operated the longest-running of these measurement infrastructure platforms (Archipelago), which has supported scientific measurement experiments of the global Internet since September 2007. This platform has collected 90 billion traceroutes in 39 TB of files, growing 16 billion traces and 7 TB annually (5-year doubling rate). These data sets have already yielded impacts across a broad range of CISE sub-disciplines. Yet the biggest remaining obstacle to even more productive scientific use of this unbounded wealth of information is infrastructural: the lack of an easy-to-use and analytically powerful exploratory interface to the data.

Researchers have made explicit requests for search functionality that would have a transformative impact on several focused areas of CISE-funded research. In response to community feedback, we propose to develop the FANTAIL system - Facilitating Advances in Network Topology Analysis - to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets. We envision a four-component system: (1) an interactive web interface; (2) an API built on web standards; (3) a full-text search system based on Elasticsearch; and (4) a big data processing system based on Spark, leveraging SDSC's cluster resources. Although our goal is to enhance the general accessibility and utility of this data, our project will be driven by specific compelling use cases, in response to research community needs for interactive exploratory capabilities. To this end, we will identify and implement reusable components, analysis modules, which will serve as primitives for constructing more complex data-processing pipelines. Users will specify, via the web interface or API, a sequence of analysis modules to execute on the set of traceroute paths matched by their queries. The FANTAIL system will then perform the queries, run the analysis modules, and provide the output for download for further analysis or processing by researchers on their own systems. We will implement analysis modules that are useful for (1) performing data reduction (to minimize the amount of data users have to download and process), (2) enhancing raw traceroute data with various annotations available publicly or created by us, and (3) offloading commonly-needed analysis/data processing tasks from users.


Projected Timeline

TaskDescriptionProjected DateStatus
Infrastructure Development
1Acquire and deploy a new server to support FANTAILYear 1
2Set up Elasticsearch and Spark on SDSC cluster computers, and develop software needed to connect Elasticsearch with SparkYear 1
3Convert most recent few years of CAIDA traceroute data into JSON format and import into ElasticsearchYear 1
4Develop a command-line tool to perform queries through the Elasticsearch API, and determine the exact Elasticsearch query expressions needed to execute traceroute queriesYear 1
5Implement analysis module to

(1) reduce a set of traceroute paths to the set of unique paths or to a graph, and
(2) extract, analyze, and compute various statistics on round-trip time data
Year 1
6Implement a web API to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable formatYear 1
7Develop tools to store DNS, IXP, bdrmapIT, and TNT data in a database and make accessible from FANTAILYear 2
8Implement all remaining analysis modulesYear 2
9Implement an interactive web site to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable formatYear 2
10Implement all analysis recipesYear 3
11Implement support for executing analysis recipes to interactive web site and APIYear 3
12Import most recent few years of RIPE Atlas traceroute data into ElasticsearchYear 3
13Import the remainder of CAIDA traceroute data into ElasticsearchYear 3
14Develop tools to automate importing of new CAIDA and RIPE dataYear 3
15Document FANTAIL and its capabilities for operational maintenanceYear 3
Community Activities
1Create and open a project web siteYear 1
2Organize a Community Workshop; publish the workshop report and recommendationsYear 1
3Attend CCRI PI Community meetingYear 1
4Create a mailing list to support FANTAIL usersYear 1
5Organize a Community Workshop; publish the workshop report and recommendationsYear 2
6Discuss with researchers their needs for generally usable analyses modulesYear 2
7Attend CCRI PI Community meetingYear 2
8Conduct FANTAIL user survey #1Year 2
9Identify and prioritize improvements of FANTAIL capabilities based on users' feedbackYear 2
10Organize a Community Workshop; publish the workshop report and recommendationsYear 3
11Attend CCRI PI Community meetingYear 3
12Conduct FANTAIL user survey #2Year 3
13Refine FANTAIL capabilities based on users' feedbackYear 3
14Engage FANTAIL users in sustainability discussionsYear 3

  Last Modified: Fri Sep-13-2019 15:49:58 PDT
  Page URL: http://www.caida.org/funding/ccri-fantail/index.xml