FANTAIL: Facilitating Advances in Network Topology Analysis

We propose to develop a system to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets.

Sponsored by:
National Science Foundation (NSF)

Principal Investigator: kc claffy

Funding source:  CNS-1925729 Period of performance: September 1, 2019 - August 31, 2023.


Project Summary

Internet cartography has emerged as a new field of computer as well as network science, with several global Internet measurement infrastructures executing comprehensive topology mapping measurement experiments, continuously, for years. UCSD's Center for Applied Internet Data Analysis (CAIDA) has operated the longest-running of these measurement infrastructure platforms (Archipelago), which has supported scientific measurement experiments of the global Internet since September 2007. This platform has collected 90 billion traceroutes in 39 TB of files, growing 16 billion traces and 7 TB annually (5-year doubling rate). These data sets have already yielded impacts across a broad range of CISE sub-disciplines. Yet the biggest remaining obstacle to even more productive scientific use of this unbounded wealth of information is infrastructural: the lack of an easy-to-use and analytically powerful exploratory interface to the data.

Researchers have made explicit requests for search functionality that would have a transformative impact on several focused areas of CISE-funded research. In response to community feedback, we propose to develop the FANTAIL system - Facilitating Advances in Network Topology Analysis - to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets. We envision a four-component system: (1) an interactive web interface; (2) an API built on web standards; (3) a full-text search system based on Elasticsearch; and (4) a big data processing system based on Spark, leveraging SDSC's cluster resources. Although our goal is to enhance the general accessibility and utility of this data, our project will be driven by specific compelling use cases, in response to research community needs for interactive exploratory capabilities. To this end, we will identify and implement reusable components, analysis modules, which will serve as primitives for constructing more complex data-processing pipelines. Users will specify, via the web interface or API, a sequence of analysis modules to execute on the set of traceroute paths matched by their queries. The FANTAIL system will then perform the queries, run the analysis modules, and provide the output for download for further analysis or processing by researchers on their own systems. We will implement analysis modules that are useful for (1) performing data reduction (to minimize the amount of data users have to download and process), (2) enhancing raw traceroute data with various annotations available publicly or created by us, and (3) offloading commonly-needed analysis/data processing tasks from users.


Projected Timeline

Task Description Projected Date Status
Infrastructure Development
1 Acquire and deploy a new server to support FANTAIL Year 1 done
2 Set up Elasticsearch and Spark on SDSC cluster computers, and develop software needed to connect Elasticsearch with Spark Year 1 done
3 Convert most recent few years of CAIDA traceroute data into JSON format and import into Elasticsearch Year 1 done
4 Develop a command-line tool to perform queries through the Elasticsearch API, and determine the exact Elasticsearch query expressions needed to execute traceroute queries Year 1 done
5 Implement analysis module to

(1) reduce a set of traceroute paths to the set of unique paths or to a graph, and
(2) extract, analyze, and compute various statistics on round-trip time data
Year 1 done
done
6 Implement a web API to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable format Year 1 ongoing
7 Develop tools to store DNS, IXP, and bdrmapIT data in a database and make accessible from FANTAIL Year 2 done
8 Implement all remaining analysis modules Year 2 done
9 Implement an interactive web site to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable format Year 2 done
10 Implement analysis recipes Year 3 done
11 Implement support for executing analysis recipes to interactive web site and API Year 3
12 Import the remainder of CAIDA traceroute data into Elasticsearch Year 3
13 Develop tools to automate importing of new CAIDA data Year 3 done
14 Document FANTAIL and its capabilities for operational maintenance Year 3 done
Community Activities
1 Create and open a project web site Year 1 done
2 Organize a Community Workshop; publish the workshop report and recommendations Year 1 done
3 Attend CCRI PI Community meeting Year 1 done
4 Create a mailing list to support FANTAIL users Year 1 done
5 Discuss with researchers their needs for generally usable analyses modules Year 2 done
6 Identify and prioritize improvements of FANTAIL capabilities based on users' feedback Year 2 done
7 Organize a Community Workshop Year 3 done
8 Attend CCRI PI Community meeting Year 3 done
9 Refine FANTAIL capabilities based on users' feedback Year 3 ongoing
10 Engage FANTAIL users in sustainability discussions Year 3 ongoing

Publications


Acknowledgment of awarding agency's support

National Science Foundation (NSF)

This material is based on research sponsored by the National Science Foundation (NSF) grant CNS-1925729. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.


Additional Content

FANTAIL - Facilitating Advances in Network Topology Analysis

An abbreviated version of the original proposal.

Published
Last Modified