FANTAIL: Facilitating Advances in Network Topology Analysis
We propose to develop a system to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets.
Principal Investigator: kc claffy
Funding source: CNS-1925729 Period of performance: September 1, 2019 - August 31, 2023.
Project Summary
Internet cartography has emerged as a new field of computer as well as network science, with several global Internet measurement infrastructures executing comprehensive topology mapping measurement experiments, continuously, for years. UCSD's Center for Applied Internet Data Analysis (CAIDA) has operated the longest-running of these measurement infrastructure platforms (Archipelago), which has supported scientific measurement experiments of the global Internet since September 2007. This platform has collected 90 billion traceroutes in 39 TB of files, growing 16 billion traces and 7 TB annually (5-year doubling rate). These data sets have already yielded impacts across a broad range of CISE sub-disciplines. Yet the biggest remaining obstacle to even more productive scientific use of this unbounded wealth of information is infrastructural: the lack of an easy-to-use and analytically powerful exploratory interface to the data.
Researchers have made explicit requests for search functionality that would have a transformative impact on several focused areas of CISE-funded research. In response to community feedback, we propose to develop the FANTAIL system - Facilitating Advances in Network Topology Analysis - to enable discovery of the full potential value of massive raw Internet end-to-end path measurement data sets. We envision a four-component system: (1) an interactive web interface; (2) an API built on web standards; (3) a full-text search system based on Elasticsearch; and (4) a big data processing system based on Spark, leveraging SDSC's cluster resources. Although our goal is to enhance the general accessibility and utility of this data, our project will be driven by specific compelling use cases, in response to research community needs for interactive exploratory capabilities. To this end, we will identify and implement reusable components, analysis modules, which will serve as primitives for constructing more complex data-processing pipelines. Users will specify, via the web interface or API, a sequence of analysis modules to execute on the set of traceroute paths matched by their queries. The FANTAIL system will then perform the queries, run the analysis modules, and provide the output for download for further analysis or processing by researchers on their own systems. We will implement analysis modules that are useful for (1) performing data reduction (to minimize the amount of data users have to download and process), (2) enhancing raw traceroute data with various annotations available publicly or created by us, and (3) offloading commonly-needed analysis/data processing tasks from users.
Projected Timeline
Task | Description | Projected Date | Status |
---|---|---|---|
Infrastructure Development | |||
1 | Acquire and deploy a new server to support FANTAIL | Year 1 | done |
2 | Set up Elasticsearch and Spark on SDSC cluster computers, and develop software needed to connect Elasticsearch with Spark | Year 1 | done |
3 | Convert most recent few years of CAIDA traceroute data into JSON format and import into Elasticsearch | Year 1 | done |
4 | Develop a command-line tool to perform queries through the Elasticsearch API, and determine the exact Elasticsearch query expressions needed to execute traceroute queries | Year 1 | done |
5 | Implement analysis module to
(1) reduce a set of traceroute paths to the set of unique paths or to a graph, and (2) extract, analyze, and compute various statistics on round-trip time data |
Year 1 | done done |
6 | Implement a web API to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable format | Year 1 | ongoing |
7 | Develop tools to store DNS, IXP, and bdrmapIT data in a database and make accessible from FANTAIL | Year 2 | done |
8 | Implement all remaining analysis modules | Year 2 | done |
9 | Implement an interactive web site to perform traceroute queries, construct and execute a data processing pipeline, and download results in a suitable format | Year 2 | done |
10 | Implement analysis recipes | Year 3 | done |
11 | Implement support for executing analysis recipes to interactive web site and API | Year 3 | |
12 | Import the remainder of CAIDA traceroute data into Elasticsearch | Year 3 | |
13 | Develop tools to automate importing of new CAIDA data | Year 3 | done |
14 | Document FANTAIL and its capabilities for operational maintenance | Year 3 | done |
Community Activities | |||
1 | Create and open a project web site | Year 1 | done |
2 | Organize a Community Workshop; publish the workshop report and recommendations | Year 1 | done |
3 | Attend CCRI PI Community meeting | Year 1 | done |
4 | Create a mailing list to support FANTAIL users | Year 1 | done |
5 | Discuss with researchers their needs for generally usable analyses modules | Year 2 | done |
6 | Identify and prioritize improvements of FANTAIL capabilities based on users' feedback | Year 2 | done |
7 | Organize a Community Workshop | Year 3 | done |
8 | Attend CCRI PI Community meeting | Year 3 | done |
9 | Refine FANTAIL capabilities based on users' feedback | Year 3 | ongoing |
10 | Engage FANTAIL users in sustainability discussions | Year 3 | ongoing |
Publications
-
Jitterbug: A new framework for jitter-based congestion inference.
E. Carisimo, R. Mok, D. Clark, k. claffy.
Passive and Active Measurement Conference (PAM), Mar 2022. -
Learning to Extract Geographic Information from Internet Router Hostnames.
M. Luckie, B. Huffaker, A. Marder, Z. Bischof, M. Fletcher, k. claffy.
ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT), Dec 2021. -
Learning Regexes to Extract Network Names from Hostnames.
M. Luckie, A. Marder, B. Huffaker, k. claffy.
Asian Internet Engineering Conference (AINTEC), Dec 2021.
Acknowledgment of awarding agency's support
This material is based on research sponsored by the National Science Foundation (NSF) grant CNS-1925729. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.
Additional Content
FANTAIL - Facilitating Advances in Network Topology Analysis
An abbreviated version of the original proposal.