PANDA: Integrated Platform for Applied Network Data Analysis

We are developing a new Platform for Applied Network Data Analysis (PANDA) that will offer researchers more accessible calibrated user-friendly tools for collecting, analyzing, querying, and interpreting measurements of the Internet ecosystem.

Sponsored by:
National Science Foundation (NSF)

Principal Investigators: kc claffy Bradley Huffaker Alberto DainottiAmogh DhamdhereAlistair King

Funding source:  OAC-1724853 Period of performance: September 1, 2017 - August 31, 2022.


Project Summary

For the last 20 years CAIDA has developed many data-focused services, products, tools and resources to advance the study of the Internet, which has permeated disciplines ranging from theoretical computer science to political science, from physics to techlaw, and from network architecture to public policy. As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, is gathering increasing attention by researchers, operators and policy makers, all of whom bring questions that they lack the capability to answer themselves. CAIDA has spent years cultivating relationships across disciplines (networking, security, economics, law, policy) with those interested in CAIDA data, but the impact thus far has been limited to a handful of researchers. The current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet.

In response to feedback from these communities, we will integrate existing research infrastructure measurement and analysis components previously developed by CAIDA into a new Platform for Applied Network Data Analysis (PANDA). Our goal is to enable new scientific directions, experiments and data products for a wide set of researchers from the four targeted disciplines: networking, security, economics, and public policy. We will emphasize efficient indexing and processing of terabyte archives, advanced visualization tools to show geographic and economic aspects of Internet structure, and careful interpretation of displayed results. To prove that our platform is easily extensible and adaptable to new opportunities, we will seek to augment it with new data products for unmet research needs: a comprehensive DNS data set (facilitating mapping network behavior to a human view) and anonymized residential network traffic data (supporting privacy sensitive security monitoring of in-home networks by even non-technical users).

We will ensure active engagement of our collaborators: organize annual workshops; develop online video tutorials targeting non-networking experts as well as classroom-focused materials; maintain an annotated bibliography and discussion forum; and institute an advisory board to provide strategic directions.

The success of our project will enable new empirical studies in the four targeted disciplines, promising innovations in: Internet mapping and path prediction; detection of route hijacking and other disruptive events; cybersecurity preparedness; economic studies of correlations between ISP characteristics, market power, performance degradations, security practices, and regional economic growth; and regulatory discourse that has thus far occurred largely without data. It will lower the threshold to use CAIDA's data products and tools for R&E needs, inform discussion of critical issues in current and future large-scale networking, and increase public awareness about Internet structure, dynamics, performance, and evolution. The developed platform will address NSF's CIF21 goal of interconnecting cyberinfrastructure components and developing a comprehensive, robust, scalable shared resource that will bridge diverse communities and integrate HPC, data, software, and facilities to expand the potential of Internet-related science.


Development of Platform for Applied Network Data Analysis (PANDA)

Task 1: Improvements of existing PANDA components

Description Projected Date Status
1.1: Re-architect AS-rank to serve research communnity needs.
a. develop new indexing schemes for an efficient AS path database --- done
b. implement tracking of changes to AS paths over time (support historic queries) ---
c. implement tracking of changes to AS relationships over time --- done
d. implement tracking of changes to customer cone sizes over time --- done
e. enable computation and archiving of the set of ASes comprising a customer cone --- in progress
f. improve AS-level visualizations to highlight structure from a given AS perspective --- in progress
1.2: Traceroute measurements and inferences
a. implement smooth transition between querying archived Ark probing data and requesting on-demand measurements in real time --- in progress
b. link IP-level and AS-level views to display all archived and derivative data related to all networks crossed by the probed path ---
c. combine bdrmap and MAP-IT (by UPenn) into a unified border mapping module --- done
1.3: Improve MANIC functionality
b. enable comparative views of a given interconnect from different locations Year 2
c. integrate geolocation information about networks Year 3 done
d. integrate facility-level information about interconnections Year 2
e. re-architect influxDB to work on a cluster to support expanded community use --- in progress
1.4: Improve AS-level derivative data sets
a. update AS-to-organization data set and associated tools (e.g., API) Year 3 done
b. rearchitect prefix2AS database, build API Year 3 done
1.5: Continue BGPStream development
a. implement bindings to the main BGPStream C library to facilitate use by external software modules Year 1 done
b. create distribution-specific BGPStream packages for various OSes (Ubuntu, Debian, FreeBSD, CentOS) Year 2 done
c. enable consumption of Periscope BGP data through the BGPStream API Year 2 done

Task 2: Linking components into a multifunctional platform

2.1: enable use of BGPStream modules by all PANDA components
2.2: enable cross-use of results from Ark, RIPE Atlas, and Periscope
2.3: user interface that allows queries that synthesize multiple data sets

Task 3: Integrate new external data infrastructure building blocks into PANDA

3.1: include data from home routers into MANIC
3.2: explore the possibility of using video quality reports for cross-correlation with MANIC data
3.3: incorporate large scale active measurements of DNS
3.4: integrate user traffic data from home networks with BGP-aware and IXP-aware functionality

PANDA Community Activities

Task 1: Increase community accessibility of unified platform and its underlying components

Description Projected Date Status
1.1: Improve ITDK
a. create a simplified version removing complex artifacts (MOAs, AS loops and sets, hyperlinks) ---
b. render router graph amenable to processing by basic graph database tools ---
c. provide documentation ---
d. create an economist-friendly version ---
1.2: Provide data products in easier-to-use, domain-specific formats (JSON, standard graph formats, inputs for network simulators)
a. develop custom tools for format conversions --- ongoing
b. improve libipmeta libraries for geolocaiton --- ongoing
1.3: Create user-friendly interface to spoofer results accessible via PANDA
a. integrate spoofer data into PANDA web UI --- ---
b. integrate public BGP info on the stability of edge network address space (to evaluate the feasibility of deploying static access control lists) --- done

Task 2: Provide support for multidisciplinary collaborations

Description Projected Date Status
2.1: Regularly interact with PANDA users
c. conduct annual surveys on usability and impact of the platform ---
2.2: Engage users via CAIDA annual workshop series: Active Internet Measurement Systems (AIMS) and Workshop on Internet Economics (WIE)
a. present new data sets associated with and/or resulting from PANDA --- done
b. introduce new PANDA capabilities as they develop --- done
c. conduct hands-on tutorials --- done
2.3: Create and maintain an online community resource of project materials
a. create tutorials for using PANDA components, suitable for classroom use --- done
e. project wiki --- done (internal)
2.4: Organize and host annual external advisory board meetings
a. seek board members advice on enriching linkages between PANDA and targeted communities ---
b. identify emerging national and international issues to be tackled by PANDA --- ongoing
c. discuss data collection and analysis developments to inform policy-making ---

Task 3: Develop and implement a Science Gateway style interface for interactive access to PANDA


PANDA Strategic Advisory Council

As of April 2019, the PANDA Strategic Advisory Council consists of:

  • David Clark (Chair, MIT/CSAIL)
  • kc claffy (CAIDA/UC San Diego)
  • Robert Cannon (FCC)
  • Harold Feld (Public Knowledge)
  • Shane Greenstein (Harvard Business School)
  • Geoff Huston (APNIC)
  • Scott Jordan (UC Irvine)
  • Marvin Sirbu (Carnegie Mellon University)

Publications


Acknowledgment of awarding agency's support

National Science Foundation (NSF)

This material is based on research sponsored by the National Science Foundation (NSF) grant OAC-1724853. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.


Additional Content

DIBBs: Integrated Platform for Applied Network Data Analysis (PANDA): Proposal

An abbreviated version of the original proposal is shown below.

Published
Last Modified