PANDA: Integrated Platform for Applied Network Data Analysis
We are developing a new Platform for Applied Network Data Analysis (PANDA) that will offer researchers more accessible calibrated user-friendly tools for collecting, analyzing, querying, and interpreting measurements of the Internet ecosystem.
Principal Investigators: kc claffy Bradley Huffaker Alberto DainottiAmogh DhamdhereAlistair King
Funding source: OAC-1724853 Period of performance: September 1, 2017 - August 31, 2022.
Project Summary
For the last 20 years CAIDA has developed many data-focused services, products, tools and resources to advance the study of the Internet, which has permeated disciplines ranging from theoretical computer science to political science, from physics to techlaw, and from network architecture to public policy. As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, is gathering increasing attention by researchers, operators and policy makers, all of whom bring questions that they lack the capability to answer themselves. CAIDA has spent years cultivating relationships across disciplines (networking, security, economics, law, policy) with those interested in CAIDA data, but the impact thus far has been limited to a handful of researchers. The current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet.
In response to feedback from these communities, we will integrate existing research infrastructure measurement and analysis components previously developed by CAIDA into a new Platform for Applied Network Data Analysis (PANDA). Our goal is to enable new scientific directions, experiments and data products for a wide set of researchers from the four targeted disciplines: networking, security, economics, and public policy. We will emphasize efficient indexing and processing of terabyte archives, advanced visualization tools to show geographic and economic aspects of Internet structure, and careful interpretation of displayed results. To prove that our platform is easily extensible and adaptable to new opportunities, we will seek to augment it with new data products for unmet research needs: a comprehensive DNS data set (facilitating mapping network behavior to a human view) and anonymized residential network traffic data (supporting privacy sensitive security monitoring of in-home networks by even non-technical users).
We will ensure active engagement of our collaborators: organize annual workshops; develop online video tutorials targeting non-networking experts as well as classroom-focused materials; maintain an annotated bibliography and discussion forum; and institute an advisory board to provide strategic directions.
The success of our project will enable new empirical studies in the four targeted disciplines, promising innovations in: Internet mapping and path prediction; detection of route hijacking and other disruptive events; cybersecurity preparedness; economic studies of correlations between ISP characteristics, market power, performance degradations, security practices, and regional economic growth; and regulatory discourse that has thus far occurred largely without data. It will lower the threshold to use CAIDA's data products and tools for R&E needs, inform discussion of critical issues in current and future large-scale networking, and increase public awareness about Internet structure, dynamics, performance, and evolution. The developed platform will address NSF's CIF21 goal of interconnecting cyberinfrastructure components and developing a comprehensive, robust, scalable shared resource that will bridge diverse communities and integrate HPC, data, software, and facilities to expand the potential of Internet-related science.
Development of Platform for Applied Network Data Analysis (PANDA)
Task 1: Improvements of existing PANDA components
Description | Projected Date | Status | |
---|---|---|---|
1.1: Re-architect AS-rank to serve research communnity needs. | |||
a. | develop new indexing schemes for an efficient AS path database | --- | done |
b. | implement tracking of changes to AS paths over time (support historic queries) | --- | |
c. | implement tracking of changes to AS relationships over time | --- | done |
d. | implement tracking of changes to customer cone sizes over time | --- | done |
e. | enable computation and archiving of the set of ASes comprising a customer cone | --- | in progress |
f. | improve AS-level visualizations to highlight structure from a given AS perspective | --- | in progress |
1.2: Traceroute measurements and inferences | |||
a. | implement smooth transition between querying archived Ark probing data and requesting on-demand measurements in real time | --- | in progress |
b. | link IP-level and AS-level views to display all archived and derivative data related to all networks crossed by the probed path | --- | |
c. | combine bdrmap and MAP-IT (by UPenn) into a unified border mapping module | --- | done |
1.3: Improve MANIC functionality | |||
b. | enable comparative views of a given interconnect from different locations | Year 2 | |
c. | integrate geolocation information about networks | Year 3 | done |
d. | integrate facility-level information about interconnections | Year 2 | |
e. | re-architect influxDB to work on a cluster to support expanded community use | --- | in progress |
1.4: Improve AS-level derivative data sets | |||
a. | update AS-to-organization data set and associated tools (e.g., API) | Year 3 | done |
b. | rearchitect prefix2AS database, build API | Year 3 | done |
1.5: Continue BGPStream development | |||
a. | implement bindings to the main BGPStream C library to facilitate use by external software modules | Year 1 | done |
b. | create distribution-specific BGPStream packages for various OSes (Ubuntu, Debian, FreeBSD, CentOS) | Year 2 | done |
c. | enable consumption of Periscope BGP data through the BGPStream API | Year 2 | done |
Task 2: Linking components into a multifunctional platform
2.1: enable use of BGPStream modules by all PANDA components | |||
2.2: enable cross-use of results from Ark, RIPE Atlas, and Periscope | |||
2.3: user interface that allows queries that synthesize multiple data sets |
Task 3: Integrate new external data infrastructure building blocks into PANDA
3.1: include data from home routers into MANIC | |||
3.2: explore the possibility of using video quality reports for cross-correlation with MANIC data | |||
3.3: incorporate large scale active measurements of DNS | |||
3.4: integrate user traffic data from home networks with BGP-aware and IXP-aware functionality |
PANDA Community Activities
Task 1: Increase community accessibility of unified platform and its underlying components
Description | Projected Date | Status | |
---|---|---|---|
1.1: Improve ITDK | |||
a. | create a simplified version removing complex artifacts (MOAs, AS loops and sets, hyperlinks) | --- | |
b. | render router graph amenable to processing by basic graph database tools | --- | |
c. | provide documentation | --- | |
d. | create an economist-friendly version | --- | |
1.2: Provide data products in easier-to-use, domain-specific formats (JSON, standard graph formats, inputs for network simulators) | |||
a. | develop custom tools for format conversions | --- | ongoing |
b. | improve libipmeta libraries for geolocaiton | --- | ongoing |
1.3: Create user-friendly interface to spoofer results accessible via PANDA | |||
a. | integrate spoofer data into PANDA web UI | --- | --- |
b. | integrate public BGP info on the stability of edge network address space (to evaluate the feasibility of deploying static access control lists) | --- | done |
Task 2: Provide support for multidisciplinary collaborations
Description | Projected Date | Status | |
---|---|---|---|
2.1: Regularly interact with PANDA users | |||
c. | conduct annual surveys on usability and impact of the platform | --- | |
2.2: Engage users via CAIDA annual workshop series: Active Internet Measurement Systems (AIMS) and Workshop on Internet Economics (WIE) | |||
a. | present new data sets associated with and/or resulting from PANDA | --- | done |
b. | introduce new PANDA capabilities as they develop | --- | done |
c. | conduct hands-on tutorials | --- | done |
2.3: Create and maintain an online community resource of project materials | |||
a. | create tutorials for using PANDA components, suitable for classroom use | --- | done |
e. | project wiki | --- | done (internal) |
2.4: Organize and host annual external advisory board meetings | |||
a. | seek board members advice on enriching linkages between PANDA and targeted communities | --- | |
b. | identify emerging national and international issues to be tackled by PANDA | --- | ongoing |
c. | discuss data collection and analysis developments to inform policy-making | --- |
Task 3: Develop and implement a Science Gateway style interface for interactive access to PANDA
PANDA Strategic Advisory Council
As of April 2019, the PANDA Strategic Advisory Council consists of:
- David Clark (Chair, MIT/CSAIL)
- kc claffy (CAIDA/UC San Diego)
- Robert Cannon (FCC)
- Harold Feld (Public Knowledge)
- Shane Greenstein (Harvard Business School)
- Geoff Huston (APNIC)
- Scott Jordan (UC Irvine)
- Marvin Sirbu (Carnegie Mellon University)
Publications
-
Access Denied: Assessing Physical Risks to Internet Access Networks.
A. Marder, Z. Zhang, R. Mok, R. Padmanabhan, B. Huffaker, M. Luckie, A. Dainotti, k. claffy, A. Snoeren, A. Schulman.
USENIX Security Symposium, Aug 2023. -
A path forward: Improving Internet routing security by enabling trust zones.
D. Clark, C. Testart, M. Luckie, k. claffy.
Federal Communications Commission (FCC), Jul 2023. -
Notice of Ex Parte Meeting, Secure Internet Routing.
k. claffy, D. Clark.
Federal Communications Commission (FCC), Jan 2023. -
Mind Your MANRS: Measuring the MANRS Ecosystem.
B. Du, C. Testart, R. Fontugne, G. Akiwate, A. Snoeren, k. claffy.
ACM Internet Measurement Conference (IMC), Oct 2022. -
No Time for Downtime: Understanding Post-Attack Behaviors by Customers of Managed DNS Providers.
M. Haq, M. Jonker, R. Van Rijswijk-Deij, k. claffy, L. Nieuwenhuis, A. Abhishta.
IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Jun 2022. -
Challenges in measuring the Internet for the public Interest.
k. claffy, D. Clark.
Journal of Information Policy, May 2022. -
Temporal Correlation of Internet Observatories and Outposts.
J. Kepner, M. Jones, D. Andersen, A. Buluc, C. Byun, k. claffy, T. Davis, W. Arcand, J. Bernays, D. Bestor, W. Bergeron, V. Gadepally, D. Grant, M. Houle, M. Hubbell, H. Jananthan, A. Klein, C. Meiners, L. Milechin, A. Morris, J. Mullen, S. Pisharody, A. Prout, A. Reuther, A. Rosa, S. Samsi, D. Stetson, C. Yee, P. Michaleas.
Workshop on Graphs, Architectures, Programming, and Learning (GrAPL), May 2022. -
Design and Implementation of Web-based Speed Test Analysis Tool Kit.
R. Yang, R. Mok, S. Wu, X. Luo, H. Zou, W. Li.
Passive and Active Measurement Conference (PAM), Mar 2022. -
IRR Hygiene in the RPKI Era.
B. Du, G. Akiwate, T. Krenc, C. Testart, A. Marder, B. Huffaker, A. Snoeren, k. claffy.
Passive and Active Measurement Conference (PAM), Mar 2022. -
Jitterbug: A new framework for jitter-based congestion inference.
E. Carisimo, R. Mok, D. Clark, k. claffy.
Passive and Active Measurement Conference (PAM), Mar 2022. -
Learning to Extract Geographic Information from Internet Router Hostnames.
M. Luckie, B. Huffaker, A. Marder, Z. Bischof, M. Fletcher, k. claffy.
ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT), Dec 2021. -
Learning Regexes to Extract Network Names from Hostnames.
M. Luckie, A. Marder, B. Huffaker, k. claffy.
Asian Internet Engineering Conference (AINTEC), Dec 2021. -
Inferring Regional Access Network Topologies: Methods and Applications.
Z. Zhang, A. Marder, R. Mok, B. Huffaker, M. Luckie, k. claffy, A. Schulman.
ACM Internet Measurement Conference (IMC), Nov 2021. -
IRR Hygiene in the RPKI Era.
B. Du, A. Snoeren, k. claffy.
ACM Internet Measurement Conference (IMC) Poster, Nov 2021. -
Measuring the network performance of Google Cloud Platform.
R. Mok, H. Zou, R. Yang, T. Koch, E. Katz-Bassett, k. claffy.
ACM Internet Measurement Conference (IMC), Nov 2021. -
Risky BIZness: Risks Derived from Registrar Name Management.
G. Akiwate, S. Savage, G. Voelker, k. claffy.
ACM Internet Measurement Conference (IMC), Nov 2021. -
Characterization of Anycast Adoption in the DNS Authoritative Infrastructure.
R. Sommese, G. Akiwate, M. Jonker, G. Moura, M. Davids, R. Van Rijswijk-Deij, G. Voelker, S. Savage, k. claffy, A. Sperotto.
Network Traffic Measurement and Analysis Conference (TMA), Sep 2021. -
Challenges in measuring the Internet for the public Interest.
D. Clark, k. claffy.
Research Conference on Communications, Information, and Internet Policy (TPRC), Aug 2021. -
Spatial Temporal Analysis of 40,000,000,000,000 Internet Darkspace Packets.
J. Kepner, M. Jones, D. Andersen, A. Buluc, C. Byun, k. claffy, T. Davis, W. Arcand, J. Bernays, D. Bestor, W. Bergeron, V. Gadepally, M. Houle, M. Hubbell, A. Klein, C. Meiners, L. Milechin, J. Mullen, S. Pisharody, A. Prout, A. Reuther, A. Rosa, S. Samsi, D. Stetson, A. Tse, C. Yee, P. Michaleas.
IEEE High Performance Extreme Computing Conference (HPEC), Aug 2021. -
Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021) Final Report.
k. claffy, D. Clark, F. Bustamante, J. Heidemann, M. Jonker, A. Schulman, E. Zegura.
ACM SIGCOMM Computer Communication Review (CCR), Jul 2021. -
Workshop on Internet Economics (WIE 2020) Final Report.
k. claffy, D. Clark.
ACM SIGCOMM Computer Communication Review (CCR), Apr 2021. -
The impact of the General Data Protection Regulation on internet interconnection.
R. Zhuo, B. Huffaker, k. claffy, S. Greenstein.
Telecommunications Policy, Mar 2021. -
Inferring Cloud Interconnections: Validation, Geolocation, and Routing Behavior.
A. Marder, k. claffy, A. Snoeren.
Passive and Active Measurement Conference (PAM), Mar 2021. -
A Data-Driven Approach to Understanding the State of Internet Routing Security.
C. Testart, D. Clark.
Research Conference on Communications, Information, and Internet Policy (TPRC), Feb 2021. -
Measuring the impact of COVID-19 on cloud network performance.
R. Mok, k. claffy.
COVID-19 Network Impacts Workshop, Nov 2020. -
Improving the Efficiency of QoE Crowdtesting.
R. Mok, G. Kawaguti, J. Okamoto.
ACM Quality of Experience in Visual Multimedia Applications (QOEVMA), Oct 2020. -
Learning to Extract and Use ASNs in Hostnames.
M. Luckie, A. Marder, M. Fletcher, B. Huffaker, k. claffy.
ACM Internet Measurement Conference (IMC), Oct 2020. -
Trufflehunter: Cache Snooping Rare Domains at Large Public DNS Resolvers.
A. Randall, E. Liu, G. Akiwate, R. Padmanabhan, G. Voelker, S. Savage, A. Schulman.
ACM Internet Measurement Conference (IMC), Oct 2020. -
Unresolved Issues: Prevalence, Persistence, and Perils of Lame Delegations.
G. Akiwate, M. Jonker, R. Sommese, I. Foster, G. Voelker, S. Savage, k. claffy.
ACM Internet Measurement Conference (IMC), Oct 2020. -
Spoofed traffic inference at IXPs: Challenges, methods and analysis.
L. Müller, M. Luckie, B. Huffaker, k. claffy, M. Barcellos.
Computer Networks, Aug 2020. -
Policy challenges in mapping Internet interdomain congestion.
k. claffy, D. Clark, S. Bauer, A. Dhamdhere.
Journal of Information Policy, Jul 2020. -
The Forgotten Side of DNS: Orphan and Abandoned Records.
R. Sommese, M. Jonker, R. Van Rijswijk-Deij, A. Dainotti, k. claffy, A. Sperotto.
Workshop on Traffic Measurements for Cybersecurity, Jun 2020. -
vrfinder: Finding Outbound Addresses in Traceroute.
A. Marder, M. Luckie, B. Huffaker, k. claffy.
SIGMETRICS, Jun 2020. -
RIPE IPmap Active Geolocation: Mechanism and Performance Evaluation.
B. Du, M. Candela, B. Huffaker, A. Snoeren, k. claffy.
ACM SIGCOMM Computer Communication Review (CCR), Apr 2020. -
APPLE: Alias Pruning by Path Length Estimation.
A. Marder.
Passive and Active Measurement Conference (PAM), Mar 2020. -
To Filter or not to Filter: Measuring the Benefits of Registering in the RPKI Today.
C. Testart, P. Richter, A. King, A. Dainotti, D. Clark.
Passive and Active Measurement Conference (PAM), Mar 2020. -
Unintended consequences: Effects of submarine cable deployment on Internet routing.
R. Fanou, B. Huffaker, R. Mok, k. claffy.
Passive and Active Measurement Conference (PAM), Mar 2020. -
When parents and children disagree: Diving into DNS delegation inconsistency.
R. Sommese, G. Moura, M. Jonker, R. Van Rijswijk-Deij, A. Dainotti, k. claffy, A. Sperotto.
Passive and Active Measurement Conference (PAM), Mar 2020. -
Challenges in Inferring Spoofed Traffic at IXPs.
L. Müller, M. Luckie, B. Huffaker, k. claffy, M. Barcellos.
ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT), Dec 2019. -
Profiling BGP Serial Hijackers: Capturing Persistent Misbehavior in the Global Routing Table.
C. Testart, P. Richter, A. King, A. Dainotti, D. Clark.
ACM Internet Measurement Conference (IMC), Oct 2019. -
Regulation When Platforms Are Layered.
W. Lehr, D. Clark, S. Bauer, k. claffy.
Telecommunications Policy Research Conference (TPRC), Sep 2019. -
Toward a Theory of Harms in the Internet Ecosystem.
D. Clark, k. claffy.
Telecommunications Policy Research Conference (TPRC), Sep 2019. -
The 11th Workshop on Active Internet Measurements (AIMS-11) Workshop Report.
k. claffy, D. Clark.
ACM SIGCOMM Computer Communication Review (CCR), Jul 2019. -
Workshop on Internet Economics (WIE2018) Final Report.
k. claffy, D. Clark.
ACM SIGCOMM Computer Communication Review (CCR), Apr 2019. -
Exploring and Analysing the African Web Ecosystem.
R. Fanou, G. Tyson, E. Leao Fernandes, P. Francois, F. Valera, A. Sathiaseelan.
ACM Transactions on the Web (TWEB), Nov 2018.
Acknowledgment of awarding agency's support
This material is based on research sponsored by the National Science Foundation (NSF) grant OAC-1724853. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.
Additional Content
DIBBs: Integrated Platform for Applied Network Data Analysis (PANDA): Proposal
An abbreviated version of the original proposal is shown below.