NSF Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021)

From January 11-12, 2021, we hosted the NSF-funded Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021) online via a closed-session teleconference. The session will not be recorded for public rebroadcast.

This is the first in a series of WOMBIR workshops. The second workshop, WOMBIR-2, is to be held in April.

Date: Jan 11 (Mon) - 12 (Tue) 2021, 11am EST (8am PST) - 3:30pm EST (12:30pm PST)
Place: Closed-session Video Teleconference via Zoom

Background

A large part of computer science is the discovery and transition to practice of new concepts, advances in performance, and derivation of foundational principles. However, another essential part of the computer science agenda is to understand the behavior of large systems that arise from our earlier innovation. The Internet is the best example of a computer-science artifact so complex that its behavior cannot be derived from the specification of its components. One has to measure to see how it is behaving. For this reason, measurement needs to be an essential component of the portfolio of computer science research.

We propose to hold a virtual workshop focused on the identification of critical questions about the Internet that justify research, exploration of barriers to successful execution of that research, and collective activities that might facilitate that research. Long-term data collection and persistent infrastructure foster reproducibility and repeatability, robustness, and extensibility of research results. However, as in many fields, measurement and data sharing infrastructure is expensive to create, deploy, and maintain. We must find ways to nurture and support longitudinal data collection, but also weigh the cost of maintaining such infrastructure against the potential benefits of the generated data.

Another consideration for the workshop is whether there are critical questions that are relevant to the future of the Internet but not amenable to third-party measurement. If so, the research community may need new policies and institutions to support data collection and sharing, similar to other disciplines. Given this reality, what is the role of traditional NSF-funded academic research in advancing scientific study of the Internet?

Workshop Overview

The goals of this workshop are to identify critical research questions that warrant a call for network measurement (broadly defined), identify barriers and facilitators of that research, and discuss how research results can have impact beyond the research community.

As we think about how NSF can foster a robust community of network researchers, we see a range of issues that form an arc from question through technique to impact: We have structured this workshop around a set of issues that start with the identification of critical questions that justify research, and then explores the various possible barriers to the successful execution of that research, and collective activities that might facilitate that research.

For the critical questions that we identify, the workshop will explore:

What data is needed?
What infrastructure is needed to collect such data?
Given that access to data is the essential methodological component of modern data-driven analytic techniques (ML/AI), should the community collectively tackle the issues of equitable and sustainable data sharing and curation?
How can we develop best practices to facilitate cooperation or collaboration with commercial service providers as we collect data? How can the community deal with controlled sharing of proprietary data? Are practices such as secure enclaves, anonymization and acceptable usage agreements effective? What are the roles of various actors in the research ecosystem (e.g., PIs, institutions, funders) in fostering data sharing for Internet science?
The ultimate goal of network measurement is often to enable new action--how do we make knowledge useful? What steps make the data we collect useful at addressing real-world problems?

Workshop Chairs

David Clark (MIT)
John Heidemann (USC)
kc claffy (CAIDA/UC San Diego)

Steering Committee

Mattijs Jonker (U Twente)
Fabian Bustamante (Northwestern University)
David Clark (MIT)
John Heidemann (USC)
kc claffy (CAIDA/UC San Diego)

Structure of the Workshop

Potential attendees must submit a one-page white paper. Every white paper should identify one or more critical questions that can justify a call to arms. We will organize these questions into clusters, and the workshop will begin with a set of sessions, each centered on one of the clusters. In each cluster discussion, we will explore the issues listed above, with the goal of understanding, for this cluster of critical questions, a realistic path from research to impact.

We will then end the workshop with a plenary session where we compare the conclusions of the various cluster discussions, identify and explore common issues as well as issues that might have been missed, and attempt to draw overall conclusions.

We do not want to constrain the thinking behind the white papers by listing what the clusters will be. Typical topics for network measurement include performance, security, privacy, stability and resilience, growth, censorship, economics, and neutrality. But we want to hear what participants consider most important looking to the future. Whether the focus is new technology such as broadband wireless, cloud and data centers, mobile, new edge devices or underserved regions, and whether it is at layers of infrastructure, applications or user experience, the goal of each white paper should be a vision of how the field should approach challenges of the future.

Agenda

The workshop will be January 11 and 12, Monday and Tuesday, starting at 11am EST (8am PST) and running for about 4 hours each day. For presenters, the session format will be 1 hour sessions, with 3 or 4 5-minute paper presentations, each followed by 5 minutes of Q&A, then the remainder of the time general discussion.

Bolded entries indicate a talk.

January 11 (Monday)

11:00am - 11:15am (8:00am - 8:15am PST) Session 0: Framing the Workshop

Everyone is encouraged to attend this opening session.

Moderator: kc claffy (CAIDA/UC San Diego)

NSF Viewpoint: Deepankar Medhi (Program Director in the CNS Division), Gurdip Singh (Division Director for CNS Division)

11:15am - 12:05pm EST (8:15am - 9:05am PST) Session 1: Challenge: Understanding Properties of Internet Access

This session is focused on a challenge question that several papers addressed: What sort of measurement will be useful as part of addressing the digital divide -- the served and the un(der)served. We organized the session under the assumption that the challenge was well-understood; what is interesting is the range of measurements that different papers mentioned to address the challenge.

Moderator: David Clark (MIT/CSAIL)

What should we measure?

Udit Paul (UC Santa Barbara), Arpit Gupta (UC Santa Barbara), Elizabeth Belding (UC Santa Barbara), Matt Calder (Microsoft / Columbia University) Internet Quality Assessment and Measurement Technique Challenges slides

Challenges

Beatriz Palacios Abad (Georgia Institute of Technology), Elizabeth Belding (UC Santa Barbara), Morgan Vigil-Hayes (Northern Arizona University), Ellen Zegura (Georgia Institute of Technology) Measuring and Improving Underserved Community Access
Nick Feamster, Kyle MacMillan, Nicole Marwell, Tarun Mangla, Jamie Saxon (University of Chicago) Scale-Up Community-Level Measurements of Internet Adoption and Performance

Other Related Reading:
- Henning Schulzrinne (Columbia University) How are different user demographics actually experiencing the internet?

12:15pm - 1:05pm EST (9:15am - 10:05am PST) Session 2: Broader Goals for Measurement-Based Internet Research

What should NSF’s role be in the area of network measurement as it relates to network research? How should NSF relate to other government agencies (FCC, FTC, new efforts, etc.)? What research questions might drive federal investment in the area of network measurement.

Moderator: Mattijs Jonker (U Twente)

Measurement to inform policy: who should do it?

Scott Jordan (UC Irvine) Interconnection, privacy, and broadband performance

Transparency

Ralph Holz, Roland van Rijswijk-Deij (University of Twente) Quest(s) for transparency

What is a realistic role for academic research in this modern era?

David Clark (MIT CSAIL), kc claffy (CAIDA, UC San Diego) The Role of Academic Research in a Bureau of Cyber Statistics

1:15pm - 2:05pm EST (10:15am - 11:05am PST) Session 3: Cooperation with industry and operations to get relevant data

Many papers emphasized the need to cooperate with industry to get the necessary data. Different research questions will require different approaches to getting the necessary data. A challenge is that industry data is often not available for peer-review or subsequent research. We ask the presenters to mention the specific problem with which they are concerned, but to focus on how we can facilitate or encourage cooperation.

Moderator: David Clark (MIT/CSAIL)

What research questions warrant NSF action to improve cooperation with industry?

Paul Barford (University of Wisconsin - Madison) Bringing the Optical Layer of the Internet into Focus for Measurement-based Study
Zhi-Li Zhang (University of Minnesota), Feng Qian (University of Minnesota), Arvind Narayanan (University of Minnesota), Z. Morley Mao (University of Michigan) A Collaborative Crowd-sourced User-carrier-app Ecosystem to enable Next-generation Wireless Research
Alex Moura (RNP) The Challenge of Delivering Open OSS Data for Research slides

Other Related Reading:

Monisha Ghosh, Vanlin Sathya, Muhammad Rochman, William Payne (University of Chicago) Broadband Wireless Network Measurements Using Smartphones
Soudeh Ghorbani (Johns Hopkins) Cloud Network Trace Anonymization
Jennifer Schopf (Indiana University), Brian Tierney (ESnet), Hans Addleman (Indiana University), Doug Southworth (Indiana University) Routing Behaviors on R&E Networks
Matthew Luckie (University of Waikato), kc claffy (CAIDA, UC San Diego) Using Proprietary Data Sets to Validate Analytic Methods for Topological and Geographic Inference

2:15pm EST (11:15am PST) Session 4: New methods and analytics

A number of papers called for advances in methodology to deal with new capabilities and known barriers, and to make better use of the data (and metadata) that we have. We have collected several of these topics into this session.

Moderator: Fabian Bustamante (Northwestern University)

Capture data in a form suited to AI/ML tools. What would good labeled data be?

Ram Durairajan (University of Oregon), Reza Rejaie (University of Oregon), Walter Willinger (NIKSUN, Inc.) Challenges in Using AI/ML for Networking slides

Processing methods to deal with privacy? e.g., “Code to data”

Arpit Gupta (UC Santa Barbara), Walter Willinger (NIKSUN, Inc.) Democratizing Networking Research in the Era of AI/ML
Anita Nikolich (UIUC) Collecting, Aggregating and Sharing Better Internet Maps slides

New methods to deal with sample bias

Esteban Carisimo (Universidad de Buenos Aires), Fabian Bustamante (Northwestern University) The added value of ads - Measuring the rest of the Internet

Other analytic techniques

John Heidemann (USC/ISI), Wes Hardaker (USC/ISI), Michalis Kallitsis (MERIT Networks), Jelena Mirkovic (USC/ISI) Getting to "Join" in Privacy-Aware Data Sharing slides

Other Related Reading:

Leon Reznik, Igor Khokhlov (RIT) From Extensive Data Collection to Intensive Quality Knowledge Production
Feng Ye (University of Dayton) AI Development and Validation in Network Traffic Analytics
Achilles Petras, Paul Veitch (BT Applied Research) Data analytics & forensics for emerging cloud native converged networks

January 12 (Tuesday)

11:00am - 11:50am EST (8:00am - 8:50am PST) Session 5A: Challenges of data collection and curation

The community has looked at data collection and curation for a long time, but research continues on new approaches. We encourage speakers in this session to focus on new ideas and how they bring a fresh perspective on the challenges, and how these new approaches may influence NSF priorities. We also encourage speakers to identify specific research questions that may change the shape, characteristics, or functionality of a proposed platform.

Moderator: John Heidemann (USC)

Lack of a widely-deployed open platform for measurement

Berat Can Şenel (Sorbonne Université), Maxime Mouchet (Sorbonne Université), Justin Cappos (NYU Tandon School), Olivier Fourmaux (Sorbonne Université), Timur Friedman (Sorbonne Université), Rick McGeer (US Ignite) Shared internet-scale measurement platforms slides
Paul Barford (University of Wisconsin - Madison), Fabian Bustamante (Northwestern University) An Observatory for the Submarine Cable Network
Matt Calder (Microsoft / Columbia University), Ethan Katz-Bassett (Columbia University) Not all paths are created equal: The case for datasets weighted by traffic volume
Avi Freedman (Kentik) The importance and feasibility of traffic weighting on Internet performance analyses; and Planned approaches to obtain massively distributed edge performance observation

Other Related Reading

Susmit Shannigrahi (Tennessee Tech), Craig Partridge (Colorado State University), Anton Betten (Colorado State University) Measuring and understanding errors in the Internet

12:00pm - 12:50pm EST (9:00am - 9:50am PST) Session 5B: Challenges of data collection and curation, Part B

A continuation of Session 5A.

Moderator: John Heidemann (USC)

Serious focus on barriers to data sharing

Richard Clayton (University of Cambridge) Position Paper: Cambridge Cyber-crime lab
Jelena Mirkovic (USC Information Sciences Institute), Stephen Hayne (Colorado State University), Michalis Kallitsis (MERIT Networks), Wes Hardaker (USC Information Sciences Institute), John Heidemann (USC Information Sciences Institute), Christos Papadopoulos (University of Memphis), Devkishen Sisodia (University of Oregon) Cybersecurity Datasets: A Mirage slides

Attention to importance of stable long-term funding and longitudinal measurement

Mark Allman (ICSI) Moving From Opportunistic to Systematic Measurement slides
Ethan Katz-Bassett, Matt Calder (Columbia University) The case for (support for) public tools and longitudinal measurements

Other related reading:

Emile Aben (RIPE NCC), Romain Fontugne (IIJ) Towards Fixing Internet Measurement Infrastructure Biases
Ramakrishna Padmanabhan, Alberto Dainotti (CAIDA, UC San Diego) Internet Outages: How much of a problem are they?
Amy Csizmar Dalal (Carleton College) Internet Science Starts at Home: Integrating Residential Network Data into the Internet Measurement Space
Danny Yuxing Huang (New York University), Deepak Kumar (Stanford University) Improving Measurement Vantage Points Within Smart Home Networks to Identify and Mitigate IoT Security and

1:00pm - 1:50pm EST (10:00am - 10:50am PST) Session 6: Ethics of measurement and analysis

As research has grown, the research community is increasingly recognizing the need to grapple with ethical issues in data collection and use. Reflecting on where we are, this session will consider where we should be going and what roles NSF and peer review may take. Does the community need an advisory IRB?

Moderator: David Clark (MIT/CSAIL)

Ethics of measurement and analysis

Fred Cohen (fc0.co) The limits of measurement for cyber security
Dave Levin (University of Maryland) Challenges in Measuring and Evading Nation-state Censors slides
Jason Livingood (Comcast) Policy Influence & Ethical Considerations in the Use of Measurement Data slides

2:00pm - 3:30pm EST (11:00am - 12:30pm PST) Session 7: Closing Discussion and Directions

The outcome of this workshop will be a report to NSF with recommendations about promising directions. In this session we will encourage cross-cutting discussion.

Moderator: David Clark (MIT/CSAIL)

NSF Viewpoint: Erwin Gianchandani (Deputy Assistant Director for CISE)

Roundtable

What did we learn?
What is the point you want the workshop report to make?
Will you write a paragraph?

Adjourn and fill exit survey

Participating in the Workshop

We solicit one-page white papers for researchers who wish to participate in the workshop. Each submission must provide the following information:

Identify one or more critical research questions.
For each question, identify which points in the above list of issues are of most concern.
How NSF could support one or more aspects of the research arc (new infrastructure, new analytic techiques, data sharing, long-term collection, and making knowledge useful).

Note: It is not a goal of this workshop for participants to present their latest research!

White papers that help shape one or more sessions will help us structure the workshop to delve more deeply into the material.

The workshop will be a virtual format using Zoom. We expect sessions to mix lightning talks with considerable discussion time, with 4 hours on each of the two days.

Participants must commit to attend one or more of the cluster discussions and the plenary session. Participants are welcome to attend all the cluster discussions, but we envision serial breakout sessions, a format we believe will better fit a virtual workshop.

Key Dates

CFP out: Monday 30 Nov 2020
Registration deadline: Monday 21 Dec 2020 11:59:58pm PST
White papers due: Friday 18 Dec 2020 Wednesday 23 Dec 2020 11:59:59pm PST
Talk acceptance notification: Monday 31 Dec 2020
Workshop dates: Monday 11 Jan 2021 - 12 Jan 2021
Submissions are accepted at https://ant.isi.edu/wombir2021/

## Acknowledgments

Funding for the WOMBIR workshop is provided by the National Science Foundation (NSF) grant CNS-2111828 Workshop on Overcoming Measurement Barriers to Internet Research.