NSF Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021)
From January 11-12, 2021, we hosted the NSF-funded Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021) online via a closed-session teleconference. The session will not be recorded for public rebroadcast.
This is the first in a series of WOMBIR workshops. The second workshop, WOMBIR-2, is to be held in April.
Place: Closed-session Video Teleconference via Zoom
Background
A large part of computer science is the discovery and transition to practice of new concepts, advances in performance, and derivation of foundational principles. However, another essential part of the computer science agenda is to understand the behavior of large systems that arise from our earlier innovation. The Internet is the best example of a computer-science artifact so complex that its behavior cannot be derived from the specification of its components. One has to measure to see how it is behaving. For this reason, measurement needs to be an essential component of the portfolio of computer science research.
We propose to hold a virtual workshop focused on the identification of critical questions about the Internet that justify research, exploration of barriers to successful execution of that research, and collective activities that might facilitate that research. Long-term data collection and persistent infrastructure foster reproducibility and repeatability, robustness, and extensibility of research results. However, as in many fields, measurement and data sharing infrastructure is expensive to create, deploy, and maintain. We must find ways to nurture and support longitudinal data collection, but also weigh the cost of maintaining such infrastructure against the potential benefits of the generated data.
Another consideration for the workshop is whether there are critical questions that are relevant to the future of the Internet but not amenable to third-party measurement. If so, the research community may need new policies and institutions to support data collection and sharing, similar to other disciplines. Given this reality, what is the role of traditional NSF-funded academic research in advancing scientific study of the Internet?
Workshop Overview
The goals of this workshop are to identify critical research questions that warrant a call for network measurement (broadly defined), identify barriers and facilitators of that research, and discuss how research results can have impact beyond the research community.
As we think about how NSF can foster a robust community of network researchers, we see a range of issues that form an arc from question through technique to impact: We have structured this workshop around a set of issues that start with the identification of critical questions that justify research, and then explores the various possible barriers to the successful execution of that research, and collective activities that might facilitate that research.
For the critical questions that we identify, the workshop will explore:
- What data is needed?
- What infrastructure is needed to collect such data?
- Given that access to data is the essential methodological component of modern data-driven analytic techniques (ML/AI), should the community collectively tackle the issues of equitable and sustainable data sharing and curation?
- How can we develop best practices to facilitate cooperation or collaboration with commercial service providers as we collect data? How can the community deal with controlled sharing of proprietary data? Are practices such as secure enclaves, anonymization and acceptable usage agreements effective? What are the roles of various actors in the research ecosystem (e.g., PIs, institutions, funders) in fostering data sharing for Internet science?
- The ultimate goal of network measurement is often to enable new action--how do we make knowledge useful? What steps make the data we collect useful at addressing real-world problems?
- David Clark (MIT)
- John Heidemann (USC)
- kc claffy (CAIDA/UC San Diego)
Workshop Chairs
- Mattijs Jonker (U Twente)
- Fabian Bustamante (Northwestern University)
- David Clark (MIT)
- John Heidemann (USC)
- kc claffy (CAIDA/UC San Diego)
Steering Committee
Structure of the Workshop
Potential attendees must submit a one-page white paper. Every white paper should identify one or more critical questions that can justify a call to arms. We will organize these questions into clusters, and the workshop will begin with a set of sessions, each centered on one of the clusters. In each cluster discussion, we will explore the issues listed above, with the goal of understanding, for this cluster of critical questions, a realistic path from research to impact.
We will then end the workshop with a plenary session where we compare the conclusions of the various cluster discussions, identify and explore common issues as well as issues that might have been missed, and attempt to draw overall conclusions.
We do not want to constrain the thinking behind the white papers by listing what the clusters will be. Typical topics for network measurement include performance, security, privacy, stability and resilience, growth, censorship, economics, and neutrality. But we want to hear what participants consider most important looking to the future. Whether the focus is new technology such as broadband wireless, cloud and data centers, mobile, new edge devices or underserved regions, and whether it is at layers of infrastructure, applications or user experience, the goal of each white paper should be a vision of how the field should approach challenges of the future.
Agenda
The workshop will be January 11 and 12, Monday and Tuesday, starting at 11am EST (8am PST) and running for about 4 hours each day. For presenters, the session format will be 1 hour sessions, with 3 or 4 5-minute paper presentations, each followed by 5 minutes of Q&A, then the remainder of the time general discussion.
Bolded entries indicate a talk.
January 11 (Monday)
- 11:00am - 11:15am (8:00am - 8:15am PST) Session 0: Framing the Workshop Everyone is encouraged to attend this opening session.
- 11:15am - 12:05pm EST (8:15am - 9:05am PST) Session 1: Challenge: Understanding Properties of Internet Access This session is focused on a challenge question that several papers addressed: What sort of measurement will be useful as part of addressing the digital divide -- the served and the un(der)served. We organized the session under the assumption that the challenge was well-understood; what is interesting is the range of measurements that different papers mentioned to address the challenge.
- What should we measure?
- Challenges
- Measuring and Improving Underserved Community Access
- Scale-Up Community-Level Measurements of Internet Adoption and Performance
- Other Related Reading:
- 12:15pm - 1:05pm EST (9:15am - 10:05am PST) Session 2: Broader Goals for Measurement-Based Internet Research What should NSF’s role be in the area of network measurement as it relates to network research? How should NSF relate to other government agencies (FCC, FTC, new efforts, etc.)? What research questions might drive federal investment in the area of network measurement.
- Measurement to inform policy: who should do it?
- Transparency
- What is a realistic role for academic research in this modern era?
- 1:15pm - 2:05pm EST (10:15am - 11:05am PST) Session 3: Cooperation with industry and operations to get relevant data Many papers emphasized the need to cooperate with industry to get the necessary data. Different research questions will require different approaches to getting the necessary data. A challenge is that industry data is often not available for peer-review or subsequent research. We ask the presenters to mention the specific problem with which they are concerned, but to focus on how we can facilitate or encourage cooperation.
- What research questions warrant NSF action to improve cooperation with industry?
- Bringing the Optical Layer of the Internet into Focus for Measurement-based Study
- A Collaborative Crowd-sourced User-carrier-app Ecosystem to enable Next-generation Wireless Research
- The Challenge of Delivering Open OSS Data for Research slides
- Other Related Reading:
- 2:15pm EST (11:15am PST) Session 4: New methods and analytics A number of papers called for advances in methodology to deal with new capabilities and known barriers, and to make better use of the data (and metadata) that we have. We have collected several of these topics into this session.
- Capture data in a form suited to AI/ML tools. What would good labeled data be?
- Processing methods to deal with privacy? e.g., “Code to data”
- Democratizing Networking Research in the Era of AI/ML
- Collecting, Aggregating and Sharing Better Internet Maps slides
- New methods to deal with sample bias
- Other analytic techniques
- Other Related Reading:
Moderator: kc claffy (CAIDA/UC San Diego)
NSF Viewpoint: Deepankar Medhi (Program Director in the CNS Division), Gurdip Singh (Division Director for CNS Division)
Moderator: David Clark (MIT/CSAIL)
Moderator: Mattijs Jonker (U Twente)
Moderator: David Clark (MIT/CSAIL)
Moderator: Fabian Bustamante (Northwestern University)
January 12 (Tuesday)
- 11:00am - 11:50am EST (8:00am - 8:50am PST) Session 5A: Challenges of data collection and curation The community has looked at data collection and curation for a long time, but research continues on new approaches. We encourage speakers in this session to focus on new ideas and how they bring a fresh perspective on the challenges, and how these new approaches may influence NSF priorities. We also encourage speakers to identify specific research questions that may change the shape, characteristics, or functionality of a proposed platform.
- Lack of a widely-deployed open platform for measurement
- Shared internet-scale measurement platforms slides
- An Observatory for the Submarine Cable Network
- Not all paths are created equal: The case for datasets weighted by traffic volume
- The importance and feasibility of traffic weighting on Internet performance analyses; and Planned approaches to obtain massively distributed edge performance observation
- Other Related Reading
- 12:00pm - 12:50pm EST (9:00am - 9:50am PST) Session 5B: Challenges of data collection and curation, Part B A continuation of Session 5A.
- Serious focus on barriers to data sharing
- Attention to importance of stable long-term funding and longitudinal measurement
- Moving From Opportunistic to Systematic Measurement slides
- The case for (support for) public tools and longitudinal measurements
- Other related reading:
- Towards Fixing Internet Measurement Infrastructure Biases
- Internet Outages: How much of a problem are they?
- Internet Science Starts at Home: Integrating Residential Network Data into the Internet Measurement Space
- Improving Measurement Vantage Points Within Smart Home Networks to Identify and Mitigate IoT Security and Privacy Risks
- 1:00pm - 1:50pm EST (10:00am - 10:50am PST) Session 6: Ethics of measurement and analysis As research has grown, the research community is increasingly recognizing the need to grapple with ethical issues in data collection and use. Reflecting on where we are, this session will consider where we should be going and what roles NSF and peer review may take. Does the community need an advisory IRB?
- Ethics of measurement and analysis
- 2:00pm - 3:30pm EST (11:00am - 12:30pm PST) Session 7: Closing Discussion and Directions The outcome of this workshop will be a report to NSF with recommendations about promising directions. In this session we will encourage cross-cutting discussion.
- Roundtable
- What did we learn?
- What is the point you want the workshop report to make?
- Will you write a paragraph?
- Adjourn and fill exit survey
Moderator: John Heidemann (USC)
Moderator: John Heidemann (USC)
Moderator: David Clark (MIT/CSAIL)
Moderator: David Clark (MIT/CSAIL)
NSF Viewpoint: Erwin Gianchandani (Deputy Assistant Director for CISE)
Participating in the Workshop
We solicit one-page white papers for researchers who wish to participate in the workshop. Each submission must provide the following information:
- Identify one or more critical research questions.
- For each question, identify which points in the above list of issues are of most concern.
- How NSF could support one or more aspects of the research arc (new infrastructure, new analytic techiques, data sharing, long-term collection, and making knowledge useful).
Note: It is not a goal of this workshop for participants to present their latest research!
White papers that help shape one or more sessions will help us structure the workshop to delve more deeply into the material.
The workshop will be a virtual format using Zoom. We expect sessions to mix lightning talks with considerable discussion time, with 4 hours on each of the two days.
Participants must commit to attend one or more of the cluster discussions and the plenary session. Participants are welcome to attend all the cluster discussions, but we envision serial breakout sessions, a format we believe will better fit a virtual workshop.
- CFP out: Monday 30 Nov 2020
- Registration deadline: Monday 21 Dec 2020 11:59:58pm PST
- White papers due: Friday 18 Dec 2020 Wednesday 23 Dec 2020 11:59:59pm PST
- Talk acceptance notification: Monday 31 Dec 2020
- Workshop dates: Monday 11 Jan 2021 - 12 Jan 2021
- Submissions are accepted at https://ant.isi.edu/wombir2021/