Integrated Library for Advancing Network Data Science - (ILANDS)
We propose to enhance infrastructure to handle 100GB packet rates, and projected routing table growth, including deploying enhanced storage and compute resources to support long-term use of the data.
Work done in collaboration with subcontractors at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and University of Oregon Network Startup Resource Center (NSRC).
Principal Investigators: kc claffy David Clark
Funding source: CNS-2120399 Period of performance: October 1, 2021 - September 30, 2026.
Project Summary
Understanding the Internet’s changing character is impossible without realistic and representative datasets and measurement infrastructure that can support sustained longitudinal measurements as well as new experiments, and with resulting data available to scientific researchers. But there is a dearth of good data to support research, for several good reasons: complexity, scale, and cost of measurement instrumentation; information-hiding properties of the routing system, security and commercial sensitivities; costs of storing and processing the data; and lack of incentives to gather data in the first place. This lack of data hinders our ability to understand and reason about real-world properties of the Internet such as robustness, resilience, security, and stability.
CAIDA and NSRC propose to upgrade and integrate two of our measurement capabilities – 100GB traffic capture and BGP routing data collection – to enable a community of researchers across many institutions to collaborate on a high-level focused agenda.
We propose to enhance infrastructure to handle 100GB packet rates, and projected routing table growth, including deploying enhanced storage and compute resources to support long-term use of the data.
Our proposed approach integrates the community into the process from the beginning, to align the research goals and optimize NSF’s investment toward achievement of these goals. Our outreach coordination process will have five objectives: (1) shape what data we collect and store, (2) find new users of the infrastructure, especially from underrepresented groups, (3) bring our focus research collaborators together, (4) publish research results and analysis methods, and (5) establish a sustainability plan.
Projected Timeline
Task | Description | Projected Date | Organizations | Status |
---|---|---|---|---|
699
Traffic Data Infrastructure Enhancements |
||||
1.1 | Build 100GB traffic monitor | Year 1 | CAIDA | Done |
1.2 | Test and evaluate monitors | Year 2 | CAIDA | Done |
1.3 | Deploy monitors | Year 2 | CAIDA, DREN, Kentik | Done |
1.4 | Establish data enclave at CAIDA | Year 4 | CAIDA | |
1.5 | Manage and share traces | Year 3 | CAIDA | Done |
1.6 | Augment with Kentik data sources | Year 4 | CAIDA, Kentik | |
1.7 | User training and support | Year 5 | CAIDA | |
337
BGP Routing Data Infrastructure Enhancements |
||||
2.1 | Enhance BGPStream service broker | Year 3 | NSRC, CAIDA | Done |
2.2 | Interface to scamper at RouteViews | Year 2 | CAIDA, NSRC, Waikato | Done |
2.3 | Scale up BGP data collection | ongoing | CAIDA, NSRC | ongoing |
2.4 | Upgrade libbgpstream | Year 4 | CAIDA, NSRC | Done |
2.5 | Data integrity and quality controls | ongoing | CAIDA, NSRC | ongoing |
2.6 | Authentication | Year 3 | CAIDA | Done |
2.7 | RouteViews infrastructure updates | ongoing | NSRC | ongoing |
980
Outreach and Community Engagement |
||||
3.1 | Catalog Management | ongoing | CAIDA | ongoing |
3.2 | Ongoing user support | ongoing | CAIDA | ongoing |
3.3 | Biannual newsletters | ongoing | CAIDA | ongoing |
3.4 | Biannual community meetings | ongoing | CAIDA | ongoing |
3.5 | Annual community workshops | ongoing | CAIDA | ongoing |
3.6 | Annual community surveys | ongoing | CAIDA | ongoing |
3.7 | Sustainability plan report | ongoing | CAIDA | ongoing |
Collaborators
ILANDS involves substantial involvement of CISE researchers to advance our focused research agenda (listed alphabetically):
- Army Cyber Institute at West Point
- California Institute of Technology (Caltech)
- Canadian Internet Registration Authority (CIRA)
- Carnegie Mellon University (CMU)
- Colgate University
- Columbia University
- Freie Universitat Berlin
- HAW Hamburg
- Indiana University
- International Computer Science Institute (ICIR)
- Kentik
- Microsoft Research
- MIT/Computer Science and Artificial Intelligence Laboratory (CSAIL)
- MIT/Lincoln Laboratory
- Princeton University
- Purdue University
- RIPE NCC
- UC Davis
- Universidad de Buenos Aires
- University of Illinois Urbana-Champaign (UIUC)
- University of Minnesota
- University of Oregon/Network Startup Resource Center (NSRC)
- University of Waikato
- USC/ISI
Acknowledgment of awarding agency’s support
This material is based on research sponsored by the National Science Foundation (NSF) grant CNS-2120399. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.
Additional Content
Integrated Library for Advancing Network Data Science - (ILANDS)
Proposal for CCRI:Integrated Library for Advancing Network Data Science - (ILANDS)