Sustainable data-handling and analysis methodologies for the IRNC networks

The goal of this IRNC Special Project is to identify and support the measurement priorities of the International Research Network Connections community.

Sponsored by:
National Science Foundation (NSF)

Principal Investigator: kc claffy

Funding source:  OCI-0963073 Period of performance: March 1, 2010 - February 28, 2014.

Project Summary

Effective Internet measurement raises daunting issues for the research community and funding agencies. Improved understanding of the structure and dynamics of Internet topology, routing, workload, performance, and vulnerabilities remain a disturbingly elusive priority, in part for lack of large-scale distributed network measurements available to scientific researchers. Ironically, even the research community networks struggle to make progress on these essential obstacles to cyberinfrastructure research. The data dearth is understandable. Measurement of operational Internet infrastructure involves navigating more complex and interconnected dimensions (logistical, financial, methodological, technical, legal, and ethical) than measurement in most scientific disciplines. CAIDA has been navigating these challenges with modest success for fifteen years, collecting, coordinating, curating, and sharing data sets for the Internet research and operational community in support of Internet science.

We propose three concrete contributions to the IRNC community's measurement efforts: to foster and distill discussion of how to best make IRNC data and statistics available, to adapt two CAIDA measurement technologies for IRNC community needs, and to experiment with two innovations in data-handling procedures applied to existing IRNC measurements. We will accomplish the first task by organizing and hosting a series of workshops, including half-day workshops at IRNC PI meetings to discuss IRNC measurement priorities and identify how CAIDA and other researchers can support them. In between IRNC PI meetings we will have 2-day annual workshops dedicated to measurement activities, to build on and extend previous efforts of the IRNC measurement group and explore in depth how the community can make better use of perfSONAR, metadata, and other data-handling and data-protection technologies.

Second, we propose to improve two CAIDA measurement technologies we already know can better serve the IRNC community: (1) We will upgrade our traffic reporting software to be IRNC-friendlier, by adding functionality to recognize IPv6 and DNSSEC, support anonymization and aggregation for privacy protection, and read data formats used by the majority of the IRNC operators, such as netflow output from routers. (2) We will (optionally) install, deploy, and manage IPv6-capable active measurement nodes at each interested IRNC site. IPv6 Internet reachability measurements are of particular interest since available data suggests the educational and government-supported communities are deploying IPv6 before the commercial sector.

Third, we propose to apply two innovations in data-handling procedures to existing IRNC measurement data. The first is a recently proposed framework for privacy-sensitive data sharing, to apply to data not appropriate for public posting, but explicitly requested through designated channels to use in clearly defined research. Second, we propose to illustrate our community building effort with a landmark reporting deliverable: a prototype of a "Bureau of Internet Statistics" report, hopefully inspiring other network infrastructure communities to join in this effort.

Intellectual merit. The proposed work will help IRNC operators better understand their networks by making more effective use of data they already collect as well as newer technologies for measurement and visibility of their networks. The data, tools, and distillations resulting from this effort will be made available to researchers using a privacy-sensitive sharing framework and will advance research in a number of sub-disciplines of network science.

Broader impact. Contributions from this project promise to strengthen activities in network modeling, simulation, analysis, and theoretical research, enabling the IRNC program to play a formative role in the emerging discipline of network science, and enhancing NSF's leading role in sustainable stewardship of cyberinfrastructure.

Management Plan

CAIDA personnel will be responsible for working on the proposed tasks. The requested budget supports 12.8 person-months of effort per year. The schedule of work below shows how we plan to accomplish the proposed tasks in three years of the project.

Task Number Task Description Projected Timeline Status
Task 1 Conduct the 1st workshop introducing the IRNC community to available CAIDA (and other) measurement tools and techniques Year 1 (1st quarter) done
Task 2 Create project web pages and start regular updates with relevant information Year 1 (1st and 2nd quarter) done
Task 3 Start deploying Ark monitors at interested collaborating IRNC sites Year 1 done (syd-au)
Task 4 Implement proposed modifications to CoralReef software suite Year 1 (1st and 2nd quarter) done
Task 5 Update report generator to include statistics of interest for the IRNC community Year 1 (2nd and 3rd quarter) done
Task 6 Introduce the Privacy-Sensitive Data Sharing (PSS) Framework to IRNC members Year 1 (2nd quarter) done
Task 7 Continue deploying Ark monitors at collaborating IRNC sites Year 2 done (per-au, sao2-br, bjl-gm)
Task 8 Create web pages showing per-node connectivity and performance monitored by Ark at participating IRNC sites Year 2 (1st quarter) done
Task 9 Organize an add-on workshop co-located with the IRNC PI meeting to report progress and discuss ongoing measurement issues Year 2 (1st quarter) done
Task 10 Assist the IRNC Data Providers with preparing MOUs and MOAs for their data sharing projects Year 2 done
Task 11 Continue CoralReef modifications to implement user feedback and requests Year 2 done
Task 12 Co-host (with ISC) a workshop to discuss novel case studies of network and security data analysis and data sharing Year 3 (3rd quarter) done
Task 13 Publish the workshop report Year 3 (4th quarter) done
Task 14 Maintain Ark monitors at collaborating IRNC sites and the corresponding statistics web pages Year 3 in progress (hnl-us)
Task 15 Conduct an add-on workshop co-located with the IRNC PI meeting to report progress and discuss ongoing measurement issues Year 3 (4th quarter) done
Task 16 Refine the Privacy-Sensitive Data Sharing Framework concept based on feedback and experience of IRNC Data Providers Year 3 (2nd quarter) done
Task 17 Lead discussion at workshop regarding how to begin data collection toward a sample report actualizing the "Bureau of Internet Statistics" (BIS) concept Year 3 (4th quarter) done
Task 18 Attend the final workshop to discuss and finalize the usage and value measurement report Year 3 (4th quarter) done