Skip to Content
[CAIDA - Cooperative Association for Internet Data Analysis logo]
The Cooperative Association for Internet Data Analysis
www.caida.org > funding : : sdci-datcat
(NSF OCI-1127500) SDCI-DatCat: Metadata Management Software Tools to Support Cybersecurity Research and Development of Sustainable Cyberinfrastructure
|  Project Summary    Proposal  |
Sponsored by:National Science Foundation (NSF)

Project Summary

Collecting representative Internet measurement data has remained a challenging and often elusive goal for the networking community. Obstacles include the Internet's scale and scope, technical challenges in capturing, fltering and sampling high data rates, diffculty obtaining measurements across a decentralized network with radically distributed ownership, cost of building and operating instrumentation, and political hurdles. Even (or especially) with all these obstacles, the demand for and importance of representative Internet data sets is increasing -- which is good news for rigorous scientifc Internet research. The primary driver of this demand is the now pervasive acknowledgement that we are unable to keep up with cybersecurity threats to various critical and increasingly interdependent infrastructures, and that a primary limiting factor in the escalating arms race is our surprisingly still primitive approach to sharing cyberinfrastructure data.

CAIDA has developed an Internet Measurement Data Catalog -- IMDC -- an index of information (metadata) about data sets and their availability under various usage policies. This catalog confronted a signifcant challenge in network science: reducing the cost of searching for data by organizing metadata about accessible Internet data sets into a single repository. We developed the underlying DatCat architecture and prototype software implementation to support the IMDC.

We propose to integrate the lessons we have learned during our research, development and operational experience with the IMDC to expand the underlying software capabilities to support the cybersecurity research and cyberinfrastructure development communities. Our three primary deployment goals are to: (1) reduce the burden on those contributing data via a streamlined interface and tools for easier indexing, annotation and navigation of relevant data; (2) convert from use of a proprietary database backend (Oracle) to a completely open source solution; and (3) to expand DatCat's relevance to the cybersecurity and other research communities. This last goal includes outreach activities such as workshops and demonstrations at security-related PI meetings, creating and indexing new data sets -- ccTLD DNS zone fles -- which have been declared critically lacking by the cybersecurity community, and creation of public web forums for discussion of specifc and broader data-sharing issues.

Although our focus for this SDCI project will be enhancing DatCat's utility for the cybersecurity and cyberinfrastructure research community, our proposed design objectives and outreach plans explicitly target a range of science and engineering communities. In particular, we believe the proposed software development can support and promote NSF's newly announced Data Sharing Policy, which as of January 2011 requires all proposals to include a plan for how researchers intend to share their data with other researchers.

Intellectual Merit. The proposed software development activities will support a range of measurable benefts to cyberinfrastructure research: maximizing the re-use of existing Internet data; decreasing the time spent collecting redundant data; reducing the effort needed to start a new study; promoting validation and reproducibility of analyses and results; enabling longitudinal and cross-disciplinary studies of the Internet; and opening up new cross-domain areas of transformative networking research.

Broader Impact. The broader impacts of this project are diverse. The success of the catalog and related workshops will facilitate wide dissemination of Internet measurement data to researchers and security experts across academic, commercial, and government sectors. By including education-oriented data collections in the catalog, this project creates an immediate link between research and education, and improves access to Internet research for underrepresented groups in computer science and engineering. Most importantly, the software created through this project will help other disciplines and sectors to develop their own catalog instances to support the type of data management plans now articulated as essential to NSF.

  Last Modified: Thurs Aug-4-2011 12:28:51 PDT
  Page URL: http://www.caida.org/funding/sdci-datcat/index.xml