Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis > funding : stardust
STARDUST: Sustainable Tools for Analysis and Research on Darknet Unsolicited Traffic
Sponsored by:
National Science Foundation (NSF)

This project aims at maintaining continued operation of the UCSD Network Telescope infrastructure and maximizing its utility to researchers from various disciplines.

Funding source: NSF CNS-1730661. Period of performance: October 1, 2017 - September 30, 2020.

|   Project Summary    Proposal   |

Project Summary

The UCSD Network Telescope (UCSD-NT) is a passive monitoring system, which captures unsolicited Internet traffic sent to a large segment of unassigned IPv4 address space. For over a decade, this instrumentation has enabled global visibility into macroscopic Internet phenomena that few other data sources can offer. It has provided relevant data used in a broad set of sub-disciplines in Computer & Information Science & Engineering (CISE) and beyond: from network and systems security and stability, to machine learning and big data processing techniques, and, most recently, for studies of cyberwarfare and political repression of communication. In 2011 we enhanced the Telescope instrumentation to enable access to raw and live telescope traffic data, thus expanding the scope of possible research questions and the circle of researchers using the data. As of January 2017 we were aware of over 100 publications (a lower bound) - without UCSD co-authors - that used UCSD-NT data. Yet the infrastructure was lagging behind the increasing demands in terms of storage, computing resources, and system administration. These issues hindered our ability to continue sharing UCSD-NT data with researchers, and required compromises that limited the availability of this unique resource.

The STARDUST project will help extend and sustain operation of the UCSD-NT infrastructure. We will upgrade and modernize the current infrastructure to handle the predicted growth in traffic, leverage virtualization and NSF-funded HPC platforms at the San Diego Supercomputer Center for computational data analysis, and introduce meta-data semantics to simplify many tasks researchers typically want to do with UCSD-NT data. The proposed modifications will leave researchers more time (and available HPC resources) to focus on their specific scientific questions. Moreover, the project will forge an interdisciplinary collaboration between researchers from the field of computer networks and HPC scientists and engineers to experiment with novel approaches for research on live traffic analysis.

The stabilized and enhanced infrastructure capabilities will better serve a diverse range of academic researchers, the vast majority of whom have no access to any other source of global Internet traffic data. The proposed enhancements will support invaluable hands-on experience in operationally relevant network security and traffic analysis research engaging a wide audience of computer science faculty and students in the use of our tools and data. Project results will contribute to advancing knowledge in diverse CISE disciplines, e.g., facilitating the development of efficient strategies for early detection and mitigation of cyber attacks, supporting macroscopic Internet performance and reliability assessments, and opening a new domain for the application of live streaming big data analysis and in situ machine learning techniques.

Project Milestones

  •  Task 1 : Upgrade and modernize the UCSD-NT infrastructure (Years 1 and 2);
  •  Task 2 : Transition the data analysis infrastructure to use NSF HPC resources (Years 1, 2, and 3);
    •  Task 2.1 : Deploy cloud-compute support using novel virtualization features on Comet supercomputer (Years 1 and 2);
    •  Task 2.2 : Develop and deploy live packet capture and distribution software (Years 1 and 2);
    •  Task 2.3 : Support dynamically provisioned specialized Big Data environments (Years 1 and 2);
  •  Task 3 : Reduce processing complexity and simplify data analysis (Years 1, 2, and 3).
  •  Task 4 : Communal activities (Years 1, 2, and 3).

Project Timeline

4.1Open project web siteOct 2017
4.2Start a mailing list of STARDUST usersOct 2017
4.3Create internal project wikiNov 2017
1.1Purchase and deploy a high performance 10 Gbps capture card with accurate time stampingDec 2017
1.2Upgrade connected device interfaces (NP-router, storage server) to 10 GbpsDec 2017
3.1Extend Corsaro and related libraries to tag FlowTuple information with meta-data
- geolocation
- origin AS
- spoofed source
Jun 2018
4.4Organize and host the first DUST WorkshopJun 2018
2.1.1Provision and deploy a Comet Virtual ClusterSep 2018
2.2.1Customize and extend the WDcap packet capture software to forward traffic over a 10 Gbps management network interface to a CAIDA serverSep 2018
4.5Publish Workshop reportSep 2018
1.3Purchase and deploy an additional storage server and attached disk array (~200 TB capacity)Dec 2018
2.2.2Customize and extend the libtrace "RT" format for encapsulation and distribution of captured trafficDec 2018
2.3.1Develop and deploy an interface to request resources for processing historical telescope dataDec 2018
2.1.2Develop and document a pre-configured OS image tailored for telescope data analysisMar 2019
2.3.2Develop helper routines/APIs for Spark and Hadoop to retrieve historical data directly from the archive during processingJun 2019
3.2Deploy meta-data tagging system on cloud-compute environmentJun 2019
2.1.3Develop and deploy management interfaces to export a snapshot of a researcher's VM for archivingSep 2019
2.1.4Implement resource accounting on a per-user basis, analyze its applicabilitySep 2019
2.3.3Develop sample analysis scripts and documentation for implementing longitudinal analysesSep 2019
3.3Deploy several "operational" analysis VMs to process the telescope traffic and derive multi-level aggregated datasetsSep 2019
4.6Write and publish AUP for access to the new virtual environmentsSep 2019
4.7Announce new UCSD-NT capabilities onlineSep 2019
3.4Explore efficient indexing of FlowTuple records to enable analysis of traffic with specific meta-data characteristicsDec 2019
3.5Provide real-time streaming access to Flow Tuple data using an open-source stream processing platform KafkaMar 2020
4.8Organize and host the second DUST WorkshopMar 2020
3.6Create streams containing specified subsets of the overall traffic (e.g., one stream per country)Jun 2020
4.9Publish workshop report and recommendationsJun 2020
4.10Refine the upgraded UCSD-NT platform based on users' feedbackSep 2020
  Last Modified: Mon Aug-14-2017 16:39:19 PDT
  Page URL: