Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis > funding : stardust
STARDUST - Sustainable Tools for Analysis and Research on Darknet Unsolicited Traffic
Sponsored by:
National Science Foundation (NSF)

This project aims at maintaining continued operation of the UCSD Network Telescope infrastructure and maximizing its utility to researchers from various disciplines.

Principal Investigator(s): Alberto Dainotti, Alistair King

Funding source: NSF CNS-1730661. Period of performance: October 1, 2017 - September 30, 2021.

|   Project Summary    Proposal   |

Project Summary

The UCSD Network Telescope (UCSD-NT) is a passive monitoring system, which captures unsolicited Internet traffic sent to a large segment of unassigned IPv4 address space. For over a decade, this instrumentation has enabled global visibility into macroscopic Internet phenomena that few other data sources can offer. It has provided relevant data used in a broad set of sub-disciplines in Computer & Information Science & Engineering (CISE) and beyond: from network and systems security and stability, to machine learning and big data processing techniques, and, most recently, for studies of cyberwarfare and political repression of communication. In 2011 we enhanced the Telescope instrumentation to enable access to raw and live telescope traffic data, thus expanding the scope of possible research questions and the circle of researchers using the data. As of January 2017 we were aware of over 100 publications (a lower bound) - without UCSD co-authors - that used UCSD-NT data. Yet the infrastructure was lagging behind the increasing demands in terms of storage, computing resources, and system administration. These issues hindered our ability to continue sharing UCSD-NT data with researchers, and required compromises that limited the availability of this unique resource.

The STARDUST project will help extend and sustain operation of the UCSD-NT infrastructure. We will upgrade and modernize the current infrastructure to handle the predicted growth in traffic, leverage virtualization and NSF-funded HPC platforms at the San Diego Supercomputer Center for computational data analysis, and introduce meta-data semantics to simplify many tasks researchers typically want to do with UCSD-NT data. The proposed modifications will leave researchers more time (and available HPC resources) to focus on their specific scientific questions. Moreover, the project will forge an interdisciplinary collaboration between researchers from the field of computer networks and HPC scientists and engineers to experiment with novel approaches for research on live traffic analysis.

The stabilized and enhanced infrastructure capabilities will better serve a diverse range of academic researchers, the vast majority of whom have no access to any other source of global Internet traffic data. The proposed enhancements will support invaluable hands-on experience in operationally relevant network security and traffic analysis research engaging a wide audience of computer science faculty and students in the use of our tools and data. Project results will contribute to advancing knowledge in diverse CISE disciplines, e.g., facilitating the development of efficient strategies for early detection and mitigation of cyber attacks, supporting macroscopic Internet performance and reliability assessments, and opening a new domain for the application of live streaming big data analysis and in situ machine learning techniques.

Project Milestones

  •  Task 1 : Upgrade and modernize the UCSD-NT infrastructure (Years 1 and 2);
  •  Task 2 : Transition the data analysis infrastructure to use NSF HPC resources (Years 1, 2, and 3);
    •  Task 2.1 : Deploy cloud-compute support using novel virtualization features on Comet supercomputer (Years 1 and 2);
    •  Task 2.2 : Develop and deploy live packet capture and distribution software (Years 1 and 2);
    •  Task 2.3 : Support dynamically provisioned specialized Big Data environments (Years 1 and 2);
  •  Task 3 : Reduce processing complexity and simplify data analysis (Years 1, 2, and 3).
  •  Task 4 : Communal activities (Years 1, 2, and 3).

Project Timeline

4.1Open project web siteOct 2017done
4.2Start a mailing list of STARDUST usersSep 2019done
4.3Create internal project wikiNov 2017done
1.1Purchase and deploy a high performance 10 Gbps capture card with accurate time stampingDec 2017done
1.2Upgrade connected device interfaces (NP-router, storage server) to 10 GbpsDec 2017done
3.1Extend Corsaro and related libraries to tag FlowTuple information with meta-data
- geolocation
- origin AS
- spoofed source
Jun 2018done
4.4Organize and host the first DUST WorkshopSep 2019done
2.1.1Provision and deploy a virtualized cloud environmentMar 2020done
2.2.1Customize and extend the WDcap packet capture software to forward traffic over a 10 Gbps management network interface to a CAIDA serverSep 2018done
1.3Purchase and deploy an additional storage server and attached disk array (~200 TB capacity)Dec 2018done
2.2.2Customize and extend the libtrace "RT" format for encapsulation and distribution of captured trafficDec 2018done
2.3.1Develop and deploy an interface to request resources for processing historical telescope dataJul 2020done
2.1.2Develop and document a pre-configured OS image tailored for telescope data analysisMay 2020done
2.3.2Develop helper routines/APIs for Spark and Hadoop to retrieve historical data directly from the archive during processingJun 2020done
3.2Deploy meta-data tagging system on cloud-compute environmentJun 2019done
2.1.3Develop and deploy management interfaces to export a snapshot of a researcher's VM for archivingSep 2019done
2.1.4Implement resource accounting on a per-user basis, analyze its applicabilityAug 2020in progress
2.3.3Develop sample analysis scripts and documentation for implementing longitudinal analysesJul 2020in progress
3.3Deploy several "operational" analysis VMs to process the telescope traffic and derive multi-level aggregated datasetsSep 2019done
4.5Write and publish AUP for access to the new virtual environmentsAug 2020
4.6Announce new UCSD-NT capabilities onlineJul 2020
3.4Explore efficient indexing of FlowTuple records to enable analysis of traffic with specific meta-data characteristicsDec 2019done
3.5Extend the Corsaro3 FlowTuple plugin to support publication of flows to a Kafka clusterMar 2020done
4.7Organize and host the second DUST WorkshopMar 2021
3.6Create streams containing specified subsets of the overall traffic (e.g., one stream per country)Jun 2020done
4.8Publish workshop report and recommendationsJun 2021
4.10Refine the upgraded UCSD-NT platform based on users' feedbackSep 2020in progress
  Last Modified: Sat Oct-31-2020 04:24:21 UTC
  Page URL: