ASSISTS - Advancing Scientific Study of Internet Security and Topological Stability

CAIDA participates in the Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT) program as a Data Provider and as a Decision Analytics-as-a-Service Provider.

Sponsored by:
Department of Homeland Security (DHS)

Principal Investigators: kc claffy Alberto Dainotti

Funding source:  FA8750-18-2-0049 Period of performance: December 18, 2017 - August 31, 2020.


Project Summary

Large-scale Internet cyber-attacks and incidents - route hijacking, network outages, fishing campaigns, botnet activities, large-scale bug exploitation, etc. - represent a major threat to public safety and to both public and private strategic and financial assets. Mitigation and recovery, assessment of impacts and restoration costs, as well as prevention of further attacks of similar nature, are impeded by the fact that such events can remain unnoticed or are hard to characterize, in terms of motivation, infrastructure used by the attacker, and scope. Because of their macroscopic nature, identifying such events and understanding their scope and dynamics requires three critical inputs:

  • heterogeneous sources and types of data to cross-validate inferences;
  • a system to enable close to real-time integration and interactive visualization of such data;
  • a team of experts with varied background and skills to soundly interpret fused data.

We are pursuing these three inputs via strategically planned two-fold participation in the IMPACT program. As a Data Provider, we will continue to provide data sets that have already proven relevant to researchers studying security, stability, and resilience of networks. As a Decision Analytics-as-a-Service Provider, we will support new analytic capabilities that integrate, correlate, and cross-validate multiple sources of measurement and meta-data to enable informed mitigation of and response to attacks and other disruptive events.

Statement of Work

CAIDA performs fundamental research on a reasonable efforts basis and in accordance with UC policy. Technical reports will be submitted triannually.


Technical Topic Area #1: Supporting Cybersecurity Research through Network Data Collection and Curation


Subtask Description Projected Timeline Status
1. Data Provider Tasks
1.1 Curate and package the Internet Topology Measured from Ark Platform datasets ongoing Ark topology datasets indexed in IMPACT
1.2 Curate and package the Internet Topology Data Kits every 3-6 mo ITDK CAIDA page
1.3 Curate and package the UCSD Real-time Network Telescope Datasets ongoing Telescope datasets indexed in IMPACT
1.4 Collect, process, and archive the U.S. backbone bidirectional traffic data*
*as long as conditions permit and links and traffic monitors are available
ongoing Anonymous Internet Traces Dataset
1.5 Acquire a 100gb packet capture monitor Year 2 Done
1.6 Deploy the packet capture monitor on a 100gb national backbone link Year 2 Work in progress
2. Data Host Tasks
2.1 Maintain and expand our hosting capabilities ongoing Size of datasets Indexed in IMPACT
2.2 Manage, maintain, and serve previously collected CAIDA data ongoing CAIDA Data Overview Table
2.3 Index and share new CAIDA data sets with researchers ongoing CAIDA Data available in IMPACT
2.4 Compile statistics of data volumes, requests and download ongoing IMPACT datasets access requests stats
3. New Data Sets
3.1 Generate new data sets that are crucial for studying threats, vulnerabilities, and hazards to critical infrastructures ongoing List of new datasets indexed in IMPACT
3.2 Generate derivative data sets that reveal signals of connectivity disruptions from active and passive measurement methods Year 2
3.3 Experiment with which possible data sets are most amenable to live streaming to support HI-CUBE's near-real-time analytic capabilities Year 2
4. Project Support
4.1 Work closely with other IMPACT project team members ongoing
4.2 Work closely with IMPACT Portal developers ongoing
4.3 Update IMPACT MOAs to support new data offerings as needed
4.4 Host and attend project meetings as needed DHS IMPACT PI Meetings/Presentations
4.5 Provide documentation, outreach materials, marketing efforts ongoing List of Outreach Publications and Presentations

Deliverables

1 Hosting Infrastructure Description Anually Apr 2018
2 Summary of use and utility of CAIDA's IMPACT data Annually Summary

Technical Topic Area #2: Developing HI-CUBE: Hub for Internet Incident Investigation


Subtask Description Projected Timeline Status
1. Development of web services and visual interfaces
1.1 Extend the authorization functionality of the current Charthouse web application to support fine-grained data access control Year 1 done
1.2 Develop a management interface for users, groups and shared data Year 2 ongoing
2. Design and development of software infrastructure for data storage, query, and transformation
2.1 Replace our monolithic time series database (DBATS) with a distributed database for time-series analytics (e.g. Apache Kudu, Influx DB Enterprise version) Year 2 done
2.2 Replace the Graphite back-end that queries DBATS with a data analytics query engine Year 2 ongoing
3. Integration and testing of HI-CUBE system in operational research environments
3.1 Acquire the hardware needed for hosting the service Year 1 done
3.2 Migrate current databases and integrate additional datasets developed Year 1 done
3.3 Deploy, benchmark, and tune the upgraded components of the infrastructure for big data analytics Year 2 ongoing
3.4 Migrate the time series currently stored in DBATS into the new system Year 2 ongoing
3.5 Deploy the query engine and the HTTP query server Year 2 ongoing
3.6 Integrate and test the data analytics query engine Year 2
4. Community outreach and service
4.1 Collect feedback during meetings and presentations ongoing
4.2 Interact with the users of the platform to better focus our efforts on the needs of the community of cybersecurity researchers and analysts ongoing
4.3 Present the HI-CUBE platform in one or more of our CAIDA workshops May 2020 done

Milestones

1 Deploy SSD cluster machine, storage server, and disk tray Jun 2018 done
2 Deploy Web Application Server Sep 2018 done
3 Extend the authentication and authorization functionality Sep 2018 done
4 Release alpha version of prototype website Sep 2018 done
5 Migrate time series currently stored in DBATS Mar 2019 ongoing
6 Deploy second Web Application Server Mar 2019 done
7 Complete the development of a distributed database for time-series analytics Mar 2019 done
8 Develop management interfaces for users, groups and shared data Mar 2019 ongoing
9 Deploy second SSD cluster machine May 2019 done
10 Complete the tuning of the distributed database system Jul 2019 ongoing
11 Deploy the query engine and the HTTP query server Aug 2019
12 Complete the development of the Data analytics query engine May 2020 ongoing
13 Release beta version of prototype web site May 2020 done
14 Release as open source the distributed time-series database and query engine May 2020

Deliverables

1 Capability Design Plan Feb 2018 done
2 Demonstrate web service at PI Meetings Triannually done
3 Open source HI-CUBE software May 2020 ongoing

Acknowledgment of awarding agency's support

This material is based on research sponsored by Air Force Research Laboratory under agreement number FA8750-18-2-0049. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory or the U.S. Government.

Published
Last Modified