Skip to main content

How to Create a NIDS Module

A NIDS module is a paired public/private GitHub repository under the CAIDA organization that delivers one driving question, one dataset, and one analysis notebook for the Network Infrastructure Data Science curriculum. This page walks through the four steps to create one.

Overview

Each Network Infrastructure Data Science (NIDS) educational module consists of two GitHub repositories under the CAIDA organization:

  • Public repository — student-facing content: exercises, analysis notebooks, dataset references, and instructions
  • Private repository — instructor-facing content: solution notebooks, answer keys, grading notes, and facilitation guidance

Both repositories are managed through the Team Network Infrastructure Data Science GitHub team.

Module Content Structure

Every NIDS module contains the following sections. Each module must work as a guide to understanding and using at least one Internet related dataset. While the module goal is a greater understanding of the data, the module will be framed around a driving question.

This question will act as the higher level goal for the student. All other sections will be framed in such as way as to answer that question. By the end of the module the student will have gained insight into the data and should be able to provide an answer to the driving question.

Example

  • Question: Which country has the most important Network?

  • Background: Internet Yellow Pages and CAIDA’s AS Rank provide a ranking on the most important. Each …..

  • Resources:

    • Internet Yellow Pages can be downloaded (here) with example code (here)
    • CAIDA’s AS Rank can be downloaded (here) and this (recipe) provides an example dataset
  • Analysis: write a script that will download both IYP and AS Rank, then build a table with ASes sorted by their average rank

    ASN Name IYP Ranking AS Rank Country
    3356 Level 3 1 1 US
  • Answer: Both IYP and AS Rank agree on the top Ranked AS 3356, they disagree on the second ….

Question

The driving question frames why students are working with the dataset. State it at the top of the module so students know what they are working toward. They should be able to provide an answer then they finish the module.

Background

Provide the context students need before they can answer the question: slides, webpages, PDFs, or links to other NIDS modules that introduce prerequisite concepts.

Resources

Full list of all the resources that will be used.

  • Resources Known
    List the datasets, tools, and libraries the student is expected to understand before beginning this module. This list should be organized around the prerequisite modules which introduce this information.

  • Resources New
    List the datasets, tools, and libraries that are introduced for the first time in this module. For each new resource:

    • Provide enough detail for students to understand what the resource is and why it is used.

    • Link, to an existing, or create a new, catalog.caida.org recipe for scripts and deeper documentation. (optional)

      Catalog Recipes are technical methodologies used to derive a particular conclusion or a particular dataset, which can be used for further synthesis and analysis. These will include example code and more details then present in the module.

Analysis

Step-by-step notebook cells that work toward answering the driving question:

  1. For each new resource, begin with a self-contained introduction — pose a small question about that resource alone and walk students through the analysis needed to answer it.
  2. Then combine the known and new resources to answer the driving question.

Answer

Write a concise answer to the driving question, grounded in the charts and results produced in the Analysis section.

Prerequisites

Step 1: Create the Public Repository

  1. Create a new repository in the CAIDA GitHub organization. Use a descriptive name that reflects the dataset or topic (e.g., nids-module-bgp-hijacks).
  2. Set visibility to Public.
  3. Add the following to the repository:
    • README.md — module overview, learning objectives, and dataset citation
    • Optional: exercises/examples (at least one)
      • notebooks/ — Jupyter notebooks
      • scripts/ — scripts code
      • src/ - compiled code
    • data/ or a data/README.md — instructions for accessing the dataset on NRP (do not commit large data files)
    • requirements.txt or environment.yml — dependencies needed to run the notebooks

Step 2: Create the Private Repository

  1. Create a second repository named with a -key suffix (e.g., nids-module-bgp-hijacks-key).
  2. Set visibility to Private.
  3. Add the following:
    • notebooks/ — completed solution notebooks with full answers
    • grading/ — rubrics, answer keys, and grading notes
    • facilitation.md — tips for running the module, common student questions, timing guidance

Step 3: Configure Team Access

  1. Go to the Team Network Infrastructure Data Science repositories page.
  2. Add both repositories (public and private) to the team.
  3. Confirm the public repo has at least Read access for all team members; the private repo should be Write for instructors.

Required if the module uses a dataset not already represented in catalog.caida.org; otherwise skip.

  1. Find or create a catalog.caida.org recipe for the dataset(s) used in the module.
  2. Add the recipe URL to the README.md of the public repository.
  3. Notify the NIDS team so the module can be linked from the NIDS project page.