How to Create a NIDS Module
Overview
Each Network Infrastructure Data Science (NIDS) educational module consists of two GitHub repositories under the CAIDA organization:
- Public repository — student-facing content: exercises, analysis notebooks, dataset references, and instructions
- Private repository — instructor-facing content: solution notebooks, answer keys, grading notes, and facilitation guidance
Both repositories are managed through the Team Network Infrastructure Data Science GitHub team.
Module Content Structure
Every NIDS module contains the following sections. Each module must work as a guide to understanding and using at least one Internet related dataset. While the module goal is a greater understanding of the data, the module will be framed around a driving question.
This question will act as the higher level goal for the student. All other sections will be framed in such as way as to answer that question. By the end of the module the student will have gained insight into the data and should be able to provide an answer to the driving question.
Example
-
Question: Which country has the most important Network?
-
Background: Internet Yellow Pages and CAIDA’s AS Rank provide a ranking on the most important. Each …..
-
Resources:
- Internet Yellow Pages can be downloaded (here) with example code (here)
- CAIDA’s AS Rank can be downloaded (here) and this (recipe) provides an example dataset
-
Analysis: write a script that will download both IYP and AS Rank, then build a table with ASes sorted by their average rank
ASN Name IYP Ranking AS Rank Country 3356 Level 3 1 1 US -
Answer: Both IYP and AS Rank agree on the top Ranked AS 3356, they disagree on the second ….
Question
The driving question frames why students are working with the dataset. State it at the top of the module so students know what they are working toward. They should be able to provide an answer then they finish the module.
Background
Provide the context students need before they can answer the question: slides, webpages, PDFs, or links to other NIDS modules that introduce prerequisite concepts.
Resources
Full list of all the resources that will be used.
-
Resources Known
List the datasets, tools, and libraries the student is expected to understand before beginning this module. This list should be organized around the prerequisite modules which introduce this information. -
Resources New
List the datasets, tools, and libraries that are introduced for the first time in this module. For each new resource:-
Provide enough detail for students to understand what the resource is and why it is used.
-
Link, to an existing, or create a new, catalog.caida.org recipe for scripts and deeper documentation. (optional)
Catalog Recipes are technical methodologies used to derive a particular conclusion or a particular dataset, which can be used for further synthesis and analysis. These will include example code and more details then present in the module.
-
Analysis
Step-by-step notebook cells that work toward answering the driving question:
- For each new resource, begin with a self-contained introduction — pose a small question about that resource alone and walk students through the analysis needed to answer it.
- Then combine the known and new resources to answer the driving question.
Answer
Write a concise answer to the driving question, grounded in the charts and results produced in the Analysis section.
Prerequisites
- A GitHub account
- Membership in Team Network Infrastructure Data Science — contact a CAIDA team member to request access
- A dataset and at least one corresponding catalog.caida.org recipe. If the recipe doesn’t exist, you will have to make it.
Step 1: Create the Public Repository
- Create a new repository in the CAIDA GitHub organization. Use a descriptive name that reflects the dataset or topic (e.g.,
nids-module-bgp-hijacks). - Set visibility to Public.
- Add the following to the repository:
README.md— module overview, learning objectives, and dataset citation- Optional: exercises/examples (at least one)
notebooks/— Jupyter notebooksscripts/— scripts codesrc/- compiled code
data/or adata/README.md— instructions for accessing the dataset on NRP (do not commit large data files)requirements.txtorenvironment.yml— dependencies needed to run the notebooks
Step 2: Create the Private Repository
- Create a second repository named with a
-keysuffix (e.g.,nids-module-bgp-hijacks-key). - Set visibility to Private.
- Add the following:
notebooks/— completed solution notebooks with full answersgrading/— rubrics, answer keys, and grading notesfacilitation.md— tips for running the module, common student questions, timing guidance
Step 3: Configure Team Access
- Go to the Team Network Infrastructure Data Science repositories page.
- Add both repositories (public and private) to the team.
- Confirm the public repo has at least Read access for all team members; the private repo should be Write for instructors.
Step 4: Link to Catalog
Required if the module uses a dataset not already represented in catalog.caida.org; otherwise skip.
- Find or create a catalog.caida.org recipe for the dataset(s) used in the module.
- Add the recipe URL to the
README.mdof the public repository. - Notify the NIDS team so the module can be linked from the NIDS project page.

