Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
A Day in the Life of the Internet (DITL)

This page chronicles the efforts of CAIDA, ISC, DNS-OARC, many partnering root nameserver operators and other organizations to coordinate and conduct large-scale, simultaneous traffic data collection events with the goal of capturing datasets of strategic interest to researchers. Over the last several years, we have come to refer to this project and related activities as "A Day in the Life of the Internet" (DITL).

Motivation

In 2002, as part of our DNS research activities, CAIDA responded to the Root Server System Advisory Committee's invitation to help DNS root operators study and improve the integrity of the root server system. In 2006, after a few more years of building trust with these operators, we asked them to participate in a simultaneous collection of a day of traffic to (and in some cases from) the DNS root nameservers. We collaborated with the Internet Systems Consortium (ISC) and DNS Operation and Research Center (DNS-OARC) to coordinate four annual large-scale data collection events that took place in January 2006, January 2007, March 2008, and March 2009. While these measurements can be considered prototypes of a Day in the Life of the Internet [1], their original goal was to collect as complete a dataset as possible about the DNS root servers operations and evolution, particularly as root operators introduce new technologies, such as anycast, with no rigorous way to evaluate their impacts in advance. As word of these experiments spread, the number and diversity of participants and datasets grew.

Building Community and Broadening Scope

By establishing a tradition of periodic, synchronized measurements, and supporting tools, analysis, visualization, and data catalog ( DatCat, http://www.datcat.org [2], Internet Traffic Archive [3], CRAWDAD [4], MOME [5], Datapository [6], PREDICT [7] ) in which to index collected traces, we hope to significantly increase the quantity, quality, and accessibility of empirical data supporting Internet research.

Several complementary projects at CAIDA provided the impetus for our first attempts to coordinate large-scale, distributed measurement activities in late 2006. As part of an NSF-sponsored DNS measurement project ( http://www.caida.org/funding/dns-itr/ [8] ), CAIDA and ISC performed a 48-hour measurement event on dozens of root server anycast nodes. Integrating recommendations based on lessons learned from previous measurement experiments ( http://www.caida.org/research/dns/roottraffic/dnsroot_measurement_recommendations.xml [9] ), ISC coordinated the collection of packet header traces from multiple anycast instances of three root nameservers. To our knowledge, the 2006 event, and each event since, stand as the largest scale simultaneous collections from critical components of the global Internet infrastructure made available to academic researchers. We consider these events prototypes for eventual regular "Day in the Life of the Internet" measurement events. Specifically, if you have access to or influence over Internet measurement infrastructure and can contribute datasets (anonymized according to your needs [10,11]), please email ditl-info@caida.org for details regarding already planned measurement dates, times, locations, and types of data. (We conduct informal vetting to avoid manipulation of the experiments.)

We also seek input from others interested in gathering specific complementary measurements on the same days, to help us maximize the return on investment of participation in the experiment.

Commercial pressures make it next to impossible to get many types of Internet measurement data to the research community, but empirical network science is not possible without such data. As with similar efforts in other disciplines [12], the proposed project involves building a global cooperative community to support the simultaneous capture of a variety of measurements from and across many strategic links around the globe for further analysis by research scientists. Ideally, participating partners would provide a variety of trace data: workload, topology, routing, and performance, with privacy-sensitive techniques for anonymization, aggregatation, or analysis appropriate to local jurisdictions [13,14]. We hope that over time, annual measurement activities to support "A Day in the Life of the Internet" data sets will gather increasing momentum, including expanded partnerships with public and private sector stakeholders, Internet infrastructure providers, and educational networks, as well as legal and policy expertise to advise and review privacy-protecting data disclosure control mechanisms [15].

DITL Collection Events

The table below lists the DITL-style Internet collection events and their associated publications. The table also shows the growth in interest from the global community of infrastructure operators in need of resources to support regular, coordinated, simultaneous collection events.

Event Root
nameservers
RIRs* TLDs** Other Participants Length
2002 (prototypical)
1) 14 August
2) 28 August
3) 21 October

4
6
4

26 hours (10 min. interval)
7 days (10 min. interval)
3 days (10 min. interval)
2006
10-11 January

3

2 days
2007
9-10 January

5***
2 alt root servers
1 as112 server
5 ASes (passive traces)

2 days
2008
18-19 March

8

2

5
2 alt root servers
7 as112 server
2 caching dns resolvers
6 ASes (passive traces)
Various netflow, BGP, syslogs

2 days
2009
30 March - 1 April

8

3

17
2 alt root servers
6 as112 server
5 ASes (passive traces)

3 days

*    Regional Internet Registries (RIR) in-addr.arpa data.
**  Global and country code Top-Level Domain (gTLD/ccTLD).
*** E-root participated in data collection though logistical issues prevented successful data upload to OARC servers.

Research Questions

During informal discussion of the DITL project at the January 2008 CAIDA/WIDE workshop, researchers brainstormed questions of interest to analyze in potential DITL data collections. The list mostly includes questions that require data not currently available to researchers, but we hope the list serves as inspiration for project participation. We also experimented with Google Moderator functionality, by posting a poll asking, "What are the most important empirical questions to be asking about the Internet?".

DNS Evolution

Early analysis of the DITL data collected since 2006 focuses on the DNS root nameservers. These datasets allow researchers a view of the characteristics and workload of traffic to this critical component of the global Internet.. This data provides a baseline for comparison against traffic we expect to see in the near future that will contain cryptographic signatures, internationalization of the name space, and new global Top Level Domains (TLDs). We post our most recent analysis of the evolution of the DNS derived from these four data sets. Previously, we posted analysis of DNS root server traffic for 2002, 2006, and 2007 and published results from the 2006 data set in the paper "Two Days in the Life of the DNS Anycast Root Servers". Further, we offer a comparison of traffic from the DNS root nameservers as measured in DITL 2006 and 2007.

Working with Visiting Researcher, Mia Zhang, we developed an interactive web interface to the DNS-OARC data that enables users to view graphs showing coverage, locations of open DNS resolvers in the address space, the geography of clients, pollution, and distributions of clients and queries across eight of 13 root nameservers.

Using the 2006 and 2007 data sets, we developed Influence Maps of DNS anycast servers that visualize the geographic distribution of DNS clients for each anycast instance.

References

[1]
"Looking over the Fence at Networks: A Neighbor's View of Networking Research (2001)", National Academies Press, http://www.nap.edu/books/0309076137/html/
[2]
Internet Measurement Data Catalog (DatCat), http://www.datcat.org/
[3]
"The Internet Traffic Archive", http://ita.ee.lbl.gov/index.html
[4]
"Community Resource for Archiving Wireless Data at Dartmouth", http://crawdad.cs.dartmouth.edu/
[5]
"Cluster of European Projects aimed at Monitoring and Measurement -- MoMe Database", http://www.ist-mome.org/database/
[6]
"The Datapository: A collaborative network data analysis and storage facility", http://www.datapository.net/
[7]
"Protected Repository of Data for Internet CyberThreats" http://www.predict.org/
[8]
"Improving the Integrity of Domain Name System (DNS) Monitoring and Protection" (NSF grant SCI-0427144), http://www.caida.org/funding/dns-itr/
[9]
"Recommendations for future large scale simultaneous DNS data collections", http://www.caida.org/research/dns/roottraffic/dnsroot_measurement_recommendations.xml
[10]
"Crypto-PAn: Cryptography-based Prefix-preserving ANonymization", http://www-static.cc.gatech.edu/computing/Telecomm/projects/cryptopan/
[11]
"The Devil and Packet Trace Anonymization", http://www.icir.org/enterprise-tracing/papers.html
[12]
IGY: International Geophysical Year http://en.wikipedia.org/wiki/IGY
[13]
An Internet Data Sharing Framework For Balancing Privacy and Utility http://www.caida.org/publications/papers/2009/engaging_data/
[14]
Dialing privacy and utility: a proposed data-sharing framework to advance Internet research http://www.caida.org/publications/papers/2009/dialing_privacy_utility/
[15]
Promotion of Data Sharing http://www.caida.org/data/sharing/
  Last Modified: Wed Jul-6-2011 12:00:07 PDT
  Page URL: http://www.caida.org/projects/ditl/index.xml