Motivation
In 2002, as part of our DNS research activities, CAIDA responded to the Root Server System Advisory Committee's invitation to help DNS root operators study and improve the integrity of the root server system. In 2006, after a few more years of building trust with these operators, we asked them to participate in a simultaneous collection of a day of traffic to (and in some cases from) the DNS root nameservers. We collaborated with the Internet Systems Consortium (ISC) and DNS Operation and Research Center (DNS-OARC) to coordinate four annual large-scale data collection events that took place in January 2006, January 2007, March 2008, and March 2009. While these measurements can be considered prototypes of a Day in the Life of the Internet [1], their original goal was to collect as complete a dataset as possible about the DNS root servers operations and evolution, particularly as root operators introduce new technologies, such as anycast, with no rigorous way to evaluate their impacts in advance. As word of these experiments spread, the number and diversity of participants and datasets grew.
Building Community and Broadening Scope
By establishing a tradition of periodic, synchronized measurements, and supporting tools, analysis, visualization, and data catalog ( DatCat, http://www.datcat.org [2], Internet Traffic Archive [3], CRAWDAD [4], MOME [5], Datapository [6], PREDICT [7] ) in which to index collected traces, we hope to significantly increase the quantity, quality, and accessibility of empirical data supporting Internet research.
Several complementary projects at CAIDA provided the impetus for our first attempts to coordinate large-scale, distributed measurement activities in late 2006. As part of an NSF-sponsored DNS measurement project ( https://www.caida.org/funding/dns-itr/ [8] ), CAIDA and ISC performed a 48-hour measurement event on dozens of root server anycast nodes. Integrating recommendations based on lessons learned from previous measurement experiments ([9] ), ISC coordinated the collection of packet header traces from multiple anycast instances of three root nameservers. To our knowledge, the 2006 event, and each event since, stand as the largest scale simultaneous collections from critical components of the global Internet infrastructure made available to academic researchers. We consider these events prototypes for eventual regular "Day in the Life of the Internet" measurement events. Specifically, if you have access to or influence over Internet measurement infrastructure and can contribute datasets (anonymized according to your needs [10,11]), please email ditl-info@caida.org for details regarding already planned measurement dates, times, locations, and types of data. (We conduct informal vetting to avoid manipulation of the experiments.)
We also seek input from others interested in gathering specific complementary measurements on the same days, to help us maximize the return on investment of participation in the experiment.
Commercial pressures make it next to impossible to get many types of Internet measurement data to the research community, but empirical network science is not possible without such data. As with similar efforts in other disciplines [12], the proposed project involves building a global cooperative community to support the simultaneous capture of a variety of measurements from and across many strategic links around the globe for further analysis by research scientists. Ideally, participating partners would provide a variety of trace data: workload, topology, routing, and performance, with privacy-sensitive techniques for anonymization, aggregatation, or analysis appropriate to local jurisdictions [13,14]. We hope that over time, annual measurement activities to support "A Day in the Life of the Internet" data sets will gather increasing momentum, including expanded partnerships with public and private sector stakeholders, Internet infrastructure providers, and educational networks, as well as legal and policy expertise to advise and review privacy-protecting data disclosure control mechanisms [15].
DITL Collection Events
The table below lists the DITL-style Internet collection events and their associated publications. The table also shows the growth in interest from the global community of infrastructure operators in need of resources to support regular, coordinated, simultaneous collection events.
Event | Root nameservers |
RIRs* | TLDs** | Other Participants | Length | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 (prototypical)
|
|
|
||||||||||||
2006
| 3 |
2 days | ||||||||||||
2007
|
5*** |
|
2 days | |||||||||||
2008
|
8 |
2 |
5 |
|
2 days |
|||||||||
2009
|
8 |
3 |
17 |
|
3 days |
* Regional Internet Registries (RIR) in-addr.arpa data.
** Global and country code Top-Level Domain (gTLD/ccTLD).
*** E-root participated in data collection though logistical issues prevented successful data upload to OARC servers.
Research Questions
During informal discussion of the DITL project at the January 2008 CAIDA/WIDE workshop, researchers brainstormed questions of interest to analyze in potential DITL data collections. The list mostly includes questions that require data not currently available to researchers, but we hope the list serves as inspiration for project participation. We also experimented with Google Moderator functionality, by posting a poll asking, "What are the most important empirical questions to be asking about the Internet?".
DNS Evolution
Early analysis of the DITL data collected since 2006 focuses on the DNS root nameservers. These datasets allow researchers a view of the characteristics and workload of traffic to this critical component of the global Internet.. This data provides a baseline for comparison against traffic we expect to see in the near future that will contain cryptographic signatures, internationalization of the name space, and new global Top Level Domains (TLDs). We post our most recent analysis of the evolution of the DNS derived from these four data sets. Previously, we posted analysis of DNS root server traffic for 2002, 2006, and 2007 and published results from the 2006 data set in the paper "Two Days in the Life of the DNS Anycast Root Servers". Further, we offer a comparison of traffic from the DNS root nameservers as measured in DITL 2006 and 2007.
Working with Visiting Researcher, Mia Zhang, we developed an interactive web interface to the DNS-OARC data that enables users to view graphs showing coverage, locations of open DNS resolvers in the address space, the geography of clients, pollution, and distributions of clients and queries across eight of 13 root nameservers.
Using the 2006 and 2007 data sets, we developed Influence Maps of DNS anycast servers that visualize the geographic distribution of DNS clients for each anycast instance.
References
Funding support
Support for the A Day in the Life of the Internet project is provided by the National Science Foundation (NSF) grant OAC-0427144 Improving the Integrity of Domain Name System (DNS) Monitoring and Protection. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.