The Heartbeat of Private Nets:
Spectroscopy of DNS Update Traffic

Andre Broido, Evi Nemeth and kc claffy

CAIDA, SDSC, UCSD, University of Colorado

Work in progress, May-Oct.2002.

Presented at IETF 54 in Yokohama, Jul.14-19, 2002, by Nevil Brownlee

Abstract

We classify attempts to dynamically update DNS records primarily for private (RFC1918) blocks (10/8, 192.168/16, 172.16/16) by analyzing the frequency spectrum of update packets seen at one of the authoritative servers for RFC1918 zones. RFC1918 addresses are strictly local and should never leak to the Internet or be updated on servers outside the local intranet. All such update attempts fail. We find that most updates come in apparently infinite periodic series, and that most update sources have periods of 60 or 75 minutes. We identify both periods as default settings of out-of-the-box Microsoft Windows 2000 DNS software.

Introduction

Several services are vital for the Internet's continued ability to satisfy our demands. A search system (implemented in search engines) converts a verbal description of a user's request to a human-readable domain name. A naming system (such as DNS, the Domain Name System) finds an address for this domain name. The routing system (whose interdomain implementation is given by BGP) finds out how to reach this address, given the constraints of relationships and policies of connectivity providers. The forwarding system carries the traffic generated by these requests between end nodes.

All those systems were designed with traffic loads in mind which match the rate and complexity of human-generated requests. A high-end workstation, a cluster of these, or a distributed set of clusters can serve the whole population of Internet users. The service however can be easily overwhelmed when streams of repeating requests come from devices about as powerful as servers themselves. This can happen even with each device producing a trickle of requests when large numbers of request streams converge on a few servers. Computing power required to process N streams of rate nu requests per second, variations notwithstanding, is close to

E = N . h . nu,
where h is the resources spent per request. This can be large even when nu is small.

In this paper, we analyze spurious machine-generated traffic which push the worldwide Domain Name System (DNS) service to and beyond the edge of its performance constraints. Large fractions of this traffic are completely repetitive and periodic. Classifying and identifying phenomena by their frequencies is known to practitioners of Natural Science as an art of Spectroscopy. We introduced the term Network Spectroscopy to refer to all methods of identifying discrete network components like links, paths, transmission technologies, device models, operating systems and like by delay, frequency, periods and other kinds of spectra, This notion encompasses various approaches based on phenomenology, deductive reasoning or sound theory, and motivated by different applications [Dovrolis] [Nowak] [Katabi & Blake].

Repetitive requests and more generally delay quantization, either in the form of strict periodicity or in delays belonging to a set of equispaced values, is present in many other types of Internet traffic data. We found these phenomena especially important for understanding BGP updates [Andre], broadband end-user traffic [Andre & RyanKing], round-trip times (RTTs) and precision of packet timestamps. We are currently preparing a description of these and related questions which can be analyzed in the framework of Internet spectroscopy.

Private nets

The history of private network address blocks begins with RFC1918 [1] published in 1996. Private nets or RFC1918 blocks are segments of IP address space reserved by IANA for use within an organization. These addresses can be used by anyone without being officially assigned by a registry or an ISP. The three most popular blocks among RFC1918 addresses are 192.168/16, 10/8 and 172.16/12. The addresses in these blocks are called "private IP addresses" in the Internet vernacular. We will use this term sparingly in the rest of the paper, since the addresses in question are public in the strictest sense, and since the expression "private internet" from the title of RFC1918 is in itself sort of an oxymoron. RFC1918 says:

An enterprise that decides to use IP addresses out of the address space defined in this document can do so without any coordination with IANA or an Internet registry.

and continues:

Indirect references to such addresses should be contained within the enterprise. Prominent examples of such references are DNS Resource Records and other information referring to internal private addresses. In particular, Internet service providers should take measures to prevent such leakage.

In this paper, we examine how the above statement is implemented today and observe that millions of DNS packets are sent daily to nameservers outside private nets requesting or containing information on RFC1918 addresses. DNS records for RFC1918 addresses (and thus updates to these records) are legitimate only within the network on which a host with RFC1918 address resides. They should not appear on the public Internet; they are not unique and are not globally routed.

IP addresses are often assigned dynamically using DHCP (Dynamic Host Configuration Protocol) [RFC1541]. When this is done, the requesting host receives an IP address lease valid for a fixed period of time that is guaranteed to be unique in the local context. But the mapping between the hostname and the IP address may have changed since the host last was active on the network and the DNS records for that host may be incorrect. The Internet Software Consortium's DNS software has had the ability to receive dynamic updates of new address assignments since 1996.

Flaws in the Microsoft (and others) software implementations or configurations have caused these update packets using RFC1918 addresses to leak out to the global Internet and arrive at the root servers -- the top of the Internet naming tree. Initially, the root servers refused the updates and logged the error, but as the load increased, separate servers were deployed to handle just the RFC1918 addresses. This has reduced the spurious update load on the root servers significantly.

We examine these attempted updates to try to determine which operating systems are guilty of leaking private names and addresses onto the global Internet and what configuration can be done to alleviate the problem. Our data source is the log files from an RFC1918 authoritative server, in particular the attempts to dynamically update the reverse DNS records (PTR records) that map from an IP address to a hostname. We also see attempts to update the DNS A records, but in much smaller numbers.

An RFC1918 address can appear in DNS packets as either the source address of the packet or as part of the DNS data inside the packet. In the first case, there is no route back to the sending host and the packet cannot be answered at all. In the second case, the sending host has a valid IP address, but the root servers receiving the packet have no interest in the local RFC1918 address mappings.

The ambiguity in the status of RFC1918 addresses (legitimate only within the scope of the local organization) results in DNS software being unable to deny all RFC1918 updates a priori, since this will disrupt operation of internal networks. Software misconfiguration and incorrect default behavior allow local nameservers to send information about hostnames in their RFC1918 address blocks to the root servers.

Dynamic update packets for RFC1918 addresses are generated by DHCP servers on networks where private addresses are used internally (see NANOG discussion [NANOG]) DHCP servers periodically assign and renew hosts' IP addresses on their networks. We have run small-scale (about 500 hosts at a time) attempts to identify the operating system of the DHCP server but the fact that a fully patched Windows 2K or XP system currently shows up as "unknown" hampers these efforts. At a recent IEPG (Internet Engineering Planning Group) meeting, many other DNS based misconfigurations were documented [1].

The analysis presented here extends CAIDA's earlier work on measurement, performance and placement of DNS root servers [2] [3] [4] and on the use of private and unrouted addresses [5]. In particular, [2] and [3] discuss the vast extent of DNS misconfiguration that manifests itself in queries reaching the root servers.

In the sections that follow, we describe our data source and then examine the prevalence of update attempts to the various RFC1918 address blocks. We also identify the sources of these update attempts and categorize them by continent, country, and ISP or organization. The log data provides timestamps and we use these to determine periodicity in the updates. Finally, we coalesce all this information to fingerprint the guilty operating systems and suggest configuration changes to ameliorate some of the damage.

Data Sets

The data presented here is obtained from hazel.isc.org, an authoritative server for RFC1918 addresses that is located near F-root in Palo Alto, California. Hazel is part of the AS112 project (http://as112.net). Whenever a nameserver tries to update a root server with data about an RFC1918 address (like 192.168.0.1), it is told the machine at 192.175.48.1 (hazel) has authority for this zone and should be contacted instead. This is called a referral; Hazel is then contacted with the update request. Hazel logs the request and returns an update denied answer to the sending host. The log record includes a timestamp, source IP address, source port, and the RFC1918 zone to be updated. The timestamp has 1 millisecond resolution. This is a finer resolution than that typically used with nameserver software and will allow us to study interarrival times in detail.

Hazel is actually multiple machines on the network 192.175.48.0/24, an IANA reserved address block. All dynamic updates are referred to the machine with host byte .1, while queries go to .6 and .42. The route to the AS112 network, 192.175.48.0/24 is carried by most networks worldwide and by all networks participating in the University of Oregon RouteViews project [RouteViews]. The AS112 netblock is globally reachable. It is an anycast block in the sense that there are several places in the Internet where machines are assigned these three addresses; the routing system chooses the one closest to the sender. All of our measurements are from hazel, the instance of a server at address 192.175.48.1 that is run by Paul Vixie at the Internet Software Consortium (isc.org). Any ISP intending to run servers to confine RFC1918 updates to their own networks are encouraged to use those same three IP addresses for their RFC1918 servers [Vixie, NANOG].

Our analysis uses two data sets: one collected May 28 to June 4, 2002 and the other collected from July 4 to July 30, 2002. We have monitored hazel since April, 2002, but operational issues have interfered a bit and our largest continuous stream of data at the time of this writing is 26 days in July. Hazel logs about 1/2 gigabyte every 8 hours.

Updates per RFC1918 address block

In this section, we present statistics of RFC1918 updates overall and on a per IP address block basis.

Recall that the major RFC1918 reverse DNS zones are:

  168.192.in-addr.arpa   for   192.168.0.0/16
       10.in-addr.arpa   for   10.0.0.0/8
16-31.172.in-addr.arpa***   for   172.16.0.0/12
*** footnote: 
The log files contain entries for each of the /16 networks
making up the 172.16.0.0/12 block; we aggregate the results
over the whole /12 block in the analysis that follows.

To see the extent of the problem, we computed the number of distinct hosts that were sending at least one DNS update packet toward hazel, our authoritative server for the RFC1918 zones. The count of these hosts over time is shown in Figure 1 below.

***** change the figure title as follows *****
Figure 1: Update Attempts over Time
RFC1918 in-addr.arpa reverse zones
July, 2002 (26 days), Palo Alto, CA
***** and use initial capital in axes labels *****
***** make x axis caption be "Time, day in July, 2002" *****
***** andre will change figure 1a to one that spans the long dataset, 26 days
***** andre, i dont think we need both fig 1a and 1b, can we put them on the
***** same graph, with two x axes, time and #updates.  what happens to the 
***** second one (vs. updates) if you make the x axis or both axes linear.

Figure 1a is necessarily a monotonically increasing function. It increases steadily over the 26 days of July and approaches the square root function. Figure 1b shows this same data, but relative to the number of updates, rather than time. Both curves show that many, many hosts are contributing to the problem, not simply a few badly broken systems. The distributions over time and over number of updates are smooth indicating lots of little contributions rather than a few large spikes.

Figure 2 shows the persistence of the updating hosts over the 26 day period (624 hours) in July, 2002. The x-axis is the duration of updates where duration is measured by looking at the first update from a host and the last update from that same host and computing the update interval. The y-axis is the fraction of updates from hosts that were sending update packets over a particular duration. About 60% of the updates came from hosts that were updating for the whole measurement period.

***** change the figure title as follows *****
Figure 2: Duration of Update Series
RFC1918 in-addr.arpa reverse zones
July, 2002 (26 days), Palo Alto, CA
***** and use initial capital in axes labels *****

Host count

We now look at the update behavior in more detail separating each of the 3 address blocks in RFC1918 space and also using finer time granularity.

IP addresses in the old ARPAnet range, 10.0.0.0/8, are often used in corporate environments, in particular, many VPNs are numbered in 10-net space. This space is presumably managed by professional system administrators resulting in fewer instances of address leakage. The 192.168.0.0/16 block is often used by manufacturers of networking gear for home and small office use - NATs, firewalls, DSL "routers", combinations thereof and sundry boxes, the multitude of which escapes classification. These devices have either manufacturer's defaults that assign 192.168.0.0/16 addresses to the LAN computers, or advise users to set up addresses in that range in their instruction manuals. 172.16.0.0/12 does not enjoy the same level of popularity, and as expected, has fewer RFC1918 updates. It is used by some universities [6] for internal routing.

We examined the attempted updates per address block during a 3 1/2 day period in our May-June data set. Table 1 below shows the number of attempted updates per RFC1918 address block and Figure 3 plots their distribution over the data collection interval.

Table 1: Number of Updates by DNS Zone

Start: 01-Jun-2002 06:28:35.835
End:   04-Jun-2002 20:58:34.648

DNS zone               #Updates   Percentage
------------------------------------------
168.192.in-addr.arpa   35055154     68.3%
10.in-addr.arpa        12391040     24.2%
16.172.in-addr.arpa     3834284      7.5%

Total updates:         51370999

***** change the figure title as follows *****
Figure 3: Update Attempts per Minute
RFC1918 in-addr.arpa reverse zones
DNS root server f.root-servers.net
***** and use initial capital in axes labels *****
Figure 3 shows the number of updates arriving per minute for IP addresses in each RFC1918 block. Note that the baselines of each plot are in general agreement with the share of updates for each block given in Table 1 above.

We see a periodic diurnal and weekly pattern which we discuss in detail in the following section. Figure 4 uses per second granularity to detail the 9AM spike on June 3.

***** change the figure title as follows *****
Figure 4: 9AM Spike of DNS Update Attempts
RFC1918 reverse zones, Palo Alto, CA
***** and use initial capital in axes labels *****

Many systems attempt to update the nameserver at 9:00. Due to the lack of clock synchronization their updates spread over 6 minutes. This is a good news for the nameserver that has to process 107,000 updates in 6 minutes. Proper clock synchronization would have an overwhelming negative impact on the system. The other whole-hour boundaries where spikes tend to occur show similar results.

Sources in RFC1918 space

A noticeable number of the update packets have a source IP address assigned from RFC1918 space. These, of course, cannot be answered because there is no route back to the sending host in the global routing tables. The breakdown by address block and their update counts is given in Table 2.
Table 2: Update Packets from RFC1918 Sources

RFC1918 block    #IPs   #Updates
--------------------------------
10/8             1554     216408
172.16/12         589      34844
192.168/16       1234     297734

Total            3378     548995
While this is a small fraction (1%) of the total update attempts in the data set, it has slightly different characteristics. The average number of updates per IP address is 162, double that (81 updates/IP) for the whole sample. This is not surprising because of the non-uniqueness of the senders' source addresses causes data from different machines to be coalesced and results in an undercount of the hosts leaking their private addresses onto the Internet. This will not influence our conclusions, because the number of updates contributed by this segment of host population is so small. If each host in this group were to contribute 80 updates, as in the rest of the sample, the total number of RFC1918 addressed hosts would increase to about 7,000, which is less than 1% of all IP addresses (1.2M) observed.

Table 3 below shows the RFC1918 IP addresses that sourced the largest numbers of updates over the 3.6 day period of measurement.

Table 3: Most Popular RFC1918 Addresses Seen

IP address	#Updates
------------------------
192.168.0.186     29196
192.168.206.2     14686
192.168.19.6      11294
192.168.0.1       10866
10.0.0.1          10335
10.0.1.1          10298
10.44.72.110      10148
192.168.0.2        9056
10.191.1.2         7842
192.168.50.31      5679
 ...
Addresses like 192.168.0.1 or 10.0.0.1 are popular because they are first in their block. Addresses like 192.168.0.186 are likely to be either a misconfigured host spewing lots of update traffic or a default address assigned by a DSL or cable modem manufacturer. The first address from the 172.16 block is number 27 in the list, and contributes 2632 updates. The first 36 addresses in the list all have counts over 2000 and contribute a total of over 200K updates (36.6% of the total.)

Updates per source IP and source AS

In this section we explore the source of the update attempts and try to classify them by their layer 3 attributes such as IP address, port, network prefix and autonomous system (AS).

One way to tackle this problem is to see whether the machines are in address space allocated to end users or to corporations. Traditionally class B space was allocated to universities and medium sized businesses. Many class B allocations happened before allocations in class C space and the upper half of class A space. Figure 5 below shows the distribution of IP addresses that are the source of update packets. The bands of points correspond loosely to IP address allocation policies.

 refer to hwb address space plot 

***** change the figure title as follows *****
Figure 5: IP Addresses Responsible for DNS Update Attempts
RFC1918 reverse zones, Palo Alto, CA
***** and use initial capital in axes labels *****
***** and make the x-axis label be Source IP address, first byte *****
Figure 5 shows the counts of updates originating in IP addresses that have one byte in common (/8s, squares) or two bytes in common (/16s, dots). The largest individual /8 contribution comes from the 24.0.0.0/8 block that is the cable companies' traditional address space. Many newer allocations with first byte between 60 and 68 also belong to broadband end-user connectivity providers. These users have a connection with a large enough bitrate to put multiple computers on an internal network and typically use RFC1918 addresses for numbering their private networks since providers charge extra for real IP addresses. Not being professional system administrators, they are likely to use whatever defaults are provided by the vendors of their operating systems. This suggests an explanation for the prominence of update counts in those blocks.

Classful statistics

Table 3.5 below shows the number of IP addresses in our sample that come from each of the traditional Class A, B, and C ranges.
Class &     #IPs &   Percent & #Updates & Percent
--------------------------------------------------
    A &   507541 &   42.2 & 47263887 &   48.2 \\
    B &    65177 &    5.4 &  3048519 &    3.1 \\
    C &   631432 &   52.4 & 47787751 &   48.7 \\
  Tot &  1204150 &  100.0 & 98100157 &  100.0 \\
Class B networks are rarely the source of RFC1818 updates (only 5% of all IP addresses and 3% of all updates come from class B sources.)

Sources, registry geographic data

Recent Internet geography is based on the notion that a part of the world is associated with a particular registry. This may not exactly match geographic boundaries, but does in most cases.

The major established registries are APNIC (Asia-Pacific), RIPE (Europe, Middle East, and the former Soviet Union) and ARIN (the rest). As of mid-2002, NICs (Network Information Centers) for Latin America and Africa are in the process of becoming fully operational.

To analyze updates per registry region, we used the tables of allocated address blocks dated April 1, 2002, available from ARIN (ftp://ftp.arin.net/pub/stats/). We converted all IP addresses that attempted to update hazel to their respective countries of origin and continents. With minor exceptions, the blocks in the tables are unique and those few that are common to two registries have the same country information (for details, see [5].)

Before the existence of RIPE and APNIC, ARIN allocated address blocks to Asian and European countries. We included these with the RIPE and APNIC data. Some IP addresses assigned to companies registered in one country and having equipment in another may be misplaced through the use of registries' tables, but their number is very small. (CAIDA geographic studies [7] [8] are usually done with Netgeo or its commercial counterpart, Ixia's IxMapper, that try to resolve these ambiguities.) However these tools are not universally available and our analysis is clearer by using a publicly accessible source for address mappings. We compare the accuracy of the two methods of IP to continent identification in a later section; the registry method is sufficient for our analysis.

Table 4 below shows the number of sources (IP addresses) and the number of attempted updates generated by those sources for each registry area.

Table 4: Hosts and Update Attempts by Continent

Total hosts:   1204150	**** these numbers dont add up, 1204162 (+12)
Total updates: 98100157 **** 97925805 (-174352) not the rfc1918 total 548995

Region   #Hosts	Percent	#Updates  Percent
-------------------------------------------------
America	 327616  27.2%	 49029151  50.0%
Asia     372974  31.0%	 25041172  25.5%
Europe   484227  40.2%	 22314423  22.7%
Unknown   19345   1.6%	  1541059   1.6%

Total  1204162/1204150   97925805/98100157  **** fix, also update % dont add up
Unknown addresses include RFC1918 sources discussed in the previous section and IP addresses not found in registries' lists of allocated blocks. At least one matching IP block is missing from ARIN table, even though it is is present in Whois databases, and contains IPs with assigned DNS names.

Figure 6, below, shows different regional patterns of diurnal and weekly variation in the flow of RFC1918 updates. It is a mixture of singular spikes and smooth periodic patterns; the spikes are probably automatically generated, for example by many DHCP leases expiring at the same time, while the smooth swells are more likely human-related events such as turning on the computer at the beginning of the day.

As the above plot shows, the spikes coincide with the midnights for time zones in major centers of Internet user population density. We checked two such midnoghts and found that samples taken in first 6 minutes after the midnight for US East and West Coast tend to underrepresent companies located outside US. For example, Telstra (AS 1221) occupies 4-th place in update sources for July 27 nighttime traffic (3% of all IPs). In midnight EDT and PDT samples it is found at 11-th place (2% of all IPs). Swiss IpPlus moves from 7-th place to 23-rds, resp. 19-th place. American and Canadian ISPs move up at the same time, but this move appears to be only marginally dependent on whether they serve one or both US coasts, and which of them. Most prominent ASes in both EDT and PDT ssmples are Pacific and Southwestern Bells, which are closest to the server at ISC, Telus from Canada (serving West coast and Ontario regions) and Cablevision from New York state. ASes like Megapath (AS 23215), and XO (AS 2828), Earthlink and Bell Advanced (Canada) occupy high positions in both midnight samples as well. In any event, it does not look as if the midnight spike is caused by a few ISPs resetting all their DHCP leases.

***** change the figure title as follows *****
Figure 6: Update Attempts per Minute by Registry
RFC1918 in-addr.arpa reverse zones
DNS root server f.root-servers.net
***** and use initial capital in axes labels *****
****** make the ticks on the x-axis the same as in figure 1, 4 is ok,
	6 is ok, just make them the same.
****** dont need both of these figures, choose one, either 7 days or 26.

The large spikes of updates occur near midnight in the various time zones where many Internet users are located. We see four in America an hour apart with the east and west coasts dominating the middle of the country, three in Asia and two in Europe - one in Britain and another in the rest of Western Europe.

The smooth patterns closely resemble weekly life cycles of individuals in respecitive countries. These updates appear to occur at times when people turn on their computers. In particular, update activity in Asia and Europe has much sharper rise an the beginning of a business day than in America. This may be caused by larger number of time zones in America, but may also reflect more uniform daily behaviour of people in Asia and Europe.

***** could this be also that in america people have
computers at home while in europe and asia they may
only have access to computers at work?
*****
The weekend in Europe is characterized by much lower Internet use that non-weekend days, whereas the activity pattern in Asia is not much different between weekends and weekdays. This may be influenced by countries where Saturday remains a (mandatory or voluntary) working day. A large surge of Asian activity is associated with the onset of Monday's business hours.
 ***** i dont think you can say this unless you have looked
at other mondays and dont see that big a spike. ******
As this is the first Monday of the month, this suggests that vacations is some countries may be scheduled on whole-month basis. We also see two abrupt transients in May 29 and early May 30 European updates. Those are most likely associated with routing changes that influence which of the anycast servers is closest to the source. BGP routing makes decisions based largely on the length of the AS path to a destination. A change in AS path length, even if it is the result of path prepending (a common practice in traffic shaping) will influence the choice of anycast server at any particular update source.

The frequency of updates per IP source address ranges widely between regions. American sources generate about 150 updates per IP in a week; Asian sources generate 67 and European 46. Assuming that misconfigured DNS servers are equally frequent in different regions, the larger updates-per-IP ratio for America suggests the number of computers on networks behind DHCP servers is larger.

Sources, IxMapper geographic data

We now compare a source's geographic location determined from the registries' tables with that given by IxMapper. The breakdown for IPs by region according to IxMapper is shown in Table 5:
Table 5: IxMapper -- Update Attempts by Continent

Region          # Hosts  Percent 
-----------------------------------
Europe          453415   37.6%
North America   340285   28.3%
Asia            332321   27.6%
Oceania          31687    2.6%
South America     5497    0.46%
Africa            2076    0.17%
Unknown            164    0.01%
Unresolved       38716    3.2%

Total          1204162
If we use the Registry classifications: Europe, Asia, America and Unknown, to aggregate the IxMapper data we get 30.2% for Asia (Asia + Oceania), 29% for America (North + South America + Africa), and 3.2% unresolved (including RFC1918 source addresses.) The per-region percentage of IP addresses inferred from the registry data shown in Table 4 and that from IxMapper are close within a few percent, except for unresolved addresses. Almost all registered addresses unresolved by IxMapper are in America (167 blocks, 33K addresses). Despite that fact, the IxMapper count of IPs in the Americas and Africa is larger than registries' count by about 20K. There are 14K addresses which are not present in April 01 registries' tables, but which IxMapper locates in Japan (7K), Italy (2.8K), in the US (1.6K) and the rest mostly scattered in Europe. This again points to incompleteness of some of these registry tables.

****** should we do countries based on the registry tables ???? *****
We used IxMapper to identify the country where the update attempts originated; the top 10 countries are shown in Table 6 below.
Table 6: IxMapper -- Update Attempts by Country

Country       # Hosts  Percent  CDF P(>=X)
USA            320981   27.5    27.5
China          147776   12.7    40.2
Japan          126960   10.9    51.1
Switzerland     73630    6.3    57.4
United Kingdom  59595    5.1    62.5
Netherlands     58077    5.0    67.5
Germany         53559    4.6    72.1
Austria         50802    4.4    76.5
Spain           46270    4.0    80.5
France          37432    3.2    83.7
The next 10 countries are Australia, Portugal, Italy, Taiwan, Canada, Hong Kong, Arab Emirates, South Korea, Poland and Belgium. The top 20 countries account for over 95% of the sources of updates.

Source ports used by updating hosts

****** this is may 16 data, we need to stay consistent with our data sets ****
****** also need some words :-)

44.3% of all updates come from port range 1024-5000 Sharp edge: 17 times more updates come from port 5000 than from 5001 Entropy is 14.9 bits, close enough to maximum (16 bits) for uniform distribution.

AS contributions

Using the RouteViews BGP tables, we can see that hosts from 3309 different ASes attempted to update the RFC1918. In Table 7 we show the ASes that contribute half of all updates.

Table 7: Top 20 AS sources of RFC1918 updates

  AS#   #Updates   Percent Cumul.%  AS Name, Country
 ---------------------------------------------------
 4134    7329178    7.51    7.51    CHINALINK, China
 3352    6166266    6.32   13.84    Ibernet (TDE), Spain
 7132    4559748    4.67   18.51    SW Bell, US
 5673    3271669    3.35   21.86    Pac Bell, US
 5676    2936073    3.01   24.87    Pac Bell, US
 4813    2765227    2.83   27.71    China Telecom (Guandong)
 4812    2644362    2.71   30.42    China Telecom (Shanghai)
  852    2176242    2.23   32.65    Telus, Canada
 6128    2083593    2.14   34.79    Cablevision, US
 2828    1855065    1.90   36.69    XO, US
11427    1753091    1.80   38.49    Road Runner, US
 7843    1504131    1.54   40.03    Adelphia, US
 4760    1413921    1.45   41.48    Netvigator, Hong Kong
 2914    1393102    1.43   42.90    Verio, US
 1221    1378306    1.41   44.32    Telstra, AU
11509    1226816    1.26   45.58    Pajo, US
 4436    1142608    1.17   46.75    SantaCruz Community I't, US
11426    1135058    1.16   47.91    Road Runner, US
10994    1129898    1.16   49.07    Time Warner, US
 2548    1091393    1.12   50.19    Business Internet, US
We see that more half of the updates come from 20 ASes, which is only 0.6% of the total number of autonomous systems. On that aggregation level, RFC1918 update traffic is clearly dominated by elephants. The largest numbers come from incumbent telecom carriers for respective regions, and from cable companies. Backbone ISPs produce fewer updates. This is not surprising since these ISPs cater mostly to medium and large business customers who often have their own AS number, fewer, but larger networks and use globally unique addesses. Even when these corporations use RFC1918 space, they are more likely be properly configured. The cable and DSL companies charge for globally unique addresses which encourages customers to use RFC1918 addresses internally, thus creating more potential for leakage. Countries, such as China, that are relatively late in joining the Internet have trouble getting enough global address space allocated from the registries.

In terms of the IP addresses of the hosts sending the update requests, the bias is even higher. The 20 top ASes contain over 54% of all IP addresses from which updates were sent. See Table 8.

Table 8: Top 20 ASes Updating RFC1918 Zones, by #Hosts

 AS  # Hosts   Percent  Cumul.%      Name, Country
---------------------------------------------
4134    74758   6.2262e-02   6.2262e-02  CHINALINK, China
3352    47647   3.9683e-02   1.0195e-01  Ibernet (TDE), Spain
3303    47379   3.9460e-02   1.4141e-01  Swisscom IP-plus, Switzerland
7132    46445   3.8682e-02   1.8009e-01  SW Bell, US
4713    44828   3.7335e-02   2.1742e-01  NTT Communications, JP
5673    41129   3.4254e-02   2.5168e-01  Pac Bell, US
4813    40379   3.3630e-02   2.8531e-01  China Telecom (Guandong)
5388    37874   3.1544e-02   3.1685e-01  Energis Squared, UK
8447    33079   2.7550e-02   3.4440e-01  TELEKOM-AT, Austria
3209    26932   2.2430e-02   3.6683e-01  Arcor, Germany
4812    26720   2.2254e-02   3.8908e-01  China Telecom (Shanghai)
1221    26106   2.1742e-02   4.1083e-01  Telstra, AU
5676    25183   2.0974e-02   4.3180e-01  Pac Bell, US
3215    23774   1.9800e-02   4.5160e-01  France Telecom, France
4355    21949   1.8280e-02   4.6988e-01  EarthLink, US
4760    20428   1.7014e-02   4.8689e-01  Netvigator, Hong Kong
8737    18674   1.5553e-02   5.0245e-01  Planet Media, Netherlands
3462    18094   1.5070e-02   5.1752e-01  GSA Data Communications, US
6730    17210   1.4333e-02   5.3185e-01  Sunrise, Switzerland
4732    11125   9.2655e-03   5.4112e-01  Dion KDDI Japan
****** changed table, dont number lines, need to change from sci notation
fractions to percent. *****
Note that the largest update contributors in terms of number of updates have only 9 ASes in common with the largest contributors in terms of the number of hosts sending the updates.

DNS names

We looked up the DNS names for source IP addresses found in 8M updates packets sent on May 16, 2002.
 ***** new data here maybe, this is may 16  ******
***** and do cdf instead of ccdf for consistency *****
***** and do percents with not so many significant places *****
***** and dont make the names right justified, so left justified ****
Note that those names belong to the routed (globally unique) IP addresses from which the updates were sent. The DNS server logs from hazel contain the IP address of the updating host, and the RFC1918 zone that the packet attempts to update, but no details on the update payload.

The following DNS names are present in half (54%) of the 222364 source IP addresses observed on May 16, 2002.

$ IPs    Fraction     ccdf P(X>=x)
17375    7.813765e-02 1.000000e+00                         rr.com
14063    6.324315e-02 9.218624e-01                    pacbell.net
12918    5.809394e-02 8.586192e-01                 nombres.ttd.es
9012     4.052814e-02 8.005253e-01                     swbell.net
5622     2.528287e-02 7.599971e-01                  optonline.net
4794     2.155925e-02 7.347143e-01               interbusiness.it
4140     1.861812e-02 7.131550e-01                 netvigator.com
4081     1.835279e-02 6.945369e-01                         tin.it
3878     1.743987e-02 6.761841e-01                      pol.co.uk
3850     1.731395e-02 6.587442e-01             highway.telekom.at
3675     1.652696e-02 6.414303e-01                 bigpond.net.au
3559     1.600529e-02 6.249033e-01                 mindspring.com
3548     1.595582e-02 6.088980e-01                      libero.it
3405     1.531273e-02 5.929422e-01                   adelphia.net
2785     1.252451e-02 5.776295e-01                   arcor-ip.net
2627     1.181396e-02 5.651050e-01                      attbi.com
2562     1.152165e-02 5.532910e-01                   rima-tde.net
2435     1.095051e-02 5.417694e-01               dialup.online.no
2294     1.031642e-02 5.308188e-01                    dial.wxs.nl
1983     8.917810e-03 5.205024e-01                      telus.net
1978     8.895325e-03 5.115846e-01            turboline.skynet.be
1864     8.382652e-03 5.026893e-01                       snet.net
1782     8.013887e-03 4.943066e-01                   megapath.net
1716     7.717077e-03 4.862927e-01                  shawcable.net
1657     7.451746e-03 4.785757e-01                        dsl.net
1613     7.253872e-03 4.711239e-01                    direcpc.com
1488     6.691731e-03 4.638701e-01                        cox.net
Again, as in the AS contribution analysis, relatively few (23) second-level domain names account for more than half of the hosts originating updates.

The webpages of these organizations, reveal that they are almost exclusively cable and DSL providers.

In addition, DNS names containing one of the words: catv, cable, client, cust, dial, direc, dsl, host, hsia ("high-speed Internet access"), nat, online, pool, port, are present in 113847 (51.2%) of the DNS names of hosts attempting to update the RFC1918 zones.

We examined all 222K pairs of source IP addresses and corresponding DNS names and found that the full DNS name quite often contains a numeric IP address in decimal notation. The values of individual bytes are usually connected by dashes. Hex adresses are also used, albeit less often.

More than 60% of all DNS names in the data contain 7 or more digits. When dots and dashes are viewed as field separators, 98776 or 44% of the names contain at least 4 fields of digits. 114333 or 51.4% names contain at least two fields which are just digits. This indicates that many, if not most, of the DNS names present in RFC1918 updates are generated automatically from IP addresses, or from internal customer IDs.

OS breakdown

We used Ofir Arkin's fingerprinting utility Xprobe (www.sys-security.com/) on a list of 413 IP addresses collected on 2002-07-12. IP addresses were probed as soon as they appeared in the log. The OS breakdown returned by Xprobe follows.

106:  No response
 77:  Unknown, not Microsoft Windows
 56:  Windows 2k. SP1, SP2/Windows XP
 47:  Windows Based.  Open/Net/FreeBSD/DG-UX/HP-UX 10.x etc
 33:  Novell (FreeBSD 4.3-current(?))
 31:  Ultrix!HPUX 10.20(?)
 16:  3Com SuperStack II Switch SWNBBSI-CF,11.1.0.00S38 | Nokia IPSO
      +3.2-2.3.1 releng 783-849 | Ricoh Aficio AP4500 Network Laster Printer |
      +Linux 2.0.x/2.2.x/2.4.x | Shiva AccessPort Bridge/Router Software V.2.1.0
 11:  OpenBSD 2.4-2.5!NetBSD 1.5, 1.4.1, 1.4
 10:  AIX
  5:  Windows NTsp4+
  4:  Windows 95
  4:  Linux 2.2.x/2.4.5+ kernel
  3:  Cisco IOS 11.x-12.x
  2:  Little endian BSDI/NetBSD 1.1.x-1.2.x! MacOS X 1.0-1.2
  2:  HPUX 10.x
  1:  Windows ME
  1:  Unknown Unix (Accuracy dropped) or MacOS X
  1:  ULTRIX
  1:  NetBSD
  1:  Linux kernel 2.2.x! 2.4.x! assumed.
  1:  IBM OS/390
413   Total
Although we could not resolve the ambiguity in the fourth largest count (Windows Based. Open/Net/FreeBSD/DG-UX/HP-UX 10.x etc), it appears that there is no dominant operating system in the set. Note that IP addreses in this sample were not weighted by their number of updates. Howewer, when we did that in previous experiments we got a qualitatively similar picture. Machines that did not respond were presumably on xDSL or cable modem connections and had simply been turned off.

Miscrosoft Windows based platforms were recognized in 66 instances out of 413 (16%). If in addition we assume that about half of the "Windows-Unix" group is in fact Windows, their number increases to 90, or 22%. Furthermore, if the statistics of the non-responding destinations is similar to the responding ones, dividing the assumed number of Windows by the number of responses gives about 30% of Windows boxes. This gives a rough idea of how many Windows boxes sent these updates. Unix boxes make up a comparable number of systems with the same (factor of 2) degree of uncertainty. Notably missing from the list are Apple systems, but through Mac OS 10.1 they do not do dynamic DNS updates at all.

Our operating system fingerprinting efforts did not yield a very coherent picture of the sources of DNS update attempts. In section xxx, we describe a laboratory test network that we built to try to understand the sources, and regularity of update attempts.

Analysis of source contribution sizes

Data analysis often begins with finding outlying instances among source contributions. In our case, this would represent the most badly broken hosts. However, the actual contribution of these fringe sources may or may not be a decisive summand in the total volume of data. It is therefore necessary to distinguish between hosts that contribute large amounts of data because of their extreme propensity at generating this data (elephants), hosts which generate small amounts of data but are present in extremely large numbers (mice), and hosts in between (workhorses), we devised a simple presentation technique that caters to all three groups. We estimate the relative importance of each group by separating hosts' contributions to the total traffic into intervals (bins) whose boundaries are successive powers of 2. We then evaluate the number of hosts and amount of traffic in each bin.

Figure 7 below shows the orders of magnitude of the elephants, mice and workhorses contributions. The dashed vertical lines mark the middle of the distribution, that is half the sources (or updates) lie to the left of the line and half to the right.

Summarizing, for the weekly update log from May 28 to June 4, 2002:

The number of sources that amass 1/2 of all updates is relatively high (1.6% or about 20K). The average is 81.5 updates per source. 95% of the updates come from 20% of the sources which is relatively large compared to other cases of Internet traffic volume disparity. This shows that the sample is dominated by midrange contributions (1000 being a square root of 1 million) -- workhorses, not elephants.

Interarrival Times

The knowledge of interarrival times is important for the purpose of modeling the flow of DNS updates and the flow of requests to nameservers in general. A well developed mathematical theory [Kleinrock] deals with Poisson processes whose interarrival time distribution is exponential. The equations for Poisson processes admit simple analytic solutions for expected queue size and service time. We will show the distribution of interarrival times is for the most part exponential ***.
*** Footnote:. In the general scientific context, 
exponential interarrival distributions represent
the simplest model for
a flow of events that occur independently,
at random, and with a constant average arrival rate.
*** end of footnote ****
Cases in which the distribution significantly deviates from exponential are rare. They occur when large gaps between requests are present with higher frequency. Figure 8 shows the distribution of interarrival times of one million updates between 6:28 and 8:12 AM on Saturday, June 1, 2002. The distribution is very close to exp(-x/6.5) which translates into an average of 6.5 milliseconds or 170 updates per second.

The distribution shown in Figure 8, deviates from exponential for very small interarrival times, in which case, the probability of packets having 0 or 1 ms interarrival time is less than that predicted by the exponential model. There are also a few longer interarrival times in the range of 100 ms. Figure 9 below, compares 21 interarrival time distributions for measurements taken at approximately 8 hour intervals. Most of the distributions are very close to exponential, with only one deviating significantly in the range of interarrival times exceeding 70 ms.

 *****  andre may change this figure *****

The distribution of interrival times for the larger dataset (26 days) is very close to an exponential when the interval is less than 0.1 sec. In crosses over to power funcion in the range of larger times. The largest interval we saw in 26 days is 64 sec.

Periodic updates

The ideal random mix of interarrival times described above might lead to the conclusion that the individual sources generate updates at random as well. It comes as a surprise that many of the updates exhibit regular periodic patterns. We have already discussed the beginning-of-the-hour spikes positioned at midnight for respective time zones. Yet another set of patterns arise from the updates with periods being multiples of 75 minutes, 1 hour and periods under 1 minute.

To see how many of the update sources are periodic, we analyzed the average update rates for sources present in the 26-day July dataset. An average update rate is the number of updates from given source minus one, divided by the timestamp difference between last and first update in the series.

Figure 10 shows the density of updates vs. the update rate with a resolution of 20 bins per decade (a factor of 1.122 between successive bin boundaries). We took only sources whose update series lasted longer than 1 hour. This resulted in the removal of 882,633 sources (1,582,417 updates) leaving 1.45M hosts with 302M updates over 26 days in July 2002. The solid line is updates and the dashed line source IP addresses.

The two large spikes represent periods of 60 minutes and 75 minutes. Five percent of all updates come from sources with average update rates in the range 1-1.122 per hour (60 minute cycle) and 8% from sources with 2.24-2.51 updates per hour. This 8% actually matches a cycle of 3 updates in 75 minutes. The next noticeable spike is at twice this rate. It is most likely caused by networks with two hosts in RFC1918 space, for which 6 updates are generated in 75 min.

As the dashed line shows, most of the IPs are sending updates at much lower rates; half of the IP addresses are sending at a rate of 0.09 updates per hour or less. However, half of the updates come at rates of 5 or less per hour. The rates of 1 per hour and 3 per 75 min. account for 6.43 and 3.53% of all observed sources. Neither of these numbers, however, reveals how strict or loose the periodicity is, nor the spacing of updates within a period.

It is difficult to find the precise period of updates because every now and then an update is missing from the series, either because a host is switched off, a DNS packet is lost in the network or some activity on the source network is interfering with updates. Furthermore, often an extra sequence of updates becomes interleaved into the series because another host becomes active on the private network. For that reason we could not use the Fourier transform on update arrivals to extract a period. The lack of coherence in update arrival times would defeat the amplifying properties of the transform. We tested two approaches to finding the update period. Both of them evaluate a binary autocorrelation function. By determining the lag (shifts) at which the autocorrelation is maximal, we find how many updates constitute a period. We then recover the actual (temporal) period from the original interarrival times.

In the first method, we sorted each logfile*** by the IP address of the source of the update packet, and only used sources with 15 or more updates.

*** footnote
An update logfile usually
covers about 8 hours and contains up to 5M updates.
***
We then computed sequences of update interarrival times for each source, rounding them to whole minutes. We calculated how many of these rounded values will coincide with the update sequence shifted by 1...9. Those sources which for some shift match in more than 90% of rounded inter-update times were classified as periodic. The sum of minute counts over the lag (shift length) was taken as their period.

Figure 11 above presents a histogram obtained by that method. In that example, we used a 7.5 hour logfile from early Wednesday May 29, 2002 that contained 4.67M updates and 240K source IPs. Of those, 78K sent 15 or more updates over the duration of the log, of which 32K (40%) were found periodic. Among the periodic updates, 2001 (60%) have period 60 minutes, 22333 (70%) period 75 minutes and 10% a period of 76 minutes.

In the whole set of 21 logfiles, 38-56% of the sources were periodic. Of these, 6-12% were 60 minute periods, 64-70% 75 minute periods, and 1% the 76 minute period.

This approach discovers a smaller percentage of periodic sources when run over the whole one-week dataset. 314996 sources were found to have 15 or more updates; out of those, 86580 (27.5%) are identified as having one period. Among those IPs, 32456 (38%) have a period of 60 minutes, 37575 (43%) 75 minutes, and 5503 (6.4%) 76 minutes. The significant drop in the fraction of 75 minute periods is most likely caused by occasional missing updates and/or rounding errors in converting interarrival times to whole minutes, that destroy the periodicity of minute counts.

As a remedy against these variations, we chose to use a more robust algorithm, which finds the fraction of periodic updates from one source as follows:

0. Take all sources with ten or more updates.

1) Take the sequence of interarrival times expressed as integers in milliseconds.

2) Convert them to logarithms base two truncated to integer parts *** *** footnote: We add 1 to the truncated integers to disambiguate them from 0 which represents 0 milliseconds *** end footnote.

3) For each shift of the update sequence by 1, 2, ..., 30 updates, count the number of positions in which truncated logarithms in the original and shifted sequences are equal.

4) Find the lag (shift) at which this overlap is maximal. Discard the source if the maximal count is less than 10% of all its updates.

5) Find the longest contiguous stretch in which every entry equals its shifted counterpart.

6) Extract the interarrival times from the beginning of this longest stretch. Take the sum of these interarrival times as the period.

While this seems involved, it was the only method we found that worked well. The problem is that update data contains interleaved sequences sent on behalf of several local hosts that can join and leave the private network at arbitrary times. This, together with the occasional missing or extra updates, requires a very robust algorithm. Clock skew in the source hosts also contributes to the noise that must be filtered out. That is why we chose to match binary logarithms of data rather than numeric values, and relaxed the threshold condition for source's periodicity (matching 10% of the updates as opposed to 90% in the first algorithm.)

Figure 12, below shows the number of IP source addresses which send a significant fraction of the updates in periodic intervals and the number of updates produced by these sources. 360710 source IP addresses were included in the analysis; each source contributed at least 10 updates. The largest observed period was 75 hours.

The pattern of the 75-minute update cycle is especially revealing. It usually involves three updates, made at intervals of 5, 10, and 60 minutes. The most likely cause is that an attempted update (at "0" minutes) is repeated after timeout of 5 minutes and then again after doubling the timeout to 10 minutes, after which the system falls back to a default of 60 minutes.

There is also a strong spike which represents a simpler nameserver behavior in which updates are sent strictly at 60 minute intervals. The most frequent periods and their prevalence is shown in Table 8.5 below.

Table 8.5: Update periods and their prevalence

Period       % Sources         % Updates
----------------------------------------
 0 minutes        8%                 24%
60 minutes       24%                 14%
75 minutes       34%                 28%
76 minutes        8%                  5%
These and nearby periods account for 3/4 of the sources and updates.

The largest contributions to the computation of periodicity are shown in Table 9 below. The first line is the 3 part period 75 minutes; the second is the single 60 minute period. Later lines are the smaller spikes in Figure 12, which correspond to multiple computers on the private network with 75 minute periods. Note that 82% of the updates are in sequences with periods listed in the table. Table 9: Update Contributions Determining Periodicity

Updates   Update      Percent  Cumulative
/Period   Count       of Data  Percent
-----------------------------------------
3       2.44132e+07   25.61    25.61
1       2.10346e+07   22.07    47.68
2       8.76354e+06    9.19    56.88
6       8.56021e+06    8.98    65.86
9       4.15423e+06    4.36    70.21
12      3.86912e+06    4.06    74.27
18      3.07479e+06    3.23    77.50
4       3.02154e+06    3.17    80.67
15      1.91918e+06    2.01    82.68
We can also see the periodicity if we look at update rate relative to the port spectrum. In Figure 13 we plot the update rate vs. the largest port number used by a particular host and indicate with the plot symbol the number of updates contributing to any update rate and port range.

Notice the black bands with update rate between 1 and 10 per hour and between 1 and 50. The first corresponds to a TCP stack that uses ports up to 5000 and the second a stack that uses the full port range.

 need andre sentences here describing the periodicity better 

Test Laboratory

Due to the numbers of hosts involved and the fact that they are predominantly from home or small business computers via xDSL or cable modem connections, we suspected that more of the update traffic came from Windows boxes than was indicated by the fingerprinting program Xprobe. To understand the situation better, we set up a laboratory experiment in which we installed virgin Windows boxes running the Win2k desktop, Win2k server, and WinXP operating systems. We also had machines at various patch levels and had one of the systems configured as an Active Directory server. We captured packet traces for all traffic on this test network over a period of several days. The packet traces showed the xxx machine sending DNS update packets to the nameserver configured in DHCP at regular intervals. The update sequence was periodic with bursts of packets on 5, 10, 60 and 75 minute boundaries as seen in the data arriving at hazel, the RFC1918 authoritative server. Several back to back packets were sent at each period. The source port ranges used by this test network machines matched the peaks in the 1025-5000 range of Figure xxx and would correspond to the dark band at port 5000 in Figure 13. We also examined packet traces between the campus' Active Directory server and their DNS primary nameserver. These showed xxx. These packet traces clearly said that Microsoft Windows boxes were guilty of sending regular periodic DNS dynamic updates. However, we have not shown that other operating systems don't do it too. To do this, we built another test network that included several Unix/Linux operating systems and used DHCP to address them. The resulting packet traces showed xxx.
 this needs to be done when i am back and can work
with brian or else grant/dan needs to do it. 

Conclusions

The prevalence of RFC1918 DNS updates is a sign of widespread misconfiguration of nameserver and DHCP software. Precise periodic update attempts are much more likely come from software errors or misconfiguration than from human actions. The majority of updates comes from hosts with periods of 0, 60 and 75 minutes. These periods, were confirmed by several independent algorithms (average update rate, discrete autorrelation) and for both host and update counts. They were also observed on a test network running Win2K in a default configuration.

An overwhelming majority of the hosts that are trying to update RFC1918 zones at the AS112 server are connnected to the Internet via DSL and cable modem providers. Since these companies serve almost exclusively home-based users and (to lesser extent) small business customers we conclude that the bulk of RFC1918 updates originate in home office and small business environments. This is further corroborated by diurnal and weekly variation in the flow of updates, by the prevalence of personal operating systems (such as Windows and Linux), and and by the generally small numbers of updates contained in one update period (for each source IP), reflecting small number of hosts on local LANs getting their addresses from the same DHCP server. We found that the process of update arrivals has three specific timescales. On the timescale of milliseconds interarival time of all updates is close to exponential distribution with average time 6.5 ms for May-June and 8.5 ms for July data. On the timescale of minutes, individual sources display periodic behavior, with dominant interarrival times of 5 min, 10 min, 30 min and 1 hour. Finally, on the timescale of hours updates from hosts in different time zones increase by a factor of four over 6 min. intervals immediately following midnight local time. of which most prominent spikes can be identified with time zones in US, West Europe and Pacific Asia.

***** need to have more conclusions wehn we get the windows
boxes traced here in the test lab at sdsc.  jeff and tom are
setting them up for us.  also need these conclusions here
to tie back to paper a bit better *****

Acknowledgements

Many thanks to Paul Vixie who initiated the project of AS112 and provided CAIDA with this data set, and to Peter Losher of Internet Software Consortium (isc.org) whose help was instrumental in handling the technical matters. We benefited greatly from discussions with Brian Kantor of UCSD Networking Services, and with CAIDA elves. Public availability of Xprobe by Ofir Arkin greatly simplified the task of OS fingerprinting. Ken Keys' Xprobe redesign and execution, and Ryan King's help with Xprobe data collection and analysis are also highly appreciated. Thanks for Tom Guptill and other system administration personnel at SDSC for setting up Widnows testbed which confirmed that most of updates come from default settings in Windows machines. Thanks to Piet Barber for helping us understand the dynamisc of the updates at the Verisign AS112 nameservers. Finally, we want to thank Nevil Brownlee of CAIDA and University of Aucland, New Zealand for presenting a preliminary version of this work at IETF 54 in Yokohama in July 2002, and participants of the NANOG mailing lists for two discussions of then current versions of this work in July and September 2002.
***** add piet barber if we use his dig fingerprinting
and jeff and tom for their win2k help *****

References

***** need to flesh out the references too *****
[1] IEPG meeting - July 2002. http://www.potaroo.net/iepg/july2002/

[2] Nevil Brownlee, kc claffy, and Evi Nemeth. DNS Root/gTLD Performance Measurements. Usenix LISA, 2001.

[3] Nevil Brownlee, kc claffy, and Evi Nemeth. DNS Measurements at a Root Server. Globecom 2001.

[4] Marina Fomenkov, kc claffy, Bradley Huffaker, and David Moore. Macroscopic Internet Topology and Performance Measurements From the DNS Root Name Servers. Usenix LISA, 2001

[5] Andre Broido, kc claffy. Inter-domain routing evolution - Episode II: Dark Space" (ARIN IX, Apr 02)". http://www.caida.org/outreach/presentations/

[6] Brian Kantor, UCSD Network Services, private communication, July 4, 2002.

[7] Bradley Huffaker. Skitter daily summaries. http://www.caida.org/cgi-bin/skitter_summary/main.pl

[8] Bradley Huffaker, Daniel Plummer, David Moore, and k claffy Topology discovery by active probing. http://www.caida.org/outreach/papers/2002/SkitterOverview/

[9] Route Views archive. http://archive.routeviews.org/

[10] Andre Broido, Evi Nemeth, kc claffy. Packet arrivals on rate-limited Internet links. CAIDA, Nov.2000 http://www.caida.org/~broido/coral/packarr.html

[11] Constantinos Dovrolis, M.Jain. Bandwidth estimation, 2001.

[12] Dina Katabi, Charles Blake. Inferring congestion sharing and link characteristics from packet interarrival times. MIT LCS Technical Report, 2001.

[13] Mark Coates, Alfred Hero, Robert Nowak, Bin Yu. Internet tomography. IEEE Signal Processing Magazine, May 2002, vol.19, No.3, 47-65.

[NANOG] Discussion of RFC1918 updates. NANOG mailing list, April 2002. www.irbs.net/internet/nanog/0204/0450.html.