The traces used for this study were collected from the NASA Ames Internet
exchange (AIX) in Mountain View, CA [AIX] as part of an
NSF/NASA collaborative
effort with NLANR/MOAT. They were collected from one of
four (now five) OC-3 ATM links that interconnect AIX and MAE-West in San
Jose, CA.
Fig 1:
A diagram showing the location of the optical splitter used to collect the
data. Note that there are currently five links between NASA-Ames and MAE-West.
Thanks to Hans-Werner Braun and NLANR/MOAT for use of this figure.
This group of links form a striped connection between two DEC Gigaswitches,
with
an aggregate bandwidth of approximately a single OC-12 link. The Gigaswitches
use a proprietary scheduling algorithm for sending packets across this link,
but each packet is sent across an individual link inside an AAL5 PDU. This
means the scheduling inside the Gigaswitches happens at the packet level,
since all cells from a PDU are sent over the same link.
Consequently, the data we collect from this site is essentially sub-sampled
from the actual data traversing this link using the proprietary scheduling
algorithm inside the Gigaswitches. This algorithm is approximately
round-robin, but also depends on internal load characteristics in the
Gigaswitch switching fabric. However, we know that the distribution of
packets among the OC-3 links is not entirely uniform, since measurements of
the link
utilizations show two of them carry approximately twice the traffic of the
other two (measured by byte volume, not packet volume)
[Feldman98].
We assume that the Gigaswitch scheduling algorithm is independent of the
encapsulated protocol, e.g. not dependent upon packet length.
Because of this complex scheduling algorithm we are not able to accurately
estimate the number of conversations traversing the monitored links, or the
length of these conversations in packets or bytes. Consequently, we only
characterize the workload observed at AIX in terms of relative fractions
of packets and bytes.
The data collection system is essentially similar to the one used in
[Thompson97]. A Coral/OC3mon platform was
connected to one link in each
direction using optical splitters. The traces we studied were collected as
part of NLANR/MOAT's Network Analysis Infrastructure (NAI) project
[Braun98].
For each packet that passes the monitor, only the first ATM cell from the
AAL5 PDU is captured and written to disk. The first cell contains the first
40 bytes
of each packet, which is usually enough to extract the TCP or UDP port numbers
from the transport layer headers. However, the monitor does not verify that
the entire AAL5 PDU is carried by the link, and so estimates of the data
rate carried by the link may be inflated in the presence of cell loss.
Six to eight traces were collected each day, usually with a duration of 90
seconds each. The starting time for
each trace was set at equal intervals during the 24 hour period, and randomized
over a range of an hour at the beginning of each interval.
After collection, the traces are processed to remove any information that
might compromise the privacy of the individuals generating the traffic.
This processing masks the source and destination IP addresses, and deletes
all data from the IP payload except for a TCP or UDP header (if present), or
the ICMP or IGMP type and code fields. If the packet carries enough bytes of
IP header options, then the TCP or UDP port numbers may not be present in the
first cell of the PDU. In this case,
we ignore that packet in subsequent application workload analysis. Since
the fraction of packets with IP header options is typically less than 0.003%,
this doesn't seriously impact our measurements of the traffic fraction
generated by the most popular TCP and UDP applications.
We used CoralReef [CoralReef] to reduce each raw trace to a set of summary
tables that we archived for later analysis. The tables
include aggregate numbers such as the number of packets and bytes in the
trace as well as distributions of packet lengths and
the number of packets and bytes seen for each IP-layer protocol.
For TCP and UDP, we analyze application usage
using port address pairs. The packet traces available from
the NAI archive [Braun98] only include IP and transport layer headers, so
our methodology does not use encapsulated data to identify the application
that generated the packets. Traces in the NAI archive have had all payload
data removed to protect the privacy of Internet users.
In most cases, we have assumed that packets sent between any port number
higher than 1023 and a well-known port number below 1023 are generated by
the same protocol (e.g., HTTP on port 80). This matches
typical end host behavior, in which clients allocate ephemeral ports from the
range 1024 to 32767 [Stevens94].
For some of the protocols, we have condensed ranges of port numbers in both
the source and destination fields. For example, the RealAudio category
in the UDP table includes all traffic with destination ports between 6970 and
7170 inclusive [RealNetworks]. Unfortunately,
this range also includes the ports used by AFS, and so we
are potentially confusing an unknown amount of AFS traffic with
RealAudio. However, the majority of RealAudio traffic appears on
UDP ports 6970, 6971, and 6972, none of which are used by AFS. By only
considering traffic on UDP ports from this range that are not used by AFS,
the amount of
RealAudio traffic can be estimated independently from the amount of AFS traffic
that may be present as well.
We are currently investigating better techniques for
differentiating RealAudio and AFS traffic using
packet size distribution and packet inter arrival patterns,
and we hope to be able to conclusively differentiate between the
two in the future. A recent analysis of the traffic patterns exhibited
by RealAudio traffic [Mena00] has shown several parameters that may be
used to differentiate between RealAudio and other protocols. A further
study characterizing AFS traffic patterns needs to be undertaken to identify
the best metrics to use to separate the two.
For both TCP and UDP traffic, there is a significant fraction of traffic
that cannot be mapped to applications using well known port numbers.
Many protocols do not depend on well-known port numbers, but either use
a well-known service for negotiating the port numbers used by secondary
connections, or use arbitrary but fixed port numbers that are not registered
with IANA. The most popular application with negotiated port numbers
is passive-mode FTP, in which the client sends the port number to
use for a data connection over the command channel. There are many other
protocols that use similar behavior, such as Napster and Internet telephony
applications.
Most online games do not register well-known ports with IANA, but use
arbitrary port numbers above 5000. We have collected the port numbers used
by several of the popular games and use this information to estimate the
fraction of traffic generated by them. Our analysis of online game traffic
includes game traffic on the following UDP ports:
| Game | Ports Used |
| Half Life | any to or from 27005 |
| | any to or from 27015 |
| Quake 3: Arena | any to or from 27960 |
| Starcraft | 6112 to 6112 |
| Quake II | any to or from 27901 |
| | any to or from 27910 |
| QuakeWorld | any to or from 27500 |
| | any to or from 27001 |
| Unreal | any to or from 7777 |
Table 1:
UDP ports used by Online Games
As is the case with RealAudio and AFS, there are many possibilities for
confusion between game traffic and other applications when only port numbers
are used to make the classification. We assume that
there are no other protocols that preferentially use these same ports, and
that applications that ephemerally use these ports contribute equal amounts of
traffic across all traffic categories. This assumption carries significant
risks, and needs further analysis to fully evaluate its impact on our data.
-
[AIX]
Ames Internet eXchange,
http://aix.arc.nasa.gov/
.
-
[Braun98]
H.-W. Braun.
Towards a systemic understanding of the Internet organism: a framework for the
creation of a Network Analysis Infrastructure,
http://moat.nlanr.net/NAI
.
-
[CoralReef]
CoralReef home page,
http://www.caida.org/tools/measurement/coralreef
.
-
[Feldman98]
S. Feldman.
MAE-West Link Utilization Statistics,
http://www.mae.net/~feldman/gigaswitch/ames
.
-
[Mena00]
A. Mena and J. Heidemann.
An Empirical Study of RealAudio Traffic,
,
IEEE INFOCOM 2000,
http://www.isi.edu/~johnh/PAPERS/Mena00a.html
.
-
[RealNetworks]
RealNetworks RealSystem Firewall Support,
http://service.real.com/firewall/adminfw.html
.
-
[Stevens94]
W. Richard Stevens.
TCP/IP Illustrated, Volume 1: The Protocols
Addison-Wesley, 1994.
-
[Thompson97]
K. Thompson, G. Miller, and R. Wilder.
Wide Area Internet Traffic Patterns and Characteristics
IEEE Network, Vol. 11 No. 6, pp. 10-23, Nov/Dec 1997.
http://www.vbns.net/presentations/papers/MCItraffic.ps
.