- Ongoing Research
- Measurement Projects
- Routing asymmetry (2010)
- Taxonomy of Internet traffic classification papers (1994-2009)
- Analyzing UDP usage in Internet traffic (2009)
- Internet traffic classification methods (2008)
- Packet size distribution comparison between Internet links in 1998 and 2008
- Measuring the use of IPv4 address space utilization (2008)
- Remote physical device fingerprinting (2005)
- Spectroscopy of traceroute delays (2005)
While traffic classification techniques are improving in accuracy and efficiency, traffic classification remains an open problem in Internet research due to the continued proliferation of different Internet application behaviors, further aggravated by growing incentives to disguise some applications to avoid filtering or blocking. In the paper Issues and future directions in traffic classification we provide an overview of both the evolution of traffic classification techniques and constraints to their development, highlighting key differences across existing approaches and techniques. We propose different strategies and actions aimed at the persistently unsolved challenges in the field over the last decade.
In the paper Towards a Statistical Characterization of the Interdomain Traffic Matrix we examine spatial properties of the Interdomain Traffic Matrix (ITM) using passive flow data collected at a Europe-wide measurement infrastructure deployed at the GEANT backbone network. This network is the largest academic/research backbone in Europe connecting hundreds of universities and research organizations to the global Internet. Although we directly measure the ITM elements that are routed via the GEANT network, our goal is not to accurately measure each entry of ITM, but to infer its statistical properties from the observable elements. We believe that such properties can yield a better understanding of the nature of ITM and can be used to generate synthetic, but realistic ITMs for simulation and modeling purposes.
Researchers observe one-way (unsolicited) Internet traffic arriving at network telescopes and analyze it to study malicious activity on the Internet. Such traffic is now so pervasive and diverse that using it to identify new events requires examining more than raw packet, byte, or port counts. Seeking a better understanding of current and emerging one-way traffic behavior, we developed and implemented a freely available measurement and analysis tool iatmon that differentiates among categories of unsolicited traffic. The tool uses 14 source types and 10 inter-arrival-time groups to separate observed one-way traffic into a matrix of 140 type-and-group subsets. This taxonomy allows us to determine which source subsets were active during any hour, and to track subset behaviour over weeks or months as the characteristics of one-way traffic evolve. In the paper One-way Traffic Monitoring with iatmon we applied iatmon to analyze changes in one-way traffic collected by the UCSD network telescope over the first half of 2011.
We monitor optical networks using an optical splitter which diverts a small fraction of the light from the optical fiber to the monitor device. CAIDA researchers maintain a few realtime traffic monitors. The CoralReef report generator produces graphs and tables of derived statistics, including composition of the observed traffic by protocols, application, and hosts measured in packet, bytes and flow tuples.
CAIDA maintains Internet data collection monitors equipped with Endace network cards at two Equinix datacenters. The monitor in Chicago, IL is connected to a backbone link of a Tier1 ISP between Chicago, IL and Seattle, WA. The monitor in San Jose, CA, is connected to a backbone link of a Tier1 ISP between San Jose, CA and Los Angeles, CA. We attempt to collect a one-hour traffic trace at each of the monitors once a month. The anonymized packet headers are available to academic researchers and CAIDA members by request.
Aggregated statistical information is available for all collected traffic traces. Metadata for each trace include: the monitor that captured the trace, year and month of collection, start and stop time of trace (UTC), graphical display of the trace composition by protocol, application and country, numbers of IPv4 and IPv6 packets, transmission rate (in packets per second and in bits per second), link load (as fraction of the nominal maximum load), average packet size (in bytes), and a graph of the packet size distribution.
The UCSD Network Telescope is a passive traffic monitoring system built on a globally routed, but lightly utilized /8 network. It continously collects unsolicited one-way traffic arising from various security-related events, such as scanning of address space by hackers looking for vulnerable targets, backscatter from denial-of-service attacks using random spoofed source addresses, the automated spread of worms and viruses, and misconfigurations (e.g., mistyping an IP address). CAIDA personnel maintain and evolve the telescope instrumentation to support collection, curation, archiving, and analysis of the data, as well as sharing the data access with vetted security researchers.
Publications regarding traffic analysis can be found under the Measurement Methodology category, but not all listed papers pertain specifically to traffic analysis.
Many people assumed routing symmetry in traffic on Internet links, that is, they assumed that they would see both directions of a conversation flow across the same physical link. In fact, except at network edges, there exists in Internet traffic a routing asymmetry which, if ignored, would impair or invalidate the results of tools and models.
Internet traffic classification papers tend to try to work with whatever traffic samples a researcher can find, with no systematic integration of results. To fill this gap, we created a structured taxonomy of traffic classification papers published from 1994 to 2009. Our taxonomy summarizes the relevant attributes from these papers, including data sets and methods used, goals, and basic empirical findings.
Although it was still an accepted assumption that most Internet traffic was transmitted via the TCP protocol, we expected the rise of new streaming applications and new P2P protocols to increase the usage of UDP as a transport protocol. Performing an analysis on UDP usage in Internet traffic, we found that most UDP flows used random high ports and carried few packets with little content, consistent with its use as a signaling protocol for increasingly popular P2P applications.
Using seven traces with payload collected in Japan, Korea, and US, we conducted a thorough evaluation of three approaches to traffic classification: based on transport layer ports, host behavior, and flow features. The diverse geographic locations, link characteristics and application mix in these data allowed us to compare the approaches under a wide variety of conditions. In the paper Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices we analyzed the advantages and limitations of each approach, evaluated methods to overcome the limitations, and extracted insights and recommendations for both the research and practical applications of various traffic classification methods. Our software, classifiers, and data are available to researchers interested in validating or extending this work.
To elucidate long-term evolutionary changes in the characteristics of the Internet traffic, we analyzed and compared the distributions of packet sizes observed at various network links in 1998 and in 2008.
To visualize and measure the use of IPv4 Internet address space as observed in traffic from a few core (OC192) U.S. backbone samples, we created heatmaps that used intensity of color (heat) to show the use of addresses belonging to the same network.
In the paper Remote physical device fingerprinting we introduced the area of fingerprinting a physical device, as opposed to an operating system or class of devices, remotely, and without the fingerprinted device's known cooperation. We accomplished this goal by exploiting small, microscopic deviations in device hardware: clock skews. Our techniques reported consistent measurements when the measurer is thousands of miles, multiple hops, and tens of milliseconds away from the fingerprinted device, and when the fingerprinted device was connected to the Internet from different locations and via different access technologies. Further, one can apply our passive and semi-passive techniques when the fingerprinted device is behind a NAT or firewall, and also when the device's system time is maintained via NTP or SNTP. One can use our techniques to obtain information about whether two devices on the Internet, possibly shifted in time or IP addresses, are actually the same physical device. Example applications include: computer forensics; tracking, with some probability, a physical device as it connects to the Internet from different public access points; counting the number of devices behind a NAT even when the devices use constant or random IP IDs; remotely probing a block of addresses to determine if the addresses correspond to virtual hosts, e.g., as part of a virtual honeynet; and unanonymizing anonymized network traces.
In the paper Spectroscopy of Traceroute Delays, we analyzed delays of traceroute probes, i.e. packets that elicited ICMP TimeExceeded messages, for a full range of probe sizes up to 9000 bytes as observed on unloaded high-end routers. Our motivation was to use traceroute RTTs for mapping of router and ISP Point-of-Presence nodes, including potentially gleaning information on equipment models, link technologies, capacities, latencies, and spatial positions. To our knowledge it was the first study examining the detailed statistics of ICMP response generation in a reliable testbed setting. We found that ICMP delays were not a linear function of packet size and that ICMP generation rate was not equal to the capacity of the interface on which probes were received. The primary causes of these violations appeared to be internal segmentation of packets into cells and limiting of ICMP packet rates and bit rates inside a router. Our findings suggested possibilities of developing new techniques for bandwidth estimation and router fingerprinting.