Bibliography Details

A. Moore and K Papagiannaki, "Toward the Accurate Identification of Network Applications", in Passive and Active Measurement Conference (PAM), Mar 2001.

Toward the Accurate Identification of Network Applications
Authors:	A. Moore K Papagiannaki
Published:	Passive and Active Measurement Conference (PAM), 2001
URL:	http://www.pamconf.org/2005/PDF/34310042.pdf
Entry Date:	2009-02-06
Abstract:	Well-known port numbers can no longer be used to reliably identify network applications. There is a variety of new Internet applications that either do not use well-known port numbers or use other protocols, such as HTTP, as wrappers in order to go through firewalls without being blocked. One consequence of this is that a simple inspection of the port numbers used by flows may lead to the inaccurate clas- sification of network traffic. In this work, we look at these inaccuracies in detail. Using a full payload packet trace collected from an Internet site we attempt to identify the types of errors that may result from port- based classification and quantify them for the specific trace under study. To address this question we devise a classification methodology that relies on the full packet payload. We describe the building blocks of this methodology and elaborate on the complications that arise in that context. A classification technique approaching 100% accuracy proves to be a labor-intensive process that needs to test flow-characteristics against multiple classification criteria in order to gain sufficient confidence in the nature of the causal application. Nevertheless, the benefits gained from a content-based classification approach are evident. We are capable of accurately classifying what would be otherwise classified as unknown as well as identifying traffic flows that could otherwise be classified in- correctly. Our work opens up multiple research issues that we intend to address in future work.
Results:	a full payload packet trace (Genome Campus, about 1000 research, administrators and technical staffs); a full-duplex Giganical staff; a full 24 hour, week-day period, both link directions; Total Packets: 573,429,697; Total MBytes: 268,543. a classification methodology that relies on the full packet payload; with nine distinct identification methods: port-based, packet header, single packet signature, single packet protocol, signature on the first Kbyte, selected flows protocol, all flow protocol, host history. content-based operate on traffic flows: 1)aggregate packets into flows according to their 5-tuple; 2)iteratively tests flow characteristics against different criteria until sufficient certainty has been gained as to the identity of the application.