Bibliography Details

L. Bernaille, R. Teixeira, and K. Salamatian, "Early Application Identification", in ACM CoNEXT 2006, Jul 2006.

Early Application Identification
Authors:	L. Bernaille R. Teixeira K. Salamatian
Published:	ACM CoNEXT, 2006
URL:	https://dl.acm.org/doi/10.1145/1368436.1368445
Entry Dates:	2009-02-09
Abstract:	The automatic detection of applications associated with network traffic is an essential step for network security and traffic engineering. Unfortunately, simple port-based classification methods are not always efficient and systematic analysis of packet payloads is too slow. Most recent research proposals use flow statistics to classify traffic flows once they are finished, which limit their applicability for online classification. In this paper, we evaluate the feasibility of application identification at the beginning of a TCP connection. Based on an analysis of packet traces collected on eight different networks, we find that it is possible to distinguish the behavior of an application from the observation of the size and the direction of the first few packets of the TCP connection. We apply three techniques to cluster TCP connections: K-Means, Gaussian Mixture Model and spectral clustering. Resulting clusters are used together with assignment and labeling heuristics to design classifiers. We evaluate these classifiers on different packet traces. Our results show that the first four packets of a TCP connection are sufficient to classify known applications with an accuracy over 90% and to identify new applications as unknown with a probability of 60%.
Results:	datasets: 1) four payload traces: the first three traces were collected during 2004 and 2005 at the university of paris 6 network using an optical splitter and a DAG card;the fourth trace was captured at the edge on an enterprise network; 2) Packet-header traces: only capture the first 64 bytes of every packet, wich contain IP and layer-4 headers. One was captured in 2003 on a 1Gbit/s link between a large college and the Dutch academic and research network. The other was collected in 2004 on a 1Gbit/s ADSL access network. Also study a trace from a wireless network from the Crawdad repository. Finally, a trace captured at the edge of the network of the University of Massachusetts Amherst campus; three techniques to cluster TCP connections: K-Means, Gaussian Mixture Model and spectral Clustering; the first four packets of a TCP connection are sufficient to classify known applications with an accuracy over 90% and to identify new applications as unknown with a probability of 60%;