<?xml version="1.0" standalone="no"?>
                    <!DOCTYPE div SYSTEM "/www/backend/www-xml-443/dtd/caidaML.dtd">
                    <!-- do NOT ERASE the DOCTYPE declaration! --><div>


<tr bgcolor="#f4f4f4">
  <td>
<font face="helvetica,arial" size="2">
<b>URL:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
<a href="http://conferences.sigcomm.org/sigcomm/2005/paper-HafSen.pdf">http://conferences.sigcomm.org/sigcomm/2005/paper-HafSen.pdf</a>
</font>
  </td>
</tr>


<tr bgcolor="#e9e9e9">
  <td>
<font face="helvetica,arial" size="2">
<b>Entry Dates:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
2009-02-11


</font>
  </td>
</tr>


<tr bgcolor="#f4f4f4">
  <td>
<font face="helvetica,arial" size="2">
<b>Abstract:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.
In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection. 


</font>
  </td>
</tr>


<tr bgcolor="#e9e9e9">
  <td>
<font face="helvetica,arial" size="2">
<b>Results:</b>
</font>
</td>
  <td>
<font face="helvetica,arial" size="2">
<ul>
<li>
datasets: 100Gbyte of packet traces on a high speed access network serving more than 500 residential customers; the training data was collected in August 2004, the test data was collected in August 2004 and March 2005;
</li>
<li>
use machine learning algorithms: Naive Bayes, AdaBoost, Maximum Entropy;
</li>
<li>
explore the feasiblity of automatically identifying application signautes; the resluting signatures remain highly accurate t otraffic variations on the time of serveral months;
</li>
</ul>


</font>
  </td>
</tr>
</div>

