Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis > publications : bib : networking : entries : erman06internet.xml
Bibliography Details
J. Erman, A. Mahanti, and M. Arlitt, "Internet Traffic Identification using Machine Learning", in IEEE GLOBECOM 2006, Nov 2006.
Internet Traffic Identification using Machine Learning
Authors: J. Erman
A. Mahanti
M. Arlitt
Published: IEEE GLOBECOM, 2006
Entry Dates: 2009-02-13
Abstract: We apply an unsupervised machine learning approach for Internet traffic identification and compare the results with that of a previously applied supervised machine learning approach. Our unsupervised approach uses an Expectation Maximization (EM) based clustering algorithm and the supervised approach uses the Naive Bayes classifier. We find the unsupervised clustering technique has an accuracy up to 91% and outperform the supervised technique by up to 9%. We also find that the unsupervised technique can be used to discover traffic from previously unknown applications and has the potential to become an excellent tool for exploring Internet traffic.
  • datasets: auckland-iv(20010314-20010319) and auckland-vi(20010608-20010609), part of each trace;
  • present an unsupervised machine learning approach (AutoClass) for Internet traffic classification;
  • AutoClass can achieve an average accuracy greater than 90%, outperforms Naive Bayes by up to 9%;
  Last Modified: Tue Oct-13-2020 22:21:42 UTC
  Page URL: