The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

Bibliography Details

J. Erman, A. Mahanti, and M. Arlitt, "Internet Traffic Identification using Machine Learning", in IEEE GLOBECOM 2006, Nov 2006.

Internet Traffic Identification using Machine Learning
Authors: J. Erman
A. Mahanti
M. Arlitt
Published: IEEE GLOBECOM, 2006
Entry Dates: 2009-02-13
Abstract: We apply an unsupervised machine learning approach for Internet traffic identification and compare the results with that of a previously applied supervised machine learning approach. Our unsupervised approach uses an Expectation Maximization (EM) based clustering algorithm and the supervised approach uses the Naive Bayes classifier. We find the unsupervised clustering technique has an accuracy up to 91% and outperform the supervised technique by up to 9%. We also find that the unsupervised technique can be used to discover traffic from previously unknown applications and has the potential to become an excellent tool for exploring Internet traffic.
  • datasets: auckland-iv(20010314-20010319) and auckland-vi(20010608-20010609), part of each trace;
  • present an unsupervised machine learning approach (AutoClass) for Internet traffic classification;
  • AutoClass can achieve an average accuracy greater than 90%, outperforms Naive Bayes by up to 9%;