V. Seeberg and S. Petrovic, "A New Classification Scheme for Anonymization of Real Data Used in IDS Benchmarking", in ARES, 2007.
|A New Classification Scheme for Anonymization of Real Data Used in IDS Benchmarking
|Artificially generated network traffic sources for IDS benchmarking have been harshly criticized because of their inability to realistically simulate networks. Benchmarking data sets based on real data have many advantages over the artificially generated ones, but due to privacy concerns and legal restrictions such original data sets cannot be widely distributed. Their anonymization ("sanitization") is necessary in order to be used in IDS testing. In this paper, we define a new variable strength filter-in methodology of anonymization of IDS benchmarking data sets. It is based on an original classification criterion used to categorize informational objects in network data according to the action to be performed on them in the anonymization process. The action depends on the possibility of these objects to disclose sensitive information. We analyze the possibility of disclosing sensitive information by various http header fields. We also study influence of application of the new anonymization methodology on percentage of attacks detectable by an IDS. Experimental results show that a great number of the attacks present in the input data without anonymization are still detectable by the tested IDS even after the application of the strongest anonymization scheme defined by our methodology. Although the new anonymization method focuses on application data, it could also be used in the link, network, and transport protocol contexts