Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > tools : taxonomy : anonymization.xml
Anonymization Tools Taxonomy
co-sponsored by:
Cisco Systems
This section provides a listing of tools for performing anonymization of Internet log files and trace data. We provide a summary of each tool along with pointers to more detailed information. Review comments are also included when available.

This listing has not been actively maintained since 2004. These pages are made available for historical purposes.

|   Index    Anonymization    Topology    Workload    Performance    Routing    Multicast    Measurement Infrastructures   |

Researchers studying the Internet face a significant challenge in looking for traffic traces: the fundamental conflict between end-user privacy and the research utility of data. When data is heavily anonymized, important attributes of data that reveal the structure and function of networks are obscured. If data is not heavily anonymized, details about end users, including geographic and network location, organization, names, passwords, and other personal information could be subject to unauthorized access.

We provide the following list of anonymization tools to help those searching for such tools to find them and also better understand and track availability of such methods.

Terms

hiding
Value is replaced with a constant value (typically 0) of the same size. Sometimes called "black marker".
hash
A hash function maps each value to a new (not necessarily unique) value.
permutation
Maps each original value to a unique new value.
prefix-preserving
Any two values that had the same n-bit prefix before anonymization will still have the same n-bit prefix as each other after anonymization. (Would be more accurately called "prefix-relationship-preserving", because the actual prefix values are not preserved.)
shift
Adds a fixed offset to each value.
enumeration
Map each original value to a new value such that their ordering is preserved.
partitioning
Possible values are partitioned into meaningful sets; actual values are replaced with a fixed value from the same set. E.g., TCP port numbers 0 to 1023 are replaced with 0, and 1024 to 65535 replaced with 65535.
updated
Checksums are recalculated to reflect changes made to other fields.
truncation
Field is shortened, losing data at the end.

Tools

Tool Input Fields Methods Notes
AnonTool Netflow (v5 and v9) traces in tcpdump format or on live interfaces IP address partial hiding, random permutation, prefix-preserving permutation, hashing, etc. Built on AAPI.
most NetFlow fields random permutation, hash, hiding, prefix-preserving permutation, etc.
NetFlow checksums updated, etc.
CANINE Cisco NetFlow (v5 and v7), NFDUMP, CiscoNCSA, ArgusNCSA IPv4 address partial hiding, random permutation, prefix-preserving permutation GUI. Predecessor to FLAIM.
timestamp partial hiding, shift, enumeration
port number high/low partitioning, hiding
protocol number, byte count hiding
CoralReef network interfaces; DAG, FORE, and POINT capture cards; trace files in CoralReef (.crl), tcpdump/pcap, DAG (legacy and ERF), or TSH (.tsh) formats IPv4 address (including those in ICMP headers) partial hiding, cryptographic prefix-preserving permutation (using Crypto-PAn) The CoralReef suite provides many other analysis tools, and C and Perl APIs; all allow anonymization.
IPv6 header truncation
IPv4, TCP, and UDP checksums updated
headers below IP layer discard
payload of any layer 1-4 truncation
FLAIM tcpdump/pcap, netfilter (iptable) syslogs, NFDUMP, Linux process accounting logs, etc. IPv4/IPv6 addresses partial hiding, random permutation, cryptographic prefix-preserving permutation, hash, etc. Scriptable command line tool. Extensible via dynamically loadable modules. Successor to CANINE.
Ethernet addresses, other numbers partial hiding, random permutation, hash, etc.
various other fields in Ethernet, IP, TCP, UDP, ICMP partial hiding, partitioning (for numbers), shift (for timestamps), enumeration (for timestamps), hash, truncation, etc.
ipsumdump tcpdump/pcap, DAG (legacy and ERF), FR, FR+, TSH, ipsumdump (text), NetFlow summary (text), linux network device IPv4 address (outer header only) prefix-preserving permutation, class-preserving permutation (based on tcpdpriv) Outputs only tcpdump (pcap) format or ipsumdump text format
checksum updated
most Ethernet, IP, TCP, UDP, ICMP fields; payload discarded
NFDUMP NetFlow (v5, v7, v9) in NFDUMP format or on live interfaces IPv4 address cryptographic prefix-preserving permutation (using Crypto-PAn)
SCRUB-tcpdump tcpdump/pcap, network interface IPv4 address partial hiding, random permutation, subnet/host permutation
TCP/UDP ports, TCP sequence number, TCP flags, TTL, packet length, transport protocol hiding, random permutation, partitioning
packet timestamp partial hiding, enumeration, shift, random permutation
fragmentation flag hiding, random permutation
payload partial hiding
tcpanon tcpdump/pcap fields in application layers: HTTP, SMTP, POP3, IMAP4, FTP, FTP-data hiding
tcpdpriv tcpdump/pcap, network interface IPv4 address (including those in nested headers) permutation, prefix-preserving permutation, class-preserving permutation
TCP/UDP port numbers permutation
IP/TCP options hidden
checksum updated
payload truncation
tcpmkpub tcpdump/pcap IPv4 address (differentiable by external, internal, multicast, private, etc.), including those in ICMP headers cryptographic prefix-preserving permutation (using Crypto-PAn algorithm), subnet-preserving permutation, etc. IP address algorithm is particularly well suited for edge networks, which are especially vulnerable to signature attacks. Extensible via policy configuration files and C++ functions.
Ethernet address vendor-preserving anonymization, etc.
checksums updated, hidden, etc.
many other fields in Ethernet, ARP, IP, ICMP, UDP, TCP hidden, etc.
payload truncation
TCPurify tcpdump/pcap, network interface IPv4 address hiding, random permutation within specified networks
payload truncation

Libraries

Library Lang Input Fields Methods Notes
AAPI C network interfaces, tcpdump/pcap, and Netflow (v5 and v9) traces in tcpdump format or on live interfaces IPv4 address partial hiding, random permutation, prefix-preserving permutation, hashing, etc. Users may write their own decoders for new protocols.
many other header fields in Ethernet, IPv4, TCP, UDP, NetFlow, HTTP, FTP random permutation, hash, hiding, prefix-preserving permutation, etc.
checksums updated
Crypto-PAn C++ IPv4 address IPv4 address cryptographic prefix-preserving permutation original address can be recovered with a key
Lucent's extensions to Crypto-PAn C++ IPv4 address IPv4 address cryptographic prefix-preserving permutation, random permutation output contains random permutation; one key can be used to recover a prefix-preserving permutation; two keys can be used to recover original address
IP::Anonymous Perl IPv4 address IPv4 address cryptographic prefix-preserving permutation Perl port of Crypto-PAn

Related Tools

FPGA-based Packet Header Anonymization
http://www.cesnet.cz/doc/techzpravy/2006/anon/
tcprewrite
As part of the tcpreplay suite, this is primarily intented to rewrite a trace so it can be replayed on a different network.
Bit-Twist
http://bittwist.sourceforge.net/. Primarily intended for generating and rewriting packets for replay.
Netdude
http://netdude.sourceforge.net/. GUI packet editor.
Bro IDS
http://www.bro-ids.org/wiki/index.php/Version_1.2
Anonymizer
http://sourceforge.net/projects/anonymizer. Appears to be linux-only, incomplete, undocumented, and unmaintained.

  Last Modified: Wed Apr-16-2014 13:31:41 PDT
  Page URL: http://www.caida.org/tools/taxonomy/anonymization.xml