Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > tools : taxonomy : anonymization.xml
Anonymization Tools Taxonomy
co-sponsored by:
Cisco Systems
This section provides a listing of tools for performing anonymization of Internet log files and trace data. We provide a summary of each tool along with pointers to more detailed information. Review comments are also included when available.

This listing has not been actively maintained since 2004. These pages are made available for historical purposes.

|   Index    Anonymization    Topology    Workload    Performance    Routing    Multicast    Measurement Infrastructures   |

Researchers studying the Internet face a significant challenge in looking for traffic traces: the fundamental conflict between end-user privacy and the research utility of data. When data is heavily anonymized, important attributes of data that reveal the structure and function of networks are obscured. If data is not heavily anonymized, details about end users, including geographic and network location, organization, names, passwords, and other personal information could be subject to unauthorized access.

We provide the following list of anonymization tools to help those searching for such tools to find them and also better understand and track availability of such methods.

Terms

hiding
Value is replaced with a constant value (typically 0) of the same size. Sometimes called "black marker".
hash
A hash function maps each value to a new (not necessarily unique) value.
permutation
Maps each original value to a unique new value.
prefix-preserving
Any two values that had the same n-bit prefix before anonymization will still have the same n-bit prefix as each other after anonymization. (Would be more accurately called "prefix-relationship-preserving", because the actual prefix values are not preserved.)
shift
Adds a fixed offset to each value.
enumeration
Map each original value to a new value such that their ordering is preserved.
partitioning
Possible values are partitioned into meaningful sets; actual values are replaced with a fixed value from the same set. E.g., TCP port numbers 0 to 1023 are replaced with 0, and 1024 to 65535 replaced with 65535.
updated
Checksums are recalculated to reflect changes made to other fields.
truncation
Field is shortened, losing data at the end.

Tools

ToolInputFieldsMethodsNotes
AnonToolNetflow (v5 and v9) traces in tcpdump format or on live interfacesIP addresspartial hiding, random permutation, prefix-preserving permutation, hashing, etc.Built on AAPI.
most NetFlow fieldsrandom permutation, hash, hiding, prefix-preserving permutation, etc.
NetFlow checksumsupdated, etc.
CANINECisco NetFlow (v5 and v7), NFDUMP, CiscoNCSA, ArgusNCSAIPv4 addresspartial hiding, random permutation, prefix-preserving permutationGUI. Predecessor to FLAIM.
timestamppartial hiding, shift, enumeration
port numberhigh/low partitioning, hiding
protocol number, byte counthiding
CoralReefnetwork interfaces; DAG, FORE, and POINT capture cards; trace files in CoralReef (.crl), tcpdump/pcap, DAG (legacy and ERF), or TSH (.tsh) formatsIPv4 address (including those in ICMP headers)partial hiding, cryptographic prefix-preserving permutation (using Crypto-PAn)The CoralReef suite provides many other analysis tools, and C and Perl APIs; all allow anonymization.
IPv6 headertruncation
IPv4, TCP, and UDP checksumsupdated
headers below IP layerdiscard
payload of any layer 1-4truncation
FLAIMtcpdump/pcap, netfilter (iptable) syslogs, NFDUMP, Linux process accounting logs, etc.IPv4/IPv6 addressespartial hiding, random permutation, cryptographic prefix-preserving permutation, hash, etc.Scriptable command line tool. Extensible via dynamically loadable modules. Successor to CANINE.
Ethernet addresses, other numberspartial hiding, random permutation, hash, etc.
various other fields in Ethernet, IP, TCP, UDP, ICMPpartial hiding, partitioning (for numbers), shift (for timestamps), enumeration (for timestamps), hash, truncation, etc.
ipsumdumptcpdump/pcap, DAG (legacy and ERF), FR, FR+, TSH, ipsumdump (text), NetFlow summary (text), linux network device IPv4 address (outer header only)prefix-preserving permutation, class-preserving permutation (based on tcpdpriv)Outputs only tcpdump (pcap) format or ipsumdump text format
checksumupdated
most Ethernet, IP, TCP, UDP, ICMP fields; payloaddiscarded
NFDUMPNetFlow (v5, v7, v9) in NFDUMP format or on live interfacesIPv4 addresscryptographic prefix-preserving permutation (using Crypto-PAn)
SCRUB-tcpdumptcpdump/pcap, network interfaceIPv4 addresspartial hiding, random permutation, subnet/host permutation
TCP/UDP ports, TCP sequence number, TCP flags, TTL, packet length, transport protocolhiding, random permutation, partitioning
packet timestamppartial hiding, enumeration, shift, random permutation
fragmentation flaghiding, random permutation
payloadpartial hiding
tcpanontcpdump/pcapfields in application layers: HTTP, SMTP, POP3, IMAP4, FTP, FTP-datahiding
tcpdprivtcpdump/pcap, network interfaceIPv4 address (including those in nested headers)permutation, prefix-preserving permutation, class-preserving permutation
TCP/UDP port numberspermutation
IP/TCP optionshidden
checksumupdated
payloadtruncation
tcpmkpubtcpdump/pcapIPv4 address (differentiable by external, internal, multicast, private, etc.), including those in ICMP headerscryptographic prefix-preserving permutation (using Crypto-PAn algorithm), subnet-preserving permutation, etc.IP address algorithm is particularly well suited for edge networks, which are especially vulnerable to signature attacks. Extensible via policy configuration files and C++ functions.
Ethernet addressvendor-preserving anonymization, etc.
checksumsupdated, hidden, etc.
many other fields in Ethernet, ARP, IP, ICMP, UDP, TCPhidden, etc.
payloadtruncation
TCPurifytcpdump/pcap, network interfaceIPv4 addresshiding, random permutation within specified networks
payloadtruncation

Libraries

LibraryLangInputFieldsMethodsNotes
AAPICnetwork interfaces, tcpdump/pcap, and Netflow (v5 and v9) traces in tcpdump format or on live interfacesIPv4 addresspartial hiding, random permutation, prefix-preserving permutation, hashing, etc.Users may write their own decoders for new protocols.
many other header fields in Ethernet, IPv4, TCP, UDP, NetFlow, HTTP, FTPrandom permutation, hash, hiding, prefix-preserving permutation, etc.
checksumsupdated
Crypto-PAnC++ IPv4 address IPv4 address cryptographic prefix-preserving permutationoriginal address can be recovered with a key
Lucent's extensions to Crypto-PAnC++ IPv4 address IPv4 address cryptographic prefix-preserving permutation, random permutationoutput contains random permutation; one key can be used to recover a prefix-preserving permutation; two keys can be used to recover original address
IP::AnonymousPerl IPv4 address IPv4 address cryptographic prefix-preserving permutationPerl port of Crypto-PAn

Related Tools

FPGA-based Packet Header Anonymization
http://www.cesnet.cz/doc/techzpravy/2006/anon/
tcprewrite
As part of the tcpreplay suite, this is primarily intented to rewrite a trace so it can be replayed on a different network.
Bit-Twist
http://bittwist.sourceforge.net/. Primarily intended for generating and rewriting packets for replay.
Netdude
http://netdude.sourceforge.net/. GUI packet editor.
Bro IDS
http://www.bro-ids.org/wiki/index.php/Version_1.2
Anonymizer
http://sourceforge.net/projects/anonymizer. Appears to be linux-only, incomplete, undocumented, and unmaintained.

  Last Modified: Wed Apr-16-2014 13:31:41 PDT
  Page URL: http://www.caida.org/tools/taxonomy/anonymization.xml