Table of Contents
- Traceroute Format
- Traceroute Annotations
- Hop Addresses (
hop-addrs
) - IP Links (
ip-links
) - IP Paths (
ip-paths
) - IP-RTT Distributions (
ip-rtts
) - AS Peering Links (
peering-links
) - Router-Level Graph (
router-graph
) - Caveat: Uniqueness in JSON Format
Traceroute Format
The traceroute file format consists of a sequence of newline-terminated
JSON objects, each representing a single traceroute path. This format is
based on the scamper
JSON traceroute format
but with changes to the top-level keys of each JSON object.
Sample
{ "stop_reason": "COMPLETED", "stop_data": 0, "timestamp": 1578141145, "timestamp_usec": 247633, "src_addr": "192.42.115.98", "dest_addr": "119.218.204.32", "hops": [ { "addr": "192.42.115.1", "probe_ttl": 1, "probe_id": 1, "probe_size": 44, "tx": { "sec": 1578141145, "usec": 247649 }, "rtt": 0.299, "reply_ttl": 255, "reply_tos": 0, "reply_size": 76, "reply_ipid": 0, "icmp_type": 11, "icmp_code": 0, "icmp_q_ttl": 1, "icmp_q_ipl": 44, "icmp_q_tos": 0 }, { "addr": "145.145.17.88", "probe_ttl": 2, "probe_id": 1, "...": "... more keys ..." }, { "...": "... more intermediate hops ..." }, { "addr": "119.218.204.32", "probe_ttl": 19, "probe_id": 1, "...": "... more keys ..." } ], "vp_name": "ams-nl", "dest_rtt_ms": 272.53, "path_len": 19, "hop_addrs": [ "64.71.165.22", "80.249.208.50", "192.42.115.1", "145.145.17.88", "... more addrs ..." ] }
Description
Each traceroute object contains the following keys:
-
vp_name
The name/ID of the vantage point conducting the traceroute.
"vp_name": "ams-nl"
-
src_addr and dest_addr
The source and destination addresses of the traceroute. You should not use the source address as an identifier because (1) the source address may be a non-unique private address (RFC1918) and (2) there is no guarantee that a vantage point will maintain the same address (public or private) over time.
"src_addr": "192.42.115.98", "dest_addr": "119.218.204.32"
-
timestamp and timestamp_usec
Timestamp at the start of the traceroute. The
timestamp
value gives the traditional Unix timestamp in seconds andtimestamp_usec
gives subsecond precision in microseconds; that is, the full timestamp is equal totimestamp + timestamp_usec / 1_000_000
.
"timestamp": 1578141145, "timestamp_usec": 247633
-
stop_reason and stop_data
The reason why the traceroute measurement stopped and, where applicable, additional data about the cause.
"stop_reason": "COMPLETED", "stop_data": 0
Common stop_reason
values are
-
COMPLETED
for traceroute finished successfully by getting a response from the destination, -
GAPLIMIT
for traceroute stopped after 5 successive non-responding hops despite multiple attempts per non-responding hop (this often means that a firewall is blocking probes or that there’s no host at the destination address), -
UNREACH
for traceroute stopped on ICMP destination unreachable, and -
LOOP
for traceroute stopped due to the detection of a loop; that is, an address seen earlier in the traceroute path is seen again, except when repeated in adjacent hops (with some caveats).
Please see scamper
documentation for more details.
-
dest_rtt_ms
The RTT to the destination in milliseconds, if the destination responded.
"dest_rtt_ms": 272.53
-
path_len
The highest TTL value at which the traceroute received a response. For example, a
path_len
of 1 means there was a response at TTL 1, the first hop from the source. A givenpath_len
value doesn’t necessarily mean that there were responses at all intervening hops.
"path_len": 19
-
hop_addrs
An array of the unique responding addresses (including the destination if it responded) in the traceroute path. The addresses are not in any guaranteed or meaningful order.
"hop_addrs": ["64.71.165.22", "..."]
-
hops
An array of the responding hops, including the destination, if it responded. No information is included in this array for hops without responses. Each hop object has the same definition as in the
scamper
JSON traceroute format with the possible addition of a few FANTAIL-specific keys (see [Traceroute annotations]) depending on FANTAIL query options.The following sample shows the most commonly used keys:
"hops": [ { "addr": "192.42.115.1", "probe_ttl": 1, "probe_id": 1, "tx": { "sec": 1578141145, "usec": 247649 }, "rtt": 0.299, ... # more keys }, ... # more objects ]
- addr: the address of the responding hop,
- probe_ttl: the hop number,
- probe_id: the per-hop 1-based index of the probe packet that elicited a response, with the index increasing with each attempt,
- tx: the timestamp of the probe packet that elicited a response, and
- rtt: the RTT of the response in milliseconds (e.g., 0.299 is less than 1 ms).
The hops
array is ordered first by probe_ttl
and then by probe_id
at the same probe_ttl
.
Traceroute Annotations
FANTAIL can annotate hop addresses using several CAIDA datasets, as follows:
- aliases from ITDK MIDAR aliases,
- ASes from ITDK bdrmapIT,
- hostnames from the IPv4 Routed /24 DNS Names Dataset, and
- IXPs from the CAIDA Internet eXchange Points (IXPs) Dataset.
An annotation appears as an extra FANTAIL-specific key of a hop object in
the hops
array.
FANTAIL uses the start timestamp of the traceroute to match up a hop address to the closest relevant record of the annotation source. For annotation datasets, such as the ITDK, provided as periodic releases, FANTAIL first finds the closest dataset release to the traceroute timestamp and then finds the record matching the hop address within the release. For datasets collected continuously, such as the DNS names dataset, FANTAIL finds the individual matching record with the closest timestamp to the traceroute timestamp; for example, the DNS names dataset may have many lookups over time for a given hop address, but FANTAIL will find the closest hostname result for a given hop address collected at a given traceroute timestmap.
All annotation objects have an id
field that provides details, where
available, about the matching dataset, timestamp, and record ID so that
users may, for example, determine how close in time the matching annotation
is to the traceroute (if it matters that the annotation be timely) and
consult the annotation source for additional details (where available).
Samples
{ "hops": [ { "addr": "...", "...": "... more keys ...", "alias_set": { "id": "itdk-20190111-midar@1547164800:38885", "addrs": [ "145.145.17.89", "145.145.17.91", "192.42.115.1" ] }, "asnum": { "id": "itdk-201904@1554076800", "value": "1103" }, "hostnames": { "id": "dns-names@1584370551", "values": [ "forbidden-fruit.prutsnet.nl" ] }, "ixp": { "id": "ixps-201802@1517443200:235", "prefix": "206.126.236.0/22", "name": "Equinix IBX Ashburn" } } ] }
Aliases
The alias_set
annotation provides a list of the IP addresses belonging to
the ITDK MIDAR alias set containing the hop address. The id
value has
the following form
<dataset_name>@<timestamp>:<set_id>
where dataset_name is the name of the ITDK release, timestamp is the timestamp of the dataset (alias sets don’t have individual timestamps), and set_id is the ID of the alias set internal to the dataset release (and thus not meaningful across releases).
"alias_set": {
"id": "itdk-20190111-midar@1547164800:38885",
"addrs": [
"145.145.17.89",
"145.145.17.91",
"192.42.115.1"
]
}
The addrs
value is an array of the addresses in the alias set, including
the hop address.
ASes
The asnum
annotation gives the AS number for the AS owning the router
with the given hop address, according to the ITDK bdrmapIT dataset. The
id
value has the following form
<dataset_name>@<timestamp>
where dataset_name is the name of the ITDK release and timestamp is the timestamp of the dataset (records don’t have individual timestamps).
"asnum": {
"id": "itdk-201904@1554076800",
"value": "1103"
}
The value
key gives the AS number as a string because the value may be a
32-bit AS number in dotted notation.
Hostnames
The hostnames
annotation gives the hostname(s) of the hop address
according to the IPv4 Routed /24 DNS Names Dataset. The id
value has the
following form
dns-names@<trace_timestamp>
where trace_timestamp is the timestamp of the traceroute rather than of the hostname records because the latter timestamps aren’t available to FANTAIL.
"hostnames": {
"id": "dns-names@1584370551",
"values": [
"forbidden-fruit.prutsnet.nl"
]
}
The values
array contains one or more hostnames.
IXPs
The ixp
annotation gives information about the Internet exchange point
that announces the hop address, according to the CAIDA Internet eXchange
Points (IXPs) Dataset. The id
value has the following form
<dataset_name>@<timestamp>:<ix_id>
where dataset_name is the name of the dataset release, timestamp is the timestamp of the dataset (individual records don’t have timestamps), and ix_id is the ID of IXP record internal to the dataset release (and thus not meaningful across releases).
"ixp": {
"id": "ixps-201802@1517443200:235",
"prefix": "206.126.236.0/22",
"name": "Equinix IBX Ashburn"
}
The prefix
value is the prefix containing the hop address, and name
gives the name of the IXP.
Hop Addresses (hop-addrs
)
The hop-addrs
data transformation extracts unique IP addresses, including
the destination if it responded, from traceroute paths. The output doesn’t
include the address of the vantage point.
The text output has one address, in dotted-decimal format, per line. There is no JSON output.
IP Links (ip-links
)
The ip-links
data transformation extracts unique IP links, direct and
indirect, from traceroute paths. A direct IP link is a link between two
adjacent responding hops, including between the last intermediate hop and
the destination. An indirect IP link is a link between two responding
hops (or hop and destination) separated by one or more non-responding hops.
Text Format
Text output contains one link per line in the following format:
<link> <count>
where <link> is
- A=B – for adjacent address A and B, and
- A-n-B – for addresses A and B separated by n nonresponding hops;
and <count> is the number of traceroute paths where the link appeared.
If a trace has a responding destination, then the destination address will appear prefixed with “D” in a link.
Sample output:
101.95.120.149=101.95.95.69 1
112.174.172.6=D175.242.126.233 5
112.174.83.117-2-112.189.127.98 7
123.56.34.25-3-D182.92.208.122 4
The first link is a direct link between two intermediate hops. The second is a direct link between a hop and the destination. The third is an indirect link between two hops separated by 2 non-responding hops. The fourth is an indirect link between a hop and the destination separated by 3 non-responding hops.
JSON Format
There is a straightforward mapping between the text and JSON IP-links formats, as illustrated in the following example with the same IP link shown in both formats:
123.56.34.25-3-D182.92.208.122 4
{ "count": 4, "gap": 3, "src": { "addr": "123.56.34.25", "is_dest": false, "annotations": {} }, "dest": { "addr": "182.92.208.122", "is_dest": true, "annotations": {} } }
The gap
key gives the number of non-responding hops separating the source
(src
) and destination (dest
) addresses of the IP link. So if gap
is
zero, then the link is direct.
The is_dest
key indicates whether the given link endpoint (src
/dest
)
is a responding destination (only the dest
endpoint can have a true
is_dest
, but we include the key in src
for completeness and ease of
use).
The annotations
key contains any annotations that a given hop address had
in the traceroute path(s) that produced the IP link (see [Traceroute
annotations]); for example:
{ "src": { "annotations": { "alias_set": { "id": "itdk-20200109-midar@1578528000:43504", "addrs": [ "1.255.72.227", "218.145.43.10" ] }, "asnum": { "id": "itdk-201904@1554076800", "value": "111" } }, "...": "..." }, "...": "..." }
See Caveat: Uniqueness in JSON Format.
IP Paths (ip-paths
)
The ip-paths
data transformation extracts unique IP paths from traceroute
paths and outputs them in a highly condensed text format.
Text output contains one path per line in the following format:
<path> <count>
where <path> is a sequence of hops separated by “=” or “-n-”, in
the same way as in the [IP Links (ip-links
)] text format, and
<count> is the number of source traceroute paths where the unique IP
path appeared. The <path> does not contain the address of the
vantage point.
If a trace has a responding destination, then the destination address will appear prefixed with “D” in the path. If more than one address responded at a hop, then all hop addresses are listed separated with commas in the output.
Sample output (normally a single line that we’ve wrapped for convenient display):
129.122.31.254=129.122.0.5=41.189.172.33,196.201.62.220=
196.201.62.221=212.187.195.161-2-4.68.72.254=D109.27.101.45 1
represents the following path
129.122.31.254
129.122.0.5
41.189.172.33,196.201.62.220 (two addresses at this hop)
196.201.62.221
212.187.195.161
*
*
4.68.72.254
109.27.101.45 (responding destination)
where “*” indicates a non-responding hop.
IP-RTT Distributions (ip-rtts
)
The ip-rtts
data transformation calculates the min, max, average,
standard deviation, and percentiles (25th, 50th, 75th, and 95th) of RTTs
for each hop or responding destination per vantage point.
Text output contains one set of statistics per pair of (vantage point, IP address) per line in the following format:
<vp=addr> <count> <min> <max> <avg> <stddev> <25th> <50th> <75th> <95th>
where vp is the name/ID of the vantage point, addr is the hop or responding destination address, <count> is the number of RTT samples for the given <vp=addr>, and the remaining fields are the RTT statistics. If <count> is 1, then we set <stddev> and all percentile values to 0.0, since these computations require at least two values.
Sample output (normally a single line that we’ve wrapped for convenient display):
ams-nl=145.145.17.88 500 0.216 82.84 3.096 10.553 0.307 0.374 0.501 27.143
AS Peering Links (peering-links
)
The peering-links
data transformation examines IP links with AS number
annotations (see [Traceroute Annotations]), as produced by the ip-links
transformation, and extracts unique AS peering links, which are IP links
between two routers with different AS ownership.
Because peering links are IP links, the peering-links JSON output format
(the only supported format) is a superset of the IP-links format (see [IP
Links (ip-links
)]) and includes two additional keys, type
and
middle_hop
.
The type
key indicates the type of the peering link and has the following
values:
-
“direct_peering” is a peering link between adjacent hops,
-
“missing_middle” is a potential peering link between two hops separated by one or more non-responding hops (specified by
gap
), and -
“unidentified_middle” is a potential peering link between two hops separated by a single responding hop lacking an AS annotation (and thus no determination could be made).
In the case of “unidentified_middle”, the peering-link object will also
have the middle_hop
key, which has the same structure as src
and
dest
and describes the hop separating src
and dest
.
The above type
values don’t exhaustively cover all possible combinations
of IP links over which a peering link could conceivably be inferred; for
example, two AS-annotated hops separated by both non-responding hops and
non-AS annotated hops. If such cases are important to you, then you should
perform additional processing and inferences yourself over the input
IP-links data.
We never try to infer a peering link from an IP link involving a responding
destination because such a link doesn’t necessarily represent a physical
link between two routers. Hence, the value of is_dest
in src
, dest
,
and missing_midle
will always be false in peering-link objects.
The value of gap
will be non-zero for “missing_middle” and zero otherwise
(the hop of the “unidentified_middle” type isn’t a non-responding hop).
Sample output:
{ "count": 1, "type": "unidentified_middle", "src": { "addr": "145.145.17.88", "is_dest": false, "annotations": { "asnum": { "id": "itdk-201904@1554076800", "value": "1103" }, "hostnames": { "id": "dns-names@1493979030", "values": [ "ae5.2216.jnr01.asd002a.surf.net" ] } } }, "gap": 0, "middle_hop": { "addr": "145.145.166.86", "is_dest": false, "annotations": { "hostnames": { "id": "dns-names@1493979030", "values": [ "google-router2.peering.surf.net" ] } } }, "dest": { "addr": "108.170.241.204", "is_dest": false, "annotations": { "asnum": { "id": "itdk-201904@1554076800", "value": "15169" } } } }
See Caveat: Uniqueness in JSON Format.
Router-Level Graph (router-graph
)
The router-graph
data transformation examines IP links with alias set
annotations (see [Traceroute Annotations]), as produced by the ip-links
transformation, and extracts unique router links, which represent links
between two alias sets (rather than between two interface addresses as in
IP links).
An IP link is converted to a router link by replacing the src
and dest
endpoint IP addresses with their corresponding alias set ID (that is, the
id
value of the alias_set
annotation), which acts as the router ID. If
an IP link endpoint doesn’t have an alias set annotation, then the endpoint
IP address itself serves as the router ID and represents a router with just
one (known) interface. Hence, every input IP link becomes a router link
(possibly a duplicate router link), and in the degenerate case of no input
IP links having alias set annotations, each unique IP link becomes its own
unique router link.
The JSON output format (the only supported format) has two types of
objects, as indicated by the type
value, “router” and
“router_link”.
The “router” type specifies a unique router with a router ID and any
annotations found in the input IP link, as follows (see [Traceroute
Annotations] for the structure of the annotations
value):
{"type": "router", "router_id": "R1", "annotations": {}} {"type": "router", "router_id": "R2", "annotations": {}}
The “router_link” type specifies a unique router link between two router IDs, as follows:
{"count": 1, "type": "router_link", "src": "R1", "dest": "R2"}
All “router” objects will occur before “router_link” objects in the output.
Sample output (note: not all “router” objects shown):
{ "type": "router", "router_id": "1.255.76.126", "annotations": {} } { "type": "router", "router_id": "itdk-20200109-midar@1578528000:43504", "annotations": { "alias_set": { "id": "itdk-20200109-midar@1578528000:43504", "addrs": [ "1.255.72.227", "218.145.43.10" ] } } } { "count": 2, "type": "router_link", "src": "1.255.76.126", "dest": "itdk-20200109-midar@1578528000:1286" } { "count": 5, "type": "router_link", "src": "itdk-20200109-midar@1578528000:2354", "dest": "itdk-20200109-midar@1578528000:1625" }
See Caveat: Uniqueness in JSON Format.
Caveat: Uniqueness in JSON Format
The inclusion of annotations from the source traceroute paths is the key benefit of the JSON format over the text format, but annotations complicate the definition of uniqueness for JSON data objects, and users must take this into account to avoid incorrect analyses. Of course, if you choose not to annotate the source traceroute paths, or if there are no available annotations for a given set of query results, then this caveat won’t apply.
Uniqueness for JSON objects is based on the entirety of the object; that is, two JSON objects are the same if, and only if, the objects have the same keys with the same values. Hence, although the JSON format technically contains unique objects, one per line, JSON objects can be unique by differing only in their annotations with other objects. More subtly, two annotations with the same value (e.g., hostname) but different IDs (derived from the specific annotation dataset matching the timestamp of the source traceroute path) will still cause the enclosing JSON objects to be considered distinct.
For these reasons, you should not assume in your analyses that distinct JSON objects represent distinct IP links. You must postprocess the entirety of the file and extract the IP links in whatever way makes the most sense for your particular analyses, taking into account annotation values.