FANTAIL File Formats


Table of Contents


Traceroute Format

The traceroute file format consists of a sequence of newline-terminated JSON objects, each representing a single traceroute path. This format is based on the scamper JSON traceroute format but with changes to the top-level keys of each JSON object.

Sample

{
  "stop_reason": "COMPLETED",
  "stop_data": 0,
  "timestamp": 1578141145,
  "timestamp_usec": 247633,
  "src_addr": "192.42.115.98",
  "dest_addr": "119.218.204.32",
  "hops": [
    {
      "addr": "192.42.115.1",
      "probe_ttl": 1,
      "probe_id": 1,
      "probe_size": 44,
      "tx": {
        "sec": 1578141145,
        "usec": 247649
      },
      "rtt": 0.299,
      "reply_ttl": 255,
      "reply_tos": 0,
      "reply_size": 76,
      "reply_ipid": 0,
      "icmp_type": 11,
      "icmp_code": 0,
      "icmp_q_ttl": 1,
      "icmp_q_ipl": 44,
      "icmp_q_tos": 0
    },
    {
      "addr": "145.145.17.88",
      "probe_ttl": 2,
      "probe_id": 1,
       "...": "... more keys ..."
    },
    {
      "...": "... more intermediate hops ..."
    },
    {
      "addr": "119.218.204.32",
      "probe_ttl": 19,
      "probe_id": 1,
       "...": "... more keys ..."
    }
  ],
  "vp_name": "ams-nl",
  "dest_rtt_ms": 272.53,
  "path_len": 19,
  "hop_addrs": [
    "64.71.165.22",
    "80.249.208.50",
    "192.42.115.1",
    "145.145.17.88",
    "... more addrs ..."
  ]
}

Description

Each traceroute object contains the following keys:

  • vp_name

    The name/ID of the vantage point conducting the traceroute.

      "vp_name": "ams-nl"
  • src_addr and dest_addr

    The source and destination addresses of the traceroute. You should not use the source address as an identifier because (1) the source address may be a non-unique private address (RFC1918) and (2) there is no guarantee that a vantage point will maintain the same address (public or private) over time.

      "src_addr": "192.42.115.98",
      "dest_addr": "119.218.204.32"
  • timestamp and timestamp_usec

    Timestamp at the start of the traceroute. The timestamp value gives the traditional Unix timestamp in seconds and timestamp_usec gives subsecond precision in microseconds; that is, the full timestamp is equal to timestamp + timestamp_usec / 1_000_000.

      "timestamp": 1578141145,
      "timestamp_usec": 247633
  • stop_reason and stop_data

    The reason why the traceroute measurement stopped and, where applicable, additional data about the cause.

       "stop_reason": "COMPLETED",
       "stop_data": 0

Common stop_reason values are

  • COMPLETED for traceroute finished successfully by getting a response from the destination,

  • GAPLIMIT for traceroute stopped after 5 successive non-responding hops despite multiple attempts per non-responding hop (this often means that a firewall is blocking probes or that there’s no host at the destination address),

  • UNREACH for traceroute stopped on ICMP destination unreachable, and

  • LOOP for traceroute stopped due to the detection of a loop; that is, an address seen earlier in the traceroute path is seen again, except when repeated in adjacent hops (with some caveats).

Please see scamper documentation for more details.

  • dest_rtt_ms

    The RTT to the destination in milliseconds, if the destination responded.

       "dest_rtt_ms": 272.53
  • path_len

    The highest TTL value at which the traceroute received a response. For example, a path_len of 1 means there was a response at TTL 1, the first hop from the source. A given path_len value doesn’t necessarily mean that there were responses at all intervening hops.

        "path_len": 19
  • hop_addrs

    An array of the unique responding addresses (including the destination if it responded) in the traceroute path. The addresses are not in any guaranteed or meaningful order.

        "hop_addrs": ["64.71.165.22", "..."]
  • hops

    An array of the responding hops, including the destination, if it responded. No information is included in this array for hops without responses. Each hop object has the same definition as in the scamper JSON traceroute format with the possible addition of a few FANTAIL-specific keys (see [Traceroute annotations]) depending on FANTAIL query options.

    The following sample shows the most commonly used keys:

        "hops": [
          {
            "addr": "192.42.115.1",
            "probe_ttl": 1,
            "probe_id": 1,
            "tx": {
              "sec": 1578141145,
              "usec": 247649
            },
            "rtt": 0.299,
            ... # more keys
          },
          ... # more objects
        ]
  • addr: the address of the responding hop,
  • probe_ttl: the hop number,
  • probe_id: the per-hop 1-based index of the probe packet that elicited a response, with the index increasing with each attempt,
  • tx: the timestamp of the probe packet that elicited a response, and
  • rtt: the RTT of the response in milliseconds (e.g., 0.299 is less than 1 ms).

The hops array is ordered first by probe_ttl and then by probe_id at the same probe_ttl.


Traceroute Annotations

FANTAIL can annotate hop addresses using several CAIDA datasets, as follows:

  • aliases from ITDK MIDAR aliases,
  • ASes from ITDK bdrmapIT,
  • hostnames from the IPv4 Routed /24 DNS Names Dataset, and
  • IXPs from the CAIDA Internet eXchange Points (IXPs) Dataset.

An annotation appears as an extra FANTAIL-specific key of a hop object in the hops array.

FANTAIL uses the start timestamp of the traceroute to match up a hop address to the closest relevant record of the annotation source. For annotation datasets, such as the ITDK, provided as periodic releases, FANTAIL first finds the closest dataset release to the traceroute timestamp and then finds the record matching the hop address within the release. For datasets collected continuously, such as the DNS names dataset, FANTAIL finds the individual matching record with the closest timestamp to the traceroute timestamp; for example, the DNS names dataset may have many lookups over time for a given hop address, but FANTAIL will find the closest hostname result for a given hop address collected at a given traceroute timestmap.

All annotation objects have an id field that provides details, where available, about the matching dataset, timestamp, and record ID so that users may, for example, determine how close in time the matching annotation is to the traceroute (if it matters that the annotation be timely) and consult the annotation source for additional details (where available).

Samples

{
  "hops": [
    {
      "addr": "...",
      "...": "... more keys ...",
      "alias_set": {
        "id": "itdk-20190111-midar@1547164800:38885",
        "addrs": [
          "145.145.17.89",
          "145.145.17.91",
          "192.42.115.1"
        ]
      },
      "asnum": {
        "id": "itdk-201904@1554076800",
        "value": "1103"
      },
      "hostnames": {
        "id": "dns-names@1584370551",
        "values": [
          "forbidden-fruit.prutsnet.nl"
        ]
      },
      "ixp": {
        "id": "ixps-201802@1517443200:235",
        "prefix": "206.126.236.0/22",
        "name": "Equinix IBX Ashburn"
      }
    }
  ]
}

Aliases

The alias_set annotation provides a list of the IP addresses belonging to the ITDK MIDAR alias set containing the hop address. The id value has the following form

<dataset_name>@<timestamp>:<set_id>

where dataset_name is the name of the ITDK release, timestamp is the timestamp of the dataset (alias sets don’t have individual timestamps), and set_id is the ID of the alias set internal to the dataset release (and thus not meaningful across releases).

     "alias_set": {
       "id": "itdk-20190111-midar@1547164800:38885",
       "addrs": [
         "145.145.17.89",
         "145.145.17.91",
         "192.42.115.1"
       ]
     }

The addrs value is an array of the addresses in the alias set, including the hop address.

ASes

The asnum annotation gives the AS number for the AS owning the router with the given hop address, according to the ITDK bdrmapIT dataset. The id value has the following form

<dataset_name>@<timestamp>

where dataset_name is the name of the ITDK release and timestamp is the timestamp of the dataset (records don’t have individual timestamps).

    "asnum": {
      "id": "itdk-201904@1554076800",
      "value": "1103"
    }

The value key gives the AS number as a string because the value may be a 32-bit AS number in dotted notation.

Hostnames

The hostnames annotation gives the hostname(s) of the hop address according to the IPv4 Routed /24 DNS Names Dataset. The id value has the following form

dns-names@<trace_timestamp>

where trace_timestamp is the timestamp of the traceroute rather than of the hostname records because the latter timestamps aren’t available to FANTAIL.

    "hostnames": {
      "id": "dns-names@1584370551",
      "values": [
        "forbidden-fruit.prutsnet.nl"
      ]
    }

The values array contains one or more hostnames.

IXPs

The ixp annotation gives information about the Internet exchange point that announces the hop address, according to the CAIDA Internet eXchange Points (IXPs) Dataset. The id value has the following form

<dataset_name>@<timestamp>:<ix_id>

where dataset_name is the name of the dataset release, timestamp is the timestamp of the dataset (individual records don’t have timestamps), and ix_id is the ID of IXP record internal to the dataset release (and thus not meaningful across releases).

    "ixp": {
      "id": "ixps-201802@1517443200:235",
      "prefix": "206.126.236.0/22",
      "name": "Equinix IBX Ashburn"
    }

The prefix value is the prefix containing the hop address, and name gives the name of the IXP.


Hop Addresses (hop-addrs)

The hop-addrs data transformation extracts unique IP addresses, including the destination if it responded, from traceroute paths. The output doesn’t include the address of the vantage point.

The text output has one address, in dotted-decimal format, per line. There is no JSON output.


The ip-links data transformation extracts unique IP links, direct and indirect, from traceroute paths. A direct IP link is a link between two adjacent responding hops, including between the last intermediate hop and the destination. An indirect IP link is a link between two responding hops (or hop and destination) separated by one or more non-responding hops.

Text Format

Text output contains one link per line in the following format:

<link> <count>

where <link> is

  • A=B – for adjacent address A and B, and
  • A-n-B – for addresses A and B separated by n nonresponding hops;

and <count> is the number of traceroute paths where the link appeared.

If a trace has a responding destination, then the destination address will appear prefixed with “D” in a link.

Sample output:

    101.95.120.149=101.95.95.69 1
    112.174.172.6=D175.242.126.233 5
    112.174.83.117-2-112.189.127.98 7
    123.56.34.25-3-D182.92.208.122 4

The first link is a direct link between two intermediate hops. The second is a direct link between a hop and the destination. The third is an indirect link between two hops separated by 2 non-responding hops. The fourth is an indirect link between a hop and the destination separated by 3 non-responding hops.

JSON Format

There is a straightforward mapping between the text and JSON IP-links formats, as illustrated in the following example with the same IP link shown in both formats:

   123.56.34.25-3-D182.92.208.122 4
{
  "count": 4,
  "gap": 3,
  "src": {
    "addr": "123.56.34.25",
    "is_dest": false,
    "annotations": {}
  },
  "dest": {
    "addr": "182.92.208.122",
    "is_dest": true,
    "annotations": {}
  }
}

The gap key gives the number of non-responding hops separating the source (src) and destination (dest) addresses of the IP link. So if gap is zero, then the link is direct.

The is_dest key indicates whether the given link endpoint (src/dest) is a responding destination (only the dest endpoint can have a true is_dest, but we include the key in src for completeness and ease of use).

The annotations key contains any annotations that a given hop address had in the traceroute path(s) that produced the IP link (see [Traceroute annotations]); for example:

{
  "src": {
    "annotations": {
      "alias_set": {
        "id": "itdk-20200109-midar@1578528000:43504",
        "addrs": [
          "1.255.72.227",
          "218.145.43.10"
        ]
      },
      "asnum": {
        "id": "itdk-201904@1554076800",
        "value": "111"
      }
    },
    "...": "..."
  },
  "...": "..."
}

See Caveat: Uniqueness in JSON Format.


IP Paths (ip-paths)

The ip-paths data transformation extracts unique IP paths from traceroute paths and outputs them in a highly condensed text format.

Text output contains one path per line in the following format:

<path> <count>

where <path> is a sequence of hops separated by “=” or “-n-”, in the same way as in the [IP Links (ip-links)] text format, and <count> is the number of source traceroute paths where the unique IP path appeared. The <path> does not contain the address of the vantage point.

If a trace has a responding destination, then the destination address will appear prefixed with “D” in the path. If more than one address responded at a hop, then all hop addresses are listed separated with commas in the output.

Sample output (normally a single line that we’ve wrapped for convenient display):

        129.122.31.254=129.122.0.5=41.189.172.33,196.201.62.220=
        196.201.62.221=212.187.195.161-2-4.68.72.254=D109.27.101.45 1

represents the following path

        129.122.31.254
        129.122.0.5
        41.189.172.33,196.201.62.220 (two addresses at this hop)
        196.201.62.221
        212.187.195.161
        *
        *
        4.68.72.254
        109.27.101.45 (responding destination)

where “*” indicates a non-responding hop.


IP-RTT Distributions (ip-rtts)

The ip-rtts data transformation calculates the min, max, average, standard deviation, and percentiles (25th, 50th, 75th, and 95th) of RTTs for each hop or responding destination per vantage point.

Text output contains one set of statistics per pair of (vantage point, IP address) per line in the following format:

<vp=addr> <count> <min> <max> <avg> <stddev> <25th> <50th> <75th> <95th>

where vp is the name/ID of the vantage point, addr is the hop or responding destination address, <count> is the number of RTT samples for the given <vp=addr>, and the remaining fields are the RTT statistics. If <count> is 1, then we set <stddev> and all percentile values to 0.0, since these computations require at least two values.

Sample output (normally a single line that we’ve wrapped for convenient display):

       ams-nl=145.145.17.88 500 0.216 82.84 3.096 10.553 0.307 0.374
	0.501 27.143

The peering-links data transformation examines IP links with AS number annotations (see [Traceroute Annotations]), as produced by the ip-links transformation, and extracts unique AS peering links, which are IP links between two routers with different AS ownership.

Because peering links are IP links, the peering-links JSON output format (the only supported format) is a superset of the IP-links format (see [IP Links (ip-links)]) and includes two additional keys, type and middle_hop.

The type key indicates the type of the peering link and has the following values:

  • direct_peering” is a peering link between adjacent hops,

  • missing_middle” is a potential peering link between two hops separated by one or more non-responding hops (specified by gap), and

  • unidentified_middle” is a potential peering link between two hops separated by a single responding hop lacking an AS annotation (and thus no determination could be made).

In the case of “unidentified_middle”, the peering-link object will also have the middle_hop key, which has the same structure as src and dest and describes the hop separating src and dest.

The above type values don’t exhaustively cover all possible combinations of IP links over which a peering link could conceivably be inferred; for example, two AS-annotated hops separated by both non-responding hops and non-AS annotated hops. If such cases are important to you, then you should perform additional processing and inferences yourself over the input IP-links data.

We never try to infer a peering link from an IP link involving a responding destination because such a link doesn’t necessarily represent a physical link between two routers. Hence, the value of is_dest in src, dest, and missing_midle will always be false in peering-link objects.

The value of gap will be non-zero for “missing_middle” and zero otherwise (the hop of the “unidentified_middle” type isn’t a non-responding hop).

Sample output:

{
  "count": 1,
  "type": "unidentified_middle",
  "src": {
    "addr": "145.145.17.88",
    "is_dest": false,
    "annotations": {
      "asnum": {
        "id": "itdk-201904@1554076800",
        "value": "1103"
      },
      "hostnames": {
        "id": "dns-names@1493979030",
        "values": [
          "ae5.2216.jnr01.asd002a.surf.net"
        ]
      }
    }
  },
  "gap": 0,
  "middle_hop": {
    "addr": "145.145.166.86",
    "is_dest": false,
    "annotations": {
      "hostnames": {
        "id": "dns-names@1493979030",
        "values": [
          "google-router2.peering.surf.net"
        ]
      }
    }
  },
  "dest": {
    "addr": "108.170.241.204",
    "is_dest": false,
    "annotations": {
      "asnum": {
        "id": "itdk-201904@1554076800",
        "value": "15169"
      }
    }
  }
}

See Caveat: Uniqueness in JSON Format.


Router-Level Graph (router-graph)

The router-graph data transformation examines IP links with alias set annotations (see [Traceroute Annotations]), as produced by the ip-links transformation, and extracts unique router links, which represent links between two alias sets (rather than between two interface addresses as in IP links).

An IP link is converted to a router link by replacing the src and dest endpoint IP addresses with their corresponding alias set ID (that is, the id value of the alias_set annotation), which acts as the router ID. If an IP link endpoint doesn’t have an alias set annotation, then the endpoint IP address itself serves as the router ID and represents a router with just one (known) interface. Hence, every input IP link becomes a router link (possibly a duplicate router link), and in the degenerate case of no input IP links having alias set annotations, each unique IP link becomes its own unique router link.

The JSON output format (the only supported format) has two types of objects, as indicated by the type value, “router” and “router_link”.

The “router” type specifies a unique router with a router ID and any annotations found in the input IP link, as follows (see [Traceroute Annotations] for the structure of the annotations value):

{"type": "router", "router_id": "R1", "annotations": {}}
{"type": "router", "router_id": "R2", "annotations": {}}

The “router_link” type specifies a unique router link between two router IDs, as follows:

{"count": 1, "type": "router_link", "src": "R1", "dest": "R2"}

All “router” objects will occur before “router_link” objects in the output.

Sample output (note: not all “router” objects shown):

{
  "type": "router",
  "router_id": "1.255.76.126",
  "annotations": {}
}
{
  "type": "router",
  "router_id": "itdk-20200109-midar@1578528000:43504",
  "annotations": {
    "alias_set": {
      "id": "itdk-20200109-midar@1578528000:43504",
      "addrs": [
        "1.255.72.227",
        "218.145.43.10"
      ]
    }
  }
}
{
  "count": 2,
  "type": "router_link",
  "src": "1.255.76.126",
  "dest": "itdk-20200109-midar@1578528000:1286"
}
{
  "count": 5,
  "type": "router_link",
  "src": "itdk-20200109-midar@1578528000:2354",
  "dest": "itdk-20200109-midar@1578528000:1625"
}

See Caveat: Uniqueness in JSON Format.


Caveat: Uniqueness in JSON Format

The inclusion of annotations from the source traceroute paths is the key benefit of the JSON format over the text format, but annotations complicate the definition of uniqueness for JSON data objects, and users must take this into account to avoid incorrect analyses. Of course, if you choose not to annotate the source traceroute paths, or if there are no available annotations for a given set of query results, then this caveat won’t apply.

Uniqueness for JSON objects is based on the entirety of the object; that is, two JSON objects are the same if, and only if, the objects have the same keys with the same values. Hence, although the JSON format technically contains unique objects, one per line, JSON objects can be unique by differing only in their annotations with other objects. More subtly, two annotations with the same value (e.g., hostname) but different IDs (derived from the specific annotation dataset matching the timestamp of the source traceroute path) will still cause the enclosing JSON objects to be considered distinct.

For these reasons, you should not assume in your analyses that distinct JSON objects represent distinct IP links. You must postprocess the entirety of the file and extract the IP links in whatever way makes the most sense for your particular analyses, taking into account annotation values.

Published