Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis > publications : papers : 2019 : learning_regexes_extract_router
Learning Regexes to Extract Router Names from Hostnames
M. Luckie, B. Huffaker, and k. claffy, "Learning Regexes to Extract Router Names from Hostnames", in ACM Internet Measurement Conference (IMC), Oct 2019.

The data supplement is designed to be used with sc_hoiho, which is included as part of scamper.

|   View full paper:    PDF    Data Supplement    DOI    Related Presentation    |  Citation:    BibTeX    Resource Catalog   |

Learning Regexes to Extract Router Names from Hostnames

Matthew Luckie2
Bradley Huffaker1
kc claffy1

CAIDA, San Diego Supercomputer Center, University of California San Diego


University of Waikato

We present the design, implementation, evaluation, and validation of a system that automatically learns to extract router names (router identifiers) from hostnames stored by network operators in different DNS zones, which we represent by regular expressions (regexes). Our supervised-learning approach evaluates automatically generated candidate regexes against sets of hostnames for IP addresses that other alias resolution techniques previously inferred to identify interfaces on the same router. Conceptually, if three conditions hold: (1) a regex extracts the same value from a set of hostnames associated with IP addresses on the same router; (2) the value is unique to that router; and (3) the regex extracts names for multiple routers in the suffix, then we conclude the regex accurately represents the naming convention for the suffix. We train our system using router aliases inferred from active probing to learn regexes for 2550 different suffixes.

We then demonstrate the utility of this system by using the regexes to find 105% additional aliases for these suffixes. Regexes inferred in IPv4 perfectly predict aliases for ≈85% of suffixes with IPv6 aliases, i.e., IPv4 and IPv6 addresses representing the same underlying router, and find 9.0 times more routers in IPv6 than found by prior techniques.

Keywords: measurement methodology, software/tools, topology
  Last Modified: Tue Nov-17-2020 04:47:39 UTC
  Page URL: