Distance metrics in the Internet Distance metrics in the Internet

Bradley Huffaker, Marina Fomenkov, Daniel J. Plummer, David Moore and k claffy
CAIDA, SDSC, UC San Diego {bradley,marina,djp,info,kc}@caida.org

IEEE International, Telecommunications Symposium, (ITS2002)

overview

  • server selection problem
  • data used in this study
  • distance metrics
  • metric success rates
  • conclusions


server selection

  • Many Internet services are provided by multiple servers.
  • Clients want to select a server that optimizes their access to
    a given service.
  • Possible optimizations:
    server load, available bandwidth, loss rate, transit time.
  • We will look at the problem of optimizing minimum transit time
    or Round Trip Time (RTT).


Approach for selecting minimum RTT

  • A common solution to this problem is a selection system
    which is local to a client and can do the selection for it.
  • This system requires a metric of distance in order to sort
    the list of potential servers.
  • We will present four metrics which represent the distance between
    two nodes on the Internet.


study background

  • data: CAIDA Macroscopic Topology Project

    • IP forward path topology and RTT
    • Collected at 9 monitors around the world.
    • Continuously monitors many thousands of destinations
  • methodology: success rate
  • br Providing a single value which represents the rate
  • br at which a metric successfully predicts low RTT.

data: CAIDA Macroscopic Topology Project

  • monitors

    • trace
    • br forward IP path and RTT between monitor and destination
    • cycle
    • br a single run through the destination list
  • destination lists

    • DNS clients

      • one DNS client per routable prefix
      • 8 to 14 cycles per day
      • 58,000 destinations
    • IPv4 list

      • one IP address per /24 in routable prefix
      • 1 cycle per day
      • 300,000 destinations

methodology: success rate

  • For each pair of traces we compare RTTs and the metrics.

    • lower metric & lower RTT = success
    • lower metric & higher RTT = failure
    • equal metric = useless
    • brbr RTTs are never equal.
  • For each metric, we count the total number of success, failure,
    and useless trials.


distance metrics

figures/distance_typesb.png


metrics

  • IP path Length (number of IP)
  • br The number of routers, represented by their IP address.
  • AS path length (number of AS)
  • br The number of ISPs, represented by their AS.
  • geographic distance (km)
  • br The great circle distance from client to the server.
  • median RTT (ms)
  • br The median RTT sampled from midnight GMT to midnight
  • br GMT on the previous day.

metric description

  • possible deployable system
  • br A system which could be created to provide user end
    access to a given metric.
  • our approximation
  • br Method used to estimate a given metric in our study.


IP path length

  • possible deployable system

    • Shortest path found between client and server in a IP graph.
    • This IP graph can be built by a remote system and shared
      between multiple distance resolvers.
  • our approximation

    • The actual forward IP path seen in the Internet.

AS path length

  • possible deployable system

    • AS paths can be collected from the Border Gateway Protocol
      (BGP) annoucements.
    • This information is already distributed by the Internet routers
      and so would not introduce additional traffic to the network.
  • our approximation

    • We could not collect BGP data for all our monitors.
    • So we converted our IP paths to AS paths using information collected by Oregon's Routeviews Project (www.routeviews.org).

geographic distance

  • possible deployable system

    • A service that knows geographic location of IP address can
      be used to find the location of servers.
    • The location of the client should already be known (or can be
      retrieved from the same geographic service).
    • Then the distance between these two points can be calculated.
  • our approximation

    • The location of the skitter monitors is already known.
    • We used a commercial geographic service IxMapper to find
      geographic location.

geographic distance and RTT figures/dist_density_rie_20010513.png

  • three clusters of high density on both RTT and geography
  • West Coast, East Coast. Europe/Asia

median RTT

  • median RTT from sample of the previous day
  • possible deployable system

    • Set of monitors which systematically sample possible client RTT.
    • This monitoring increases traffic on the network.
    • Due to high variability in RTT values previous values can not
      specifically predict the next RTT value.
  • our approximation

    • We used our sampled data from the previous day and calculated
      median value from these samples.

percentage of successful trials figures/game_servers_successful.png

  • RTT median provides over 90% success rate
  • geographic distance provides 75% success within the US,
    but only two of the five non-US monitors achieved this
  • IP path length is a little better then random
  • AS path length is only as good as random

stability of results
figures/game_ries_successful.png

  • all metrics highly stable, with only minor local fluctuation
  • br IP path length, AS path length, geographic distance, median RTT

RTT based metrics

  • median RTT
  • br the value in the middle of the sorted list of values
  • single value RTT
  • br a single RTT value previously observed
  • average RTT
  • br the sum of all values divided by the number of values

RTT accumulation
figures/a-root.png

  • the success rate of RTT metrics as number of cycles increases
  • median has the greatest success up to 24 hour period
  • taking a single RTT value near the current time of day
    is more effective than averaging all values in between
  • storing RTT over a 24 hour period does not improve the
    success rate of the median RTT metric

conclusions

  • RTT based metrics provide a high success rate for server ranking.

    • No more then 24 hours of data should be collected.
    • The success rate is better then the next best even
      when a single RTT value is used.
    • But these are hard to collect.
  • Geography provides a reasonable indicator of low RTT within the US.

    • This can be done at no cost to the network.
  • AS Path length has no predictive power when selecting low RTT.





File translated from TEX by TTH, version 2.92.
On 17 Sep 2002, 11:04.

Related Objects

See https://catalog.caida.org/media/2002_distance/ to explore related objects to this document in the CAIDA Resource Catalog.