- CAIDA

Distance metrics in the Internet Distance metrics in the Internet

Bradley Huffaker, Marina Fomenkov, Daniel J. Plummer, David Moore and k claffy
CAIDA, SDSC, UC San Diego {bradley,marina,djp,info,kc}@caida.org

IEEE International, Telecommunications Symposium, (ITS2002)

overview

server selection problem

data used in this study

distance metrics

metric success rates

conclusions

server selection

Many Internet services are provided by multiple servers.

Clients want to select a server that optimizes their access to
a given service.

Possible optimizations:
server load, available bandwidth, loss rate, transit time.

We will look at the problem of optimizing minimum transit time
or Round Trip Time (RTT).

Approach for selecting minimum RTT

A common solution to this problem is a selection system
which is local to a client and can do the selection for it.

This system requires a metric of distance in order to sort
the list of potential servers.

We will present four metrics which represent the distance between
two nodes on the Internet.

study background

data: CAIDA Macroscopic Topology Project
- IP forward path topology and RTT
- Collected at 9 monitors around the world.
- Continuously monitors many thousands of destinations

methodology: success rate

br Providing a single value which represents the rate

br at which a metric successfully predicts low RTT.

data: CAIDA Macroscopic Topology Project

monitors
- trace
- br forward IP path and RTT between monitor and destination
- cycle
- br a single run through the destination list

destination lists
- DNS clients
  - one DNS client per routable prefix
  - 8 to 14 cycles per day
  - 58,000 destinations
- IPv4 list
  - one IP address per /24 in routable prefix
  - 1 cycle per day
  - 300,000 destinations

methodology: success rate

For each pair of traces we compare RTTs and the metrics.
- lower metric & lower RTT = success
- lower metric & higher RTT = failure
- equal metric = useless
- brbr RTTs are never equal.

For each metric, we count the total number of success, failure,
and useless trials.

distance metrics

metrics

IP path Length (number of IP)

br The number of routers, represented by their IP address.

AS path length (number of AS)

br The number of ISPs, represented by their AS.

geographic distance (km)

br The great circle distance from client to the server.

median RTT (ms)

br The median RTT sampled from midnight GMT to midnight

br GMT on the previous day.

metric description

possible deployable system

br A system which could be created to provide user end
access to a given metric.

our approximation

br Method used to estimate a given metric in our study.

IP path length

possible deployable system
- Shortest path found between client and server in a IP graph.
- This IP graph can be built by a remote system and shared
  between multiple distance resolvers.

our approximation
- The actual forward IP path seen in the Internet.

AS path length

possible deployable system
- AS paths can be collected from the Border Gateway Protocol
  (BGP) annoucements.
- This information is already distributed by the Internet routers
  and so would not introduce additional traffic to the network.

our approximation
- We could not collect BGP data for all our monitors.
- So we converted our IP paths to AS paths using information collected by Oregon's Routeviews Project (www.routeviews.org).

geographic distance

possible deployable system
- A service that knows geographic location of IP address can
  be used to find the location of servers.
- The location of the client should already be known (or can be
  retrieved from the same geographic service).
- Then the distance between these two points can be calculated.

our approximation
- The location of the skitter monitors is already known.
- We used a commercial geographic service IxMapper to find
  geographic location.

geographic distance and RTT

three clusters of high density on both RTT and geography

West Coast, East Coast. Europe/Asia

median RTT

median RTT from sample of the previous day

possible deployable system
- Set of monitors which systematically sample possible client RTT.
- This monitoring increases traffic on the network.
- Due to high variability in RTT values previous values can not
  specifically predict the next RTT value.

our approximation
- We used our sampled data from the previous day and calculated
  median value from these samples.

percentage of successful trials

RTT median provides over 90% success rate
geographic distance provides 75% success within the US,
but only two of the five non-US monitors achieved this
IP path length is a little better then random
AS path length is only as good as random

stability of results

all metrics highly stable, with only minor local fluctuation
br IP path length, AS path length, geographic distance, median RTT

RTT based metrics

median RTT

br the value in the middle of the sorted list of values

single value RTT

br a single RTT value previously observed

average RTT

br the sum of all values divided by the number of values

RTT accumulation

the success rate of RTT metrics as number of cycles increases

median has the greatest success up to 24 hour period

taking a single RTT value near the current time of day
is more effective than averaging all values in between

storing RTT over a 24 hour period does not improve the
success rate of the median RTT metric

conclusions

RTT based metrics provide a high success rate for server ranking.
- No more then 24 hours of data should be collected.
- The success rate is better then the next best even
  when a single RTT value is used.
- But these are hard to collect.

Geography provides a reasonable indicator of low RTT within the US.
- This can be done at no cost to the network.

AS Path length has no predictive power when selecting low RTT.

File translated from T_EX by T_TH, version 2.92.
On 17 Sep 2002, 11:04.

Related Objects

See https://catalog.caida.org/media/2002_distance/ to explore related objects to this document in the CAIDA Resource Catalog.