Web traffic characterization: an assessment of the impact of caching documents from the NCSA's web server
H.-W. Braun and K. Claffy
Cooperative Association for Internet Data Analysis - CAIDA
San Diego Supercomputer Center,
University of California, San Diego
We analyze two days of queries to
the popular NCSA Mosaic server to assess
the geographic distribution of the
transaction requests.
The wide geographic
diversity of query sources present a strong
case for deployment of
geographically distributed caching mechanisms
to improve server efficiency.
The NCSA web server consists of
four servers in a cluster. We
show time series of bandwidth and transaction
demands for both individual servers as well
as aggregated statistics for the server cluster.
We then break down the transaction load
by geographic areas, specifically nine
zones within the United States.
We analyze the impact of caching the results of
repetitive queries from the same areas,
in terms of reduction of flows from the main server.
We present metrics the reflect the
implications of such savings if there were
usage-based costs for using the backbone
that would connect several cache sites.
We investigate multiple timeouts for
that cached objects, and visualize the results.
We also discuss future issues that
caching will inevitably force us to face,
such as mechanisms for redirecting queries initially
destined for a central server to a preferred
cache site. The preferred cache site may be a function
of not only geographic proximity, but also
current load on nearby servers or network links.
Such refinements in the web architecture
will be essential to the stability of the
network as the web continues to grow,
and operational geographic analysis of
queries to archive and library servers will
be fundamental to its effective evolution.