Web traffic characterization: an assessment of the impact of caching documents from the NCSA's web server
We analyze two days of queries to the popular NCSA Mosaic server to assess the geographic distribution of the transaction requests. The wide geographic diversity of query sources present a strong case for deployment of geographically distributed caching mechanisms to improve server efficiency.
The NCSA web server consists of four servers in a cluster. We show time series of bandwidth and transaction demands for both individual servers as well as aggregated statistics for the server cluster.
We then break down the transaction load by geographic areas, specifically nine zones within the United States. We analyze the impact of caching the results of repetitive queries from the same areas, in terms of reduction of flows from the main server. We present metrics the reflect the implications of such savings if there were usage-based costs for using the backbone that would connect several cache sites. We investigate multiple timeouts for that cached objects, and visualize the results.
We also discuss future issues that caching will inevitably force us to face, such as mechanisms for redirecting queries initially destined for a central server to a preferred cache site. The preferred cache site may be a function of not only geographic proximity, but also current load on nearby servers or network links. Such refinements in the web architecture will be essential to the stability of the network as the web continues to grow, and operational geographic analysis of queries to archive and library servers will be fundamental to its effective evolution.