CAIDA's Annual Report for 1998
Disclaimer: This page was written in 1998 and is made available for historical purposes. Due to the changing nature of the internet, many of the links referenced below are no longer valid.
A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 1998.I. SUMMARY
The National Science Foundation (NSF) is providing a seed-fund grant to the University of California, San Diego (UCSD) to establish the Cooperative Association for Internet Data Analysis (CAIDA), grant number NCR-9711092. This report is a summary of CAIDA's progress during its first 11 months under the leadership of CAIDA's principal investigator, Dr. K.C. Claffy.
CAIDA is a collaborative undertaking between government, academia and commercial firms to promote greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure. Having NSF support for this emerging industry-focused effort assists in promoting balance among the needs of the various communities (private, research, government, and users) and facilitating the near term development and deployment of critical measurement technology and techniques.
CAIDA is designed to address complex, multidimensional needs, spanning technical, research, and organizational boundaries. It provides a neutral forum for organizations to collaborate directly with CAIDA staff and/or with one another to address current and future operational and engineering requirements. CAIDA also provides an environment conducive to the collection of routing data and passive and active measurement data from throughout the infrastructure and aggregation of these data to form a baseline for comparison of individual network's performance against industry norms. CAIDA also hopes to develop a historic traffic archive of proprietary and public data permitting CAIDA staff to analyze macro-level Internet trends relating to traffic and routing behavior.
During CAIDA's initial months, the focus has been on (1) developing the infrastructure required to support this effort, including legal documentation, staffing resources, and equipment; and (2) initiating projects reflective of the type of work that will be accomplished under CAIDA's auspices.
CAIDA's current priorities are on developing and deploying traffic measurement and analysis tools, collaborating with staff of the National Laboratory for Applied Network Research's Measurement and Operations Analysis Team (NLANR/MOAT) toward the collection of traffic data and analysis of data according to research and member requirements. Details on CAIDA's background, goals, collaborators, and projects are available on the CAIDA website https://www.caida.org.
This report is divided into four sections:
- Organizational Progress
- Technical Progress
- Outreach
- Next Steps
II. ORGANIZATIONAL PROGRESS
1.0 Agreements
During its first year, CAIDA emphasized development of the appropriate legal and organizational structure necessary to permit it to operate as an entity within the University of California. While UCSD supports CAIDA's goals, the private sector focus of the project delayed CAIDA's implementation due to requirements for interpretation of university policies (as they pertain to CAIDA) and negotiation of appropriate language with university officials to ensure compliance with university regulations. Key documents and agreements developed during the first year are described below.
1.1 Membership Agreement & Overhead - CAIDA staff finalized the Membership Agreement in April 1998, see https://www.caida.org/Caida/join.html. In addition to working with UCSD officials on the agreement language, CAIDA places significant priority on a low overhead on funds derived through CAIDA corporate participation. An effective overhead rate of 14% will be applied to this service agreement (versus the university's standard overhead of 51.5% on research agreements). Based on discussions with private sector collaborators, this reduced overhead is an important assurance that the majority of commercial participants funds will apply directly to CAIDA initiatives.
1.2 SD-NAP - In April, CAIDA announced plans to host a Network Access Point in San Diego, the SD-NAP. The participation agreement for this facility is available at https://www.caida.org/Docs/sdparticipation.html. Since SD-NAP is not intended as a profit-making operation and will serve as a real-world environment in which CAIDA research can be conducted, UCSD plans to waive overhead on this facility.
1.3 Proprietary Data Agreement - Participation of private sector firms in CAIDA's operations is in part contingent on CAIDA's being able to protect the privacy of the companies and confidentiality of their traffic and other data. The Proprietary Data Agreement (see https://www.caida.org/Docs/proprietary.html) vests ownership of monitored traffic data with the participating organizations, while reserving CAIDA's right to analyze the data. CAIDA hopes that participating firms may be willing to share some data with third parties, however, any such transfers will be directly between the owner of the data and the potential user.
1.4 Copyright Notice - CAIDA's policy is to make source code for its tools publicly available, however, it is our intention that use of our products be in accordance with the globally accepted terms and conditions of the GNU General Public License for public software. The copyright notice that CAIDA places on all tools and scripts developed under our auspices is available at https://www.caida.org/Docs/copyright.html.
2.0 Staffing
When CAIDA officially began (October 1, 1997), there were five individuals supporting the National Laboratory for Applied Network Research (NLANR) that transitioned to CAIDA tasks. Staff of CAIDA and NLANR's Measurement and Operations Analysis Team (MOAT) continue a tradition of close collaboration in the acquisition and analysis of Internet traffic data.
As of August 1, 1998, the CAIDA and NLANR staff includes 33 individuals. Principal Investigators are kc claffy (CAIDA), Hans-Werner Braun (NLANR-MOAT), and Duane Wessels (NLANR-Cache). Tracie Monk manages the group. Current sabbatical personnel include Dr. Evi Nemeth (University of Colorado, Boulder) who is heading CAIDA's Internet Engineering Curriculum Repository project and Dr. Tony McGregor (University of Waikato, New Zealand) who is designing NLANR/MOAT's active measurement architecture. Full time CAIDA and NLANR staff include: Nancy Bachman, Amy Blanchard, Glenn Chisholm, Brad Huffaker, Sean McCreary, Daniel McRobb, David Moore, Alex Rousskov, Brendan White, and Jenniffer Woodson. Interns include: Karin Braun, Jeff Brown, Jambi Ganbar, Oliver Jakubiec, Mary Kalra, Esmond Lee, Alex Ma, Kai Tang, Mike Tesch, and Michael Young. SDSC staff working with CAIDA part-time include: Harry Ammons (on video production), Mike Gannis (technical writing), David Nadeau (visualization), and Glenn Sager (security).
Foreign interns currently include Jaeyeon Jung from Korea's Advanced Institute for Science and Technology (KAIST). Yamamoto Shigeru, from the Internet Initiative of Japan, Inc. will join CAIDA's staff in October 1998.
3.0 Collaborations
3.1 NLANR/MOAT - The technical scope of work of CAIDA and NLANR bear many similarities. Both involve the acquisition and analysis of traffic data and development of related tools. Many of SDSC's staff share responsibilities under both projects. NLANR/MOAT is currently focusing on the deployment of Coral (OC3mon and OC12mon) monitors and CAIDA staff is focusing on the analysis tools and scripts to use these data. Details on the NLANR Measurement Operations and Analysis Team effort led by Hans-Werner Braun of UCSD/SDSC are available at http://moat.nlanr.net.
3.2 NLANR/Cache - Web caching has been popular internationally for several years due to the significant inter-continental bandwidth savings it provides. Only recently, however, has caching gained favor in the United States among commercial providers and large institutional users. In response to these trends, CAIDA and NLANR/Cache staff are collaborating with MAE-West and several participating ISPs (most notably Zocalo) to implement a prototype commercial cache model at the MAE-West Exchange Point. The collaboration began in December 1997. Unlike the successful NLANR Global Cache Hierarchy 'research' model, this new cache model requires participating ISPs to be responsible for their own http misses, e.g., provide their own transit to resolve http misses as opposed to reliance on the vBNS or other third party transit provider. The prototype, described at http://ircache.nlanr.net/Cache/mae-west/, now has 33 networks peering with the cache's router. Additional details on the NLANR-Cache effort are available at http://ircache.nlanr.net/. Details on the proposed CAIDA Cache Working Group and 1999 Web Caching Workshop are described in the Next Steps section of this report.
3.3 NGI/DARPA - In July 1998, the Defense Advanced Research Project Agency (DARPA) awarded UCSD/CAIDA a $2.4 million award for a Next Generation Internet (NGI) initiative. This effort focuses on monitoring, depicting, and predicting traffic behavior on current and advanced networks, through developing and deploying tools to better engineer and operate networks and to identify traffic anomalies in real time. CAIDA will concentrate efforts on: developing tools to automate the discovery and visualization of Internet topology and peering relationships (through application of the CAIDA-developed Skitter tool); monitoring and analyzing Internet traffic behavior on high speed links (through the development of OC48 and Light monitors); detecting and mitigating threats (through security application of the Coral monitors); and providing for storage and analysis of massive volumes of traffic data. This project leverages existing NSF-supported work, including efforts by the MCI/vBNS team, CAIDA and NLANR on the development and deployment of Coral monitors (OC3mon and OC12mon); deployment of the Skitter performance tool (primarily with NSF funding); and analysis/visualization techniques developed through NSF support of CAIDA. CAIDA's home page on these efforts is available at https://www.caida.org/NGI.
3.4 ARTS File Format - In February 1998, ANS Communications Inc. made an important engineering data storage and analysis program - ARTS - available to the community through licensing the software to CAIDA. Development of the ARTS binary file format specification was initiated in 1992 by David Bolen of ANS, as part of ANS support of the NSFnet Backbone Network Service. Daniel McRobb (previously with ANS, now with CAIDA) led ARTS enhancements since 1993, including integratation of ARTS into other ANS public code products such as CflowD. ARTS efficiently stores several types of IP network traffic data, ranging from traffic flow information to performance data. As per the terms of this agreement, CAIDA will continue to enhance the ARTS code and will sublicense it to other users in the community. The press release on ARTS is available at https://www.caida.org/Caida/pr-arts.html.
3.5 SD-NAP - In April 1998, CAIDA announced the formation of a Network Access Point. SD-NAP is hosted by CAIDA at the San Diego Supercomputer Center on the campus of UCSD, with support by both SDSC and the Packet Clearing House (PCH). The primary purpose of the SD-NAP is to facilitate efficient interconnection of Internet Protocol transit networks within and to the San Diego Local Access and Transport Area, California LATA 6. The secondary purpose of the SD-NAP is to provide a platform for traffic analysis by CAIDA researchers with the goal of promoting a robust, scalable global Internet infrastructure. The website https://www.caida.org/Caida/caidaix.html describes the goals and terms and conditions for participation in the SD-NAP. Connection to the SD-NAP is open to all data network service providers and research and educational organizations whose networks use the conventions required for connection to the Global Internet, as defined by the Internet Engineering Task Force (IETF) in the specifications for interconnectivity called Requests for Comments (RFCs). We anticipate that the primary use of SD-NAP will be for BGP peering between organizations for the purpose of exchanging local traffic. Initial participants at SD-NAP include: Accentric, MillenniaNet, American Digital Network, CSUnet, CTS Networking Services, Electriciti, Epoch Internet, iDREN, SDSC, SDSU, UCSD, Zocalo/PCH.
3.6 One-Way Performance Measurements - In April 1998, Waikato University's Stephen Donnelly visited CAIDA to install a Global Positioning System (GPS) antenna for active measurements between SDSC and New Zealand. The results of these ongoing tests are posted on a Waikato web site at http://atm.cs.waikato.ac.nz/atm/delay/.
3.7 CflowD - CflowD is experimental software used to collect data from Cisco's flow-export feature, NetFlow. Daniel McRobb developed this code while at ANS Communications Inc. It is now running in production to analyze Netflow data from numerous commercial provider's routers. Information on ANS implementation of CflowD is available at http://engr.ans.net/cflowd/. CAIDA is working with the community (especially Cisco and UUNET/ANS) to continue enhancements to this code. The next version release will be made available on a CAIDA CflowD web page in late 1998.
3.8 Multicast - CAIDA staff (Nancy Bachman and Evi Nemeth) are working with Bill Fenner (Xerox-PARC) to identify sources of duplication of multicast packets over the global multicast backbone (MBONE). Fenner collects data on duplicate packets arriving at a PC router at Xerox PARC. Preliminary analysis of these data involved visualizing the patterns of duplicate packet arrival times to determine whether patterns (as viewed by an end receiver of multicast services) provide indications as to the source and cause of the duplications. Fenner's description of the problem and the data are available at http://sandbox.parc.xerox.com/fenner/mcast/dups.html. CAIDA's analyses should be available in late 1998.
In related areas, CAIDA staff are involved in an IETF experiment with David Meyer (CISCO) and several ISPs (notably Kevin Thompson of MCI/vBNS) to deploy native multicast using PIM-SM and to interconnect those ISPs using MBGP at three "multicast-friendly exchange points". Meyer described the scope of this effort at the April 1998 (Los Angeles) meeting of the IETF's IDMR Working Group.
3.9 Visualizations - Visualization of Internet data is still in the nascent stages of development. Early prototypes were initiated in 1996 as part of an UCSD collaboration with Tamara Munzner of Stanford University. Collaborations with Munzner are continuing, as are visualization efforts with Bill Cheswick (Lucent/Bell Laboratories) and Hal Burch of Carnegie Mellon University (a 1998 summer intern with Lucent/Bell Laboratories).
3.10 Cache Visualization - The Plankton visualization tool (described below) is a collaboration between CAIDA and NLANR/Cache staff. In addition, Jaeyeon Jung, an intern from Korea's KAIST institute and representative of the Asia Pacific Advanced Networking (APAN) organization, is a co-developer of the tool.
3.11 Equipment Loans/Donations - Several organizations are participating in CAIDA projects through provision of testing and operational equipment, including Cisco, Digital, Mae-West, Sun, and Zocalo. Loaned and donated equipment during CAIDA's first year include:
- Cisco 2500 router for SD-NAP
- 3-COM bridge for SD-NAP
- Cisco 2503 ISDN router for use in CAIDA's Ann Arbor, MI operations
- Cisco GSR 12004 router (OC12) for use in the CaidaLab
- Cisco Lightstream 1010 switch (oc12, oc3) for use in the CaidaLab
- Sun Enterprise 450 (4.5GB disk, 512MB RAM) for use in CAIDA's data acquisition/analysis effort
- Digital Alphaserver 1200 5/533 with Dual 533 MHz processors, 1 GB RAM, 80 GB disk for caching research
- 2 Cisco Transparent Cache Engines (CE2050s) and a Cisco 7200 router for caching research
- 5 active measurement hosts, located at Boulder, Pittsburgh, San Diego, San Jose, and Urbana-Champaign
- DNS server for use in the MAE-West Cache project
In addition, both Digital (through the PAIX) and Verio donated connectivity to remote CAIDA researchers.
3.12 Internet Engineering Curriculum (IEC) Repository - CAIDA initiated the Internet Engineering Curriculum Repository project in 1998 under the leadership of Dr. Evi Nemeth and Dr. Claffy. This collaborative project involves the coordination and operation of a Web-based 'living repository' for Internet engineering. An annual report on its progress is due late August 1998 and will be linked from the IEC web page (see http://iec.caida.org).
III. TECHNICAL PROGRESS
1.0 Tools
1.1 Mapnet - Mapnet is a tool for visualizing the backbone infrastructure of major Internet Service Providers simultaneously, and for updating and correcting information that may be invalid or out of date. The tool's accuracy relies on the cooperation of providers in keeping the information accurate. The tool also includes a module for determining performance or routing information about components of the infrastructure amenable to measurement. UCSD/SDSC staff initiated work on this Java-based tool as part of the NSF-funded National Engineering Grant (NEG) award to UCSD, and continued its development in late 1997 under the auspices of CAIDA. The author of the tool is Bradley Huffaker. The Mapnet web page is now a popular site routinely receiving more that 2,000 hits per day. The tool, including source code, is available at https://www.caida.org/catalog/software/mapnet/ a video describing Mapnet is under development by CAIDA staff member Oliver Jakubiec and SDSC's Harry Ammons.
1.2 Plankton - CAIDA released Plankton, another Java-based tool, in Spring 1998. The tool offers a visualization of the NLANR Global Cache Hierarchy topology as seen from the perspective of the NLANR root caches. Plankton can depict the hierarchy either in geographic form or as a logical topology. Performance of the hierarchy, as viewed by the root caches, is illustrated in the form of the rates of URL hits or misses. The tool also permits the user to depict caches according to domain (i.e., .com, .edu, .us, .nz, .au) and to simulate changes in the relationships of the hierarchy's caches using information from the NLANR cache database. The NSF-funded NLANR Cache Hierarchy Project, see http://ircache.nlanr.net, collaborated with CAIDA in the development of this tool. Information on the Plankton tool, including source code, is available at https://www.caida.org/catalog/software/plankton. Plankton's developers, Bradley Huffaker and Jaeyeon Jung, described the tool in a presentation at the Third International Web Caching Workshop in Manchester, England in June 1997. The resulting paper (see: https://www.caida.org/catalog/software/plankton/paper/turnin/plankton/) will be published in a special issue of Computer Networks and ISDN Systems in late 1998.
Plankton Image - Logical Topology https://www.caida.org/catalog/software/plankton |
1.3 Manta - CAIDA's multicast collaborations with Bill Fenner, Xerox PARC, led to the development of the tool known as Manta. This Java-based tool is a follow-on to work in 1996 between Claffy and Tamara Munzner (Stanford) relating to the PlanetMulticast visualization, illustrated below.
Early multicast image developed by Munzner/Claffy in 1996 -- see http://oceana.nlanr.net/PlanetMulticast
Manta visualizes the multicast backbone, including both PIM (native multicast) connections and IP multicast tunnels. A draft paper describing the functionality of the Manta tool and its relevance for analyzing global infrastructures is under development Bradley Huffaker, K.C. Claffy and Evi Nemeth. Information on Manta, including source code, is available at https://www.caida.org/tools/manta.
1.4 Otter - In developing the above tools, it became apparent that each aspect of the Internet's infrastructure has visualization opportunities (and challenges). In Spring 1998, Bradley Huffaker initiated efforts to develop a tool capable of visualizing arbitrary (generically non-strictly formatted) Internet data sets. In July 1998, we released a beta version of the resulting tool, Otter. Otter currently visualizes a range of data types, from vBNS SNMP data (see https://www.caida.org/catalog/software/otter/vbns/), to a web tree (see https://www.caida.org/catalog/software/otter/htmlcrawl/), to relationships among Autonomous Systems (see https://www.caida.org/catalog/software/otter/AS/), to CAIDA's active measurement data (see https://www.caida.org/catalog/software/otter/Skitter/). Otter's versatility makes it a critical addition to CAIDA's visualization tools.
Logical Depiction of vBNS Routers Using the Otter Tool https://www.caida.org/catalog/software/otter |
1.5 Skitter - In early 1998, CAIDA's Daniel McRobb began development on a new measurement tool for conducting infrastructure-wide measurements. Skitter tracks forward IP path (the "hops") from a source to many destinations. Key features of Skitter include:
- Measurement of round trip time and path to thousands of destinations - Skitter measures the IP path to a destination in a manner similar to traceroute: it increments TTL when sending packets to a destination and records the router that replies at each TTL until a TTL sufficient to reach the destination is used. Skitter uses ICMP echo requests as probes, unlike the default of UDP used by traceroute.
- Visibility and frequency of routing changes - Low-frequency persistent routing changes (as opposed to things like per-packet equal-cost multipath) are visible through Skitter measurements; often a step function in the RTT indicates a change in either forward or reverse path, and a change in the forward path is visible in the Skitter data.
- Network connectivity (characteristics of directed graph from a source) is a fundamental goal in the initial design of Skitter. By probing the path to many destinations spread throughout the IPv4 address space, CAIDA is trying to get a picture of the directed graph from a source to much of the Internet.
Image created by Hal Burch (CMU/Lucent) using Skitter Data
https://www.caida.org/catalog/software/skitter
CAIDA is currently monitoring more than 20,000 destination hosts (mostly web servers) from around the globe, from five source machines in the U.S. Initial visualizations are being developed in collaboration with Hal Burch, Carnegie Mellon University (an intern with Lucent Technologies). Burch's image above depicts IP paths to roughly 20,000 destinations from a single source in Ann Arbor, Michigan (USA). The lighter colors represent links closer to the source and darker colors are used for links further from the source (by hop count).
The Skitter graphic (next page) depicts the connectivity between key backbones. This graphic is being used on the reference workbook for the ISMA workshop August 31-September 1, 1998.
Image of ISP Backbone Connectivity reflecting Skitter Path Data
https://www.caida.org/catalog/software/skitter/viz9808.html
More details on Skitter and on its deployment as part of CAIDA's Next Generation Internet project funded by the Defense Advanced Research Project Agency (DARPA) are available at https://www.caida.org/catalog/software/skitter/.
1.6 Other Tools - CAIDA is continuing exploration of alternative measurement and visualization tools. MapRoute is an alpha-version tool displaying a geographic representation of a traceroute output. The lack of data on latitude and longitudinal coordinates for major Internet nodes significantly hinders the usefulness of tools like MapRoute. (See https://www.caida.org/tools/iptll.html on how ISPs can help.) CAIDA's visualization staff are currently exploring programming alternatives associated with displaying 3-dimensional information.
2.0 Data Analysis
CAIDA is in the initial stages of analyzing Internet traffic data being gathered by NLANR/MOAT monitors and by CAIDA-deployed monitors. The sections below group this analysis by category: Global Internet Issues And Characteristics; Measurement/Visualization (Active and Passive Data); Protocol and Architecture Research and Development; Backbone Engineering: Capacity Planning; Backbone Engineering: Routing; Backbone Engineering: Advanced Topics; and Equipment Design Constraints. An online tutorial covering each of these topics will be distributed in early 1999.
2.1 Global Internet Issues and Characteristics - CAIDA is analyzing traffic traces, routing tables, and Skitter measurement data for details on the behavior of packets as they cross multiple Autonomous Systems. Information on path lengths (hops) and connectivity among networks can provide insights to questions relating to:
- How many people are using the net, and how much are they using?
- What is the average number of hops a packet travels?
- How many ASes does the 'average' Internet packet travel?
- How distributed is Internet traffic?
- Who routes most of the traffic in the Internet?
- Who is peering where? And, how much of it is public?
- How much of the IPv4 address space is really utilized/routed?
The graphs below reflect preliminary analyses of AS Path Lengths, AS Favoritism, and IPv4 Address Space Allocations.
Preliminary Analyses of AS Path Lengths using Coral (OC3mon) Data
Analysis of Fix-West Trace against Routing Data from Oregon Route Views Project
(http:..www.antc.uoregon.edu/route-views)
Data indicates significant concentration among a few ASs
Analysis of IPv4 Address Space as seen by BGP Routing Tables,
via specific clusters of AS numbers, and addresses within one AS hop from sample AS (Worldcom).
Analysis conducted in collaboration with NLANR
2.2 Measurement/Visualization (Active and Passive Data) - The types of questions being addressed here include:
- What is meant by 'flow'? Are 'flows' SYN/FIN-delimited or something else?
- What tools can collect traffic at high speeds (e.g., at least OC3), and what post-processing utilities are available?
- What forms of traffic data might ISPs be willing/wanting to collect?
- How useful are wide-area network measurement tools for intranet measurement applications?
- How many 'transit backbones' exist, and to what extent do their topologies overlap?
- What are legitimate tools for accurately measuring end-to-end performance?
- What initiatives are pursuing macroscopic views of Internet performance? What are their strength/weaknesses?
CAIDA is working with NLANR/MOAT, the MCI/vBNS team, and Coral-Developers (individuals participating on the Coral-Dev mailing list) to develop analysis techniques using data from the OC3mon and OC12mon monitors. David Moore, Sean McCreary, Claffy and NLANR's Hans-Werner Braun are developing a library of analysis related scripts. Scripts from this Libcoral family will be made publicly available starting late 1998.
CAIDA's active measurement focus on Skitter. Initial observations from a 15-minute sample run of Skitter from a host in Michigan, to about 13,900 destinations resulted in packets visiting 22,756 intermediate IP addresses. Interestingly, only 165 IP addresses showed up in more than 1% of paths, suggesting that a few key routers play a huge role in global connectivity from a given source. The graphics below depict the resulting Hop Distance Distribution; Paths Per Hop Address; and Skitter Performance Measurements. (Skping is the round-trip measurement portion of the Skitter tool.)
Skitter measurement plot of Hop Distributions, June 1998
see https://www.caida.org/Presentations/Nanog698.dwm.skitter
Skitter plot of frequency of IP addresses in paths, June 1998
Skitter measurement plot of Paths Per Hop, June 1998
Examples of Skitter's skping round trip measurements, June 1998
see https://www.caida.org/Presentations/Nanog698.dwm.skitter
The last graphic illustrates measurements of routers that possess prefix cache agers - note the statistically significant performance variation. Routers running CEF do not seem to exhibit this behavior. From our initial data sets, there seems to be correlation between the outdegree of the some routers and the duration of performance degradation (as reflected in sustained higher ICMP delay to the router).
Skitter measurements of routers with cache agers, June 1998
see https://www.caida.org/Presentations/Nanog698.dwm.skitter
2.3 Protocol and Architecture Research and Development - Coral packet traces provide detailed information into the nature of and trends associated with specific Internet protocols. CAIDA researchers are examining various R&D issues, including:
- How does the number of individual traffic flows/conversations change as a function of flow granularity?
- How elastic are flows?
- How does buffering affect TCP throughput?
- How relevant is current state-of-the-art TCP simulation to backbone problems?
- What is the current profile of DNS traffic, and how has/will it change over the last/next 4 years?
[note - DNS profiling may increase in importance as DNS authentication and routing are introduced into the Net] - What kind of hit rates/performance gains does web caching really provide the user/network?
- What does the global caching hierarchy look like?
- Has anyone articulated a business case ('ROI') for deploying caches? What metrics are relevant?
The plot below depicts the distribution of packets per flow by protocol as a function of the number of flows (roughly, conversations, or packet trains using a 64-second timeout) of that application based on an NLANR/MOAT trace sample from FIX-West.
Analysis of protocols found in FIX-West trace
CAIDA and NLANR/Cache are collaborating on issues of caching economics. Claffy presented initial results from these efforts at the February 1998 meeting of the North American Network Operator's Group (NANOG), https://www.caida.org/Presentations/Nanog9802/. The figures below were developed by NLANR-Cache's Alex Rousskov through analysis of data from the NLANR root caches.
ROI analysis based on T1; ISP with monthly income (2 T1s, no cache) of $32k and sustained hit ratio of 25% would double its investment in about 2*6 = 12 months
(2 connections, 6 = 25%: double curve (x=32)) connections;
NLANR supercache latency savings: average hit saves US about 1 sec, European users ~2 sec
total (crude) NLANR supercache savings in response time: (sec_saved_per_hit * #hits)
assumes investment of 1 hour/day (365 days/year) to monitor cache hierarchy
2.4 Backbone Engineering: Capacity Planning - Information on the source and destination of packets (by Autonomous System) is of critical importance to Internet Backbone Providers in making decisions about capacity and peering. In addition, these forms of data can be useful in answering questions, such as:
- How are people using the Internet?
- Which is harder on Internet infrastructure (from a performance and hardware/software cost standpoint) -- web or multimedia?
- Does most traffic remain local or travel across wide area? (80/20 rule?)
- How much traffic from non-U.S. sites traverses Internet infrastructure as a transit hub/backbone?
At least two different tools (Coral and CflowD) now let you extract AS information from raw packet traces and output from Cisco's Netflow. CflowD is a public tool developed originally by Daniel McRobb under the auspices of ANS Communications. McRobb, now with CAIDA, is continuing collaborations with ANS, Cisco's David Rowell (primary NetFlow programmer), and others in the industry, and will be releasing an enhanced version of the CflowD tool in late 1998. CAIDA researchers are using the XRT/PDS libraries from KL Group to develop 3-D plots of Coral and CflowD data.
2.5 Backbone Engineering: Routing - Internet service providers (ISPs) seek to maintain stable interactions among one another to ensure service predictability to users. Unfortunately, even several years after transition to a commercialized provided multi-backbone infrastructure, the overall routing system continues to exhibit instability, as indicated by frequent changes in routing advertisements.
Routing related questions under review by CAIDA researchers include:
- How unstable is Internet routing? (How often do packets change paths to the same destination?)
- How much do we really know about BGP dynamics?
- Has anyone demonstrated a correlation between BGP instability and reachability?
- How is traffic distributed among addresses of given sizes?
- How well is route aggregation really happening?
- What tools/methods are available to visualize AS connectivity?
- How can one depict/visualize the difference between two routing tables?
- How asymmetric is routing?
- Is there any way to visualize routing information in real-time?
- How do operators/engineers visualize traffic going through their network infrastructure clouds?
- How do network engineers use such information (or not) to determine/tune IGP metrics within their clouds?
Analysis of 'routing churn' (new route advertisements), per network based on Merit data from MAE-West is depicted in the graph below.
3-D AS Matrix Plot using Merit Data from MAE-West
Router vendors have a strong interest in prefix length of packets since their hardware must weight an average for the lookup rate in a full route table. Each individual segment of the address space has an average (efficiency) cost associated with this lookup process and will see different average traffic patterns due to the different utilization efficiencies.
The first two graphs show the distribution of packets and bytes seen in a trace relative to the length of the network mask of the network that either generated or received the data.
The green line in the first graph superimposes the number of routes for each prefix length in the composite routing table. The largest number of routes have a 24 bit mask (corresponding to the old class C networks), but the largest amount of traffic is for the routes with a 16 bit mask (corresponding to the old class B networks).
Distribution of Packets and Bytes from FIX-West Trace - 3/98
The next plot depicts distributions of packets as a function of prefix length of source and destination IP addresses in a 20-minute trace of IP packets from a wide-area exchange point (FIX-West).
Distribution of Packets and Bytes from FIX-West Trace - 3/98
The last graph then shows that the longer prefixes are net producers of net traffic compared to the shorter prefixes (indicated by the larger mean packet size and thus greater payload on average), suggesting the presence of more servers on long prefix networks than on short prefix ones.
Distribution of Packets and Bytes from FIX-West Trace - 3/98
2.6 Backbone Engineering: Advanced Topics - Some of the questions of concern to CAIDA in this area include:
- What does the MBONE (multicast) tunnel topology look like?
- Why do people always talk about multicast being much harder/less stable than unicast service?
- What is being done to address infrastructure growth/scalability problems?
- What research efforts can help address problems that ISPs (should) care about?
CAIDA researchers are working with Bill Fenner to visualize his and Dan Massey's multicast routing summary report (see: http://ganef.cs.ucla.edu/~masseyd/Route/) using the Manta tool.
Efforts are also underway, in collaboration with NLANR/Cache staff, to explore infrastructure operations issues associated with caching. This includes collaborations with Cisco, involving evaluation of their transparent cache, and deployment of a prototype cache infrastructure model in collaboration with MAE-West.
2.7 Equipment Design Constraints - Specifics relating to Internet packet size and the sequencing of similarly sized packets are fundamental to performance assumptions relating to Internet hardware. Questions being explored by CAIDA researchers include:
- How are packets changing, e.g., their size and sequencing?
- What about tinygrams? (i.e., 40 byte packets/flows) are they reproducing?
- How long between packets within a given flow? How many of those are arriving back-to-back?
- How long are the tinygram troops (i.e., chains of small packets?)
IV. OUTREACH
1.0 BGP Workshop
As an introduction to potential participants of the SD-NAP, CAIDA joined with the Packet Clearing House and SDSC to sponsor a BGP training seminar for local San Diego ISPs. The event included presentations by Paul Ferguson and David Meyer (CISCO) and Bill Woodcock (Zocalo). Details on the event and copies of the presentation materials are available at https://www.caida.org/Caida/bgp_980407.html.
2.0 Presentations & Papers
Outreach to the community is a key CAIDA goal. During its first year, CAIDA staff made presentations to the Internet Engineering Task Force (IETF) in December 1997 and August 1998; the North American Network Operators Group (NANOG) in October 1997, February 1998, and June 1998; the Cross Industry Working Team (XIWT) in October 1998, February 1998, and June 1998; the Corporation for Educational Network Initiatives in California (CENIC) in May 1998; SpaWar officials in June 1998; the Third International Caching Workshop in June 1998; the INET98 conference and the Coordinating Committee for Intercontinental Research Networking in July 1998; and DARPA and ISI personnel in July 1998. Presentations were made to visiting international delegations from China in January 1998; Singapore in September 1997 and July 1998; and Korea in July 1998. Corporate presentations included: Nortel in February 1998; Cisco in February and April 1998; 3COM in April 1998; Bellcore in May 1998; Hewlett Packard (UK) in June 1998. Many of these papers/presentations are available at https://www.caida.org/Presentations and https://www.caida.org/Papers.
3.0 ISMA
The first Internet Statistics and Metrics Analysis (ISMA) Workshops were held in 1996 and 1997 at SDSC under the auspices of NLANR. ISMA workshops are continuing under CAIDA's leadership with the next meeting scheduled for August 31-September 1, 1998. The workshop, entitled ISMA: Engineering Data and Analysis targets the requirements of backbone engineers. Approximately 40 engineers from major ISPs in the U.S., Europe, and Canada, along with representatives from vendors and research organizations will attend. Details of this meeting are available at https://www.caida.org/ISMA/isma9808/.
Additional focused ISMA workshops are planned for Passive Monitoring Techniques and Analysis (November 1998 in collaboration with NLANR/MOAT); Active Monitoring for Macro-Level Analyses (January 1999); and Visualization Tools for Internet Traffic Analysis (April 1999).
V. NEXT STEPS
1.0 Organizational/Administrative Priorities
CAIDA's priorities during its first year were on establishing its infrastructure and developing the technical means for collecting and analyzing traffic data. During its second year, CAIDA will focus on Opening Membership and addressing the requirements of these members. CAIDA also plans to form working groups on Internet Engineering and Caching two areas identified as critical by ISPs. Videos on CAIDA tools, notably Mapnet, Plankton and Otter, should be released in late 1998 and 1999.
CAIDA management will work with participants to develop the technical workplan for its second year. Key priorities are described below:
2.1 Measurement Hardware - Development of OC48mon prototype monitor is the focus of a task under the DARPA-funded NGI project. CAIDA staff will also work with NLANR and the community to continue enhancements to the Unix versions of OC3mon and OC12mon boxes and to test prototype DS3, OC3, and OC12 monitoring cards from Waikato University in New Zealand.
2.2 Measurement Software - CAIDA will continue enhancements to Skitter and to CflowD software. Mail lists will be formed to encourage collaborative development of these tools, as well as the ARTS library (this process is similar to the collaborative models associated with Squid's and Coral's development).
2.3 Data Acquisition - CAIDA deployed its first OC12mon at MAE-West in early August 1998. For the near term, most of the data analyzed by CAIDA staff will be from NLANR/MOAT collection sites (see http://moat.nlanr.net/Images/moatmap.jpg). Active measurement hosts are currently located at Ann Arbor, MI; Boulder, CO; Palo Alto, CA; San Diego, CA; Urbana-Champaign, IL. Additional boxes will be located in Europe, Asia, and several sites in the U.S. during 1998/99.
Security will be a foremost consideration throughout the data acquisition process. Measurement hosts are tightly secured and all transmissions are encrypted using SSH. Only authorized CAIDA researchers have access to data, with data ownership retained by the hosting network (see CAIDA's Proprietary Data Agreement at https://www.caida.org/Docs/proprietary.html).
2.4 Data Storage - In June 1998, a Request for Quotes was released to vendors of RAID Array products. Based on the results of the RFQ, the Digital solution was selected for testing. UCSD is currently in the process of purchasing a 345 GB disk array to use as an intermediary storage device for CAIDA traffic data. Long-term archiving of data will be accomplished using the SDSC High Performance Storage System (HPSS); the HPSS is a tape-based archive currently storing more that 130 TB of data.
ARTS - We will use the ARTS binary file format library to store the various forms of passive and active measurement data. Enhancements to this code will continue during the next year and, per CAIDA's agreement with ANS. Sublicensing of the code to third parties should begin by late 1998.
2.5 Viz Tools - Work will continue on enhancements to existing CAIDA visualization tools, with priority placed on Otter, the versatile tool capable of displaying an array of different forms of Internet topology and measurement data. The DARPA-funded NGI effort will include development of new techniques for analyzing and visualizing Skitter data.
2.6 Analyses/Correlation of Data - Development of the Libcoral library of scripts for analyzing packet traces will continue. Analysis of Skitter active measurement data will also be prioritized, with particular attention on insights relating to routing stability and topology changes. Data analysis requirements of CAIDA participants (ISPs and vendors) will also receive increasing priority during the coming year.
2.7 Education/Outreach - CAIDA outreach will continue through presentations to participants of IETF, IEPG, NANOG, CCIRN, XIWT, DARPA NGI, and other meetings and through preparation of peer-reviewed publications (workshops, conferences, journals). Quarterly ISMA workshops are anticipated, focused on specific technical topics associated with Internet measurement, analysis, and visualization. The Internet Engineering Curriculum Repository will also host two training workshops during 1999, and Dr. Evi Nemeth will conduct a prototype distance education course jointly at UCSD and the University of Colorado, Boulder in the Fall 1998.
2.8 Security Applications for Coral Tools - The Pacific Institute for Computer Security (PICS) at UCSD will collaborate with CAIDA to develop the security applications for high performance monitors (OC3 and OC12). Funding for this initiative is through the DARPA NGI effort.
2.9 Caching - Collaborations with NLANR/Cache and with cache vendors, ISPs, and Internet eXchange providers will expand during the coming year. CAIDA/NLANR personnel are initiating testing and analysis of a prototype of Cisco's transparent cache. An equipment grant from Digital/Compaq for a state-of-the-art digital cache server is also being processed with receipt anticipated by October 1998. Collaborations with Internet eXchange providers and ISPs on cache deployment/architectures are also planned and the creation of a CAIDA Cache Working Group to support these efforts is under discussion.
2.10 4th International Web Caching Conference - NLANR/Cache and CAIDA propose to collaborate to host the fourth in a series of international meetings on caching technology. The first workshop was held in Warsaw, Poland in September 1996. NLANR/Cache hosted the second workshop in Boulder, CO in June 1997 (see http://ircache.nlanr.net/Cache/Workshop97/) and participated on the program committee for the third workshop, held in Manchester, England in June 1998 (see http://wwwcache.ja.net/events/workshop/).
NLANR and CAIDA propose that the fourth event in this series be a formal conference with juried paper submissions and a published conference proceedings. Approximately 200 participants are envisioned. Duane Wessels, Co-PI for NLANR/Cache, will have the lead on this effort.