Internet Application Use
cflowd is no longer supported by CAIDA. Instead, please consider the use of flow-tools, which will provide a toolset for working with NetFlow data. flow-tools can also be used (like cflowd) in conjunction with FlowScan, maintained by Dave Plonka at the University of Wisconsin, Madison.
Analysis using Cflowd on a connection between a university and commerce link.
A Preview of analysis performed on cflowd data collected on an outgoing connection between a large research university and a commerce link.
Characteristics of traffic traversing a router between a university and its commercial Internet provider General features:
This stacked bar chart shows the proportion of various Internet applications and the traffic they generate, as identified by the port numbers they use. These numbers have been accumulated from both "official" (e.g. IANA, developer specifications) and unofficial sources (e.g. students that happen to know popular gaming ports). This particular graph shows data taken from a typical 24 hour period near the end of an academic quarter, where time is given in Universal Time (GMT). We chose this interval to illustrate the flavor of traffic during and between sessions. Unfortunately, the present cflowd aggregation method does not permit independent accumulation of protocol and port information. This graph aggregates over all protocols, including TCP and UDP.
Specific graph features:
|
The traffic observed through this outgoing link displays the typical diurnal cycle observed in network traffic. Minimum traffic levels occur in the early morning hours. |
|
Maximum traffic levels occur during peak business hours. However, the appearance of the typical diurnal cycle is somewhat misleading, since it reflects traffic leaving the university network rather than that generated by users within the university. This observation suggests two plausible hypotheses: either that a sizable amount of traffic generated from this site is "local" (at least, local to the time zone.); or that variations in traffic through this link should be less pronounced than on links that service local users. Further investigation would be necessary to verify the hypotheis. |
|
As is true generally on the present Internet, web-related traffic (using HTTP and associated protocols) comprises the majority of the workload. Again, the fact that this is an outgoing link is important to the analysis. Bulk web traffic leaving the university can only come from outside users accessing web servers housed on campus. The pleasant implication for the campus is that it creates content of interest to the Internet, and that the web plays a significant role for information access regarding this campus. |
|
Historically, network news (NNTP) has accounted for a large amount of traffic at universities. This graph does not bear this out, but that is not particularly surprising given that this is once more an outgoing link. Only students, faculty, and staff connecting from off-campus and reading news from on-campus NNTP servers would impose NNTP traffic on this link. Since this campus provides extensive dorm facilities, there is not as large a student population off-campus as might be found at a commuter school. In addition, this campus provides local networking to dorms, while students off-campus have more limited access to high-speed Internet connections (although both ADSL and Cable Modems are now available in this area.) |
|
Perhaps unexpected is the amount of traffic that is not accounted for in the main categories. Some possible candidates for new sources of traffic include: IRC (Internet Relay Chat), RealAudio players, and Hotline collaborative environment. As it turns out, none of these sources amount to more than at most a few percent of traffic. The low amount of RealAudio traffic outbound from the university is not terribly surprising (presumably students are not running many of their own radio stations) The Hotline traffic is more of a surprise, observed exceeding 5% of traffic during off-peak hours. We have learned that the Hotline architecture has become a favorite system for students seeking to put up servers for video clips, a hypothesis that would explain why this service generates as much traffic as it does. |
|
Even with the three new applications introduced in item #5 above, there is still a great deal of traffic unaccounted for. The top five applications account for over 86% of the total traffic. Yet, the next 15 top applications account for only 10% percent of the remaining traffic. With a cut off 0.10% of traffic, which leaves about 5% of traffic still unaccounted for. Preliminary analysis suggests that this remaining traffic consists of a large number of very small transmissions. |
|
Also unresolved are unexpected late evening peaks in traffic. Since these peaks appear in the known applications as well as in the unresolved category, it is clear that these peaks are not caused by one application. Yet, they do not correspond to any obvious user-driven activity. Further investigation is ongoing. |