Computing the Traffic Demands
Operational data
- Large, diverse, and collected at multiple places at different times
- Traffic: Netflow clock (un)synchronization and lost records
- Routing: route instability and upgraded access link
Notes:
In practice, there are several other challenges, associated with collecting, cleaning big data in a big IP network.
- The collection was in some cases not supported by production systems ñ particularly true for NetFlow.
- Fluctuations in information over the interval in which it is collected. Intrinsic fluctuations in reachability occur (as the paper by Labovitz et al. illustrates). Second, the network itself changes; e.g., new interfaces are added, causing various measurement glitches.
- On inspection of the data where we saw anomalous results ñ we were able to identify and in some cases correct for these problems. This buttressed our confidence in the methodology. See the paper!
The guy with the hard hat is a composite view of the coauthors, in their data mining gear. Just kiddingÖ..