In a network service provider environment, there are a number of
considerations when configuring flow-switching and flow-export.
Focusing only on those relevant to data collection using
cflowd, these are the critical issues:
Data duplication has more than just performance and disk space penalties; if the possiblity exists that you've recorded the same traffic more than once in two different places, you often can't aggregate the traffic data from those two different places in a meaningful way without resorting to very granular duplication detection.
A typical network service provider wants source to destination traffic information at the AS and network prefix level (the AS matrix and the net matrix). To obtain this information, they must use version 5 flow-export or use version 8 flow-export and configure the prefix aggregation cache on the routers.
A provider may optionally want the port matrix and the protocol table. The necessary information is available in version 1, version 5 and version 8 flow-export. In the case of version 8, you need to configure the protocol/port aggregation cache on the router(s). In all cases, a provider will want the information per input interface (not aggregated across interfaces on a router) where available.
NetFlow data is input-based. Flows are instantiated as traffic enters the router, not as it exits. When you configure flow-switching for an interface on a Cisco router, flow data will be recorded for packets received from the network by the interface (and not for packets received from other interfaces on the router). In other words, flow data is recorded only in the receive direction per flow-switching interface.
Recording traffic data when the traffic first enters your network is critical to security related activities (attack backtracking). It's also important for determining the offered load to your network; it increases your data integrity (you get data on incoming traffic, possibly above and beyond what your transit facilities will handle), and it provides greater topological information (the source of your data is closer to the source of the traffic for which the data was collected).
Another reason to collect data as it enters your network: in IOS images using the prefix cache (not running any form of CEF), source netmask and source AS lookups frequently return a cache miss and IOS will not resort to a routing table lookup. The result is frequent zero values in the source AS and source netmask length fields in version 5 flow-export. The prefix cache is populated by destinations, not sources; if traffic to the source network is not seen frequently by the router, a prefix cache entry will not exist for the source network. Hence, asymmetric paths aggravate the zero value problem.
Taking these constraints into account, an optimal configuration for a network service provider usually looks something like Figure 3. To avoid recording data more than once, enable flow-switching only on interfaces at the edge of your network (external interfaces). Since you'll be collecting data as it enters the network, you'll also meet the other constraints.