cflowd - Frequently Asked Questions
cflowd is no longer supported by CAIDA. Instead, please consider the use of flow-tools, which will provide a toolset for working with NetFlow data. flow-tools can also be used (like cflowd) in conjunction with FlowScan, maintained by Dave Plonka at the University of Wisconsin, Madison.
Q: What is a flow?
A: A flow is a uni-directional traffic stream with a unique [source-IP-address, source-port, destination-IP-address, destination-port, IP-protocol, TOS, input interface ID] tuple. When a router receives a packet for which it currently does not have a flow entry, a flow structure is initialized to maintain state information regarding that flow, i.e. number of bytes exchanged, IP addresses, port numbers, AS numbers, etc. Each subsequent packet matching the same parameters of the flow will contribute to the byte count and packet count of the flow structure until the flow is terminated and exported.
Q: What is NetFlow?
A: NetFlow was initially developed by Cisco in its Quality of Service (QoS) program. It is a switching method that allows more efficient switching of packets according to the type of packet.
Q: How are flows terminated in netflow?
According to a document at http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm, Cisco's netflow architecture will terminate and export a flow for one of four reasons: (1) expiration of an `inactive' timeout (no packets seen for the flow for N seconds, configured via ip flow-cache inactive-timeout on a Cisco); (2) expiration of an `active' timeout (flow terminated by time duration regardless of whether or not there are packets coming in for the flow, configurable via ip flow-cache active-timeout on a Cisco, 30 minutes by default); (3) upon seeing a FIN or RST; and (4) by a number of (undocumented?) heuristics that apply to aggressively age flows as the cache becomes too full.
Q: How does netflow interact with ACL configs?
Access control lists (ACLs) on Cisco routers enable packet filtering based on source and destination addresses, protocols, and specific interfaces.
Normally, every packet is matched against an access list to determine acceptability. With netflow, only the first packet of a flow needs matched; if the first packet passes the filters, an entry is added to the NetFlow cache. Subsequent packets in the same flow are then switched based on this cache entry, without needing to be matched against the complete set of access lists. Specific performance will vary based on the number and complexity of the access lists.
Q: How many flows are exported at once in netflow?
A: Expired flows are bundled into UDP export datagrams of up to 30 flow records in V5 (25 records for V1, 28 for V7 Catalyst netflow), at least once per second. To configure export, specify the IP address and application port number of the device configured to receive the exported flow data.
Q: What if i want bi-directional flows?
A: netflow captures only uni-directional flow statistics. For users wanting to summarize flow statistics that represent two-way traffic, netflow provides "canned" SQL procedures which the user can leverage without change, or choose to modify/enhance them for different uses. The SQL procedures allow the user to summarize the byte, packet, flow counts and active flow times (when present) for flows between IP address pairs. The user can then use the SQL procedure to make requests such as:
SQL procedures are provided for each FlowCollector aggregation scheme, which contains IP address pairs in the aggregation scheme key. The SQL procedures can be applied to As-Is Data, or data produced by any time-based consolidation using either SRC or IRC schemes.
Q: What is cflowd?
A: cflowd was developed to collect and analyze the information available from NetFlow flow-export. It allows the user to store the information and enables several views of the data. It produces port matrices, AS matrices, network matrices and pure flow structures. The amount of data stored depends on the configuration of cflowd and varies from a few hundred Kbytes to hundreds of Mbytes in one day per router.
Q: What is an AS matrix? How can I find out about its usage in flow data?
A: An AS matrix contains packet and byte counters for traffic from either source-destination autonomous systems (ASes), or last-peer to next-peer ASes.
Q: Does cflowd aggregate all the flows 'through' an AS (e.g. a transit AS) into the traffic numbers for that AS?
A: It can if you configure your exporting Cisco with:
"ip flow-export version 5 peer-as"
Whether or not the Cisco reports the "peer-as" or the "origin-as" in the flows that cflowd collects depends on how you've configured the exporting Cisco(s). Here's what the IOS says:
cisco(config)# ip flow-export version 5 ?
origin-as record origin AS
peer-as record peer AS
Using peer-as, popular for peering engineering, also keeps the AS-matrix smaller.
Q: Can netflow seriously help me with peering engineering?
A: uh, you could say that. ISPs are utilizing NetFlow's source and destination autonomous system (AS#) number recording capabilities both for network planning purposes (who to peer with and bandwidth requirements) as well as accounting purposes. In particular, the AS# can be used to determine if packets are bound for a local, domestic, regional or distant international destination and both distance and volume based usage charged accordingly to offset scarce network resource utilization.
Q: What is a net matrix?
A: A net matrix contains packet and byte counters for traffic from source networks to destination networks. A network is identified by the network address and netmask (CIDR), i.e., an address prefix.
Q: What is a port matrix?
A: A port matrix contains packet and byte counters for traffic from source ports to destination ports. Obviously, this data is only applicable to UDP and TCP traffic.
Q: What is ARTS?
A: ARTS is a binary file format specification for storing network data. Initially developed at ANS by David Bolen in 1992, ARTS was licensed to CAIDA in March of 1998. CAIDA has developed a C++ class library for ARTS, independent from the software licensed from ANS. The C++ class library is used by the packages cflowd and skitter. In addition to the class library, CAIDA distributes some simple applications for viewing and manipulating ARTS data. The entire package is called arts++.
Q: How do netflow/cflowd compare with RMON?
A: NetFlow is not intended as a replacement for SNMP and/or RMON/RMON2 capabilities. Instead, Cisco suggests that SNMP, RMON and NetFlow data collection capabilities be combined to maximize network monitoring, management, and planning. In particular, Cisco recommends:
- utilize carefully-designed SNMP polling policies to gather key statistics on backbone/core routers and MIB objects not related to flow-by-flow measurements, including interface errors and memory and CPU utilization statistics required for real time monitoring
- utilize RMON capabilities for detailed drill-down including application tracking, interface error analysis and packet capture for problem diagnosis and resolution and for threshold driven network management via the events and alarms capabilities.
Q: What are popular ways to read and/or post-process NetFlow data?
A: Here are a few:
- Use CAIDA's cflowd, and its accompanying flowdump util
- Use cflowd, and Dave Plonka's Cflow.pm perl module:
http://net.doit.wisc.edu/~plonka/Cflow/ which comes with a sample perl script called "flowdumper" which mimics flowdump, but shows you how to use the perl API to access the NetFlow flow fields.
- Use the Ohio State University "flow-tools" package by Mark Fullmer, etc.
(You can find that:
Q:Is a flow produced when an ACL hit occurs? Or is it only once it passes through the ACL?
A: In my experience, if a flow of traffic is black-holed by an ACL you'll still get a flow, but the output ifIndex will show as zero. My patch to cflowd makes use of this "feature" to filter these and therfore reduce the size of raw flow files by allowing you to filter out flows with zero as the output if like this:
$ cflowd -O 0
Note however, that multicast flows always show the output if as zero, so I have added a special option to retain those:
$ cflowd -O 0 -m
The patch and these options are explained at these URLs:
Q:If a flow is produced when an ACL hit occurs, is there any indicator in the data of this (for example, 0.0.0.0 appearing in the next hop field)?
A: I think an $output_if of zero is the best indicator that a flow was black-holed.
I have found the nexthop value to be more a function of the routing method configured on the router than whether or not the flow's traffic was forwarded or not.
For instance, with the full BGP routing table, I get valid/correct nexthop values. However, FlowScan users that simply configure a default route to an ip unnumbered interface have reported that the destination IP is repeated in the nexthop field. (FlowScan's CampusIO report has a special option to deal with this scenario since normally FlowScan likes to determine if a flow is outbound based on its nexthop.)
Other FlowScan users have occasionlly reported seeing zeroes regularly as the nexthop values, but I never got full details on their Cisco config. Right now, FlowScan just plain ignores all flows with nexthop of zero unless you have specified an array of output ifIndexes to use instead to ID outbound traffic.
IIRC, I've also seen zeroes in the nexthop field when route flaps occur. Personally, I don't trust the NetFlow nexthop field to a high degree, esp. not as a reliable indicator of whether or not the traffic was forwarded. IMO, it's simply a function of the destination IP and whether or not the router successfully did a routing table lookup for the given flow.
If you're interested in FlowScan, check the aforementioned URLs and consider joining the mailing list mentioned at http://net.doit.wisc.edu/~plonka/FlowScan/ and archived at http://pages.cs.wisc.edu/~plonka/lisa/FlowScan/.
Q: How fast does the dedicated network between the router and the PC running cflowd need to be?
A: Your first concern should be whether the router can handle the load fo running netflow itself. Unfortunately, trying it is the best way to find out, since everyone's situation is different.
Eveyone's situation is different for the flow-export traffic load too. A general rule of thumb is to expect 30 (Ethernet MTU) packets per second per DS-3 or FDDI when using v5 flow-export. Expect more for an OC-3 (say 50 packets per second). A 10Mb/sec Ethernet should be fine in most situations unless you wind up putting a bunch of other stuff on it and it's not switched.
As for the machine running cflowd, it again depends on the load. CAIDA runs cflowd on a Pentium II 233MHz machines with 128Mb of RAM running FreeBSD. We haven't had ethernet card issues, using old 3Com 509B(ISA), newer 3Com card (9xx), cards based on the DEC DC21x4x chipset (the DE500 card in particular), and Intel EtherExpress 10/100 cards (normally our preferred card).
I don't know that you'll see any performance differences in FreeBSD vs Linux for cflowd. Probably not. The primarily development platform for cflowd has thus far been FreeBSD, so more testing occurs for that platform than any other. We support other platforms as best we can.
Q: Upon startup, cfdcollect creates a file per router of named "arts.YYYYMMDD". I've also noticed that sending cfdcollect a SIGHUP creates a new file if I rename the old one. Do I need to write a file rotation script that will run daily?
A: You don't need a rotation script; cfdcollect is creating "arts.YYYYMMDD" files correctly according to cfdcollect's notion of time in GMT. If you want to adjust cflowd to begin and end arts files at the local midnight instead of GMT midnight, edit CflowdServer.cc to use Localtime() instead of gmtime().
No SIGHUP is needed to create a new file. What you probably observed was that sending cfdcollect a SIGHUP causes it to reload the configuration and start a collection round immediately, hence creating new files as necessary. Normally, cfdcollect only tries to write data in intervals roughly equal to minPollInterval and it won't create a new file until it grabs new data from an instance of cflowd.
Q: How is the value for k_maxFlowPacketSize determined? What impact does it have on cflowdmux performance?
A: k_maxFlowPacketSize is the buffer size for reading incoming packets. It is set to 2048 because for any of the supported versions of flow-export, incoming packets are always smaller than 2048 bytes. This parameter should have no effect on cflowdmux performance. There is another parameter, k_maxPacketReads, that is used to set how many packets we try to read from a socket once we've determined it has data waiting (using select()).
Q: On Solaris 2.x, why does cflowdmux report a shmget() error at startup (and then fail to work)?
A: Solaris' default shared memory segment size maximum is very low (1MB). cflowdmux wants a shared memory segment for packet buffers that's just over 1MB, which is beyond the default maximum limit imposed by the operating system. To permit a larger shared memory packet buffer, you should add this to /etc/system:
set shmsys:shminfo_shmmax = 4194304
and reboot the host. The setting above is 4MB; a higher setting is acceptable as well (no resources are allocated by this change, since shared memory is allocated on demand).
Note that the packet buffer size may be configured using the PKTBUFSIZE setting in the OPTIONS stanza in cflowd.conf.
Q: Are there issues with running cflowd/cfdcollect atop NFS?
A: Running cflowd or cfdcollect atop NFS is generally a very bad idea. NFS doesn't support disconnected operation, so a network problem can cause you to lose data. (In the worst case, the data that is lost may be just what you need to point you to the cause of the NFS problem...). If NFS is in the picture, run cfdcollect on the NFS server and cflowd on the client. Also note that cflowd uses mmap() on the active raw flow file; doing this over NFS tends to either 1) not work, or 2) interact poorly.
Q: What visualization applications are available?
A: Visualization front-ends will be supported by Caimis to display cflowd data, based on the ARTS data format. For example, the program "FlowScan" produces graph images that provide a continuous, near real-time view of the network border traffic. FlowScan binds together (1) a flow collection engine (a patched version of cflowd), (2) a high performance database (Round Robin Database - RRD) and (3) a visualization tool (RRDtool). It examines flow data and maintains counters reflecting what was found. Counter values are stored using RRDtool, a database system for time-series data. Finally, FlowScan uses visualization capabilities of both RRDtool and other front-ends to report on the processed flow data.
The program "xartsprotos" will display a stacked bar chart of traffic (in bits/sec) per IP protocol versus time. It uses ARTS files as input, and permits the user to cycle through all of the datasets present in the input files (typically, there would be a dataset per interface per router). You may also initiate time-domain aggregation from within xartsprotos, and you can also print a plot to a file (in PostScript).
Q: What if I want to aggregate with a different time granularity, e.g. daily instead of 5 min.?
A: You aggregate using one or more of the aggregation utilities (artsasagg, artsnetagg, artsprotoagg and artsportmagg). In the typical case, you might have a cron job that aggregates for you on a regular basis (for example, once a night you could create hourly and daily aggregates). This is how you get macroscopic views of traffic data for trend analysis.
Q: Is there any way I could choose to group routers, e.g., all the routers associated with a region/network/customer/ etc?
A: It's possible to do this with aggregation, but as yet this hasn't been implemented. It's actually pretty simple to add. The current aggregation utilities key by router IP address and interface index (ifIndex) when aggregating; if you set the IP address and the interface index fields to a single phony value for all of the ARTS data as it's read, the aggregate produced will be for all data in the input. The limitation here is that you have to think about data duplication whenever you do inter-router aggregation. If this is done at the data collection level, you never have duplicate data (for example, the best thing to do is to only run flow-switching and flow-export on the border interfaces in your network). If you don't do it this way, inter-router aggregation can be difficult (a lot of traffic may be counted twice, but not all of it, which can lead to meaningless aggregates).
Q: How can I format arts data so that the KLgroup's Java bean "JClass Chart" or class JCChartApplet can be used to graphically display Netflow data?
A: If you own a copy of JClass JChart, check out the man pages for jcdstasbar, jcprotobar, and jcsrcasbar. These utilities will output an HTML-format page containing the necessary data to have JClass Chart and friends transform it into pretty graphs.
Don't forget: decompress the JClass Chart file into the web tree so that the bits and pieces get included by the browser when fetching the main page. (From "William F. Maton" email@example.com)
Q: Why do I see different packet and byte counts for the same ports depending on how I select the ports with artsportmagg?
A:The selection algorithm for a selected port table object does not retain the source to destination port information. Instead, it lumps traffic into the lower of the source or destination port for ports in portlist, and lumps those that aren't in portlist into the input counters for port 0.
This means that if you use "artsportmagg -s 1-10000" and there is a src port = 6699 and a dst port = 2089, then the pkts/bytes will be added to 2089 and not to 6699.
On the other hand, if you use "artsportmagg -s 80,20,6699,6688" and there is a src port = 6699 and a dst port = 2089, then the pkts/bytes will be added to 6699 and not to 0.
Q: How do i deal with the large amount of netflow data i might want to archive?
A: netflow allows users to fine-tune data retention to their specific needs, e.g., allowing a maximum age for each aggregation scheme, and allowing the user to specify:
- number of days as-is data is held
- number of days/months/quarters/years for which daily consolidated data is to be held
Q: With "ip flow-export version 5 origin-as" specified on my router, I'm getting a lot of src AS 0, inconsistent with what I know the traffic to be. What's happening?
A: Are you running CEF on the router? It usually helps a lot. The problem here is that if you're using the prefix cache (in other words, not running CEF), flow-export packets from the Cisco will contain AS 0 in many of the flows, expecially in the source AS field. The problem here is that flow-export will look in the prefix-cache; if it fails to find a matching prefix cache entry, it sets the corresponding AS field to 0. It will not look in the routing table because it was deemed too expensive. The prefix cache is of course populated by destinations (as that's how forward works), hence cache misses for a source network lookup can be very frequent in many situations. The typical cause is unidirectional traffic, which is often caused by asymmetric routes. The best way to resolve this problem is to run CEF. (There are also other reasons to use CEF; poke around the Cisco Express Forwarding Overview.)
Q: In our data I noticed a large amount of records with nexthop 0.0.0.0 . Is there some particular case in NetFlow export v5 (like AS 0) that is mapped to 0.0.0.0? What does it mean?
A: A nexthop of 0.0.0.0 means that traffic is going to the bit bucket instead of being forwarded. If it's unicast traffic, it means there was no route. However, multicast may also cause large quantities of traffic to map to 0.0.0.0; (as of 1998, not sure if it's fixed) there was a bug in some IOS FDDI (and possibly other) drivers that caused all incoming multicast to be put in the flow cache, even though there are no group members on any other interfaces (and hence the traffic isn't actually being forwarded). This effectively makes NetFlow a multicast sniffer for FDDI (and maybe other media, but I've only seen it personally on FDDI).
If you are having packets in your Cisco router with source address 0.0.0.0 it might mean that someone is attacking you. Watch out for problems resulting when someone is sending packets with src address 0.0.0.0 to any destination, and you have console logging enabled. This error is always reported to the console.
While there may or may not be malicious intent, in 99.9% of the cases, it's someone sending packets with a bogus source address in order to perform a DoS attack on a host that is not your router. The destination address usually indicates the targeted host.
If the destination address is 0.0.0.0, it's usually just bogus (and relatively harmless since they just go to the bit bucket). You shouldn't see these for more than one hop in a default-free environment.
If you see an IP nexthop of 0.0.0.0, it means there was no route to the destination. Lots of these are legitimate, but many are not (e.g., the SYN ACKs from a SYN flood DoS attack using spoofed sources). The first non-defaulting router will drop these (and the IP nexthop field in the flow record will be 0.0.0.0.)
Q: Why are there so many entries with source and destination AS 0?
A: A value of 0 in an AS field is (unfortunately) overloaded in v5 flow-export; it can mean a failed AS lookup or it can mean that the traffic is from/to the router's local AS. However, the situation is not really horrible unless:
- You're not running Cisco Express Forwarding (CEF), and traffic is asymmetric. This will cause many source AS (and netmask) lookups (which are only done via the prefix cache, not the routing table) to fail. Solution: Use CEF.
- The traffic is from the local AS (the one the router sits in), but you haven't configured LOCALAS in the CISCOEXPORTER stanza for the router. Solution: set LOCALAS to the local AS of the router in the CISCOEXPORTER stanza in cflowd.conf.
- Traffic is destined for the router, but output ifindex is zero.
- Flows are unroutable, in which case nexthop, output ifindex, and destination mask should be zero.
- A flow doesn't have an entry in the route-cache for either the source or the destination. If the source AS is zero, examine the source prefix mask. If the source prefix mask is zero, then it indicates the absence of the route-cache. This is typically caused by asymmetric traffic. Solution: Use CEF switching instead of demand caching.
Q: The following data was taken from our NAP router at the interface directly connected to the NAP. We are AS237. We have a peering relationship with AS5696. If NetFlow is all inbound, how is it possible to get this data? If the source is within my AS and it is destined for a peer, why is traffic coming IN this interface?
How can this be correct:
Src AS Dst AS Pkts Pkts/sec Bytes Bits/sec
237 5696 35759 0.442064 16729764 1654.55
A: This is possibly due to the fudging that cflowd does with AS 0, and may indicate someone defaulting at you at the NAP. Any traffic from the NAP should not head toward you for AS5696, unless you're announcing their routes to others at the NAP. (I assume you aren't.) The AS 0 fudge would only come into play if you've got LOCALAS set in yor config file. What happens here is that if cflowd gets a flow where the netmask looks valid but AS is 0, it asumes that the traffic is for the local AS and will plug in whatever was specified as LOCALAS (replacing the 0). I suggest that you log in to the cflowd host and watch for the raw flows causing this entry:
flowwatch '(srcas == 0 || srcas == 237) && datas == 5696'
Q: When multiple Cisco routers are configured to send NetFlow data to one host running cflowdmux, is one IP address specified for that host, or can multiple IP addresses be specified for it? What is the most common configuration?
A: If the cflowd host is multihomed (or just has multiple IP addresses on a single interface), you could use more than one of the IP addresses. However, it's of no benefit to anything in cflowd since any reasons for doing this are outside the scope of cflowd.
Q: In the CISCOEXPORTER stanze, multiple IP addresses can be defined for a Cisco router. What is the reasoning behind this functionality? How common is it for a router to originate data from more than one interface in a typical network configuration?
A: That depends on your version of IOS and your router configuration. In some versions of IOS, if you don't specify the source address to be used by flow-export, you'll get packets with source addresses of one of the interfaces on the box; for other versions of IOS you'll get packets with a bogus (0.0.0.0, for example) source address if you don't specify the source address for flow-export in your router config. I forget Cisco's exact logic, but suffice it to say you're best off configuring the source address for flow-export in your Cisco config and configuring multiple IP addresses in cflowd.conf to be sure.
Q: In the COLLECTOR stanza, multiple IP addresses can be defined for a host running cfdcollect. What is the reasoning behind this functionality? A backup host is discussed in the configuration guide - is this the only reason?
A: No. A backup cfdcollect host would typically be configured via an additional COLLECTOR stanze. Multiple IP addresses in a single COLLECTOR stanza permits a multihomed host to connect from more than one of its IP addresses.
Q: What about sending multiple copies of the flow export streams from the Cisco to independent collector hosts?
A: In July 2000, Cisco released a Netflow Multiple Export Destinations feature, in response to customers wanting improvement in the chance of receiving complete flow export data by providing redundant streams of data. See https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/fnetflow/configuration/xe-3se/fnf-xe-3e-book/fnf-export-ipv4.html#GUID-C9DD6529-6C9F-4BD4-A775-14ADEE2C8BA3 for details.
Q: Why don't source and destination IP addresses that are present in the raw flows get passed on by cfdcollect to the arts++ files?
A: I assume that you mean host IP addresses, in which case that's correct. Host-level granularity is not carried through the system (doesn't go to cfdcollect, doesn't get written to the arts++ file). In other words, there is no 'host matrix' data type. Keeping data at such a granularity is very resource-intensive if it's flooded through the whole system. It is also often just not useful for what cflowd was designed to do: collect data for capacity planning and engineering.
The net matrix is as granular as we get at this point, and is usually more than granular enough for any capacity planning activities. It contains network/mask ==> network/mask traffic data. You need to use flow-export version 5 on your router, and enable 'netmatrix' collection in cflowd.conf. The cflowd system will store the net matrix information.
Note that it's somewhat common to see /32 networks (host addresses) in the net matrix, but usually not enough to cause it to explode into a host matrix. Multicast destinations are always /32. Directly-connected destinations are often marked as /32. Most other destinations will have a /24 or shorter netmask.
See the manpages for artsnets and artsnetagg.
Note that there is low-level flow information in the raw flow files. If you wanted to keep that data indefinitely, you could set the number of raw flow files (NUMFLOWFILES in cflowd.conf) to something very large (say 1000) and use a flow file size (FLOWFILELEN in cflowd.conf) of 2000000 or so, assuming you have the disk space. Or course, you still have to get it off the cflowd host (or at least out of the way of cflowd) before cflowd re-uses the files (this is perhaps the hardest part if you've got a lot of data), but you could just write a simple perl script to do it.
This isn't a trivial task; in many cases you'll have a lot of raw flow data. Busy routers will generate approximately 30-50 packets per second of v1 or v5 flow-export per interface. Each packet contains 20-30 flows. In a large network, the quantity of raw flow data adds up fast. cflowd is designed to reduce that data to something you can pull back to a centralized data warehouse from your whole network, yet still use for capacity planning. Anything that needs raw flow information today has to get it from the raw flow files on the cflowd host, or, it must hook into cflowdmux in the same way that cflowd does.
According to http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm, Cisco's netflow architecture will expire and export a flow for one of four reasons:
- Expiration of an `inactive' timeout (no packets seen for the flow for N seconds, configured via ip flow-cache inactive-timeout on a Cisco)
- Expiration of an `active' timeout (flow terminated by time duration regardless of whether or not there are packets coming in for the flow, configurable via ip flow-cache active-timeout on a Cisco, 30 minutes by default)
- Upon seeing a FIN or RST, and
- By a number of (undocumented?) heuristics that apply to aggressively age flows as the cache becomes too full.
Q: I set ip flow-cache active-timeout N
down to 1 minute and still see flows reported that last an hour.
There's a possibility that we're seeing fewer of them than before
setting N so short, but they still most definitely show up. Assuming an
active timeout of M minutes, are the following situations true:
A:That's essentially my (dwm @ caimis.com) understanding of how active timeout is 'supposed' to work, but note that this isn't duplicate data, since an export of data implies removal from the flow cache and in this case an immediate new cache entry.The long flows mentioned above apparently indicate otherwise (and only Cisco can answer to that, unfortunately).
Q:We notice significant peaks (sometimes 3 times more than the nominal speed of the subinterface) in our traffic volume, where there are no peaks in SNMP data reported from the same (serial) interface. What's going on?
A: Probably the timeout considerations: netflow traffic volume will not be reported until the flow is expired. You probably have a significant number of long flows. Also if you have flows that span multiple 5-minute intervals, but only count them in the last interval, this could generate such incongruity.
Q: Have there been any other tests to calibrate the accuracy of Cisco's netflow statistics?
A: This is a hard one since the flow definition aspects are still not clear to us. A few have done rough comparisons, e.g., See the study done at UC Santa Cruz by Mark Boolootian (firstname.lastname@example.org) and Jim Warner but this is mostly to make sure the netflow stats weren't way off. We are unaware of any methodological testing or comparison.
Q:High-end Cisco routers support router-based netflow aggregation to reduce export data volume and improve scalability. Does cflowd support all these aggregations?
A: More detail from Cisco available at http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm. (as of Aug 2000).
The five Cisco aggregation schemes allow summarization of data on the router before export. In fact, Ciscos will support multiple netflow `aggregation caches'. The normal flow ager process runs on each active aggregation cache the same way it runs on the main cache. On-demand aging is also supported.
NetFlow v8 aggregation is available on all Cisco netflow-supported platforms beginning in 12.0(3)T and 12.0(3)S. Default secondary NetFlow v8 aggregation cache size is 4096 entries on all platforms that support Cisco IOS NetFlow. Note that traditional v1/v5 aggregation netflow services may be enabled simultaneously, but only the router-based aggregation feature uses v8 export format.
Q: What are the differences in cisco's v1/v5/v8/v7 flow.export format?
- V1: IPs, nexthop, input i/f index, output i/f index, pkts, octets, time.start, time.finish, src.port, dst.port, protocol, TOS, or (tcp.flags)
- V5: all of V1 stuff plus: (1) dst (peer | final.dest) AS and src (pre-peer | origin) AS ; (2) src/dst route.masks
- V7: all of V5 plus: (1) 'flags' field, that indicates
what kind of flow'
note this has to do with catalyst.switch munging of flow.definition.
ipaddrtype router_sc; /* Router which is shortcut by switch */
- V8: V8 is trying to MINIMIZE output.size. So there are several different v8 aggregated formats, all trying to minimize output.flow overhead. See http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm. Formats include: asmatrix, protocolPortMatrix, srcPrexfixMatrix, dstPrefixMatrix, totalMatrix
type UDP datagram: records/datagram max udp pktsize (inc. v8 header, in bytes) ASMatrix 51 1456 ProtocolPortMatrix 51 1456 SourcePrefixMatrix 44 1436 DestPrefixMatrix 44 1436 PrefixMatrix 35 1428
Q: Is it a problem to have all three of our routers sending stats to the same port on the cflowd machine?
A: It depends. Flow export traffic tends to be bursty, so overrunning the socket buffer is always possible. If you see dropped flow messages in the syslogs, you might be seeing socket buffer overflows. You can usually check this condition if it's persistent by looking at the Recv-Q length in "netstat -an" output; if it looks high all the time (for instance, pegged at the maximum), you should try using more than one port. This will get you the additional room per router in the socket buffers that you need.
Q: We are encountering some problems porting legacy scripts that provide 'outgoing' numbers to cflowd. I realize that NetFlow is input-interface-specific, but management wants both incoming and outgoing aggregated numbers. While our legacy script uses separate threads for incoming and outgoing traffic, and filters on the dst interface to allow only flows that terminate on the 'external' interface (outgoing), or filters on the same interface as the src (incoming). It seems that cflowd has no similar capability to filter on src/dst interface. It also looks like the arts++ file formats do not save the dst interface, except for the interface object. The workaround I've been using is to add up the numbers from all the interfaces in the output generated by 'artsases', except for the edge interface going out of our network. This is obviously a horrible kludge and has numberous problems. Has anybody else tried to get aggregated 'outgoing' numbers? (Note that each outgoing interface has its own BGP peering session, such that you can't tell, from an AS Matrix object, which one is the destination interface. You can with Interface Matrix objects, but then, how do I correlate the two? (I need SRC/DST AS data in both directions for a given external peering interface.)
A: Not in cflowd, you're right. Hendrik Visage (email@example.com) modified flowdump in January 2000 to take rawflows (after modifying Plonka's cflowd patch to have the file dropped around the 1,2,3,5,10,15 etc. minute marks with the date stamped), and then based on a modified flowdump CLI expresions, provide summary output ready for e.g. RRDtool.
Q: How do i de-duplicate flows?
A: De-duplicating flows is defined as recognizing when two netflow export records are redundant, i.e., where a single flow is seen and reported by two different routers in the network Billing/accounting applications render it essential to know when to [not] double-count the statistics of such a flow.
Given current netflow data, a simple flow correlation technique involves matching the aggregation keys of two or more records output by FlowCollector(s), and then comparing the starting and ending flow timestamps to determine if an overlap exists. If an overlap exists, we can conclude that at least some portion of the flows is redundant.
The NetFlow Server will provide flow de-duplication functionality, with some basic limitations:
- For short flows, or those with non-trivial time differences timestamps may not overlap. In this case, it is impossible to recognize the flow records as being redundant. A single flow with this profile would be reported by each router and then double-counted by the NetFlow Server.
- Only a few of the FlowCollector aggregation schemes, such as DetailHostMatrix and CallRecord, contain sufficient detail (starting and ending flow timestamps) to be able to correlate duplicate flows in the network. The other aggregation schemes are lacking this detail and, therefore, cannot be used to recognize duplicate flows unless their records were extended to include timestamp fields (with possible performance and disk space impacts).
- Consider a single flow sent through two parallel routers using per-packet load balancing available with Cisco Express Forwarding (CEF). With NetFlow enabled, the two routers would each report a flow with similar timestamps. Given the timestamp approach to recognizing duplicate flows, the NetFlow Server would identify the NetFlow Export data from the two routers as being redundant even though they are not.
Q: Will cflowd work with Juniper routers? Is the file format the same?
A: Juniper routers perform packet sampling, and one of the options is for the sampling data to be generated in cflowd-recognizable format.
The file format is close enough that Daniel McRobb seemed to be able to easily make the mods to cflowd. I don't think it is identical.
For more info, see:
Q: I'm just starting to use cflowd. What would you suggest to handle: Cisco 7206, Fast Ethernet, 1 HSSI, 8 V.35 up to E1? Assuming all of the serial interfaces are running at full speed, what dedicated machine do I need to run the collector on? Will a PC and free Unix do? Would the disk I/O require a SCSI system?
A: (Chet F Johnson) I did a test to try and understand platform capacity constraints of cflowd in a fairly large local area network. I configured 13 fully populated production 7513 campus routers, IOS11.2(15a), and directed ip flow export to a single platform. Each router had a mix of 45Mb ATM, FastEthernet, and Ethernet interfaces. On average, each router would transmit roughly 80 flows per second to the collector (approximately 45kbps). The aggregate data rate to the collector was approximately 3.6Mbps for all router feeds. It should be noted that I only had Type 1 NetFlow available and we do not run BGP so I had no AS information.
cfdcollect (configured with default 5 minute sample interval) was operating on the same platform as cflowdmux and cflowd. During each five minute interval, cflowd memory utilization would grow to approximately 25Mbytes. This memory utilization would double when cfdcollect retrieved the data from memory and wrote it to disk. The cfdcollect process took approximately 19 seconds to complete. Raw flows were written to the same partition as the cfdcollect data.
This exercise was performed under Redhat Linux 6.1 running on a dual 166Mhz Pentium CPU platform, 256 MBytes RAM, and 24 GBytes of disk space (Adaptec SCSI and 6 four Gbyte drives). CPU utilization was < 1%. RAM utilization never approached critical (see note below). It took a month of collection to consume the disk space.
Based on what I observed and the environment you describe, I would expect that any standard PC with 128+Mbytes RAM and a large disk drive (or two) would serve well in getting you started.
Q: I have two routers with five "external" connections. How can I account for data perceived as coming from the outside to individual IP addresses within our network (e.g. to a /32). I would like to be able to do this in N minute samples. I want to then "summarize" this data and stick it in a mysql database for reporting. What's the best way?
A: (Alex Pilosov firstname.lastname@example.org) There are a few ways to do it:
One is to patch cflowd to make it summarize differently. (I didn't have much success with this approach.)
Another way, is to feed a perl script with an ASCII dump of flow traffic, and store it in an RRDtool database. This is slightly tricky, because RRDtool requires sorted data, and Cisco exports flows in the order in which they ENDED. Therefore, the script has to take care of storing, caching, and sorting, in order to keep RRDtool happy.
Note: I'm not using cflowd for input. Instead, I'm using flow-tools at https://www.caida.org/catalog/software/cflowd/ to parse flow packets and convert them to ASCII. The perl script isn't super-fact, but it uses <1% of CPU on 3mpbs of traffic. (Contact Alex for details/script.)
Q: How can cflowd (in particular, flowdump) be used to produce a workable, dead-easy-to-implement border router traffic accounting system?
A: (Andrew Fort email@example.com) We hacked flowdump to do this, specifically changing the definition of the ostream& operator in /classes/src/CflowdRawFlow.cc.
For version cflowd2-1-a3, this definition starts on line 1498 of CflowdRawFlow.cc:
ostream& operator << (ostream& os, const CflowdRawFlow & flow)...
This tends to be more parseable than the default flowdump output.
Then, a simple Perl script turns flowdump output into a list of single IPs (no distinction between source and destination) with input and output bytes for a given period. Note that each source is also a destination at some point in our environment. The modified flowdump proves to be a terrific traffic logger for border routers. The output files are exported to an SQL server and billing is performed at that level. Since we know that a particular IP address 'X' is a local co-hosted server, the database can provide an accurate billing report based on accurately identifying the IP in question.
Q: How do you collect flow data for unnumbered serial interfaces? If I'm using IP unnumbered FastEthernet0/0 in the router, Do the stats get reported under my FastEthernet interface?
A: (from John jtk @ depaul.edu) Setup 'ip route-cache flow' on your serial interface as usual and in your cflowd.conf. Then, you'll just watch for export data coming from the Ethernet interface of your router. Your HOST and ADDRESSES field in the CISCOEXPORTER section of cflowd.conf file will probably just contain the same IP address (assuming that's all you're watching).
The flow information itself should be what you expect. However, the IP nexthop field in the flow data will probably be of little value to you if you're watching traffic go out the serial interface too.
Q: Do you have any advice for folks worried about overwhelming the data collection/storage/analysis requirements?
A:: Cisco would probably want us to keep in mind that netflow is an input-based measurement technology, that should be deployed carefully to get traffic matrices for accounting, monitoring or network planning data. Data volume manageability is critical; seriously consider deploying netflow incrementally (i.e. interface by interface) and strategically (i.e. on well chosen routers), rather than unilaterally.
E.g., understand the impact of topology and routing policy on flow collection strategy. For example, avoid collecting duplicate flows by activating NetFlow on key aggregation routers where traffic originates or terminates and not on routers that provide duplicate views of flow information. Service providers in the "transit carrier" business (i.e. carrying traffic neither originating nor terminating on their network) may utilize NetFlow Export data for measuring transit traffic usage of network resources for accounting and billing purposes.
Q: I'm using artsnets to try and trace source/destination flows during DoS attacks. Problem is, the destination net is always an aggregate network (/18 for example). Is there is another tool I can use to drill down to to the specific destination /32 so I can figure out exactly what customer is getting attacked and ban him from accessing IRC?
A: (Christian B. Wiik firstname.lastname@example.org) As long as it's during and attack, using flowwatch works great. Use 'flowwatch all | grep "dst IP"' to find the receiver and "flowwatch '(dstaddr == ?.?.?.?)'" to examine all flows to that host.
I've not had much luck finding DoS attacks in cflowd's statistics. I guess you need to collect raw flows to be able to do this after an attack has finished.
Also, if the DoS attacks are big enough, the flow dumps may have rolled over by the time you get to them. But, if the relevant dump file is still around, you can read it with flowdump. (from Christian B. Wiik christbw @ tihlde.org and Jason Lixfeld jlixfeld @ team.look.ca)
Q: How do identify anomalies such as DoS attacks after the fact?
A: (Dave Plonka, email@example.com) I've written a patch for cflowd that enables you to have cflowd produce flow files with the timestamp in the name, in addition to cflowd's normal processing.
I use a 2Gbyte file system to keep about 28 hours worth of flows when I launch the patched cflowd like this:
cflowd -s 300 -O 0 -m /path/to/cflowd.conf where: -s 300 means 'drop a flow file every 300 seconds' -O 0 means 'skip flows with output_if of zero' -m means retain multicast flows (even though their output_if is 0)
I haven't talked to Daniel recently about incorporating this patch into an upcoming cflowd but I think it would be great to do this since it peacefully co-exists with existing features. For more info, see the docs at the patch URL: http://net.doit.wisc.edu/~plonka/cflowd/ Or, see the FlowScan INSTALL doc at: http://net.doit.wisc.edu/~plonka/FlowScan/ The INSTALL doc explains how the features added by the patch are used by FlowScan. Examples of crontab entries are included in the FlowScan distribution that will help you to compress and purge your flow files.
It is my understanding that my current "a3" patch applies against "a6" just fine. This because there weren't significant changes to the files it patches ("cflowd.cc" and "configure.in") between "a3" and "a6". (from Dave Plonka plonka @ doit.wisc.edu)
Q:I've been using Dave Plonka's patched cflowd and perl module to do some custom analysis of NetFlow data. Call me crazy, but ideally I would like to have about a 2 month archive of flow data in some database. I'm currently dumping the data into a mysql database on a Linux box. Collecting the data is no problem; generating reports takes forwever. The problem is that the data set grows large very fast. Since I'm looking at 8-10 million records a day, I'm figuring 1 month's worth of data will be about 40Gbytes. Although it seems that keeping the raw flow data will allow me to generate whatever reports I need, is there a more efficient approach? Can anyone recommend an economical hardware/ operating system/ database to use? (I've tried Flowscan, but it doesn't give me all that I need.)
A: ( Alex Pilosov firstname.lastname@example.org ) Mysql should be able to handle that. The problem is that for many reports, mysql would have to examine 40Gbytes worth of data, which takes ages. So, you need some sort of aggregation on-the-fly as data is inserted into mysql. I considered using postgres and triggers and stored procedures to extract and total up statistics as data is loaded. Instead, I currently use OSU's flowtools. flowtools logs raw flows into 1Gbyte partitions (for us, this is about 2 week's worth of data at 4M/sec avg), and changes filenames every 4 hours. That allows me to do certain reports reasonably quickly, as long as they don't need to span more than 4 hours of raw data.
(from Dave email@example.com) Storing raw flows as rows in a relational database seems difficult. The storage space and run-time work-set size overhead incurred by maintaining either relational tables of ISAM indexes on all interesting flow fields probably doesn't justify the convenience of executing adhoc SELECT statements.
I'm assuming you started by preserving the timestamped raw flow files produced by the patched cflowd (e.g., "flows.YYYYMMDD_HH:MI:SS"), then found that you wanted to do on-demand searches of flows that met particular selection criteria.
One way of approaching this problem is to process/organize the flows once (as FlowScan currently does) and run multiple FlowScan reports (or enhance existing reports), such that flows are diverted into separate raw flow files having descriptive names:
Napster/flows.YYYYMMDD_HH:MI:SS-TZ # named by application ResNet/flows.YYYYMMDD_HH:MI:SS-TZ # named by network/subnet (or dept/cust) 10.0.1.0_24/flows.YYYYMMDD_HH:MI:SS-TZ hostname/flows.YYYYMMDD_HH:MI:SS-TZ # named by host
Obviously, some flows would meet multiple selection criteria. Those flows could be duplicated into sub-dirs.
This way, although your selection criteria is limited to the subset you chose when FlowScan processed the file, you don't have to slog through all the flows to find something of interest. This particularly helps when investigating abuse, where timely reports are much better than taking 'n' hours to rescan all the raw flows. If your selection criteria usually contains the subnet or value for source or destination address, then this simple method could speed things up tremendously.
By the way, have you noticed in the current Cflow perl module that there is a variable called "$Cflow::raw"? It's described like this:
Cflow::raw - the entire "packed" flow record as read from the input file. This is useful when the "wanted" subroutine wants to write the flow to another FILEHANDLE. i.e.: syswrite(FILEHANDLE, $Cflow::raw, length $Cflow::raw)
I added this variable so that it would be easy for us to divert the flows to a new raw flow file of our choosing. (This is described in the FlowScan "TODO" list as a possible "FlowDivert.pm" report.)
If you do continue with a relational DB approach, perhaps you could create all the "empty" flow rows in advance. Then, just fill them round-robin and use a unique "row-id" for the "current" row. This is similar to RRDtool's RRAs, but with non-numeric data. This might help to keep the number of rows from changing and causing the DB engine to have to extend, truncate, or rewrite portions of the table on the disk.
As for FlowScan, it certainly does make you decide beforehand what you might look for, but if we build some more reports for it (based on Cflow.pm) then it's more likely that one of them will have done what you want.
Q: Are there any plans to support NetFlow version 7 (generated by Cisco Cat5000s)?
A: There are no plans to support v7 flow-export at this time.
(David Spindler firstname.lastname@example.org) I believe that Catalyst switches will export V1. We use the exports from our Catalyst to get per-interface MRTG like statistics for the router that was bypassed.
Q: Are there other options besides cflowd?
A:Well, in terms of free stuff, we only know of cflowd and Ohio State's flowtools. Cisco has expressed intention to partner with 3rd-party vendors to support customer flow analysis needs. Not much information seems available at this time.
(from http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm ) Also, cisco lists customers that have used and continue to integrate netflow into their usage-based billing and services, including: Portal Software; Solect, Belle Systems, Saville Systems, X-Cel Communications, and Kenan Systems.
Q: So in the meantime, any recommendations?
A: In the meantime, we recommend a single or few collectors to gather all netflow data for significantly-sized networks. Note that cflowd can collect on multiple collection hosts from multiple netflow servers at once.