Overview
- Measurement in large production networks
- Measurement infrastructure design issues
- Passive Measurement Methodology
- UCSD Network Topology
- Measurement goals, Writing rulesets
- Case Studies
- Short-term data rates
- Time Variations of Stream Lifetime and Size
- Conclusion
Measurement Design Issues
- Active vs passive
- Monitor placement within the network
- Need to understand the physical topology
- Can discover the IP Address ranges (netblocks) in use
- Could monitor every link, but this doesn't scale
- Simpler approaches:
monitor only busiest links, monitor only at edges
- Selecting Metrics
- Many metrics to choose from, e.g. IPPM, CAIDA Metrics FAQ
- We use NeTraMet (RTFM) attributes, i.e. those described in
RFC 2720 and RFC 2724
- Data Collection and Archiving
- What data is to be measured and stored?
(NeTraMet flows, 5-minute meter readings)
- How will the data be stored and accessed?
(Flow data files, one per day)
- What interface will be provided to make the data
easily accessible to users?
(Web page for daily/weekly plots)
UCSD/SDSC network topology
- SDSC has links to CERFnet, vBNS+ and Abilene
- UCSD network has CalREN link and SDSC links
- Routing is asymmetric across the four Internet links
-
NeTraMet meters are installed at two points, `UCSD,' and `SDSC'
Developing Rulesets
- Write ruleset to select flows of interest, and gather
the required flow data
- Careful testing is essential!
- Must make sure ruleset covers all possible pairs of hosts
(even the ones one didn't expect)
- Distribution parameters (number of bins, upper and lower
limits, etc.) work well for the expected traffic load
- Rulesets evolve - they are refined as one's understanding
of the measured traffic improves
Case Study 1: Short-term Data Rates
- We measure the total data rate into and out from SDSC via
the CERFnet link. Link is rate-limited to 20 Mbps
- We determine each packet's direction by testing whether
its Source IP Address lies within one of UCSD/SDSC's
14 netblocks
- Our ruleset for this is as follows ..
# Ruleset to get 10-second data rates for CERFnet link
define CAIDA = 192.172.226/24;
define HYPERNET = 153.105/16;
define MPL106 = 192.135.237/24;
define MPL4 = 192.135.238/24;
define NET_NSI = 198.133.185/24;
define SCRIPPSNET_BIG = 137.131/16;
define SDSCFDDIDMZ = 198.17.46/24;
define SDSC2 = 132.249/16;
define SDSC_APOLLO = 192.31.21/24;
define SDSCNET_CBLK = 198.202.64/18;
define UCSD = 128.54/16;
define UCSD_CERF = 199.105.0/18;
define UCSD_EXTRN = 137.110/16;
define UCSD_SUB = 132.239/16;
define UCSD_NETS =
UCSD, UCSD_SUB, UCSD_EXTRN, MPL106, MPL4, UCSD_CERF;
define SDSC_NETS =
SDSC2, SCRIPPSNET_BIG, HYPERNET, SDSC_APOLLO, CAIDA,
SDSCFDDIDMZ, SDSCNET_CBLK, NET_NSI;
define SOURCE_NETS = UCSD_NETS, SDSC_NETS;
if SourcePeerType == IPv4 save;
else ignore;
if SourcePeerAddress == (SOURCE_NETS) {
# To means 'away from SOURCE'
save ToBitRate = 48.10.0!0 & 1.3.1!24000;
save FromBitRate = 48.10.0!0 & 1.3.1!24000;
# 48 buckets, 10s rates, linear, **3 => 1k..24M B/s
count;
}
set data_rate_n;
format
FlowRuleSet FlowIndex FirstTime SourcePeerType
" " ToPDUs FromPDUs " " ToOctets FromOctets
" (" ToBitRate
") (" FromBitRate
")";
- This ruleset builds a single flow, with `all netblocks
within UCSD/SDSC' as its Source
- The Save To/FromBitRate statements tell the meter it
should compute n-second bit rates To and From
- The distributions parameters are given in the comment line.
We are using 10-second bit rate distributions with 48 bins
10-second Data Rates for week from Sat 17 Feb 2001
- 18T09 means 0900 (UTC) on 18 Feb 2001, i.e. 0100 (PST)
- Diurnal variations In and Out (minimum around 1200 UTC,
i.e. 0400 PST)
- Maximum is clearly rate-limited. This limiting is not at
all visible in the 10-second medians
- More data out than in for this week (probably not true
for the other three research/education networks
- Could do this with SNMP, would need to read interface
counters every 10 seconds
Case Study 2: Time Variations of Stream Lifetime and Size
- We extended NeTraMet to build stream size and lifetime distributions.
When a stream terminates, its size (packets and kB) and
duration (ms) are added into the distributions for its flow
- As well as time variations, we are interested in differences
between protocols: UDP, non-web TCP, web
- We also distinguish `outside' web (data imported to UCSD)
and `outside' web (data exported from UCSD)
- Our ruleset for this is as follows ..
# Collect stream lifetime and size distributions
define UCSD_SUB = 132.239/16;
define UCSD_EXTRN = 137.110/16;
define UCSD_CERF = 199.105.0/26;
define SOURCE_NETS = UCSD_SUB, UCSD_EXTRN, UCSD_CERF;
define WWW = 80; # www port number
if SourcePeerType == IPv4 save;
else ignore;
if SourceTransType == TCP save,
store FlowKind := 2;
else if SourceTransType == UDP save,
store FlowKind := 1;
else ignore;
if SourcePeerAddress == (SOURCE_NETS) {
# To means 'away from SOURCE'
if DestPeerAddress == (SOURCE_NETS)
ignore; # Internal UCSD flow, ambiguous
if SourceTransType == TCP {
if SourceTransAddress == WWW &&
DestTransAddress == WWW
store FlowKind := 5; # Would be ambiguous
else if DestTransAddress == WWW
store FlowKind := 3; # Server outside UCSD
else if SourceTransAddress == WWW
store FlowKind := 4; # Server inside UCSD
}
save ToFlowOctets = 50.0.0!0 & 2.2.1!1000;
save FromFlowoctets = 50.0.0!0 & 2.2.1!1000;
# 50 buckets, PP_NO_TEST, log, 100..100k B
save FlowTime = 50.0.0!0 & 2.4.1!12000
# 50 buckets, PP_NO_TEST, log, 10 ms .. 120 s
count;
}
set flow_stats_size;
format
FlowRuleSet FlowIndex FirstTime SourcePeerType
SourceTransType " " FlowKind
" " ToPDUs FromPDUs " " ToOctets FromOctets
" (" ToFlowOctets ") (" FromFlowOctets
") (" FlowTime
")";
Stream Lifetimes for 5 days from Fri 2 Feb 2001
Medians of 5-minute distributions.
- Bottom trace shows number of streams for each protocol
per 5-minute interval.
- UDP streams are mostly short; median <= 10 ms
- Non-WWW streams last much longer
- `Outside' web streams have median around 300 ms
- `Inside' web streams are very simlar to `outside'
Inbound Stream Sizes for 5 days from Fri 2 Feb 2001
Medians of 5-minute distributions.
- Most UDP streams are very small; 95% import < 500 Bytes
- Non-WWW streams are bigger; 75% <= 10 kB
- `Outside' web streams have median below 800 Bytes, but
95%-ile is higher, abround 30 kB
- `Inside' web streams have median below 800 Bytes, but
95%-ile about 200 Bytes. For inside servers, inbound
packets are only carrying TCP acks for exported web objects
Outbound Stream Sizes for 5 days from Fri 2 Feb 2001
Medians of 5-minute distributions.
- UDP outbound streams are even smaller than inbound ones
- Non-WWW outbound streams are also similar to inbound ones
- Web streams inbound are very similar to outbound, except that
their `import' and `export' roles are reversed
Cummulative Distributions for Streams
- Some diurnal variations are visible, together with occasional
bursty changes, but overall the distributions are
surprisingly stable
- The next three plots are cummulative distributions for
the 5-minute interval ending at 2200 (UTC) on Fri 2 Feb
Cumulative Stream Lifetime Distributions
Medians of 5-minute distributions.
- Nearly 60% of UDP streams last 10 ms or less
- TCP streams are longer-lived; only 10% of them last 10 ms
or less, and their
60th percentile is close to 1 s
- non-WWW streams are more long-lived than web streams
Cumulative Stream Inbound Size Distributions
Medians of 5-minute distributions.
- UDP streams reach 99% at about 10 kB, but there are a few
larger ones.
- Non-WWW and `outside' web streams have a similar
distribution shapes. These should be similar to
distributions of file sizes.
- `Inside' web streams show a sharp rise at 600 Bytes;
these are TCP acks from outside web clients
Conclusion
- Building measurement infrastructure is a non-trivial task.
It requires careful design and implementation so as to ensure
that the measurements provide effective support for their
users
- A clear understanding of the network topology is vital,
but it is not always easy to achieve.
- NeTraMet is a very effective measurement tool, providing
a very general way to specify flows, and a reasonable
amount of front-end (wuthin the metere) data reduction.
However, care is needed when creating rulesets. In
particular, a ruleset should be unambiguous for all possible
source-destination pairs
- Short-term bit rate distributions are very useful for monitoring
links. NeTraMet can easily produce them for flows within a
torrnet (which is not possible using SNMP interface counters)
- We have extended NeTraMet to collect Stream Size and Lifetime
distributions. Their behaviour reinforces our earlier experiences
(most UDP and TCP streams are short-lived, etc.)
However, there is scope for more work ..
- Overall, the distributions are fairly stable
- But there are short-term burts, which don't seem to be
correlated between the various protocols
- These distributions could provide a way to identify various
kinds of network attacks (similar to FlowScan), in real time
- Overall, NeTraMet provides a good platform on which to build
network measurement systems