(underlying motivation: does every stub AS really needs its own AS number?)
Definition: leaf AS: AS node with outdegree 0 and indegree 1.
Definition: stub AS : either leaf AS
or an AS that becomes a leaf after removal of leaves
in previous stripping cycle.
As of 29 November 2000, the Oregon Route Views BGP tables 2000 contained 9,443 different ASes and 6 AS sets. AS in these sets were also present as separate ASes elsewhere in the table. We will call these 9,449 nodes AS nodes. 58 of these ASes did not originate any prefixes, serving only as transit ASes.
Of the remaining 9385 ASes, 9257 had at least one prefix for which this AS was a single origin AS. 128 ASes originated prefixes only as multiple origin ASes, i.e. together with some other AS, and were not the single origin AS for any prefix. The total number of different combinations of ASes in multiple origin groups was 607. The number of prefixes advertised by multiple origin groups of ASes was 1186 out of a total of 97,688 prefixes. The largest number of prefixes advertised by a multiple origin group was 57. The total number of single and multiple origin groups is 9864 (9257+607). We will call these origins.
Multiple origin groups are essentially sets of ASes (a pseudo-AS) who announce prefixes in common (among prefixes that they also exclusively announce). We do not view them as leaves and rather group then with non-stub ASes in the analysis below.
There were 3387 leaf (outdegree 0 and indegree 1) AS nodes in the AS graph obtained from U Oregon Route Views tables, where each pair of consecutive ASes in a path generates a link. After removing these 3387 leaves, 69 nodes become leaves. After one more iteration removing these 69 nodes, one created leaf node remains. This set of 3457 nodes will be called stub AS nodes. The remaining 5992 nodes are non-stub ASes.
3348 stub ASes were present as single origins. 109 stub AS were present only in multiple origin groups. The total number of origins that are not stub AS nodes is 6516.
The following plot shows the size distribution, i.e. the number of prefixes associated with each of 9864 origins. The first plot shows the fraction of origins that advertized more than a given number of prefixes, starting from 1. The main part of the curve fits closely to a Weibull distribution 8 exp(-(x/0.01)^0.2). In addition to the quite good fit to Weibull that we have seen with several other granularities of Internet topology data (e.g., IP, prefix, and AS nodal degree), this data follows Weibull distribution astoundingly closely even at the lowest X-values.
Only five points with the largest number of prefixes (of which the picture shows four) deviate from the Weibull. This is a common phenomenon, caused by the fact that the ccdf is statistically less reliable in this part of the plot due to the large gaps between and low occurrence probability of those data values. Essentially, ccdf gets truncated (turns to 0) at the largest data value, where an approximating function has nonzero value. This difference makes the values of ccdf approaching maximum X to be much smaller than the value predicted by the formula, since subtracting a constant makes for much larger relative error here.
Yet another difference from other Internet size distributions is that the shape parameter (power) for IP-related quantities is normally between 0.28 and 0.35. On the other hand, for the number of IP addresses observed by skitter within an AS, the shape parameter is 0.24, which is closer to the 0.2 obtained in Figure 1. This discrepancy suggests that AS-related quantities have fatter tails, i.e. larger values are more probable, than those related to prefixes and IPs. This gap is consistent with the range of variability in size and organizational strength of existing ASes in the Internet.
Figure 1
The second plot shows the numbers of origins in stub
and non-stub portion of the AS graph.
It is natural that a stub AS generally advertizes
fewer prefixes than non-stub ASes. For example,
the maximum number of prefixes advertized by
a stub AS is 313, whereas for a non-stub AS
it is 2325 (although the second largest is just 986.)
About 24% of stub ASes advertise more than 3 prefixes.
For non-stub ASes, this number is 45%.
The gap between cumulative counts becomes wider after
about 40 prefixes. Figure 2 also
shows non-cumulative frequencies of both types of AS nodes.
It is no surprise that the influence of the stub AS is almost lost in the overall distribution of Figure 1, and that their distribution curve is much more irregular. In particular, it does not appear to follow Weibull distribution as closely as two other curves do. The shape parameter of the Weibull distribution for non-stub ASes is 0.21, which is compellingly close to 0.20 for the overall curve.
Figure 2
The numerical data for stub and non-stub AS is shown in the following table:
#prefixes #nsAS Sum cdf: P(<=X) #rem.AS ccdf: P(>X) #Stubs Sum cdf: P(<=X) rem.st ccdf: P(>X)
1 1967 1967 0.301872314 4549 0.698127686 1694 1694 0.505973716 1654 0.494026
2 995 2962 0.454573358 3554 0.545426642 585 2279 0.680704898 1069 0.319295
3 597 3559 0.546193984 2957 0.453806016 274 2553 0.762544803 795 0.237455
4 423 3982 0.611111111 2534 0.388888889 157 2710 0.809438471 638 0.190562
5 350 4332 0.664825046 2184 0.335174954 109 2819 0.841995221 529 0.158005
6 253 4585 0.703652548 1931 0.296347452 96 2915 0.870669056 433 0.129331
7 208 4793 0.735573972 1723 0.264426028 52 2967 0.886200717 381 0.113799
8 187 4980 0.764272560 1536 0.235727440 50 3017 0.901135006 331 0.098865
9 158 5138 0.788520565 1378 0.211479435 56 3073 0.917861410 275 0.082139
10 121 5259 0.807090239 1257 0.192909761 32 3105 0.927419355 243 0.072581
............................................................................................
313 0 6484 0.995089012 32 0.004910988 1 3348 1.000000000 0 0.000000
............................................................................................
616 1 6507 0.998618785 9 0.001381215 0 3348 1.000000000 0 0.000000
678 1 6508 0.998772253 8 0.001227747 0 3348 1.000000000 0 0.000000
699 1 6509 0.998925721 7 0.001074279 0 3348 1.000000000 0 0.000000
802 1 6510 0.999079190 6 0.000920810 0 3348 1.000000000 0 0.000000
815 1 6511 0.999232658 5 0.000767342 0 3348 1.000000000 0 0.000000
843 1 6512 0.999386126 4 0.000613874 0 3348 1.000000000 0 0.000000
905 1 6513 0.999539595 3 0.000460405 0 3348 1.000000000 0 0.000000
959 1 6514 0.999693063 2 0.000306937 0 3348 1.000000000 0 0.000000
986 1 6515 0.999846532 1 0.000153468 0 3348 1.000000000 0 0.000000
2325 1 6516 1.000000000 0 0.000000000 0 3348 1.000000000 0 0.000000
P.S. The reason for ubiquity of Weibull is that is is an extreme value distribution, i.e. a distribution which is likely to be followed by a minimum of many positive random variables, as shown by Gumbel in 1958. (The result of Gumbel is proved for the variables which have a common continuous and invertible distribution [NIST ESH].)
Quantities that measure intrinsic strength of an object (such as node outdegree or a number of observable IPs in a prefix) reflect a minimum of many constraints that influence object size (existence time, equipment, placement etc.) It is known that sums of random variables often follow a Gaussian distribution even when the conditions of Central Limit Theorem are not fully satisfied. Weibull may likewise be followed by minima even in cases where Gumbel's result does not apply.
On the other hand, quantites that depend upon extrinsic or global properties of the network (e.g. average shortest path length from a node to other nodes) tend to follow other distributions than Weibull with c<1 (exponential in case of path length.)
andre broido