Question: of the `stub AS's', how many inject only few (<10) network prefixes?

(underlying motivation: does every stub AS really needs its own AS number?)

Definition: leaf AS: AS node with outdegree 0 and indegree 1.
Definition: stub AS : either leaf AS or an AS that becomes a leaf after removal of leaves in previous stripping cycle.

As of 29 November 2000, the Oregon Route Views BGP tables 2000 contained 9,443 different ASes and 6 AS sets. AS in these sets were also present as separate ASes elsewhere in the table. We will call these 9,449 nodes AS nodes. 58 of these ASes did not originate any prefixes, serving only as transit ASes.

Of the remaining 9385 ASes, 9257 had at least one prefix for which this AS was a single origin AS. 128 ASes originated prefixes only as multiple origin ASes, i.e. together with some other AS, and were not the single origin AS for any prefix. The total number of different combinations of ASes in multiple origin groups was 607. The number of prefixes advertised by multiple origin groups of ASes was 1186 out of a total of 97,688 prefixes. The largest number of prefixes advertised by a multiple origin group was 57. The total number of single and multiple origin groups is 9864 (9257+607). We will call these origins.

Multiple origin groups are essentially sets of ASes (a pseudo-AS) who announce prefixes in common (among prefixes that they also exclusively announce). We do not view them as leaves and rather group then with non-stub ASes in the analysis below.

There were 3387 leaf (outdegree 0 and indegree 1) AS nodes in the AS graph obtained from U Oregon Route Views tables, where each pair of consecutive ASes in a path generates a link. After removing these 3387 leaves, 69 nodes become leaves. After one more iteration removing these 69 nodes, one created leaf node remains. This set of 3457 nodes will be called stub AS nodes. The remaining 5992 nodes are non-stub ASes.

3348 stub ASes were present as single origins. 109 stub AS were present only in multiple origin groups. The total number of origins that are not stub AS nodes is 6516.

The following plot shows the size distribution, i.e. the number of prefixes associated with each of 9864 origins. The first plot shows the fraction of origins that advertized more than a given number of prefixes, starting from 1. The main part of the curve fits closely to a Weibull distribution 8 exp(-(x/0.01)^0.2). In addition to the quite good fit to Weibull that we have seen with several other granularities of Internet topology data (e.g., IP, prefix, and AS nodal degree), this data follows Weibull distribution astoundingly closely even at the lowest X-values.

Only five points with the largest number of prefixes (of which the picture shows four) deviate from the Weibull. This is a common phenomenon, caused by the fact that the ccdf is statistically less reliable in this part of the plot due to the large gaps between and low occurrence probability of those data values. Essentially, ccdf gets truncated (turns to 0) at the largest data value, where an approximating function has nonzero value. This difference makes the values of ccdf approaching maximum X to be much smaller than the value predicted by the formula, since subtracting a constant makes for much larger relative error here.

Yet another difference from other Internet size distributions is that the shape parameter (power) for IP-related quantities is normally between 0.28 and 0.35. On the other hand, for the number of IP addresses observed by skitter within an AS, the shape parameter is 0.24, which is closer to the 0.2 obtained in Figure 1. This discrepancy suggests that AS-related quantities have fatter tails, i.e. larger values are more probable, than those related to prefixes and IPs. This gap is consistent with the range of variability in size and organizational strength of existing ASes in the Internet.

Figure 1

The second plot shows the numbers of origins in stub and non-stub portion of the AS graph. It is natural that a stub AS generally advertizes fewer prefixes than non-stub ASes. For example, the maximum number of prefixes advertized by a stub AS is 313, whereas for a non-stub AS it is 2325 (although the second largest is just 986.) About 24% of stub ASes advertise more than 3 prefixes. For non-stub ASes, this number is 45%. The gap between cumulative counts becomes wider after about 40 prefixes. Figure 2 also shows non-cumulative frequencies of both types of AS nodes.

It is no surprise that the influence of the stub AS is almost lost in the overall distribution of Figure 1, and that their distribution curve is much more irregular. In particular, it does not appear to follow Weibull distribution as closely as two other curves do. The shape parameter of the Weibull distribution for non-stub ASes is 0.21, which is compellingly close to 0.20 for the overall curve.

Figure 2

The numerical data for stub and non-stub AS is shown in the following table:

#prefixes #nsAS Sum  cdf: P(<=X) #rem.AS ccdf: P(>X) #Stubs Sum cdf: P(<=X) rem.st ccdf: P(>X)
       1  1967  1967 0.301872314    4549 0.698127686  1694  1694 0.505973716  1654  0.494026
       2   995  2962 0.454573358    3554 0.545426642   585  2279 0.680704898  1069  0.319295
       3   597  3559 0.546193984    2957 0.453806016   274  2553 0.762544803   795  0.237455
       4   423  3982 0.611111111    2534 0.388888889   157  2710 0.809438471   638  0.190562
       5   350  4332 0.664825046    2184 0.335174954   109  2819 0.841995221   529  0.158005
       6   253  4585 0.703652548    1931 0.296347452    96  2915 0.870669056   433  0.129331
       7   208  4793 0.735573972    1723 0.264426028    52  2967 0.886200717   381  0.113799
       8   187  4980 0.764272560    1536 0.235727440    50  3017 0.901135006   331  0.098865
       9   158  5138 0.788520565    1378 0.211479435    56  3073 0.917861410   275  0.082139
      10   121  5259 0.807090239    1257 0.192909761    32  3105 0.927419355   243  0.072581
............................................................................................
     313     0  6484 0.995089012      32 0.004910988     1  3348 1.000000000     0  0.000000
............................................................................................
     616     1  6507 0.998618785       9 0.001381215     0  3348 1.000000000     0  0.000000
     678     1  6508 0.998772253       8 0.001227747     0  3348 1.000000000     0  0.000000
     699     1  6509 0.998925721       7 0.001074279     0  3348 1.000000000     0  0.000000
     802     1  6510 0.999079190       6 0.000920810     0  3348 1.000000000     0  0.000000
     815     1  6511 0.999232658       5 0.000767342     0  3348 1.000000000     0  0.000000
     843     1  6512 0.999386126       4 0.000613874     0  3348 1.000000000     0  0.000000
     905     1  6513 0.999539595       3 0.000460405     0  3348 1.000000000     0  0.000000
     959     1  6514 0.999693063       2 0.000306937     0  3348 1.000000000     0  0.000000
     986     1  6515 0.999846532       1 0.000153468     0  3348 1.000000000     0  0.000000
    2325     1  6516 1.000000000       0 0.000000000     0  3348 1.000000000     0  0.000000

P.S. The reason for ubiquity of Weibull is that is is an extreme value distribution, i.e. a distribution which is likely to be followed by a minimum of many positive random variables, as shown by Gumbel in 1958. (The result of Gumbel is proved for the variables which have a common continuous and invertible distribution [NIST ESH].)

Quantities that measure intrinsic strength of an object (such as node outdegree or a number of observable IPs in a prefix) reflect a minimum of many constraints that influence object size (existence time, equipment, placement etc.) It is known that sums of random variables often follow a Gaussian distribution even when the conditions of Central Limit Theorem are not fully satisfied. Weibull may likewise be followed by minima even in cases where Gumbel's result does not apply.

On the other hand, quantites that depend upon extrinsic or global properties of the network (e.g. average shortest path length from a node to other nodes) tend to follow other distributions than Weibull with c<1 (exponential in case of path length.)

References

[NIST ESH] . Extreme value distributions. In: Engineering Statistics Handbook, NIST.


you also might like ISP Tiers analysis

andre broido