检验power law distribution的规范的方法,首先是用maximum log-likelihood estimate进行参数估计,然后进行bootstrap analysis,接下来进行K-S test,最后是test against real alternatives
以Bibliometrics or Informetrics.....这篇文献,在杂志上page-121的实证数据,为例说明:
Lotka's Law : y = c*x^(-beta)
Here
y = [709,92,24,13,6,10]
x = [1,2,3,4,5,6]
1.应用s-plus Nonlinear least-squares algorithm 估计c & beta
x <- c(1,2,3,4,5,6)
y <- c(709,92,24,13,6,10)
Table2.data <- data.frame(x,y)
fit<- nls(y~c*x^-(beta),data=Table2.data,start=list(c=500,beta=2))
fit
执行结果如下
Residual sum of squares : 56.14811
parameters:
c beta
709.0087 2.957453
formula: y ~ c * x^( - (beta))
6 observations
与作者实证之c & beta 值接近
2.利用c & beta 之估计值算出f'(Lokta)
f'(Lokta) = [709.04,90.89,27.33,11.65,6.01,8.04]
3.利用 f(x) 创建 empirical cumulative distribution S(x)
f'(Lokta) 创建 hypothesized cumulative distribution F*(x)
4.计算KS test statistic.
|F*(x)-S(x)| 绝对值的最大值,就是KS test statistic
本例六组数据中的最大值,是0.00265
所以page-121 Table2,作者列出的K-S 是0.0027.
N = 10; s = 0.5; y = 1:N
proby = dzipf(y, N = N, s = s)
plot(proby ~ y, type = "h", col = "blue", ylab = "Probability",
ylim = c(0, 0.2), main = paste("Zipf(N = ",N,", s = ",s,")", sep = ""),
lwd = 2, las = 1)
sum(proby)
max(abs(cumsum(proby) - pzipf(y, N = N, s = s)))