GiniMD(x)是R语言Hmisc包中的一个函数,计算的是一个数值型向量两两元素之间距离的平均值。
最初这个指标是从describe()这个描述性统计函数的output中看到的,于是深究了一番,帮助文档中给出了一些解释,最核心的就是这句话:
This index is defined as the mean absolute difference between any two distinct elements of a vector.
尽管给出了公式定义,但是总是跟我手算的办法(见如下我自己手算的示例)得到的结果不一样。如果不去纠结于复杂的公式,实际中我们可以不管,直接用R里的这个函数求得即可,但问题在于:
1)这个“基尼平均距离”指数(注:这是我对GMD的翻译)有什么意义?
2)GMD指数可以有哪些应用?
多番琢磨和查阅资料,百思不得其解。因此拿出来供大家讨论,请高人指点一二,谢谢。
附一:GMD在R帮助文档中的解释
Gini's Mean Difference
Description
[size=13.333333015441895px]GiniMD computes Gini's mean difference on a numeric vector. This index is defined as the mean absolute difference between any two distinct elements of a vector. For a Bernoulli (binary) variable with proportion of ones equal to p and sample size n, Gini's mean difference is 2np(1-p)/(n-1). For a trinomial variable (e.g., predicted values for a 3-level categorical predictor using two dummy variables) having (predicted) values A, B, C with corresponding proportions a, b, c, Gini's mean difference is 2n[ab|A-B|+ac|A-C|+bc|B-C|]/(n-1).
Usage
GiniMd(x, na.rm=FALSE)
Arguments
x | a numeric vector (for GiniMd) |
na.rm | set to TRUE if you suspect there may be NAs in x; these will then be removed. Otherwise an error will result. |
Value[size=13.333333015441895px]a scalar numeric
附二:手算GMD指数的示例