Kappa Coefficients

Jackywolf_2008

5597

收藏 2013-01-28

今天花时间看了一下Kappa Coefficients, 用来度量两个raters对同一事物的评价的一致性(agreement), 根据Kappa的值来判定这种一致性是偶然发生，还是有内在联系的。计算方法比较简单，而且在SAS 中使用Proc freq就可以得到，但是Kappa检验也是一个备受争议的方法，很容易导致错误的使用和解释。因此，目前，学术界对这个检验还有较大的质疑，对此，许多学者也都提出了各种改进的方法。
这里有一个不错的连接，简洁却又系统了Kappa的前世今生，废话就不多说了，直接上图。

http://www.john-uebersax.com/stat/kappa.htm#procon
http://www.agreestat.com/research_papers/kappa_statistic_is_not_satisfactory.pdf

常见的使用Kappa检验的场景是

Kappa statistics are appropriate for testing whether agreement exceeds chance levels for binary and nominal ratings.

Interpreting kappa
如何解释计算得出的Kappa的值呢，很多学者认为给出下面的这样一个range是非常危险的，因为有可能导致用户错误的使用该值，并且该定义并不具备通用性，不同的

场景该定义可能会截然不同，所以下面这个只是供大家一个参考，帮助大家了解Kappa检验的作用，请勿作为Best Practice.
Kappa measures the strength of agreement of the row and column variables, which typically represent the same categorical rating variable as applied by two raters to a set of subjects or items. Note that the minimum value of kappa, when there is complete disagreement, is negative. When there is perfect agreement, all cell counts off the diagonal are 0 and kappa is 1. Kappa is zero when there is no more agreement than would be expected under independence of the row and column variables. Landis and Koch ( Biometrics, 1977) give this interpretation of the range of kappa: