摘要翻译:
统计假设检验是科学研究的基石。当它们的大小得到适当控制时,这些测试是有信息的,所以拒绝真零假设(第一类错误)的频率保持在预先设定的名义水平以下。然而,出版偏差夸大了测试的大小。由于科学家通常只能发表拒绝无效假设的结果,他们有动力继续进行研究,直到获得拒绝。这种$P$-hacking有许多形式:从收集额外的数据到检查多重回归规范,所有这些都是为了寻找统计意义。该过程将测试规模膨胀到其名义水平之上,因为用于确定拒绝的临界值假设测试统计数据是从单个研究中构造的--从$P$-黑客攻击中抽象出来的。本文通过构造与科学家行为相容的临界值来解决这个问题。我们假设研究人员进行研究,直到找到一个超过临界值的测试统计数据,或者直到进行额外研究的收益低于成本。然后求解激励相容临界值(ICCV)。当ICCV被用来确定拒绝时,读者可以确信大小被控制在期望的显著性水平,并且研究者对由临界值描述的激励的反应被考虑在内。由于它们允许研究人员在多项研究中寻找重要性,ICCVs比经典的临界值大。然而,对于广泛的研究行为和信念,ICCVs的范围相当狭窄。
---
英文标题:
《Incentive-Compatible Critical Values》
---
作者:
Adam McCloskey, Pascal Michaillat
---
最新提交年份:
2020
---
分类信息:
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
---
英文摘要:
Statistical hypothesis tests are a cornerstone of scientific research. The tests are informative when their size is properly controlled, so the frequency of rejecting true null hypotheses (type I error) stays below a prespecified nominal level. Publication bias exaggerates test sizes, however. Since scientists can typically only publish results that reject the null hypothesis, they have the incentive to continue conducting studies until attaining rejection. Such $p$-hacking takes many forms: from collecting additional data to examining multiple regression specifications, all in the search of statistical significance. The process inflates test sizes above their nominal levels because the critical values used to determine rejection assume that test statistics are constructed from a single study---abstracting from $p$-hacking. This paper addresses the problem by constructing critical values that are compatible with scientists' behavior given their incentives. We assume that researchers conduct studies until finding a test statistic that exceeds the critical value, or until the benefit from conducting an extra study falls below the cost. We then solve for the incentive-compatible critical value (ICCV). When the ICCV is used to determine rejection, readers can be confident that size is controlled at the desired significance level, and that the researcher's response to the incentives delineated by the critical value is accounted for. Since they allow researchers to search for significance among multiple studies, ICCVs are larger than classical critical values. Yet, for a broad range of researcher behaviors and beliefs, ICCVs lie in a fairly narrow range.
---
PDF链接:
https://arxiv.org/pdf/2005.04141