Power Analysis 软件
Traditionally, data collected in a research study is submitted to a significance test to assess the viability of the null hypothesis. The p-value provided by the significance test, and used to reject the null hypothesis, is a function of three factors: The larger the observed effect, the larger the sample size, and/or the more liberal the criterion required for significance (alpha ), the more likely it is that the test will yield a significant p-value.
A power analysis, executed when the study is being planned, is used to anticipate the likelihood that the study will yield a significant effect and is based on the same factors as the significance test itself. Specifically, the larger the effect size used in the power analysis, the larger the sample size, and/or the more liberal the criterion required for significance (alpha), the higher the expectation that the study will yield a statistically significant effect.
These three factors, together with power, form a closed system - once any three are established, the fourth is completely determined. The goal of a power analysis is to find an appropriate balance among these factors by taking into account the substantive goals of the study, and the resources available to the researcher.
The term "effect size" refers to the magnitude of the effect under the alternate hypothesis. The nature of the effect size will vary from one statistical procedure to the next (it could be the difference in cure rates, or a standardized mean difference, or a correlation coefficient) but its function in power analysis is the same in all procedures.
The effect size should represent the smallest effect that would be of clinical or substantive significance, and for this reason it will vary from one study to the next. In clinical trials for example, the selection of an effect size might take account of the severity of the illness being treated (a treatment effect that reduces mortality by one percent might be clinically important while a treatment effect that reduces transient asthma by 20% may be of little interest). It might take account of the existence of alternate treatments (if alternate treatments exist, a new treatment would need to surpass these other treatments to be important). It might also take account of the treatment's cost and side effects (a treatment that carried these burdens would be adopted only if the treatment effect was very substantial).
Power analysis gives power for a specific effect size. For example, the researcher might report "If the treatment increases the recovery rate by 20 percentage points the study will have power of 80% to yield a significant effect". For the same sample size and alpha, if the treatment effect is less than 20 points then power will be less than 80%. If the true effect size exceeds 20 points, then power will exceed 80%.
While one might be tempted to set the "clinically significant effect" at a small value to ensure high power for even a small effect, this determination cannot be made in isolation. The selection of an effect size reflects the need for balance between the size of the effect that we can detect, and the resources available for the study.
Small effects will require a larger investment of resources than large effects. Figure 1 shows power as a function of sample size for three levels of effect size (assuming alpha, 2-tailed, is set at .05). For the smallest effect (30% vs. 40%) we would need a sample of 356 per group to yield power of 80%. For the intermediate effect (30% vs. 50%) we would need a sample of 93 per group to yield this level of power. For the highest effect size (30% vs. 60%) we would need a sample of 42 per group to yield power of 80%. We may decide that it would make sense to enroll 93 per group to detect the intermediate effect but inappropriate to enroll 356 patients per group to detect the smallest effect.
The "true" (population) effect size is not known. While the effect size in the power analysis is assumed to reflect the population effect size for the purpose of calculations, the power analysis is more appropriately expressed as "If the true effect is this large power would be ... " rather than "The true effect is this large, and therefore power is ..."
This distinction is an important one. Researchers sometimes assume that a power analysis cannot be performed in the absence of pilot data. In fact, it is usually possible to perform a power analysis based entirely on a logical assessment of what constitutes a clinically (or theoretically) important effect. Indeed, while the effect observed in prior studies might help to provide an estimate of the true effect it is not likely to be the true effect in the population - if we knew that the effect size in these studies was accurate, there would be no need to run the new study.
Since the effect size used in power analysis is not the "true" population value, the researcher may elect to present a range of power estimates. For example (assuming N=93 per group and alpha=.05, 2 tailed), "The study will have power of 80% to detect a treatment effect of 20 points (30% vs. 50%), and power of 99% to detect a treatment effect of 30 points (30% vs. 50%)".
Cohen has suggested "conventional" values for "small", "medium" and "large" effects in the social sciences. The researcher may want to use these values as a kind of reality-check, to ensure that the values he/she has specified make sense relative to these anchors. The program also allows the user to work directly with one of the conventional values rather than specifying an effect size, but it is preferable to specify an effect based on the criteria outlined above, rather than relying on conventions.