[原创]Monte Carlo模拟和Bootstrap模拟的区别是什么？

28427

收藏 2007-10-27

大家好！最近在看一些关于bootstrap模拟方法方面的文章。其中，经常看到在一种bootstrap方法介绍完之后，会用Monte Carlo模拟进行检验。在有篇文章中，也似乎看到过说，bootstrap模拟是Monte Carlo模拟的一种。不过具体来说，Monte Carlo模拟与bootstrap方法的区别是什么？二者的联系又是什么？还有很多疑问，希望各位指点迷津。

在这里先谢谢了！！！

同时，也希望有也在做bootstrap研究的同仁可以一起讨论，我现在在准备博士论文开题。此外，我这里有不少Bootstrap的paper，有兴趣的可以共享哈

[此贴子已经被作者于2007-10-27 11:16:00编辑过]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

xuelida

2007-10-27 16:24:00

这是一种通过设定随机过程（数据生成系统），反复生成时间序列，并计算参数估计量和统计量，进而研究其分布特征的方法。蒙特卡罗在欧洲的摩那哥，以著名赌城而得名。据说这个术语是Metropolis 在1949年提出的。若再晚些时候，蒙特卡罗模拟也许就称作Las Vegas（在美国的Nevada州，著名赌城）模拟方法了。

自举模拟与蒙特卡罗模拟既有联系，又不相同。自举（Boost trap，亦称靴襻）这个名词是Efron在1979年提出的。“自举”一词来源于儿童故事。指一个人落水时，试图用自提鞋扣儿的方法自救。20世纪80，90年代发展很快。自举，即采用从总体中反复抽取样本的方法计算参数估计量的值，置信区间或相应统计量的值并估计这些量的分布。这里介绍的远不是自举模拟的全貌，而是参数估计方面的应用。

进行蒙特卡罗模拟和自举模拟首先要设定数据生成系统。而设定数据生成系统的关键是要产生大量的随机数。例如模拟样本为100的随机趋势过程的DF统计量的分布，若试验1万次，则需要生成200万个随机数。

计算机所生成的随机数并不是“纯随机数”，而是具有某种相同统计性质的随机数。计量经济学中蒙特卡罗模拟和自举模拟所用到的随机数一般是服从N(0,1)分布的随机数。计算机生成的随机数称作“伪随机数”（pseudo-random number）。生成的随机数的程序称作“伪随机数生成系统”。实际上计算机不可能生成纯随机数。

在进行蒙特卡罗模拟时一般要给定多种条件。例如样本容量要选择50，100，200等多种。有时模型形式也要选择多种。从而研究参数估计量和统计量在各种条件下的分布特征。当只需要这几个特定条件下的模拟结果时，把结果纪录下来就可以了。当需要很多条件下的模拟结果时，一般采用估计响应面函数（response surface function）的方法研究之。例如Dicky-Fuller的DF检验表中只给出了样本容量为25，50，100，250，500几个点的DF分布特征。显然对25至500间每个样本容量都进行DF分布模拟是不实际的，也是无必要的。可以把上述几个条件下得到的DF分布百分位数看作样本点，然后采用回归的方法从而得到每个样本容量所对应的DF分布百分位数。这条回归直线称为响应面函数。麦金农的协整检验临界值表就是用这种方法得到的。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

hanszhu

2007-10-27 23:20:00

bootstrapping is a modern, computer-intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods. Bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data.It may also be used for constructing hypothesis tests. It is often used as an alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.The advantage of bootstrapping over analytical method is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite sample guarantees, and has a tendency to be overly optimistic.[此贴子已经被作者于2007-10-27 23:33:46编辑过]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

hanszhu

2007-10-27 23:21:00

A permutation test (also called a randomization test, re-randomization test, or an exact test)
is a type of statistical significance test in which a reference distribution is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatments are allocated to subjects in an experimental design is mirrored in the analysis of that design. If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels. Confidence intervals can then be derived from the tests. The theory has evolved from the works of R.A. Fisher and E.J.G. Pitman in the 1930s.

Monte Carlo testing
An asymptotically equivalent permutation test can be created when there are too many possible orderings of the data to conveniently allow complete enumeration. This is done by generating the reference distribution by Monte Carlo sampling, which takes a small (relative to the total number of permutations) random sample of the possible replicates.
The realization that this could be applied to any permutation test on any dataset was an important breakthrough in the area of applied statistics. The earliest known reference to this approach is Dwass (1957)[1]. This type of permutation test is known under various names: approximate permutation test, Monte Carlo permutation tests or random permutation tests[2]. However, it should be noted that all permutation tests are theoretically the same test, so it is important to understand that those different names only refer to one small and unimportant practical difference: to what level of detail the p-value is calculated.

The necessary size of the Monte Carlo sample depends on the need for accuracy of the test. If one merely wants to know if the p-value is significant, sometimes few as 400 rearrangements is sufficient to generate a reliable answer. However, for most scientific applications the required size is much higher. For observed p=0.05, the accuracy from 10,000 random permutations is 0.0056 and for 50,000 it is 0.0025. For observed p=0.10, the corresponding accuracy is 0.0077 and 0.0035. Accuracy is defined from the binomial 99% confidence interval: p +/- accuracy.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

alphastatist

2008-3-27 20:02:00

我的理解是

bootstrap思想是用经验分布代替总体分布

在这一条件下，考虑估计量的分布一般用模拟的方法生成bootstrap样本来得到估计量的bootstrap实现值，

通过实现值的经验分布来近似估计量的分布。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

spoonshen

2008-3-28 00:15:00

One key difference: they have difference assumptions.

For bootstrap, it is assumed that the population parameters are "known."
For Monte Carlo simulation, it is assumed that the population parameters are "unknown."