Introduction to bootstrap with applications to mixed-effect models

oliyiyi

1534

收藏 2015-11-29

This article was first published on DataScience+, and kindly contributed to R-bloggers)

Bootstrap is one of the most famous resampling technique and is very useful to get confidence intervals in situations where classical approach (t- or z- tests) would fail.

What is bootstrap

Instead of writing down some equations let’s directly see how one may perform bootstrap. Below we will show a simple bootstrap example using the height of 100 person in the population.

set.seed(20151101)height<-rnorm(100,175,6)

Now we will resample with replacement 1000 times and compute the median:

t0<-median(height)t<-sapply(1:1000,function(x) median(sample(x=height,size=100,replace=TRUE)))hist(t)abline(v=t0,col="orange",lwd=3)

Here is the histogram we get:

[color=rgb(255, 255, 255) !important]

And that’s it, this is the essence of bootstrap: resampling the observed data with replacement and computing the statistic of interest (here the median) many times on the resampled data to get a distribution of the statistic of interest. This distribution of the statistic of interest can then be used to compute, for example, confidence intervals.

When to use bootstrap

Bootstrap is used to enable inference on the statistic of interest when the true distribution of this statistic is unknown. For example in linear model the parameter of interest have a known distribution from which standard errors and formal tests can be performed. On the other hand for some statistics (median, differences between two models …), if the analyst do not want to spend time writing down equations,bootstrapping might be a great approach to get standard errors and confidence intervals from the bootstrapped distribution.

When will bootstrap fail

There are some situations where bootstrapped will fail: (i) the statistic of interest is at the edge of the parameter space (like minimum or maximum), the bootstraped distribution does not converge (as the number of bootstrap sample increase to infinity) to the true distribution of the statistic. (ii) sample size is small, bootstrapping will not increase the power of statistical tests. If you sample to few data to detect an effect of interest, using bootstrap will not magically solve your problem even worse the bootstrap approach will perform less well than others.

How many bootstrap samples

As much as possible will be the answer. Note that in this post I used low numbers to speed up the computations on my “old” computer.

Non-parametric and parametric bootstrap using the boot library

The boot library in R is very convenient to easily compute confidence intervals from bootstrap samples. Non-parametric bootstrap with boot:

library(boot)b1<-boot(data=height,statistic=function(x,i) median(x),R=1000)boot.ci(b1)BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 1000 bootstrap replicatesCALL : boot.ci(boot.out = b1)Intervals : Level    Normal             Basic       95% (174.1, 176.2 ) (174.2, 176.4 )  Level    Percentile          BCa       95% (173.9, 176.1 ) (173.8, 176.1 )  Calculations and Intervals on Original Scale
Parametric bootstrap with boot:
x<-runif(100,-2,2)y<-rnorm(100,1+2*x,1)dat<-data.frame(x=x,y=y)
Simple example with a linear model
m<-lm(y~x)
We are interested in getting the confidence intervals for the coefficient of the model:
foo<-function(out){  m<-lm(y~x,out)  coef(m)}
The function rgen generate new response vector from the model:
rgen<-function(dat,mle){  out<-dat  out$y<-unlist(simulate(mle))  return(out)}
Generate 1000 bootstrap sample
b2<-boot(dat,foo,R=1000,sim="parametric",ran.gen=rgen,mle=m)
Compute the confidence intervals for the two coefficients:
boot.ci(b2,type="perc",index=1)BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 1000 bootstrap replicates CALL : boot.ci(boot.out = b2, type = "perc", index = 1)Intervals : Level    Percentile    95% ( 0.8056,  1.1983 )  Calculations and Intervals on Original Scale
Compute the confidence intervals for the two coefficients:
boot.ci(b2,type="perc",index=2)BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 1000 bootstrap replicatesCALL : boot.ci(boot.out = b2, type = "perc", index = 2)Intervals : Level    Percentile    95% ( 1.871,  2.217 )  Calculations and Intervals on Original Scale
In the non-parametric case boot expect two arguments in the function returning the statistic of interest: the first one is the object from which to compute the statistic and the second is a vector of index (i), frequencies (f) or weight (w) defining the bootstrap sample. In our example, boot will generate a series of indices (named i) with replacement and these will be used to subset the original height vector. In the parametric case the function returning the statistic(s) of interest only need one argument: the original dataset. We then need to supply another function (ran.gen) describing how to generate the new data, it needs to return an object of the same form as the original dataset. This random data generating function need two arguments: the first one is the original dataset and the second one contain maximum likelihood estimate for the parameter of interest, basically a model object. The new dataset generated by theran.gen function will then be passed to the statistic function to compute the bootstrapped value for the statistic of interest. It is then straightforward to get the confidence intervals for the statistic usingboot.ci.
Bootstrap applied to mixed-effect models
Mixed-effect models are rather complex and the distributions or numbers of degrees of freedom of various output from them (like parameters …) is not known analytically. Which is why the author of the lme4 package recommend the use of bootstrap to get confidence intervals around the model parameters, the predicted values but also to get p-values from likelihood ratio tests.
A simple random intercept model:
dat<-data.frame(x=runif(100,-2,2),ind=gl(n=10,k=10))dat$y<-1+2*dat$x+rnorm(10,0,1.2)[dat$ind]+rnorm(100,0,0.5)m<-lmer(y~x+(1|ind),dat)
Get the bootstrapped confidence intervals for the model parameters:
b_par<-bootMer(x=m,FUN=fixef,nsim=200)boot.ci(b_par,type="perc",index=1)BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 200 bootstrap replicatesCALL : boot.ci(boot.out = b_par, type = "perc", index = 1)Intervals : Level    Percentile    95% ( 0.393,  1.824 )  Calculations and Intervals on Original ScaleSome percentile intervals may be unstable
Get the bootstrapped confidence intervals for the model parameters:
boot.ci(b_par,type="perc",index=2)BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 200 bootstrap replicatesCALL : boot.ci(boot.out = b_par, type = "perc", index = 2)Intervals : Level    Percentile    95% ( 1.857,  2.041 )  Calculations and Intervals on Original ScaleSome percentile intervals may be unstable
Or alternatively:
confint(m,parm=c(3,4),method="boot",nsim=200,boot.type="perc")             2.5 % 97.5 %(Intercept) 0.4337793 1.819856x          1.8611089 2.058930
Get the bootstrapped confidence intervals around the regression curves:
new_dat<-data.frame(x=seq(-2,2,length=20))mm<-model.matrix(~x,data=new_dat)predFun<-function(.) mm%*%fixef(.)bb<-bootMer(m,FUN=predFun,nsim=200) #do this 200 times
As we did this 200 times the 95% CI will be bordered by the 5th and 195th value:
bb_se<-apply(bb$t,2,function(x) x[order(x)])new_dat$LC<-bb_se[1,]new_dat$UC<-bb_se[2,] new_dat$pred<-predict(m,newdata=new_dat,re.form=~0)
Plot the results
plot(y~x,dat,pch=16)lines(pred~x,new_dat,lwd=2,col="orange")lines(LC~x,new_dat,lty=2,col="orange")lines(UC~x,new_dat,lty=2,col="orange")
Here is the plot:

[color=rgb(255, 255, 255) !important]

Finally get bootstrapped p-values from the likelihood ratio test between two models.
library(pbkrtest)m_0<-update(m,.~.-x)PBmodcomp(m,m_0,nsim=200)Parametric bootstrap test; time: 14.35 sec; samples: 200 extremes: 0;large : y ~ x + (1 | ind)small : y ~ (1 | ind)    stat df p.value LRT 275.19  1 < 2.2e-16 ***PBtest 275.19    0.004975 ** ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Drawing confidence intervals around the regression curves is tricky due to the random effect estimated values which comes with there own uncertainty (have a look at dotplot(ranef(m,condVar=TRUE)) to see it). Bootstrapping is an efficient way to take these uncertainties into account since the random deviates are re-computed for each draw. Finally getting p-values for the effect of a fixed-effect term can be done using a parametric bootstrap approach as described here and implemented in the function PBmodcompfrom the pbkrtest package. In the output of PBmodcomp the bootstrapped p-values is in the PBtest line, the LRT line report the standard p-value assuming a chi-square distribution for the LRT value.
Conclusion
With computers being always faster, bootstrap enable us to get reliable confidence interval estimate (given that your original sample size is large enough) without relying on some hypothetical distributions. Bootstrap can also be extended in a bayesian framework, see for example this nice post.
If you have any comments or suggestion feel free to post a comment below.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

沙发

condmn

2015-11-29 15:58:22

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

藤椅

seahhj

2015-11-29 16:30:37

thanks for sharing

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

相关推荐

[下载]金融数学基础课程《 Introduction To Stochastic Calculus With Applications》

An introduction to Lévy processes with applications in finance

WEILY出版社：MATLAB：An Introduction with Applications(第4版）

Measure_theory_and_filtering__introduction_and_applications

MATLAB.. An Introduction with Applications 第四版

An Introduction to Bootstrap Methods with Applications to R

an introduction to machine learning with applications in R

求书：An Informal Introduction to Stochastic Calculus with Applications

求书：An Introduction to Population Genetics: Theory and Applications

Point Process Calculus in Time and Space: An Introduction with Applications

栏目导航

LATEX论坛

金融实务版

经管文库（原现金交易版）

外文文献专区

行业分析报告

经管在职研

热门文章

还在为论文写作头秃？这位发表100+篇SSCI的 ...

《CDA教材一级：商业数据分析》2025版官方教 ...

CDA 数据分析师：主成分分析（PCA）实战指南 ...

An Introduction to Mathematical Analysis ...

Management 16th Edition by Stephen P. Ro ...

Managing for Quality and Performance Exc ...

2022爱情伤感说说大全

2025AI时代超级设计师研究手册

《国富论》第一至三卷The Wealth of Nation ...

2025年健康保险实务考试回忆

推荐文章

AI狂潮席卷学术圈，不会编程也能打造专属智 ...

最快1年拿证，学费不足5W！热门美国人工智能 ...

关于如何利用文献的若干建议

关于学术研究和论文发表的一些建议

关于科研中如何学习基础知识的一些建议 (一 ...

一个自编的经济学建模小案例 --写给授课本科 ...

AI智能体赋能教学改革: 全国AI教育教学应用 ...

2025中国AIoT产业全景图谱报告-406页

关于文献求助的一些建议

几种免费下载文献的方法----我的文献应助经

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群