HLM样本数据 - 经管之家

HLM样本数据

ainur

7271

收藏 2014-04-01

用HLM处理数据每一层最少的样本量多少，我做的是两层的，个体层次和团队层次？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

ReneeBK

2014-4-1 22:31:42

Although there is not much you can do from a power perspective, there are some precautions you can take to ensure that the estimates are unbiased. For the variance components, using REML will provide unbiased estimates. A study by Browne and Draper (2006) in Bayesian Analysis attained unbiased variance component estimates with as few as 6 clusters using REML.

For fixed effects, using a Kenward-Roger degree of freedom adjustment has been shown to provide unbiased estimates with small sample size. There is an advance article in Methodology by Bell et al that uses Kenward-Roger and estimates showed negligible bias with as few as 10 clusters. Kenward-Roger is available in SAS using the DDFM option in the MODEL statement.

Another option is to use a Bayesian framework. A 2010 study in the International Journal of Biostatistics by Austin found that Bayesian estimates were unbiased with as few as about 7 clusters with only 10 observations in each cluster.

Hope That Helps!

Dan McNeish

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:33:12

If you look into it, you will find that multilevel models will not perform adequately with as few as 8 groups. There are debates about how many is enough, but the debate does not get anywhere near 8. It's not even a question of power, but bias in the estimation

Robert Brennan

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:34:30

Good chance of negative estimated variance components with n2=8. Better to use survey-sensitive analysis software that corrects the variance estimates for clustering without trying to estimates components of variance at the same time.

Dave Judkins

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:35:31

I'm not sure if we are referring to the same method, but if by clustered regression you are referring to design based methods such as GEEs and sandwich estimators, those methods encounter similar difficulties with downwardly biased standard errors with small sample sizes. There are some attempts to correct the bias (e.g. Pan and Wall 2002 or Morel, Bokossa,and Neerchal 2003) but from my understanding they are not any more effective than using Kenward-Roger and also require the assumption that the model is properly specified. Kenward-Roger also allows for the variance components to be estimated rather than only producing marginal estimates as is the case with GEEs.

David McNeish

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:37:58

Clustered regression (which is also what Dave calls the survey-sensitive methods) is not advisable for a small number of groups as small as 8.

For the clustered regression, I made a literature study of the sandwich estimator and this is summarized in Section 12.2 of the 2nd edition of Snijders & Bosker, "Multilevel Analysis" (Sage, 2012). The executive summary is that it is doubtful for small numbers of groups like less than 20 or 30. It certainly is not a panacea.

Best regards,

Tom Snijders

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

ReneeBK

2014-4-1 22:39:13

First, here is a StataCorp faq on what they do for cluster-adjusted robust SE's: http://www.stata.com/support/faqs/statistics/references/

Secondly, esp. re: small number of clusters, I recommend the following article: Cameron, AC and Miller, DL (2011), "Robust inference with clustered data," in A Ullah and DE Giles, _Handbook of Empirical Economics and Finance_, CRC Press, pp. 1-28; their basic answer is that the cluster-adjusted SE's are should not be used with a small number of clusters; they suggest a variation on clustered bootstrap for which there is some, but not full, code available on-line.

Richard Goldstein

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:42:26

I think Bob's and Dave's suggestions are good (treat characteristics of the sites as fixed effects, don't try to estimate level 2 variance components). You could at least account for some of the dependency within sites by treating site as a clustering variable in Mplus (not as a second level of a multilevel model) using Type = Complex, and using MLR to obtain robust standard errors. You could also do this in the regress module in Stata by using vce(cluster site) to obtain robust standard errors controlling for within site dependency. In both programs, you can also use this approach for non-normally distributed dependent variables, e.g. binary outcomes examined with logistic regression

Bruce Cooper

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:44:05

The references I had cited were methods to obtain unbiased estimates for all estimates. From studies by Maas and Hox 2004 and 2005, fixed effect standard errors and level two variance components are the biggest worry with small samples. REML can address the level two variance components and Kenward-Roger can address the fixed effects standard error bias.

Also, Bethany Bell and colleagues presented a paper at the M3 conference at UConn in 2011 that compared a MLM with KR and REML with small sample size to a single level model with clusters as fixed effects and found the estimates were much closer to the true values with 10 clusters in their simulation study. I don't believe this study has been published yet, however.

Dan McNeish

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:46:22

I was interested in this discussion, and wondered about results from an extreme case -- say just five clusters, with six observations in each. The results of a quick simulation, and the R code for conducting the simulation, are below. The first line includes the means across 1000 simulations (which take just a couple minutes to run), and the second line the standard deviations. The means should all be 1s.

Maybe people will find this useful. The biases don't seem large to me, but there are lots of other issues to take into account. I haven't checked whether the estimated SEs are anti- (or over-)conservative, for instance, though that wouldn't be hard to do.
=================================================================
library(lme4.0)
nsims <- 1000
set.seed(080813)
dgp <- function(N=5,n=6) { within(data.frame(grp = gl(N,n), x1 = rnorm(N*n), x2 = runif(N)[rep(1:N, each=n)]), y <- 1 + x1 + x2 + rnorm(N)[grp] + rnorm(N*n)) } # function to generate the data
c.mer <- function(mod) { c(fixef(mod), c(unlist(lapply(VarCorr(mod), diag)), attr(VarCorr(mod), "sc")^2)) } # function to extract FEs, REs
sim <- function(N, n) { c.mer(lmer(y ~ x1 + x2 + (1 | grp), dgp())) } # function to run a simulation
apply(sapply(1:nsims, sim), 1, function(xx) c(mean(xx),sd(xx))) # run nsims simulations, get means and SDs

[1,] 1.050036 1.0013884 0.9049677    1.0322423 1.0104995
[2,] 1.429368 0.2163859 2.7632334    0.9928147 0.2887274=================================================================

Malcolm Fairbrother

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:47:32

After vacation, I was able to revisit this issue trigged by David Kondrat's question of what to do with just 8 level-2 units.  As you may recall, Dan McNeish stepped in with a reference to a pre-publication paper by Bethany Bell et al., and others mentioned relevant sections of Snijders and Bosker and of Ullah and Giles.  Bell et al report good performance for multilevel modeling with REML and Kenward-Rogers options for level-sample sizes as small as 10.

Section 12.2 of Snijders and Bosker took me back to work by a different Bell, Robert M. Bell (with Dan McCaffrey) in Survey Methodology.  They showed that the sandwich estimator works for level-2 n>1 provided that the relevant portion of the design matrix is constant across clusters.  I am going to oversimplify their work and say that if the independent variable of central interest has itself an ICC of zero, then there is no need to worry about using the sandwich estimator that is common in survey-sensitive regression software to adjust for the effects of clustering on the variances of the estimated parameters in a single-level model.  On the other hand, if there is substantial ICC in the independent variable of interest, then there might be severe problems in sandwich variance estimators.

I am particularly interested in multi-site individually randomized trials.  Here the independent variable of interest is randomly assigned treatment status.  So Bell and McCaffrey's work would seem to suggest that as long as the randomization fraction is constant across sites, there are no problems in the sandwich variance estimator.  I verified this with a simulation study with as few as 3 sites and 100 level-1 units per site.  I used a constant randomization fraction of 0.67 within each site.  I also had a level-1 covariate that explained a widely varying portion of the variance of the dependent variable across sites.  Even with variable treatment effects across sites, the SAS procedure Surveyreg performed nearly perfectly.  The same cannot be said of the SAS procedure MIXED with REML and Kenward-Rogers options, random slopes, and random intercepts.  It worked just as well as Surveyreg for 8 sites, but started to be slightly liberal with 5 sites, and was considerably liberal with 3 sites.

Dave Judkins

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 22:48:08

Try the EAM journal Methodology (see http://www.eam-online.org/)

Joop Hox professor Utrecht University

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群