Big Data Issue at HLM?

2043

收藏 2014-05-26

I am faced with a nested random effects logistic regression model where the sample size at the 2nd level is excessively large (several hundred of thousands). There are three levels: level 1 has about 4 observations per level 2 unit, level 3 has several hundred, but the sample size at level 2 is extremely large.The model has some additional complexities not worth mentioning at the moment.

Employing virtually any estimation method (quadrature, Bayesian) results in either non-convergence due to memory limitations of my machine or days/weeks to achieve convergence.
Often one reads about the minimum size necessary to obtain nearly unbiased estimates, but there is certainly a point at which a sample size can become excessive.

I know if I reduce the number of level 2 units randomly by say 50% or so, I can achieve convergence within a reasonable amount of time. Moreover, the estimates, standard errors, and p-values are virtually the same. I would like references to support what I am finding.

Are there any articles/textbooks which speak to this issue and provide guidelines/recommendations?

Subramanian S V, Duncan C, Jones K, 2001, "Multilevel perspectives on modeling census data" Environment and Planning A 33(3) 399 – 417
"Aggregation", in Snijders & Bosker, "Multilevel Analysis", Sage 2nd edition 2012.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群