Weighting in GENLINMIXED

1226

收藏 2014-04-18

I am asked to assist in an application of a multilevel model through GENLINMIXED, for a survey based on a three-stage clustered sample(first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting.
For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household samplesize/total households in the population).
The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

ReneeBK

2014-4-18 11:11:30

To repeat my situation: I am asked to assist in applying a multilevel model
(through GENLINMIXED) with a survey based on a three-stage clustered sample
(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,
neighborhoods within towns, and households within towns and neighborhoods,
the question is what to do in this case about sample weighting.

Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.

One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).

SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.

On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.

Anybody?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-18 11:12:02

Hector,

From what you've reported, it sounds like GENLINMIXED will not account for the probability weighting scheme that you desire. Perhaps there is a workaround, but I have not explored this area enough to provide any recommendations.

Having said that, you may not be aware that you've stepped into a highly debated topic--when and how to use weighting in the context of hierarchical models. And even more broadly, whehther one should even employ a hierarchical model in the context of multi-stage sampling.

Here's a very informative SUGI paper (by David Cassell) that touches on the latter topic (a mixed model procedure versus a survey sampling procedure that allows for clustering). It is a MUST read:

http://www2.sas.com/proceedings/sugi31/193-31.pdf

David Cassell also posted a message on SAS-L (among many similar messages) that I've found to be informative when thinking about using (or perhaps not using) hierarchical models with data obtained from complex survey designs:

http://listserv.uga.edu/cgi-bin/wa?A2=ind0602D&L=sas-l&P=R44987

I also suggest you consider reviewing Chapter 14 in the second edition of "Multilevel Analysis: An introduction to basic and advanced
multilevel modeling":

http://www.stats.ox.ac.uk/~snijders/SnBos_contents.html

HTH,

Ryan

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群