To repeat my situation: I am asked to assist in applying a multilevel model
(through GENLINMIXED) with a survey based on a three-stage clustered sample
(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,
neighborhoods within towns, and households within towns and neighborhoods,
the question is what to do in this case about sample weighting.
Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.
One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).
SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.
On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.
Anybody?