GEE Using SAS Procedure Genmod?

2857

收藏 2014-03-20

I need to help a monitoring program estimate linear temporal trends over annual means. in all cases, multiple observations (L1) are nested within years (L2). For this project, the program has no interest in within-group associations. Estimating linear trends would be routine using LMMs--except that sampling probabilities for some combinations vary within years. an alternative would be to equate means with sample means/design-based sample means--under some predefined criterion for reliability. The min estimated
reliability is 0.5, with 90% of estimated reliabilities attaining or exceeding 0.93 (n = 1456).

I'll be grateful for comments re estimating trends in means using sample means provided a caveat is supplied for trends estimated where reliabilities fall below some arbitrary threshold. Also, if anyone has seen rules of thumb for reliabilities, I'd be grateful for those too.

Thanks.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

农村固定观察点

2014-3-20 04:02:45

Not sure I completely understand your situation, but it sounds as though the simplest solution might be GEE, which would allow a different correlation structure within each year and would treat the within-year correlation structure as a nuisance. Alternatively, you could model the heteroschedasticity within years (Snijders & Bosker have a chapter on this), but it sounds like there isn't interest in that.

In survey research I've seen a guideline that estimates would be considered unreliable if the standard error exceeded 30% of the estimate. This was for NAMCS/NHAMCS in particular, but Wikipedia attributes the same guideline to the National Center for Health Statistics.

Hope this helps.

Alan Ellis

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

农村固定观察点

2014-3-20 04:03:16

Thanks.  I wasn't aware that I could let the correlation structure vary by subject (year) using GEEs.  This might be an option--altho not where selection probabilities varied.  I would like to address heteroskedasticity in sampling variation but, given the other concerns, this may not be an option.

I should've provided more detail.  the data represent measurements on each of multiple water characteristics, and derive from stratified random samples from multiple river impoundments.  Selection probabilities varied by strata. The impoundments were revisited annually but the sampling units within the impoundments were not revisited.  Hence, repeatedness is at the grouping
(year) scale.  Impoundments were revisited over a period of 20 years (one year missing).

Trend across years could be estimated using survey software, with year treated as cluster.  A disadvantage is that I don't know of a way to let L1 variance vary by year using survey software.  Otoh, and as I mentioned yesterday, L1 variance of the mean is typically trivial.  The alternative method of estimating trends that I described yesterday would be to estimate trends over sample means (derived using the Horvitz-Thompson estimator when estimated over strata).
This approach also suffers from treating L1 variance as constant (as zero in this alternative) across years.  however, given the typically high reliabilities I mentioned yesterday, the use of sample means seems moderately defensible.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

农村固定观察点

2014-3-20 04:05:03

I'd hate to see you throw away information by focusing only on the means (though, as you say, most of the time you wouldn't be throwing away much).  I think you could use GEE to address both the sampling issue and the repeated measures issue.  Assuming that you have a separate model for each water characteristic, and you assign inverse-probability-of-selection weights to deal with the differential sampling, I believe the code in SAS PROC GENMOD would look something like this:

proc genmod data=dsname;
   class impound year;
   model waterchar = predictors /link=identity dist=normal;
   weight sampwt;
   /* year is nested within impound */
   /* within impound and year, correlation between any 2 sampling units is the same */
   repeated subject=year(impound) / type=exch;
run;

I think you'd need to include a categorical version of year to identify the repeated observations within impound and year, and a numeric version to predict the mean of the water characteristic.  Of course, I wouldn't want to lead you astray-- you're in the best position to check that the modeling assumptions make sense for your data.

Hope this helps.  I'd be interested in hearing how your analysis turns out.

Alan Ellis

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

农村固定观察点

2014-3-20 04:06:33

My understanding is that weighting for non-survey methods relates to variances (for GEEs, to a dispersion parameter) and that, for these methods, the use of sampling weights will be inappropriate. I'll be glad to be corrected.

I've since run a few simulations to compare trends estimated over clustered, normally-distributed data (with explicit adjustment for sampling weights and stratification) using survey regression software with those estimated using SLR over design-based means estimated from those same data. mean reliabilities were high (~0.95). Mean trend/slope and SE(trend) estimates were effectively indistinguishable across methods. I suspect these results follow from the use of linear models with normal data.

Comments welcome.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

农村固定观察点

2014-3-20 04:07:35

Hmm, maybe that's true. I read some related threads on other lists, and re-read the SAS manual for PROC GENMOD, and was reminded that (as you pointed out) the WEIGHT statement in GENMOD affects the dispersion parameter.

In epidemiology it is common to implement propensity score weighting with the WEIGHT and REPEATED statements in PROC GENMOD (i.e., a GEE model), and I have always seen propensity score weights as a type of sampling weight, but apparently not everyone agrees with this view. I would love to learn more. Maybe someone else can explain or provide some relevant citations? (But, failing that, some additional simulations might provide a relatively simple way to compare what happens when sampling weights are used in a GEE model vs. used in a survey regression model.)

In case it helps, here are a few of the things I read:

http://grokbase.com/t/r/r-help/1 ... with-sample-weights

http://www.stata.com/statalist/archive/2009-07/msg00802.html

http://www.stata.com/support/faq ... timation-and-xtgee/

http://www.sas.com/offices/NA/ca ... terofSomeWeight.pdf

http://support.sas.com/documenta ... _genmod_sect034.htm

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群