Model Selection for Incomplete and Design-Based Samples: An Application to Cervix Cancer Screening
AbstractThe Akaike Information Criterion, AIC, is one of the most frequently used methods to select one or a few good, optimal regression models from a set of candidate models. In case the sample is incomplete, the naive use of this criterion on the so-called complete cases can lead to the selection of poor or inappropriate models. A similar problem occurs when a sample based on a design with unequal selection probabilities, is treated as a simple random sample. In this paper we consider a modification of AIC, based on reweighing the sample in analogy with the weighted Horvitz-Thompson estimates. It is shown that this weighted AIC-criterion provides better model choices for both incomplete and design-based samples. The use of the weighted AIC-criterion is illustrated on data from the Belgian Health Interview Survey, which motivated this research. Simulations show its performance in a variety of settings.
http://missingdata.lshtm.ac.uk/preprints/hens_et_al_v2.pdf