Maximum likelihood estimation for sample surveys
Chambers, R. L. (2012). Maximum likelihood estimation for sample surveys, CRC Press.
Contents
Preface xv
1 Introduction 1
1.1 Nature and role of sample surveys 1
1.2 Sample designs 3
1.3 Survey data, estimation and analysis 6
1.4 Why analysts of survey data should be interested in
maximum likelihood estimation 8
1.5 Why statisticians should be interested in the analysis of
survey data 9
1.6 A sample survey example 9
1.7 Maximum likelihood estimation for infinite populations 12
1.7.1 Data 12
1.7.2 Statistical models 13
1.7.3 Likelihood 14
1.7.4 Score and information functions 15
1.7.5 Maximum likelihood estimation 17
1.7.6 Hypothesis tests 19
1.7.7 Confidence intervals 20
1.7.8 Sufficient and ancillary statistics 20
1.8 Bibliographic notes 21
2 Maximum likelihood theory for sample surveys 25
2.1 Introduction 25
2.2 Maximum likelihood using survey data 26
ix
x CONTENTS
2.2.1 Basic concepts 26
2.2.2 The missing information principle 30
2.3 Illustrative examples with complete response 33
2.3.1 Estimation of a Gaussian mean: Noninformative
selection 33
2.3.2 Estimation of an exponential mean: Cutoff sam-
pling 37
2.3.3 Estimation of an exponential mean: Size-biased
sampling 38
2.4 Dealing with nonresponse 39
2.4.1 The score and information functions under nonre-
sponse 40
2.4.2 Noninformative nonresponse 41
2.5 Illustrative examples with nonresponse 42
2.5.1 Estimation of a Gaussian mean under noninfor-
mative nonresponse: Noninformative selection 42
2.5.2 Estimation of a Gaussian mean under noninforma-
tive item nonresponse: Noninformative selection 43
2.5.3 Estimation of a Gaussian mean under informative
unit nonresponse: Noninformative selection 47
2.5.4 Estimation of an exponential mean under infor-
mative nonresponse: Cutoff sampling 49
2.6 Bibliographic notes 51
3 Alternative likelihood-based methods for sample sur-
vey data 55
3.1 Introduction 55
3.1.1 Design-based analysis for population totals 56
3.2 Pseudo-likelihood 60
3.2.1 Maximum pseudo-likelihood estimation 60
3.2.2 Pseudo-likelihood for an exponential mean under
size-biased sampling 62
3.2.3 Pseudo-Likelihood for an exponential mean under
cutoff sampling 63
3.3 Sample likelihood 64
3.3.1 Maximum sample likelihood for an exponential
mean under size-biased sampling 66
CONTENTS xi
3.3.2 Maximum sample likelihood for an exponential
mean under cutoff sampling 70
3.4 Analytic comparisons of maximum likelihood, pseudo-
likelihood and sample likelihood estimation 72
3.5 The role of sample inclusion probabilities in analytic
analysis 75
3.6 Bayesian analysis 83
3.7 Bibliographic notes 85
4 Populations with independent units 89
4.1 Introduction 89
4.2 The score and information functions for independent
units 89
4.3 Bivariate Gaussian populations 91
4.4 Multivariate Gaussian populations 96
4.5 Non-Gaussian auxiliary variables 104
4.5.1 Modeling the conditional distribution of the survey
variable 109
4.5.2 Modeling the marginal distribution of the auxiliary
variable 111
4.5.3 Maximum likelihood analysis for µ and σ 2 115
4.5.4 Fitting the auxiliary variable distribution via
method of moments 117
4.5.5 Semiparametric estimation 121
4.6 Stratified populations 122
4.7 Multinomial populations 126
4.8 Heterogeneous multinomial logistic populations 135
4.9 Bibliographic notes 144
5 Regression models 145
5.1 Introduction 145
5.2 A Gaussian example 148
5.3 Parameterization in the Gaussian model 152
xii CONTENTS
5.4 Other methods of estimation 154
5.5 Non-Gaussian models 157
5.6 Different auxiliary variable distributions 158
5.6.1 The folded Gaussian model for the auxiliary
variable 159
5.6.2 Regression in stratified populations 160
5.7 Generalized linear models 164
5.7.1 Binary regression 165
5.7.2 Generalized linear regression 166
5.8 Semiparametric and nonparametric methods 168
5.9 Bibliographic notes 170
6 Clustered populations 173
6.1 Introduction 173
6.2 A Gaussian group dependent model 178
6.2.1 Auxiliary information at the unit level 178
6.2.2 Auxiliary information at the cluster level 187
6.2.3 No auxiliary information 191
6.3 A Gaussian group dependent regression model 193
6.4 Extending the Gaussian group dependent regression
model 201
6.5 Binary group dependent models 203
6.6 Grouping models 207
6.7 Bibliographic notes 214
7 Informative nonresponse 217
7.1 Introduction 217
7.2 Nonresponse in innovation surveys 223
7.2.1 The mixture approach 224
7.2.2 The mixture approach with an additional variable 228
7.2.3 The mixture approach with a follow up survey 233
7.2.4 The selection approach 237
7.3 Regression with item nonresponse 242
CONTENTS xiii
7.3.1 Item nonresponse in y 248
7.3.2 Item nonresponse in x 250
7.3.3 Selection models for item nonresponse in y 254
7.4 Regression with arbitrary nonresponse 267
7.4.1 Calculations for s 01 280
7.4.2 Calculations for s 10 281
7.4.3 Calculations for s 00 284
7.5 Imputation versus estimation 290
7.6 Bibliographic notes 295
8 Maximum likelihood in other complicated situations 299
8.1 Introduction 299
8.2 Likelihood analysis under informative selection 301
8.2.1 When is selection informative? 301
8.2.2 Maximum likelihood under informative Hartley–
Rao sampling 302
8.2.3 Maximum sample likelihood under informative
Hartley–Rao sampling 306
8.2.4 An extension to the case with auxiliary variables 309
8.2.5 Informative stratification 310
8.3 Secondary analysis of sample survey data 316
8.3.1 Data structure in secondary analysis 316
8.3.2 Approximate maximum likelihood with partial
information 317
8.4 Combining summary population information with likeli-
hood analysis 321
8.4.1 Summary population information 321
8.4.2 Linear regression with summary population infor-
mation 323
8.4.3 Logistic regression with summary population
information 329
8.4.4 Smearing and saddlepoint approximations under
case-control sampling 333
8.4.5 Variance estimation 336
8.4.6 A derivation of the saddlepoint approximation in
Subsection 8.4.3 339
xiv CONTENTS
8.5 Likelihood analysis with probabilistically linked data 341
8.5.1 A model for probabilistic linkage 342
8.5.2 Linear regression with population-linked data 344
8.5.3 Linear regression with sample-linked data 348
8.6 Bibliographic notes 350
Notation 353
Author Index 357
Example Index 361
Subject Index 365