How This Book Is Organized
x CONTENTS
3 Reformulating Ordinary Regression Analysis in Matrix Notation 30
3.1 Writing the Ordinary Regression Equation in Matrix Notation 31
3.1.1 Example 32
3.2 Obtaining the Least-Squares Estimator βˆ in Matrix Notation 33
3.2.1 Example: Matrices in Regression Analysis 34
3.3 List of Matrix Operations to Know 36
4 Variance Matrices and Linear Transformations 38
4.1 Variance and Correlation Matrices 38
4.1.1 Example 40
4.2 How to Obtain the Variance of a Linear Transformation 40
4.2.1 Two Variables 40
4.2.2 Many Variables 42
5 Variance Matrices of Estimators of Regression Coefficients 51
5.1 Usual Standard Error of Least-Squares Estimator of Regression
Slope in Nonmatrix Formulation 51
5.2 Standard Errors of Least-Squares Regression Estimators in Matrix
Notation 52
5.2.1 Example 53
5.3 The Large Sample Variance Matrix
of Maximum Likelihood Estimators 54
5.4 Tests and Confidence Intervals 56
5.4.1 Example-Comparing PROC REG and PROC MIXED 57
6 Dealing with Unequal Variance Around the Regression Line 62
6.1 Ordinary Least Squares with Unequal Variance 62
6.1.1 Examples 64
6.2 Analysis Taking Unequal Variance into Account 66
6.2.1 The Functional Transformation Approach 66
6.2.2 The Linear Transformation Approach 68
6.2.3 Standard Errors of Weighted Regression Estimators 73
Output Packet III: Applying the Empirical Option to Adjust
Standard Errors 75
Output Packet IV: Analyses with Transformation of the Outcome Variable
to Equalize Residual Variance 83
Output Packet V: Weighted Regression Analyses of GHb
Data on Age 93
CONTENTS xi
7 Application of Weighting with Probability Sampling
and Nonresponse 97
7.1 Sample Surveys with Unequal Probability Sampling 98
7.1.1 Example 101
7.2 Examining the Impact of Nonresponse 102
7.2.1 Example (of Reweighting as Well
as Some SAS Manipulations) 104
7.2.2 A Few Comments on Weighting by a Variable Versus
Including it in the Regression Model 107
Output Packet VI: Survey and Missing Data Weights 109
8 Principles in Dealing with Correlated Data 119
8.1 Analysis of Correlated Data by Ordinary Unweighted
Least-Squares Estimation 120
8.1.1 Example 121
8.1.2 Deriving the Variance Estimator 122
8.1.3 Example 124
8.2 Specifying Correlation and Variance Matrices 124
8.3 The Least-Squares Equation Incorporating Correlation 126
8.3.1 Another Application of the Spectral Theorem 127
8.4 Applying the Spectral Theorem to the Regression Analysis
of Correlated Data 128
8.5 Analysis of Correlated Data by Maximum Likelihood 129
8.5.1 Non equal Variance 130
8.5.2 Correlated Errors 131
8.5.3 Example 132
Output Packet VII: Analysis of Longitudinal Data in Wisconsin
Sleep Cohort 135
9 A Further Study of How the Transformation Works
with Correlated Data 145
9.1 Why Would βW and βB Differ? 147
9.2 How the Between- and Within-Individual Estimators are Combined 149
9.3 How to Proceed in Practice 151
9.3.1 Example 152
Output Packet VIII: Investigating and Fitting Within- and
Between-Individual Effects 154
10 Random Effects 156
10.1 Random Intercept 156
10.1.1 Example 159
xii CONTENTS
10.1.2 Example 161
10.2 Random Slopes 161
10.2.1 Example 165
10.3 Obtaining “The Best” Estimates of Individual Intercepts
and Slopes 167
10.3.1 Example 167
Output Packet IX: Fitting Random Effects Models 169
11 The Normal Distribution and Likelihood Revisited 181
11.1 PROC GENMOD 182
11.1.1 Example 183
Output Packet X: Introducing PROC GENMOD 184
12 The Generalization to Non-normal Distributions 190
12.1 The Exponential Family 190
12.1.1 The Binomial Distribution 192
12.1.2 The Poisson Distribution 193
12.1.3 Example 194
12.2 Score Equations for the Exponential Family
and the Canonical Link 194
12.3 Other Link Functions 196
12.3.1 Example 197
13 Modeling Binomial and Binary Outcomes 199
13.1 A Brief Review of Logistic Regression 199
13.1.1 Example: Review of the Output from PROC LOGIST 200
13.2 Analysis of Binomial Data in the Generalized Linear
Models Framework 202
13.2.1 Example of Logistic Regression with Binary Outcome 206
13.2.2 Example with Binomial Outcome 207
13.2.3 Some More Examples of Goodness-of-Fit Tests 209
13.3 Other Links for Binary and Binomial Data 209
13.3.1 Example 211
Output Packet XI: Logistic Regression Analysis with PROC LOGIST
and PROC GENMOD 212
Output Packet XII: Analysis of Grouped Binomial Data 221
Output Packet XIII: Some Goodness-of-Fit Tests for Binomial Outcome 223
Output Packet XIV: Three Link Functions for Binary Outcome 229
Output Packet XV: Poisson Regression 247
Output Packet XVI: Dealing with Overdispersion in Rates 254
CONTENTS xiii
14 Modeling Poisson Outcomes—The Analysis of Rates 236
14.1 Review of Rates 236
14.1.1 Relationship Between Rate and Risk 238
14.2 Regression Analysis 239
14.3 Example with Cancer Mortality Rates 241
14.3.1 Example with Hospitalization of Infants 242
14.4 Overdispersion 243
14.4.1 Fitting a Dispersion Parameter 244
14.4.2 Fitting a Different Distribution 245
14.4.3 Using Robust Standard Errors 245
14.4.4 Applying Adjustments for Over Dispersion
to the Examples 246
Output Packet XV: Poisson Regression 247
15 Modeling Correlated Outcomes with Generalized
Estimating Equations 263
15.1 A Brief Review and Reformulation of the Normal Distribution,
Least Squares and Likelihood 263
15.2 Further Developments for the Exponential Family 264
15.3 How are the Generalized Estimating Equations Justified? 266
15.3.1 Analysis of Longitudinal Systolic Blood Pressure
by PROC MIXED and GENMOD 267
15.3.2 Analysis of Longitudinal Hypertension Data
by PROC GENMOD 269
15.3.3 Analysis of Hospitalizations Among VLBW Children
Up to Age 5 271
15.4 Another Way to Deal with Correlated Binary Data 273
Output Packet XVII: Mixed Versus GENMOD for Longitudinal SBP
and Hypertension Data 274
Output Packet XVIII: Longitudinal Analysis of Rates 285
Output Packet XIX: Conditional Logistic Regression
of Hypertension Data 288
References 290
Appendix: Matrix Operations 295
A.1 Adding Matrices 296
A.2 Multiplying Matrices by a Number 297
A.3 Multiplying Matrices by Each Other 297
A.4 The Inverse of a Matrix 299
Index