This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. It provides an intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis.
Table of Contents
Preface
Chapter 1 Introduction
Prediction Versus Interpretation, Key Ingredients of Predictive Models; Terminology; Example Data Sets and Typical Data Scenarios; Overview; Notation (15 pages, 3 figures)
Part I: General Strategies
Chapter 2 A Short Tour of the Predictive Modeling Process
Case Study: Predicting Fuel Economy; Themes; Summary (8 pages, 6 figures, R packages used)
Chapter 3 Data Pre-Processing
Case Study: Cell Segmentation in High-Content Screening; Data Transformations for Individual Predictors; Data Transformations for Multiple Predictors; Dealing with Missing Values; Removing Variables; Adding Variables; Binning Variables; Computing; Exercises (32 pages, 11 figures, R packages used)
Chapter 4 Over-Fitting and Model Tuning
The Problem of Over-Fitting; Model Tuning; Data Splitting; Resampling Techniques; Case Study: Credit Scoring; Choosing Final Tuning Parameters; Data Splitting Recommendations; Choosing Between Models; Computing; Exercises (29 pages, 13 figures, R packages used)
Part II: Regression Models
Chapter 5 Measuring Performance in Regression Models
Quantitative Measures of Performance; The Variance-Bias Tradeoff; Computing (4 pages, 3 figures)
Chapter 6 Linear Regression and Its Cousins
Case Study: Quantitative Structure-Activity Relationship Modeling; Linear Regression; Partial Least Squares; Penalized Models; Computing; Exercises (37 pages, 20 figures, R packages used)
Chapter 7 Non-Linear Regression Models
Neural Networks; Multivariate Adaptive Regression Splines; Support Vector Machines; K-Nearest Neighbors; Computing; Exercises (28 pages, 10 figures, R packages used)
Chapter 8 Regression Trees and Rule-Based Models
Basic Regression Trees; Regression Model Trees; Rule-Based Models; Bagged Trees; Random Forests; Boosting; Cubist; Computing; Exercises (46 pages, 24 figures, R packages used)
Chapter 9 A Summary of Solubility Models
(3 pages, 3 figures)
Chapter 10 Case Study: Compressive Strength of Concrete Mixtures
Model Building Strategy; Model Performance; Optimizing Compressive Strength; Computing (12 pages, 5 figures, R packages used)
Part III: Classification Models
Chapter 11 Measuring Performance in Classification Models
Class Predictions; Evaluating Predicted Classes; Evaluating Class Probabilities; Computing (20 pages, 9 figures, R packages used)
Chapter 12 Discriminant Analysis and Other Linear Classification Models
Case Study; Logistic Regression; Linear Discriminant Analysis; Partial Least Squares Discriminant Analysis; Penalized Models; Nearest Shrunken Centroids; Computing; Exercises (52 pages, 20 figures, R packages used)
Chapter 13 Non-Linear Classification Models
Nonlinear Discriminant Analysis; Neural Networks; Flexible Discriminant Analysis; Support Vector Machines; K-Nearest Neighbors; Naive Bayes; Computing; Exercises (38 pages, 16 figures, R packages used)
Chapter 14 Classification Trees and Rule-Based Models
Basic Regression Trees; Rule-Based Models; Bagged Trees; Random Forests; Boosting; C5.0; Wrap-Up; Computing (46 pages, 15 figures, R packages used)
Chapter 15 A Summary of Grant Application Models
(3 pages, 2 figures)
Chapter 16 Remedies for Severe Class Imbalance
Case Study: Predicting Caravan Policy Ownership; The Effect of Class Imbalance; Model Tuning; Alternate Cutoffs; Adjusting Prior Probabilities; Unequal Case Weights; Sampling Methods; Cost-Sensitive Training; Computing; Exercises (24 pages, 7 figures, R packages used)
Chapter 17 Case Study: Job Scheduling
Data Splitting and Model Strategy; Results; Computing (13 pages, 6 figures, R packages used)
Part IV: Other Considerations
Chapter 18 Measuring Predictor Importance
Numeric Outcomes; Categorical Outcomes; Other Approaches; Computing; Exercises (24 pages, 10 figures, R packages used)
Chapter 19 An Introduction to Feature Selection
Consequences of Using Non-Informative Predictors; Approaches for Reducing the Number of Predictors; Wrappers Methods; Filter Methods; Selection Bias; Misuse of Feature Selection; Case Study: Predicting Cognitive Impairment; Computing; Exercises (34 pages, 7 figures, R packages used)
Chapter 20 Factors That Can Affect Model Performance
Type III Errors; Measurment Error in the Outcome; Measurement Error in the Predictors; Discretizing Continuous Outcomes; When Should You Trust Your Model’s Prediction?; The Impact of a Large Sample; Computing; Exercises (26 pages, 12 figures, R packages used)
Appendix
These are included in the sample pages on Spinger's website.
Appendix A A Summary of Various Models
Appendix B An Introduction to R
Startup and Getting Help; Packages; Creating Objects; Data Types and Basic Structures; Working with Rectangular Data Sets; Objects and Classes; R Functions; The Three Faces of =; The AppliedPredictiveModeling Package; The caret Package; Software Used in This Text (16 pages, 1 figure, R packages used)
Appendix C Interesting Websites
References
Index