Contents
Series Foreword xi
Preface xiii
Symbols and Notation xvii
1 Introduction 1
1.1 A Pictorial Introduction to Bayesian Modelling 3
1.2 Roadmap 5
2 Regression 7
2.1 Weight-space View 7
2.1.1 The Standard Linear Model 8
2.1.2 Projections of Inputs into Feature Space 11
2.2 Function-space View 13
2.3 Varying the Hyperparameters 19
2.4 Decision Theory for Regression 21
2.5 An Example Application 22
2.6 Smoothing, Weight Functions and Equivalent Kernels 24
 2.7 Incorporating Explicit Basis Functions 27
2.7.1 Marginal Likelihood 29
2.8 History and Related Work 29
2.9 Exercises 30
3 Classification 33
3.1 Classification Problems 34
3.1.1 Decision Theory for Classification 35
3.2 Linear Models for Classification 37
3.3 Gaussian Process Classification 39
3.4 The Laplace Approximation for the Binary GP Classifier 41
3.4.1 Posterior 42
3.4.2 Predictions 44
3.4.3 Implementation 45
3.4.4 Marginal Likelihood 47
 3.5 Multi-class Laplace Approximation 48
3.5.1 Implementation 51
3.6 Expectation Propagation 52
3.6.1 Predictions 56
3.6.2 Marginal Likelihood 57
3.6.3 Implementation 57
3.7 Experiments 60
3.7.1 A Toy Problem 60
3.7.2 One-dimensional Example 62
3.7.3 Binary Handwritten Digit Classification Example 63
3.7.4 10-class Handwritten Digit Classification Example 70
3.8 Discussion 72
 3.9 Appendix: Moment Derivations 74
3.10 Exercises 75
4 Covariance Functions 79
4.1 Preliminaries 79
 4.1.1 Mean Square Continuity and Differentiability 81
4.2 Examples of Covariance Functions 81
4.2.1 Stationary Covariance Functions 82
4.2.2 Dot Product Covariance Functions 89
4.2.3 Other Non-stationary Covariance Functions 90
4.2.4 Making New Kernels from Old 94
4.3 Eigenfunction Analysis of Kernels 96
 4.3.1 An Analytic Example 97
4.3.2 Numerical Approximation of Eigenfunctions 98
4.4 Kernels for Non-vectorial Inputs 99
4.4.1 String Kernels 100
4.4.2 Fisher Kernels 101
4.5 Exercises 102
5 Model Selection and Adaptation of Hyperparameters 105
5.1 The Model Selection Problem 106
5.2 Bayesian Model Selection 108
5.3 Cross-validation 111
5.4 Model Selection for GP Regression 112
5.4.1 Marginal Likelihood 112
5.4.2 Cross-validation 116
5.4.3 Examples and Discussion 118
5.5 Model Selection for GP Classification 124
 5.5.1 Derivatives of the Marginal Likelihood for Laplace’s Approximation 125
 5.5.2 Derivatives of the Marginal Likelihood for EP 127
5.5.3 Cross-validation 127
5.5.4 Example 128
5.6 Exercises 128
6 Relationships between GPs and Other Models 129
6.1 Reproducing Kernel Hilbert Spaces 129
6.2 Regularization 132
 6.2.1 Regularization Defined by Differential Operators 133
6.2.2 Obtaining the Regularized Solution 135
6.2.3 The Relationship of the Regularization View to Gaussian Process
Prediction 135
6.3 Spline Models 136
 6.3.1 A 1-d Gaussian Process Spline Construction 138
 6.4 Support Vector Machines 141
6.4.1 Support Vector Classification 141
6.4.2 Support Vector Regression 145
 6.5 Least-squares Classification 146
6.5.1 Probabilistic Least-squares Classification 147
 6.6 Relevance Vector Machines 149
6.7 Exercises 150