Machine Learning, Neural and Statistical Classification

1353

收藏 2016-11-29

Machine Learning, Neural and Statistical Classification

Contents
1 Introduction 1
1.1 INTRODUCTION .......................................................... 1
1.2 CLASSIFICATION ........................................................ 1
1.3 PERSPECTIVES ON CLASSIFICATION .................................. 2
1.3.1 Statistical approaches .............................................. 2
1.3.2 Machine learning .................................................. 2
1.3.3 Neural networks .................................................... 3
1.3.4 Conclusions ........................................................ 3
1.4 THE STATLOG PROJECT ................................................ 4
1.4.1 Quality control ...................................................... 4
1.4.2 Caution in the interpretations of comparisons ...................... 4
1.5 THE STRUCTURE OF THIS VOLUME .................................. 5
2 Classification 6
2.1 DEFINITION OF CLASSIFICATION ...................................... 6
2.1.1 Rationale ............................................................ 6
2.1.2 Issues .............................................................. 7
2.1.3 Class definitions .................................................... 8
2.1.4 Accuracy ............................................................ 8
2.2 EXAMPLES OF CLASSIFIERS ............................................ 8
2.2.1 Fisher’s linear discriminants ........................................ 9
2.2.2 Decision tree and Rule-based methods .............................. 9
2.2.3 k-Nearest-Neighbour ................................................ 10
2.3 CHOICE OF VARIABLES .................................................. 11
2.3.1 Transformations and combinations of variables .................... 11
2.4 CLASSIFICATION OF CLASSIFICATION PROCEDURES .............. 12
2.4.1 Extensions to linear discrimination .................................. 12
2.4.2 Decision trees and Rule-based methods ............................ 12
ii [Ch. 0
2.4.3 Density estimates .................................................. 12
2.5 A GENERAL STRUCTURE FOR CLASSIFICATION PROBLEMS ...... 12
2.5.1 Prior probabilities and the Default rule .............................. 13
2.5.2 Separating classes .................................................. 13
2.5.3 Misclassification costs .............................................. 13
2.6 BAYES RULE GIVEN DATA ............................................ 14
2.6.1 Bayes rule in statistics .............................................. 15
2.7 REFERENCE TEXTS ...................................................... 16
3 Classical Statistical Methods 17
3.1 INTRODUCTION .......................................................... 17
3.2 LINEAR DISCRIMINANTS ................................................ 17
3.2.1 Linear discriminants by least squares .............................. 18
3.2.2 Special case of two classes .......................................... 20
3.2.3 Linear discriminants by maximum likelihood ...................... 20
3.2.4 More than two classes .............................................. 21
3.3 QUADRATIC DISCRIMINANT ............................................ 22
3.3.1 Quadratic discriminant - programming details ...................... 22
3.3.2 Regularisation and smoothed estimates ............................ 23
3.3.3 Choice of regularisation parameters ................................ 23
3.4 LOGISTIC DISCRIMINANT .............................................. 24
3.4.1 Logistic discriminant - programming details ........................ 25
3.5 BAYES’ RULES ............................................................ 27
3.6 EXAMPLE .................................................................. 27
3.6.1 Linear discriminant ................................................ 27
3.6.2 Logistic discriminant ................................................ 27
3.6.3 Quadratic discriminant .............................................. 27
4 Modern Statistical Techniques 29
4.1 INTRODUCTION .......................................................... 29
4.2 DENSITY ESTIMATION .................................................. 30
4.2.1 Example ............................................................ 33
4.3 -NEAREST NEIGHBOUR .............................................. 35
4.3.1 Example ............................................................ 36
4.4 PROJECTION PURSUIT CLASSIFICATION .............................. 37
4.4.1 Example ............................................................ 39
4.5 NAIVE BAYES ............................................................ 40
4.6 CAUSAL NETWORKS .................................................... 41
4.6.1 Example ............................................................ 45
4.7 OTHER RECENT APPROACHES .......................................... 46
4.7.1 ACE ................................................................ 46
4.7.2 MARS .............................................................. 47
Sec. 0.0] iii
5 Machine Learning of Rules and Trees 50
5.1 RULES AND TREES FROM DATA: FIRST PRINCIPLES ................ 50
5.1.1 Data fit and mental fit of classifiers .................................. 50
5.1.2 Specific-to-general: a paradigm for rule-learning .................. 54
5.1.3 Decision trees ...................................................... 56
5.1.4 General-to-specific: top-down induction of trees .................... 57
5.1.5 Stopping rules and class probability trees .......................... 61
5.1.6 Splitting criteria .................................................... 61
5.1.7 Getting a “right-sized tree” .......................................... 63
5.2 STATLOG’S ML ALGORITHMS .......................................... 65
5.2.1 Tree-learning: further features of C4.5 .............................. 65
5.2.2 NewID .............................................................. 65
5.2.3 ................................................................ 67
5.2.4 Further features of CART .......................................... 68
5.2.5 Cal5 ................................................................ 70
5.2.6 Bayes tree .......................................................... 73
5.2.7 Rule-learning algorithms: CN2 .................................... 73
5.2.8 ITrule .............................................................. 77
5.3 BEYOND THE COMPLEXITY BARRIER ................................ 79
5.3.1 Trees into rules ...................................................... 79
5.3.2 Manufacturing new attributes ...................................... 80
5.3.3 Inherent limits of propositional-level learning ...................... 81
5.3.4 A human-machine compromise: structured induction .............. 83
6 Neural Networks 84
6.1 INTRODUCTION .......................................................... 84
6.2 SUPERVISED NETWORKS FOR CLASSIFICATION .................... 86
6.2.1 Perceptrons and Multi Layer Perceptrons .......................... 86
6.2.2 Multi Layer Perceptron structure and functionality .................. 87
6.2.3 Radial Basis Function networks .................................... 93
6.2.4 Improving the generalisation of Feed-Forward networks ............ 96
6.3 UNSUPERVISED LEARNING ............................................101
6.3.1 The K-means clustering algorithm ..................................101
6.3.2 Kohonen networks and Learning Vector Quantizers ................102
6.3.3 RAMnets ..........................................................103
6.4 DIPOL92 ....................................................................103
6.4.1 Introduction ........................................................104
6.4.2 Pairwise linear regression ..........................................104
6.4.3 Learning procedure ................................................104
6.4.4 Clustering of classes ................................................105
6.4.5 Description of the classification procedure ..........................105
iv [Ch. 0
7 Methods for Comparison 107
7.1 ESTIMATION OF ERROR RATES IN CLASSIFICATION RULES ........107
7.1.1 Train-and-Test ......................................................108
7.1.2 Cross-validation ....................................................108
7.1.3 Bootstrap ..........................................................108
7.1.4 Optimisation of parameters ........................................109
7.2 ORGANISATION OF COMPARATIVE TRIALS ..........................110
7.2.1 Cross-validation ....................................................111
7.2.2 Bootstrap ..........................................................111
7.2.3 Evaluation Assistant ................................................111
7.3 CHARACTERISATION OF DATASETS ..................................112
7.3.1 Simple measures ....................................................112
7.3.2 Statistical measures ................................................112
7.3.3 Information theoretic measures ....................................116
7.4 PRE-PROCESSING ........................................................120
7.4.1 Missing values ......................................................120
7.4.2 Feature selection and extraction ....................................120
7.4.3 Large number of categories ........................................121
7.4.4 Bias in class proportions ............................................122
7.4.5 Hierarchical attributes ..............................................123
7.4.6 Collection of datasets ..............................................124
7.4.7 Preprocessing strategy in StatLog ..................................124
8 Review of Previous Empirical Comparisons 125
8.1 INTRODUCTION ..........................................................125
8.2 BASIC TOOLBOX OF ALGORITHMS ....................................125
8.3 DIFFICULTIES IN PREVIOUS STUDIES ................................126
8.4 PREVIOUS EMPIRICAL COMPARISONS ................................127
8.5 INDIVIDUAL RESULTS ..................................................127
8.6 MACHINE LEARNING vs. NEURAL NETWORK ........................127
8.7 STUDIES INVOLVING ML, k-NN AND STATISTICS ....................129
8.8 SOME EMPIRICAL STUDIES RELATING TO CREDIT RISK ..........129
8.8.1 Traditional and statistical approaches ..............................129
8.8.2 Machine Learning and Neural Networks ............................130
9 Dataset Descriptions and Results 131
9.1 INTRODUCTION ..........................................................131
9.2 CREDIT DATASETS ......................................................132
9.2.1 Credit management (Cred.Man) ....................................132
9.2.2 Australian credit (Cr.Aust) ..........................................134
9.3 IMAGE DATASETS ........................................................135
9.3.1 Handwritten digits (Dig44) ..........................................135
9.3.2 Karhunen-Loeve digits (KL) ........................................137
9.3.3 Vehicle silhouettes (Vehicle) ........................................138
9.3.4 Letter recognition (Letter) ..........................................140
Sec. 0.0] v
9.3.5 Chromosomes (Chrom) ............................................142
9.3.6 Landsat satellite image (SatIm) ....................................143
9.3.7 Image segmentation (Segm) ........................................145
9.3.8 Cut ..................................................................146
9.4 DATASETS WITH COSTS ................................................149
9.4.1 Head injury (Head) ..................................................149
9.4.2 Heart disease (Heart) ................................................152
9.4.3 German credit (Cr.Ger) ............................................153
9.5 OTHER DATASETS ........................................................154
9.5.1 Shuttle control (Shuttle) ............................................154
9.5.2 Diabetes (Diab) ....................................................157
9.5.3 DNA ................................................................158
9.5.4 Technical (Tech) ....................................................161
9.5.5 Belgian power (Belg) ..............................................163
9.5.6 Belgian power II (BelgII) ..........................................164
9.5.7 Machine faults (Faults) ..............................................165
9.5.8 Tsetse fly distribution (Tsetse) ......................................167
9.6 STATISTICAL AND INFORMATION MEASURES ......................169
9.6.1 KL-digits dataset ....................................................170
9.6.2 Vehicle silhouettes ..................................................170
9.6.3 Head injury ........................................................173
9.6.4 Heart disease ........................................................173
9.6.5 Satellite image dataset ..............................................173
9.6.6 Shuttle control ......................................................173
9.6.7 Technical ............................................................174
9.6.8 Belgian power II ....................................................174
10 Analysis of Results 175
10.1 INTRODUCTION ..........................................................175
10.2 RESULTS BY SUBJECT AREAS ..........................................176
10.2.1 Credit datasets ......................................................176
10.2.2 Image datasets ......................................................179
10.2.3 Datasets with costs ..................................................183
10.2.4 Other datasets ......................................................184
10.3 TOP FIVE ALGORITHMS ................................................185
10.3.1 Dominators ........................................................186
10.4 MULTIDIMENSIONAL SCALING ........................................187
10.4.1 Scaling of algorithms ..............................................188
10.4.2 Hierarchical clustering of algorithms ................................189
10.4.3 Scaling of datasets ..................................................190
10.4.4 Best algorithms for datasets ........................................191
10.4.5 Clustering of datasets ..............................................192
10.5 PERFORMANCE RELATED TO MEASURES: THEORETICAL ........192
10.5.1 Normal distributions ................................................192
10.5.2 Absolute performance: quadratic discriminants ....................193
vi [Ch. 0
10.5.3 Relative performance: Logdisc vs. DIPOL92 ......................193
10.5.4 Pruning of decision trees ............................................194
10.6 RULE BASED ADVICE ON ALGORITHM APPLICATION ..............197
10.6.1 Objectives ..........................................................197
10.6.2 Using test results in metalevel learning ..............................198
10.6.3 Characterizing predictive power ....................................202
10.6.4 Rules generated in metalevel learning ..............................205
10.6.5 Application Assistant ..............................................207
10.6.6 Criticism of metalevel learning approach ..........................209
10.6.7 Criticism of measures ..............................................209
10.7 PREDICTION OF PERFORMANCE ......................................210
10.7.1 ML on ML vs. regression ..........................................211
11 Conclusions 213
11.1 INTRODUCTION ..........................................................213
11.1.1 User’s guide to programs ............................................214
11.2 STATISTICAL ALGORITHMS ............................................214
11.2.1 Discriminants ......................................................214
11.2.2 ALLOC80 ..........................................................214
11.2.3 Nearest Neighbour ..................................................216
11.2.4 SMART ............................................................216
11.2.5 Naive Bayes ........................................................216
11.2.6 CASTLE ............................................................217
11.3 DECISION TREES ........................................................217
11.3.1 and NewID ....................................................218
11.3.2 C4.5 ................................................................219
11.3.3 CART and IndCART ................................................219
11.3.4 Cal5 ................................................................219
11.3.5 Bayes Tree ..........................................................220
11.4 RULE-BASED METHODS ................................................220
11.4.1 CN2 ................................................................220
11.4.2 ITrule ..............................................................220
11.5 NEURAL NETWORKS ....................................................221
11.5.1 Backprop ............................................................221
11.5.2 Kohonen and LVQ ..................................................222
11.5.3 Radial basis function neural network ................................223
11.5.4 DIPOL92 ..........................................................223
11.6 MEMORY AND TIME ....................................................223
11.6.1 Memory ............................................................223
11.6.2 Time ................................................................224
11.7 GENERAL ISSUES ........................................................224
11.7.1 Cost matrices ......................................................224
11.7.2 Interpretation of error rates ..........................................225
11.7.3 Structuring the results ..............................................225
11.7.4 Removal of irrelevant attributes ....................................226
Sec. 0.0] vii
11.7.5 Diagnostics and plotting ............................................226
11.7.6 Exploratory data ....................................................226
11.7.7 Special features ....................................................227
11.7.8 From classification to knowledge organisation and synthesis ........227
12 Knowledge Representation 228
12.1 INTRODUCTION ..........................................................228
12.2 LEARNING, MEASUREMENT AND REPRESENTATION ..............229
12.3 PROTOTYPES ..............................................................230
12.3.1 Experiment 1 ........................................................230
12.3.2 Experiment 2 ........................................................231
12.3.3 Experiment 3 ........................................................231
12.3.4 Discussion ..........................................................231
12.4 FUNCTION APPROXIMATION ..........................................232
12.4.1 Discussion ..........................................................234
12.5 GENETIC ALGORITHMS ................................................234
12.6 PROPOSITIONAL LEARNING SYSTEMS ................................237
12.6.1 Discussion ..........................................................239
12.7 RELATIONS AND BACKGROUND KNOWLEDGE ......................241
12.7.1 Discussion ..........................................................244
12.8 CONCLUSIONS ............................................................245
13 Learning to Control Dynamic Systems 246
13.1 INTRODUCTION ..........................................................246
13.2 EXPERIMENTAL DOMAIN ..............................................248
13.3 LEARNING TO CONTROL FROM SCRATCH: BOXES ..................250
13.3.1 BOXES ............................................................250
13.3.2 Refinements of BOXES ............................................252
13.4 LEARNING TO CONTROL FROM SCRATCH: GENETIC LEARNING ..252
13.4.1 Robustness and adaptation ..........................................254
13.5 EXPLOITING PARTIAL EXPLICIT KNOWLEDGE ......................255
13.5.1 BOXES with partial knowledge ....................................255
13.5.2 Exploiting domain knowledge in genetic learning of control ........256
13.6 EXPLOITING OPERATOR’S SKILL ......................................256
13.6.1 Learning to pilot a plane ............................................256
13.6.2 Learning to control container cranes ................................258
13.7 CONCLUSIONS ............................................................261
A Dataset availability ..........................................................262
B Software sources and details ................................................262
C Contributors ................................................................265

附件列表

Machine Learning Neural and Statistical Classification.pdf