PREFACE ix
1. CLUSTER ANALYSIS 1
1.1. Classifi cation and Clustering / 1
1.2. Defi nition of Clusters / 3
1.3. Clustering Applications / 8
1.4. Literature of Clustering Algorithms / 9
1.5. Outline of the Book / 12
2. PROXIMITY MEASURES 15
2.1. Introduction / 15
2.2. Feature Types and Measurement Levels / 15
2.3. Defi nition of Proximity Measures / 21
2.4. Proximity Measures for Continuous Variables / 22
2.5. Proximity Measures for Discrete Variables / 26
2.6. Proximity Measures for Mixed Variables / 29
2.7. Summary / 30
3. HIERARCHICAL CLUSTERING 31
3.1. Introduction / 31
3.2. Agglomerative Hierarchical Clustering / 32
3.3. Divisive Hierarchical Clustering / 37
3.4. Recent Advances / 40
3.5. Applications / 46
3.6. Summary / 61
4. PARTITIONAL CLUSTERING 63
4.1. Introduction / 63
4.2. Clustering Criteria / 64
4.3. K-Means Algorithm / 67
4.4. Mixture Density-Based Clustering / 73
4.5. Graph Theory-Based Clustering / 81
4.6. Fuzzy Clustering / 83
4.7. Search Techniques-Based Clustering Algorithms / 92
4.8. Applications / 99
4.9. Summary / 109
5. NEURAL NETWORK–BASED CLUSTERING 111
5.1. Introduction / 111
5.2. Hard Competitive Learning Clustering / 113
5.3. Soft Competitive Learning Clustering / 130
5.4. Applications / 146
5.5. Summary / 162
6. KERNEL-BASED CLUSTERING 163
6.1. Introduction / 163
6.2. Kernel Principal Component Analysis / 165
6.3. Squared-Error-Based Clustering with Kernel Functions / 167
6.4. Support Vector Clustering / 170
6.5. Applications / 175
6.6. Summary / 176
7. SEQUENTIAL DATA CLUSTERING 179
7.1. Introduction / 179
7.2. Sequence Similarity / 181
7.3. Indirect Sequence Clustering / 185
7.4. Model-Based Sequence Clustering / 186
7.5. Applications—Genomic and Biological Sequence
Clustering / 201
7.6. Summary / 211
8. LARGE-SCALE DATA CLUSTERING 213
8.1. Introduction / 213
8.2. Random Sampling Methods / 216
8.3. Condensation-Based Methods / 219
8.4. Density-Based Methods / 220
8.5. Grid-Based Methods / 225
8.6. Divide and Conquer / 227
8.7. Incremental Clustering / 229
8.8. Applications / 229
8.9. Summary / 235
9. DATA VISUALIZATION AND HIGH-DIMENSIONAL DATA
CLUSTERING 237
9.1. Introduction / 237
9.2. Linear Projection Algorithms / 239
9.3. Nonlinear Projection Algorithms / 244
9.4. Projected and Subspace Clustering / 253
9.5. Applications / 258
9.6. Summary / 260
10. CLUSTER VALIDITY 263
10.1. Introduction / 263
10.2. External Criteria / 265
10.3. Internal Criteria / 267
10.4. Relative Criteria / 268
10.5. Summary / 277
11. CONCLUDING REMARKS 279
PROBLEMS 283
REFERENCES 293
AUTHOR INDEX 331
SUBJECT INDEX 341