dimxu 发表于 2012-4-10 19:07 
高人能讲的具体点吗 我是个初学者,不是太了解这些过程。
十分感谢
要看呢具体做什么吧,如果你只关心结果而不是其中的算法,那样就用proc stdize对变量进行标准化,然后用proc cluster进行聚类,出来的结果就会准确。关于cluster中的method选择,可以参考下面的文档。如果你的数据量比较大的话,建议使用proc fastclus
PROC CLUSTER METHOD=name <options> ; 
The PROC CLUSTER statement starts the CLUSTER procedure, specifies a clustering method, and optionally specifies details for clustering methods, data sets, data processing, and displayed output. 
The METHOD= specification determines the clustering method used by the procedure. Any one of the following 11 methods can be specified for name: 
AVERAGE  |  AVE 
requests average linkage (group average, unweighted pair-group method using arithmetic averages, UPGMA). Distance data are squared unless you specify the NOSQUARE option. 
CENTROID  |  CEN 
requests the centroid method (unweighted pair-group method using centroids, UPGMC, centroid sorting, weighted-group method). Distance data are squared unless you specify the NOSQUARE option. 
COMPLETE  |  COM 
requests complete linkage (furthest neighbor, maximum method, diameter method, rank order typal analysis). To reduce distortion of clusters by outliers, the TRIM= option is recommended. 
DENSITY  |  DEN 
requests density linkage, which is a class of clustering methods using nonparametric probability density estimation. You must also specify either the K=, R=, or HYBRID option to indicate the type of density estimation to be used. See also the MODE= and DIM= options in this section. 
EML 
requests maximum-likelihood hierarchical clustering for mixtures of spherical multivariate normal distributions with equal variances but possibly unequal mixing proportions. Use METHOD=EML only with coordinate data. See the PENALTY= option for details. The NONORM option does not affect the reported likelihood values but does affect other unrelated criteria. The EML method is much slower than the other methods in the CLUSTER procedure. 
FLEXIBLE  |  FLE 
requests the Lance-Williams flexible-beta method. See the BETA= option in this section. 
MCQUITTY  |  MCQ 
requests McQuitty’s similarity analysis (weighted average linkage, weighted pair-group method using arithmetic averages, WPGMA). 
MEDIAN  |  MED 
requests Gower’s median method (weighted pair-group method using centroids, WPGMC). Distance data are squared unless you specify the NOSQUARE option. 
SINGLE  |  SIN 
requests single linkage (nearest neighbor, minimum method, connectedness method, elementary linkage analysis, or dendritic method). To reduce chaining, you can use the TRIM= option with METHOD=SINGLE. 
TWOSTAGE  |  TWO 
requests two-stage density linkage. You must also specify the K=, R=, or HYBRID option to indicate the type of density estimation to be used. See also the MODE= and DIM= options in this section. 
WARD  |  WAR 
requests Ward’s minimum-variance method (error sum of squares, trace W). Distance data are squared unless you specify the NOSQUARE option. To reduce distortion by outliers, the TRIM= option is recommended. See the NONORM option.