我还是存在疑问,不如我把文献那一段原文抄给你看看吧,请您再帮忙解释下:
“Next, the raw data consisting of 396 cases was randomly split into two data sets, A and B, each containing 198 cases. The K-means cluster procedure was administrated with the two sets of data.
With the possible cluster solution n (n=2,3...5,or 6), Data A were utilized to generate the distances between initial clusters by the K-means procedure.
The distance generated then was used with Data B computed by K-means analysis. Data B were computed in an unconstrained manner using the same procedure that was used for Data A.
Then a constrained computation using the cluster distances acquired in Data A was determined.
This procedure essentially provided a cross-validation for Data B. For a given n, the constrained solution clustered the cases in Data B according to the cluster distance generated from Data A, while the unconstrained solution was free of restrictions. Accordingly, Kappa co-efficiencies(the chance corrected coefficients of agreement) were calculated for the two solutions of Data B cases.
For each n, the optional n with the maximal Kappa was chosen as candidate N for the entire data for the final cluster analysis .”