kuangsir6 发表于 2012-3-30 09:11 
K-Means 的工作原理是根据数据定义一组起始聚类中心。
然后根据记录的输入字段值,将每个记录分配到 ...
先谢了,你说的是一般聚类的初始中心确定后的迭代方法。当然能够指定初始中心就最好了,不可以的话,不同工具也有不同的指定方法。研究了一下,找出modeler里面确定初始中心的算法了,在这里给大家分享一下,大家应该能看懂的,就不翻译了:
Selecting Initial Cluster Centers
The user specifes k, the number of clusters in the model. Initial cluster centers are chosen using a
maximin algorithm:
1. Initialize the first cluster center as the values of the input fields for the first data record.
2. For each data record, compute the minimum (Euclidean) distance between the record and each
defined cluster center.
3. Select the record with the largest minimum distance from the defined cluster centers. Add a new
cluster center with values of the input fields for the selected record.
4. Repeat steps 2 and 3 until k cluster centers have been added to the model.
Once initial cluster centers have been chosen, the algorithm begins the iterative assign/update
process.