有谁可以帮我翻译一下spss对于k均值聚类的初始聚类中心的算法吗？

4651

收藏 2012-04-16

我找到了初始聚类中心的算法啦，可是我不懂他的意思，有高人可以给解释一下吗？唉，我英语水平差啊。以下是初始聚类中心算法的英文解释。
If minid(xk,Mi)>dmn and d(xk,Mm)>d(xk,Mn), then xk replaces Mn. If minid(xk,Mi)>dmn and d(xk,Mm)<d(xk,Mn), then xk replaces Mm; that is, if the distance between xk and its closest cluster mean is greater than the distance between the two closest means (Mm and Mn), then xk replaces either Mm or Mn, whichever is closer to xk.
If xk does not replace a cluster mean in (a), a second test is made:
Let Mq be the closest cluster mean to xk.
Let Mp be the second closest cluster mean to xk.
If d(xk,Mp)>minid(Mq,Mi), then Mq=xk;
That is, if xk is further from the second closest cluster’s center than the closest cluster’s center is from any other cluster’s center, replace the closest cluster’s center with xk.
At the end of one pass through the data, the initial means of all NC clusters are set. Note that if NOINITIAL is specified, the first NC cases with no missing values are the initial cluster means.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

yanziwoaini

2012-4-16 14:36:06

SPSS统计分析从入门到精通这本书说的很详细，电子版的很好找，论坛有的，加油啊。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

kuangsir6

2012-4-16 17:35:03

基本过程：
K-Means 的工作原理是根据数据定义一组起始聚类中心。
然后根据记录的输入字段值，将每个记录分配到与其最相似的聚类中。在分配完所有记录后，
更新聚类中心以反映分配到每个聚类的新记录集。然后再次检查记录，以确定是否应将这些
记录重新分配到不同的聚类中，这个记录分配/聚类迭代过程将一直持续，直到达到最大迭代
次数或一次迭代与下次迭代之间的改变不超过指定阈值为止。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

kuangsir6

2012-4-16 18:23:30

Model Parameters
The primary calculation in k-means is an iterative process of calculating cluster centers and
assigning records to clusters. The primary steps in the procedure are:
1. Select initial cluster centers
2. Assign each record to the nearest cluster
3. Update the cluster centers based on the records assigned to each cluster
4. Repeat steps 2 and 3 until either:
 In step 3, there is no change in the cluster centers from the previous iteration, or
 The number of iterations exceeds the maximum iterations parameter
Clusters are defined by their centers. A cluster center is a vector of values for the (encoded) input
fields. The vector values are based on the mean values for records assigned to the cluster.

Selecting Initial Cluster Centers
The user specifes k, the number of clusters in the model. Initial cluster centers are chosen using amaximin algorithm:
1. Initialize the first cluster center as the values of the input fields for the first data record.
2. For each data record, compute the minimum (Euclidean) distance between the record and each
defined cluster center.
3. Select the record with the largest minimum distance from the defined cluster centers. Add a new
cluster center with values of the input fields for the selected record.
4. Repeat steps 2 and 3 until k cluster centers have been added to the model.
Once initial cluster centers have been chosen, the algorithm begins the iterative assign/update
process.

--------来自 IBM SPSS Modeler 14.2 算法指南

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

forex95

2012-4-18 08:58:00

yanziwoaini 发表于 2012-4-16 14:36
SPSS统计分析从入门到精通这本书说的很详细，电子版的很好找，论坛有的，加油啊。

看了，没有具体的算法，只是教人如何使用而已。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

forex95

2012-4-19 13:00:21

我找到了初始聚类中心的算法啦，可是我不懂他的意思，有高人可以给解释一下吗？以下是初始聚类中心算法的英文解释。
If minid(xk,Mi)>dmn and d(xk,Mm)>d(xk,Mn), then xk replaces Mn. If minid(xk,Mi)>dmn and d(xk,Mm)<d(xk,Mn), then xk replaces Mm; that is, if the distance between xk and its closest cluster mean is greater than the distance between the two closest means (Mm and Mn), then xk replaces either Mm or Mn, whichever is closer to xk.

If xk does not replace a cluster mean in (a), a second test is made:
Let Mq be the closest cluster mean to xk.
Let Mp be the second closest cluster mean to xk.
If d(xk,Mp)>minid(Mq,Mi), then Mq=xk;
That is, if xk is further from the second closest cluster’s center than the closest cluster’s center is from any other cluster’s center, replace the closest cluster’s center with xk.

At the end of one pass through the data, the initial means of all NC clusters are set. Note that if NOINITIAL is specified, the first NC cases with no missing values are the initial cluster means.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群