摘要翻译:
通过置换聚类标签进行聚类匹配在许多聚类环境中都很重要,如聚类验证和聚类集成技术。经典的方法是最小化两个簇解之间的欧几里得距离,这在某些情况下导致不适当的稳定性。因此,我们给出了truematch算法,它引入了两个改进,在crisp案例中得到了最好的解释。首先,我们提出了最大化簇的交叉轨迹,而不是最大化簇的交叉轨迹,而是最大化簇的交叉轨迹的卡方变换。因此,考虑到边缘,轨迹不会被计数最多的细胞所支配,而是被非随机观察最多的细胞所支配。其次,我们提出了一个概率分量,以打破联系,使匹配算法在随机数据上具有真正的随机性。truematch算法被设计为truecluster框架的构建块,并在多项式时间内进行缩放。首先,仿真结果证实了truematch算法在不等簇大小的情况下给出了更一致的truecluster结果。免费R软件可用。
---
英文标题:
《Truecluster matching》
---
作者:
Jens Oehlschl\"agel
---
最新提交年份:
2007
---
分类信息:
一级分类:Computer Science        计算机科学
二级分类:Artificial Intelligence        
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
  Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available. 
---
PDF链接:
https://arxiv.org/pdf/0705.4302