摘要翻译:
主成分分析(PCA)发现一个线性映射,使数据的方差最大化,这使得PCA对异常值敏感,可能导致错误的特征方向。在本文中,我们提出了解决这个问题的技术;我们使用数据中心化方法,并使用鲁棒统计技术重估协方差矩阵,如中值、鲁棒缩放(它是数据中心化的助推器)和Huber M-估计器(它测量离群点的呈现并用小值重估它们)。在几个实际数据集上的实验结果表明,在分类任务中,我们提出的方法处理离群点的效果比原来的PCA更好,在计算量较低的情况下提供了与核PCA相同的精度。
---
英文标题:
《Robust Principal Component Analysis Using Statistical Estimators》
---
作者:
Peratham Wiriyathammabhum, Boonserm Kijsirikul
---
最新提交年份:
2012
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Principal Component Analysis (PCA) finds a linear mapping and maximizes the variance of the data which makes PCA sensitive to outliers and may cause wrong eigendirection. In this paper, we propose techniques to solve this problem; we use the data-centering method and reestimate the covariance matrix using robust statistic techniques such as median, robust scaling which is a booster to data-centering and Huber M-estimator which measures the presentation of outliers and reweight them with small values. The results on several real world data sets show that our proposed method handles outliers and gains better results than the original PCA and provides the same accuracy with lower computation cost than the Kernel PCA using the polynomial kernel in classification tasks.
---
PDF链接:
https://arxiv.org/pdf/1207.0403