摘要翻译:
在当前的研究中,我们研究了
机器学习方法在薄层色谱(TLC)保留常数建模中的应用。这个问题可以用数百甚至数千个与各种分子性质相关的描述符来描述,其中大多数是冗余的,与保留常数的预测无关。因此,我们采用特征选择来显著减少属性的数量。此外,我们还测试了bagging过程在特征选择中的应用。利用选定的变量建立了随机森林回归模型。所得模型与实验数据的相关性优于线性回归得到的参考模型。交叉验证证实了模型的鲁棒性。
---
英文标题:
《Random forest models of the retention constants in the thin layer
chromatography》
---
作者:
Miron B. Kursa and {\L}ukasz Komsta and Witold R. Rudnicki
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.
---
PDF链接:
https://arxiv.org/pdf/1106.3361