摘要翻译:
基于机器学习、数据挖掘和
人工智能(AI)的方法被用来确定化合物的化学结构和生物活性之间的关系,称为定量构效关系(QSARs)。数据集的预处理是这一过程的第一步,它包括从高维空间中的大量分子描述子映射到低维空间中的少量组分,同时保留原始数据的特征。通常的做法是对数据集使用映射方法,而无需事先进行分析。在我们的工作中,通过将其应用于两类重要的QSAR预测问题:药物设计(预测抗HIV-1活性)和预测毒理学(估计化学品的肝癌致癌性),强调了这种预分析。我们在每个数据集上应用了一种线性和两种非线性映射方法。在此基础上,我们总结出每个数据集元素之间内在关系的本质,并由此得出最适合于此的映射方法。我们还表明,适当的预处理可以帮助我们选择正确的特征提取工具,以及对给定问题相关的分类器类型的洞察力。
---
英文标题:
《Pre-processing in AI based Prediction of QSARs》
---
作者:
Om Prasad Patri, Amit Kumar Mishra
---
最新提交年份:
2009
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Neural and Evolutionary Computing 神经与进化计算
分类描述:Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
涵盖
神经网络,连接主义,遗传算法,人工生命,自适应行为。大致包括ACM学科类C.1.3、I.2.6、I.5中的一些材料。
--
一级分类:Quantitative Biology 数量生物学
二级分类:Quantitative Methods 定量方法
分类描述:All experimental, numerical, statistical and mathematical contributions of value to biology
对生物学价值的所有实验、数值、统计和数学贡献
--
---
英文摘要:
Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.
---
PDF链接:
https://arxiv.org/pdf/0910.0542