摘要翻译:
光谱化学计量学中所遇到的大多数数据集中的大量光谱变量往往使因变量的预测变得困难。通过使用投影技术或选择方法,可以减少变量的数量;后者允许对所选变量进行解释。由于用预测模型测试所有可能的变量子集的最优方法是困难的,使用非参数统计量的增量选择方法是一个很好的选择,因为它避免了模型本身的计算密集型使用。然而,它有两个缺点:要检验的变量组的数目仍然很大,并且共线性会使结果不稳定。为了克服这些局限性,本文提出了一种选择谱变量组的方法。它包括一个应用于光谱的B样条表示的系数的前向-后向过程。在前向-后向过程中使用的准则是互信息,允许发现变量之间的非线性依赖关系,而不是通常使用的相关性。样条表示用于获得结果的可解释性,因为将选择连续的谱变量组。对羊茅草和柴油近红外光谱的实验表明,该方法提供了清晰的选择变量组,使解释变得容易,同时保持了较低的计算量。利用所选系数获得的预测性能高于直接应用于原始变量的相同方法获得的预测性能,与传统模型获得的预测性能相似,尽管使用的谱变量明显较少。
---
英文标题:
《Fast Selection of Spectral Variables with B-Spline Compression》
---
作者:
Fabrice Rossi (INRIA Rocquencourt / INRIA Sophia Antipolis), Damien
  Fran\c{c}ois (CESAME), Vincent Wertz (CESAME), Marc Meurens (BNUT), Michel
  Verleysen (DICE - MLG)
---
最新提交年份:
2007
---
分类信息:
一级分类:Computer Science        计算机科学
二级分类:Machine Learning        
机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Statistics        统计学
二级分类:Applications        应用程序
分类描述:Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学,教育学,流行病学,工程学,环境科学,医学,物理科学,质量控制,社会科学
--
---
英文摘要:
  The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual information, allowing to find nonlinear dependencies between variables, on the contrary of the generally used correlation. The spline representation is used to get interpretability of the results, as groups of consecutive spectral variables will be selected. The experiments conducted on NIR spectra from fescue grass and diesel fuels show that the method provides clearly identified groups of selected variables, making interpretation easy, while keeping a low computational load. The prediction performances obtained using the selected coefficients are higher than those obtained by the same method applied directly to the original variables and similar to those obtained using traditional models, although using significantly less spectral variables. 
---
PDF链接:
https://arxiv.org/pdf/709.3639