http://clementine-blog.beauregar ... ariable-importance/
这里有谈到,计算原理没说清楚,不过应该是一样的,至少结果是可比较的。网页要翻墙才能打开
我就不翻译了
How Clementine 12 calculates variable importance13Nov08
A long-awaited feature in Clementine 12 is that all, or almost all, modelling algorithms generate a summary listing the relative importance of the variables. In version 11, a handful of algorithms ranked variables in order of importance, each using a different technique. For instance, you could work out which variables in a regression were important by browsing the coefficients. Neural networks generated a chart by a means that now escapes me.
Version 12 standardises how variable importance is calculated, so the importance charts of different models can be compared, and models that did not previously generate “native” variable importance can be evaluated with the new technique. According to information received, the following algorithms all use the same calculation:
C5.0
C&RT
QUEST
CHAID
Regression
Logistic
Discriminant
GenLin
SVM
Bayesian Networks
How does it work?
It uses factor prioritisation: that is, which factor (input variable) leads to the greatest reduction in the variance of the output, when the value of that input variable is known? Which leads to the second-greatest?
The maths behind the calculation is quite involved. For me the most useful piece of knowledge is that all of the algorithms above use an identical means of determining variable importance, so the results can be directly compared. No word yet on whether neural networks use the new calculation.
On a practical note, for some algorithms generation of variable importance is disabled by default because it can take a long time to calculate. If you want it for SVM, logistic regression, or the binary classifier, you need to turn it on before building the model. You might want to use feature selection prior to modelling in these cases, to reduce the number of low-impact variables being entered into the models.