关于randomForest

zhzhw91

4726

收藏 2015-11-25

悬赏 100 个论坛币已解决

使用randomForest（）函数处理数据后，直接调用plot（）函数，得到图形如下，请问该图各条线如何解释，怎么添加图例？

屏幕快照 2015-11-25 下午10.49.43.png

原图尺寸 319.64 KB

最佳答案

neuroexplorer 查看完整内容

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

neuroexplorer

2015-11-25 23:03:14

From what you showed, there is no test result (namely ytest was empty for training):

Black solid line is for overall OOB (out-of-bag) error and, colour lines, one for each class' error (i.e. 1-this class recall).

Suppose you use IRIS data, then:The red curve is the error rate for the Setosa class, the green and blue curves above are for Versicolor and Virginica while the black curve is the Out-of-Bag error rate.

The code is as following:

plot(fit)
legend(1500, 0.15, c('line 1', 'line 2', 'line 3', 'line 4'),
   lty=c(1,1,1,1),
   lwd=c(2.5,2.5, 2.5,2.5),
   col=c('black', "blue","red", 'green'))

附件列表

example.png

原图尺寸 25.22 KB

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jgchen1966

2015-11-26 15:12:26

rfMt<-randomForest(x=xda[Idx,],y=yv[Idx],xtest=xda[-Idx,],ytest=yv[-Idx],ntree=1500）
用str() 显示模型结果：
str(rfMt)
List of 17
$ call          : language randomForest(x = xda[Idx, ], y = yv[Idx], xtest = xda[-Idx, ], ytest = yv[-Idx], ntree = 1500,    corr.bias = TRUE)
$ type          : chr "regression"
$ predicted    : Named num [1:120] 1.22 2.06 1.58 1.91 1.45 ...
  ..- attr(*, "names")= chr [1:120] "69" "303" "22" "13" ...
$ mse          : num [1:1500] 0.394 0.349 0.322 0.311 0.316 ...
$ rsq          : num [1:1500]

................................
........................
$ y             : num [1:120] 1.79 2.08 1.39 1.61 1.1 ...
$ test          :List of 4
  ..$ predicted: Named num [1:83] 1.78 1.71 1.69 1.74 1.94 ...
  .. ..- attr(*, "names")= chr [1:83] "5" "9" "12" "14" ...
  ..$ mse    : num [1:1500] 0.392 0.292 0.303 0.316 0.305 ...
  ..$ rsq    : num [1:1500] 0.345 0.512 0.493 0.471 0.49 ...
  ..$ proximity: NULL
$ inbag

注：红色部分即为plot(rfMat)输出部分，但不完美，因此，可自已编程绘制：
  yda<-data.frame(oobmse=rfMt$mse,testmse=rfMt$test$mse)
> library(tidyr)
> yda$id<-1:1500
> str(yda)
'data.frame': 1500 obs. of  3 variables:
$ oobmse : num  0.394 0.349 0.322 0.311 0.316 ...
$ testmse: num  0.392 0.292 0.303 0.316 0.305 ...
$ id    : int  1 2 3 4 5 6 7 8 9 10 ...
> yda<-gather(yda,key=Type,value=mse,-id)
> str(yda)
'data.frame': 3000 obs. of  3 variables:
$ id  : int  1 2 3 4 5 6 7 8 9 10 ...
$ Type: Factor w/ 2 levels "oobmse","testmse": 1 1 1 1 1 1 1 1 1 1 ...
$ mse : num  0.394 0.349 0.322 0.311 0.316 ...
> library(ggplot2)
> ggplot(yda,aes(x=id,y=mse,colour=Type))+geom_line()
当然按自已的要求进一步美化！！！

当然mse 仅仅说明ntree 指标设置是否合理。test集的mse 则是一个预测性好坏的指标。
mse : (regression only) vector of mean square errors: sum of squared residuals divided  by n.