randomForest中如何进行k-折交叉验证？

晓茜

21778

收藏 2013-11-12

如果我用randomForest，语句如下：
library（randomForest）
x=read.table("1.txt")
set.seed(150)
x.rf<-randomForest(V22~.,data=x,importance=TRUE,proximity=TRUE)
print(x.rf)
我想加入5-折交叉验证，那么我需要在哪加些什么语句呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

晓茜

2013-11-13 10:42:11

有没有会的呢，着急啊

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

CRouGD

2013-11-14 16:44:27

K-折交叉验证是把数据分成K份，然后用K-1份（训练集）去训练模型，剩下的一份（测试集）去测试模型的效果。。因为有K份，所以测试集可以有K份。

建议你去找本书看看，这样会详细点。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

funpipi

2013-11-15 12:57:45

你可以试试这个自编函数rf.cross.validation， x是数据矩阵，y是分类因子，nfolds是交叉检验的fold

# Get balanced folds where each fold has close to overall class ratio
"balanced.folds" <- function(y, nfolds=10){
folds = rep(0, length(y))
classes = levels(y)
# size of each class
Nk = table(y)
# -1 or nfolds = len(y) means leave-one-out
if (nfolds == -1 || nfolds == length(y)){
      invisible(1:length(y))
}
else{
# Can't have more folds than there are items per class
nfolds = min(nfolds, max(Nk))
# Assign folds evenly within each class, then shuffle within each class
      for (k in 1:length(classes)){
         ixs <- which(y==classes[k])
         folds_k <- rep(1:nfolds, ceiling(length(ixs) / nfolds))
         folds_k <- folds_k[1:length(ixs)]
         folds_k <- sample(folds_k)
         folds[ixs] = folds_k
      }
      invisible(folds)
}
}

"rf.cross.validation" <- function(x, y, nfolds=10, verbose=TRUE, ...){
if(nfolds==-1) nfolds <- length(y)
folds <- balanced.folds(y,nfolds=nfolds)
result <- list()
result$y <- as.factor(y)
result$predicted <- result$y
result$probabilities <- matrix(0, nrow=length(result$y), ncol=length(levels(result$y)))
rownames(result$probabilities) <- rownames(x)
colnames(result$probabilities) <- levels(result$y)
result$importances <- matrix(0,nrow=ncol(x),ncol=nfolds)
result$errs <- numeric(length(unique(folds)))

# K-fold cross-validation
for(fold in sort(unique(folds))){
      if(verbose) cat(sprintf('Fold %d...\n',fold))
      foldix <- which(folds==fold)
      model <- randomForest(x[-foldix,], factor(result$y[-foldix]), importance=TRUE, do.trace=verbose, ...)
      newx <- x[foldix,]
      if(length(foldix)==1) newx <- matrix(newx,nrow=1)
      result$predicted[foldix] <- predict(model, newx)
      probs <- predict(model, newx, type='prob')
      result$probabilities[foldix,colnames(probs)] <- probs
      result$errs[fold] <- mean(result$predicted[foldix] != result$y[foldix])
      result$importances[,fold] <- model$importance[,'MeanDecreaseAccuracy']
}

result$nfolds <- nfolds
result$params <- list(...)
result$confusion.matrix <- t(sapply(levels(y), function(level) table(result$predicted[y==level])))
return(result)
}

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jgchen1966

2013-11-16 21:25:34

不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation：
Usage
rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5,
mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)
而在R 的各类综合学习机的PACKAGE 中 cross-valdidation，更是很多，如：caret ,CMA ,rminer ,TDMR, mlr,.......

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

晓茜

2013-11-17 17:05:36

jgchen1966 发表于 2013-11-16 21:25
不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation： ...

我是初学这个的，看了文献上是用R语言的RF包做分类不错，才想试试，好多问题不太理解，请您再指教一下吧~我看了说明书的，有些问题不懂，我最终想要的是在k折交叉验证或jack-knife检验下的Sn（敏感性），Sp（特异性），Acc（预测成功率）和MCC（相关系数）值，或者告诉我预测的分类和正确、错误个数也行啊，可是按照说明书写的程序运行完就是两个图，所以有些不懂了~

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

晓茜

2013-11-17 17:11:40

jgchen1966 发表于 2013-11-16 21:25
不知道楼主为什么不仔细读“randomForest"说明书，randomForest 包本身就带一个函数做 Cross-Valdidation： ...

按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,proximity=TRUE)
print(iris.rf)
出现的结果是：
Call:
randomForest(formula = Species ~ ., data = iris, importance = TRUE,    proximity = TRUE)
            Type of random forest: classification
                  Number of trees: 500
No. of variables tried at each split: 2

      OOB estimate of  error rate: 4%
Confusion matrix:
         setosa versicolor virginica class.error
setosa       50       0       0       0.00
versicolor    0       47       3       0.06
virginica       0          3       47       0.06
>
这样里面包含交叉验证么？我最终就是想要的的这样形式的数据，可是不明白这是在什么交叉验证下得到的数据~

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jgchen1966

2013-11-17 17:24:20

晓茜发表于 2013-11-17 17:11
按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf

这是一个OOB 估计，关于计算机试验是一个较复杂的问题，建议寻找你的老师帮助，本人为私人公司的技术顾问，无法多说，请谅！！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

麻烦and纠结

2013-12-16 11:44:30

晓茜发表于 2013-11-17 17:11
按照说明书的例子，输入以下代码：
set.seed(71)
iris.rf

没有进行交叉验证啊这里有一个函数rrfcv {RRF} R Documentation

Random Forest Cross-Valdidation for feature selection
Description
This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群