如果一组数据样本事先无分类信息,但是根据已有的文献研究可以大致的将其划分为5类,并且每组样本是否属于哪一类事先也不知道。
问题就是如果想将这些样本划分到上述5类中的某一类,有什么统计方法没?可以用分类与回归树的方法么?
ths。
望指点?谢谢!
eg:分类与回归树,资料来源:http://www.statmethods.net/advstats/cart.html
Classification Tree example Let's use the data frame
kyphosis to predict a type of deformation (kyphosis) after surgery, from age in months (Age), number of vertebrae involved (Number), and the highest vertebrae operated on (Start).
# Classification Tree with rpart
library(rpart)
# grow tree
fit <- rpart(
Kyphosis ~ Age + Number + Start,
method="class", data=kyphosis)
printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits
# plot tree
plot(fit, uniform=TRUE,
main="Classification Tree for Kyphosis")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
# create attractive postscript plot of tree
post(fit, file = "c:/tree.ps",
title = "Classification Tree for Kyphosis")
疑问:这里的
Kyphosis是事先已分类好的变量吗?如果没有这个事先已分类好的因变量还可以做决策树分析吗?