全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 R语言论坛
10086 15
2016-12-12
数据有600多个自变量,其中有多分类变量、二分类变量和连续性变量,因变量为多分类变量,一百多条数据,怎么从600多个自变量中选择合适的变量达到降维的目的?试过主成分法,但是显示错误:'princomp'只能在单位比变量多的情况下使用?想请教各位还有什么方法可以使用?谢谢!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2016-12-12 16:50:39
利用逻辑回归逐步回归。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-12-12 18:11:29
愚弱厚木 发表于 2016-12-12 16:50
利用逻辑回归逐步回归。
逐步回归的过程非常慢 有没有其他合适的方法呢
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-12-12 18:23:00
J Am Stat Assoc. 2011 June ; 106(494): 544–557. doi:10.1198/jasa.2011.tm09779
   Nonparametric Independence Screening in Sparse Ultra-High  Dimensional Additive Models

Jianqing Fan, Yang Feng, and Rui Song

A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to
reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is
linear, the marginal regression can be highly nonlinear. To address this issue, we further extend
the correlation learning to marginal nonparametric learning. Our nonparametric independence
screening is called NIS, a specific member of the sure independence screening. Several closely
related variable screening procedures are proposed. Under general nonparametric models, it is
shown that under some mild technical conditions, the proposed independence screening methods
enjoy a sure screening property. The extent to which the dimensionality can be reduced by
independence screening is also explicitly quantified. As a methodological extension, a data-driven
thresholding and an iterative nonparametric independence screening (INIS) are also proposed to
enhance the finite sample performance for fitting sparse additive models. The simulation results
and a real data analysis demonstrate that the proposed procedure works well with moderate sample
size and large dimension and performs better than competing methods.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-12-12 19:15:53
jgchen1966 发表于 2016-12-12 18:23
J Am Stat Assoc. 2011 June ; 106(494): 544–557. doi:10.1198/jasa.2011.tm09779
   Nonparametric Ind ...
请问在R里面应该怎么实现呢  我是想先删除一部分变量 再利用随机森林进行预测
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-12-12 20:17:03
声辐射体 发表于 2016-12-12 19:15
请问在R里面应该怎么实现呢  我是想先删除一部分变量 再利用随机森林进行预测
Package ‘SIS’

Title Sure Independence Screening
Author Jianqing Fan, Yang Feng, Diego Franco Saldana, Richard Samworth, Yichao Wu

Description Variable selection techniques are essential tools for model selection and estimation
in high-dimensional statistical models. Through this publicly available package, we provide
a unified environment to carry out variable selection using iterative sure independence
screening (SIS) and all of its variants in generalized linear models and the Cox proportional
hazards model.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群