SAS 中如何对多个变量进行逐步回归

13331

收藏 2011-07-14

数据中有1153个变量，其中后面的1151个变量是我想要进行分析的数据。我想每一个变量都作一次逐步回归，即变量p1作为y时，其他剩余的p2-p1151都作为进行逐步回归的x，算完p1以后，再以p2作为y，p1和p3-p1151作为x的进行逐步回归。并且每个变量的观测值最多只有30个，而变量一共有一千多个，所以希望限制最终保留在方程中的最优的x的数量（例如10个）。
因为数据中并不是每个变量都有完整30个数据，有些是缺失的，所以我是想通过建立每个变量的逐步回归方程，然后数据缺失的地方用方程计算的预测值进行插值。
我想这应该需要写一个循环，而且我希望所有变量的参数可以都放在一张表上，所有变量的预测值放在一张表上。
我尝试写了一下，但是运行的时候说宏解释错误。
希望哪位大侠帮忙修改一下。我对宏和循环不太了解，所以希望可以帮忙写出完整的语句。附件是我说的数据
非常感谢！

%macro reg1(i,j) ;
%do i = 2 %to 1151 ;
   j =1 %to (i-1) and j = (i+1) %to 1151;
   proc reg data = p_stepwise outest = p_reg_log&j noprint;
         model p&i = p1 - p&j  / edf selection = stepwise slstay = 0.10;
            output out= p_stepwise_reg p = pre&i;
   run;
%end;
%mend regl;
%regl;

附件列表

p.txt

大小:248.52 KB

马上下载

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

suzhzh

2011-7-15 12:36:16

Interesting question and hope somebody can solve this

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

redaring

2011-7-15 13:44:42

求助，求助！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

redaring

2011-7-15 17:54:13

up!up!
Help me, please!
Thanks!

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

bobguy

2011-7-16 10:10:07

redaring 发表于 2011-7-14 22:19
数据中有1153个变量，其中后面的1151个变量是我想要进行分析的数据。我想每一个变量都作一次逐步回归，即变量p1作为y时，其他剩余的p2-p1151都作为进行逐步回归的x，算完p1以后，再以p2作为y，p1和p3-p1151作为x的进行逐步回归。并且每个变量的观测值最多只有30个，而变量一共有一千多个，所以希望限制最终保留在方程中的最优的x的数量（例如10个）。
因为数据中并不是每个变量都有完整30个数据，有些是缺失的，所以我是想通过建立每个变量的逐步回归方程，然后数据缺失的地方用方程计算的预测值进行插值。
我想这应该需要写一个循环，而且我希望所有变量的参数可以都放在一张表上，所有变量的预测值放在一张表上。
我尝试写了一下，但是运行的时候说宏解释错误。
希望哪位大侠帮忙修改一下。我对宏和循环不太了解，所以希望可以帮忙写出完整的语句。附件是我说的数据
非常感谢！

%macro reg1(i,j) ;
%do i = 2 %to 1151 ;
   j =1 %to (i-1) and j = (i+1) %to 1151;
   proc reg data = p_stepwise outest = p_reg_log&j noprint;
         model p&i = p1 - p&j  / edf selection = stepwise slstay = 0.10;
            output out= p_stepwise_reg p = pre&i;
   run;
%end;
%mend regl;
%regl;

It is a really good example of  spurious regression.

In the example below, the p1-p50 are IID ( no any relationship at all). But the selection 5 out of 49 will result in same stattistic "meaning" outcomes. Remember 5 out of 49 will end up 49*48*47*46*45/5*4*3*2*1=1906884 possible regressions. If there is 1/1000 chance of showing some "meaning" outcomes. It will be 1900 results.

Secondly given the size of variables, the computing time is enormous number.

The macro coding is not hard. But the whole thing is meaningless. Do you still want to do it?

                                       The REG Procedure
                                          Model: MODEL1
                                    Dependent Variable: p1
                           Number of Observations Read       30
                           Number of Observations Used       30

                                    Analysis of Variance
                                             Sum of          Mean
         Source                DF       Squares       Square F Value    Pr > F
         Model                   5    17.58377       3.51675    9.51 0.00004
         Error                   24       8.87974       0.36999
         Corrected Total       29    26.46351

                     Root MSE             0.60827 R-Square    0.6645
                     Dependent Mean       0.10530 Adj R-Sq    0.5945
                     Coeff Var          577.63000

                                       Parameter Estimates
                                    Parameter    Standard
               Variable    DF    Estimate       Error t Value Pr > |t|
               Intercept    1    -0.17922       0.12820    -1.40    0.17491
               p8          1       0.65484       0.11577    5.66    <.00001
               p20          1    -0.50819       0.13802    -3.68    0.00117
               p25          1       0.27015       0.10986    2.46    0.02153
               p28          1       0.51935       0.12975    4.00    0.00052
               p47          1       0.60056       0.13900    4.32    0.00023

data p_stepwise;
array p(*) p1-p50;
do i=1 to 30;
do j=1 to dim(p);
p[j]=rannor(897);
end;
output;
end;
drop i j;
run;
　
proc reg data = p_stepwise ;
model p1 = p2 - p50 / selection = RSQUARE slstay = 0.10 stop=5;

run;
quit;
proc reg data = p_stepwise outest = est ;
model p1 = p8 p20 p25 p28 p47 ;

run;
quit;

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

zkymath

2011-8-4 16:25:26

看En文真是很难受，没办法

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群