加权最小二乘法回归，每一组第一个Y会被改成-1是为啥

zhouyijoy1988

1889

收藏 2013-12-13

各位亲爱的：
下面是我的原始数据前五条

NO	IND	YEAR	Y	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10
000892	0002	2000	-0.220610872	0.000000001253039639329720000	0.231898552	0.331953596	0.088139	20.87047503	1.301571724	0.174032405	0.169023479	0.695188711	0.268306
600834	0002	2000	-0.152963157	0.000000002059844887153580000	-0.0245712	0.42852705	0.100523	20.20161689	0.24863549	0.043036409	0.09321455	1.359202284	0.200086
000421	0002	2000	-0.140540342	0.000000001436834614561080000	0.112320096	0.404368262	0.035705	20.53049399	1.067407663	0.180926955	0.021184526	0.421314133	0.452166
000544	0002	2000	-0.13827197	0.000000000779274879521076000	0.272756108	0.27706574	-0.000384	20.96651651	1.928423226	0.41110824	-0.005036504	-0.012728888	0.488551
600662	0002	2000	-0.135518148	0.000000000825818041234081000	0.083971604	0.35501825	0.075335	21.0482187	0.721423604	0.183097691	0.202827047	0.099186507	0.465287

然后我用SAS进行加权最小二乘法的分行业IND分年度YEAR回归，代码如下：
** run the original regression to get the residuals**;
proc reg data=data noprint;
model y=x1-x10;
by IND YEAR;
output out=WORK.PRED r=residual;
run;
** compute the absolute and squared residuals**;
data work.resid;
  set work.pred;
  absresid=abs(residual);
proc reg data=work.resid noprint;
** run a regression with the absolute residuals vs. X to get the estimated standard deviation**;
model absresid=x1-x10;
by IND YEAR;
output out=WORK.s_weights;
run;
** compute the weights using the estimated standard deviations**;
data work.s_weights;
  set work.s_weights;
  s_weight=1/(abs(residual));
  label s_weight = "weights using absolute residuals";
** Do the weighted least squares using the weights from the estimated standard deviation**;
proc reg data=work.s_weights  outest=coef;
weight s_weight;
model y=x1-x10;
by IND YEAR;
run;
quit;
data coef;
set coef;
rename x1=a x2=b x3=c x4=d x5=e x6=f x7=g x8=h x9=i x10=j;
run;

data reg;
merge data coef;
by IND YEAR;
wr=y-Intercept-a*x1-b*x2-c*x3-d*x4-e*x5-f*x6-g*x7-h*x8-i*x9-j*x10;
run;

最终得到残差输出表，发现一个现象，就是每一行业每一年的第一条数据当中的Y值，无论它的原始数据是多少，都会被SAS改成-1.这样造成该条数据输出的残差是不对的。

NO	IND	YEAR	Y	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	wr
892	2	2000	-1	1.25E-09	0.231899	0.331954	0.088139	20.87048	1.301572	0.174032	0.169023	0.695189	0.268306	-0.93322
600834	2	2000	-0.15296	2.06E-09	-0.02457	0.428527	0.100523	20.20162	0.248635	0.043036	0.093215	1.359202	0.200086	-0.11876
421	2	2000	-0.14054	1.44E-09	0.11232	0.404368	0.035705	20.53049	1.067408	0.180927	0.021185	0.421314	0.452166	-0.09756
544	2	2000	-0.13827	7.79E-10	0.272756	0.277066	-0.00038	20.96652	1.928423	0.411108	-0.00504	-0.01273	0.488551	-0.08169
600662	2	2000	-0.13552	8.26E-10	0.083972	0.355018	0.075335	21.04822	0.721424	0.183098	0.202827	0.099187	0.465287	-0.11534

NO	IND	YEAR	Y	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	wr
892	2	2000	-1	1.25E-09	0.231899	0.331954	0.088139	20.87048	1.301572	0.174032	0.169023	0.695189	0.268306	-0.93322
600880	2	2001	-1	2.51E-09	0.424173	0.178917	0.079324	19.74005	0.610613	0.236453	0.087196	-0.26891	0.72113	-0.95491
600646	2	2002	-1	1.19E-09	0.78086	0.012734	-1.41708	18.3805	2.474379	0.075539	-5.95578	-0.17564	0.01696	-0.72805
600899	2	2003	-1	1.34E-09	0.176978	0.037479	-0.52829	19.91676	1.164224	-0.84587	-0.92142	0.204259	0.013242	-0.75888
600897	2	2004	-1	7.93E-10	0.216053	0.717263	0.063238	21.04809	0.061314	0.375615	-0.00278	0.865155	0.19876	-0.93913

请各位大神帮忙诊治~~太感谢了！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

peipei0805

2014-10-15 15:26:08

加q，我也在研究这个 379022704

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jingju11

2014-10-15 19:35:23

原因并不复杂。outest输出的数据里，应变量(Y)的值被设定为-1(SAS Default). Y in the 1st dataset 'data' will be overwritten by the Y in 2nd dataset 'coef' when merging together considering they share the common variable name (Y). As you said, that causes a wrong residual eventually. you can modify as

复制代码

or simply drop y:

复制代码

Jingju

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

栏目导航

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群