Heckman Two stages

6676

收藏 2012-05-13

我从官方网站上找到一个Heckman Two stages的宏，来自：http://support.sas.com/kb/24/979.html 。原始代码也附件了在这里。发现最后一部分/***** Estimate Consistent Standard Errors of OLS Stage *****/，用 proc iml 部分报错，我用的是SAS 9.3 这段原代码和报的错如下：请问是版本不兼容吗？该怎么调整，谢谢啦！

Heckman.txt
大小:(9.46 KB)

马上下载

。。。。
。。。。
/***** Estimate Consistent Standard Errors of OLS Stage *****/
title2 'Consistent Estimates of Standard Errors for Second Stage (OLS)';
proc iml;

/* First, calculate asymptotic variance-covariance matrix of
the probit estimates.  SAS isn't very friendly and doesn't
allow us to save them from the probit estimation.
Be sure to check these estimates against those produced by
the probit procedure above.

See Greene, Econometric Analysis, pp. 677-678 for formulae.  */

use  x;
read all var _all_ into x;

use h;
read all var _all_ into h;
k=ncol(x);
n=nrow(h);
invsig=J(k,k,0);
do i= 1 to n;
invsig=invsig+J(k,k,h[i,])#(x[i,]`*x[i,]);
end;
sig=inv(invsig);

prbtnm={INTERCEP %prbtrhs};
print,"Asymptotic Variance-Covariance Matrix",
   "of First Stage (Probit) Coefficients",
   sig[r=prbtnm c=prbtnm format=12.6];

free x h invsig;
/* Now estimate the selection-corrected standard error for the Second
Stage (OLS) */
/* Get estimate of coefficient on lambda from olsest, the dataset
containing the ols estimates; SAS is nice in that we don't need
to keep track of which element of the beta vector has the coefficient,
since it's named. */
use olsest;
read all var{lambda} into theta;

/* deltas */
use delta var{delta};
read all var{delta} into delta;

deltabar=sum(delta)/nrow(delta);

/* residuals */
use err var{e};
read all var{e} into e;

/* calculate adjusted standard error */
sigsqe=e`*e/nrow(e)+theta**2*deltabar;
sige=sqrt(sigsqe);
print,"Standard Error of Second Stage (OLS)",
   "Corrected for Selection",
   sige[format=12.4];

numrowe=nrow(e);
free e ;

/* calculate rho squared */
rhosq=theta**2/sigsqe;
rho=(theta/abs(theta))*sqrt(rhosq);
print,"Corrlection of Disturbance in Regression",
   "and Selection Criterion (Rho)",rho[format=8.4];

use xstar;
read all var _all_ into xstar;
use w;
read all var _all_ into w;

/* Calculate Consistent Standard Errors
See Greene, Econometric Analysis, pp. 744-747 for formulae */
delcol=delta;
do i=1 to ncol(w)-1;
delcol=delcol||delta;
end;
cdeltaw=delcol#w;

free delcol;

delcol=delta;
do i=1 to ncol(xstar)-1;
delcol=delcol||delta;
end;
cdeltaxs=delcol#xstar;

free delcol;

/****    Version 1.3 (January 1993)    ****/
/* cdeltaw=capdelta*w, cdeltaxs=capdelta*xstar */
/* where capdelta=diag(delta).  capdelta is */
/* n x n, wherease cdelw is n x ncol(w) and */
/* cdelxstr is n x ncol(xstar).  This       */
/* reduces memory use.                      */
/***********************************************/

Q=rhosq*(xstar`*cdeltaw)*sig*(w`*cdeltaxs);

Irhosqd=1-rhosq*delta;
delcol=Irhosqd;

free delta;

do i=1 to ncol(xstar)-1;
delcol=delcol||Irhosqd;
end;
Irsdltxs=delcol#xstar;

free Irhosqd delcol;

/****          Version 1.3 (January 1993)          ****/
/* Irsdltxs=(ident(nrow(capdelta))-rwhosq*capdelta)*xstar */
/* again, this is an n x nrow(xstar) matrix, rather than  */
/* needing capdelta, which is n x n.                   */
/**********************************************************/

asyvcov=sigsqe*inv(xstar`*xstar)*
      (xstar`*Irsdltxs+ Q)*
      inv(xstar`*xstar);

olsnm={INTERCEPT %olsrhs LAMBDA};
print ,"Consistent Asymptotic Covariance Matrix of Estimates",
   "in Second Stage (OLS)",asyvcov[r=olsnm c=olsnm format=12.6];

asyse=sqrt(vecdiag(asyvcov));

use olsest;
read all var{INTERCEPT %olsrhs LAMBDA} into coeff;
variable=coeff`||asyse||(coeff`/asyse)||
      2*(1-probt(abs(coeff`/asyse),numrowe-nrow(coeff)));
colnm={"Coeff." "Std. Err." "T-Ratio" "P Value"};
print ,,,"Parameter Estimates and ",
   "Consistent Asymptotic Standard Errors of Estimates",
   "in Second Stage (OLS)",variable[r=olsnm c=colnm format=12.4];

quit;

报的错如下：
NOTE: IML Ready
ERROR: (execution) Invalid argument or operand; contains missing values.

operation : `* at line 2584 column 59
operands  : _TEM1003, _TEM1005
_TEM1003    1 row    27 cols (numeric)
_TEM1005    1 row    27 cols (numeric)

statement : ASSIGN at line 2584 column 25
ERROR: (execution) Matrix should be non-singular.

operation : INV at line 2584 column 80
operands  : invsig
invsig    27 rows    27 cols (numeric)

statement : ASSIGN at line 2584 column 73
ERROR: Matrix sig has not been set to a value.

statement : PRINT at line 2584 column 119
ERROR: (execution) Invalid argument or operand; contains missing values.

operation : `* at line 2588 column 210
operands  : e, e
e 1191 rows    1 col    (numeric)
e 1191 rows    1 col    (numeric)

statement : ASSIGN at line 2588 column 201
ERROR: (execution) Matrix has not been set to a value.

operation : SQRT at line 2588 column 249
operands  : sigsqe

sigsqe    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2588 column 240
ERROR: Matrix sige has not been set to a value.

statement : PRINT at line 2590 column 4
ERROR: (execution) Matrix has not been set to a value.

operation : / at line 2590 column 151
operands  : _TEM1001, sigsqe

_TEM1001    1 row    1 col    (numeric)

7425.3869

sigsqe    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2590 column 137
ERROR: (execution) Matrix has not been set to a value.

operation : SQRT at line 2590 column 187
operands  : rhosq

rhosq    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2590 column 160
ERROR: Matrix rho has not been set to a value.

statement : PRINT at line 2590 column 196
ERROR: (execution) Invalid argument or operand; contains missing values.

operation : `* at line 2592 column 111
operands  : xstar, cdeltaw
xstar 1191 rows    29 cols (numeric)
cdeltaw 1191 rows    27 cols (numeric)

statement : ASSIGN at line 2592 column 96
ERROR: (execution) Matrix has not been set to a value.

operation : * at line 2592 column 156
operands  : rhosq, delta

rhosq    0 row    0 col    (type ?, size 0)

delta 1191 rows    1 col    (numeric)

statement : ASSIGN at line 2592 column 141
ERROR: (execution) Matrix has not been set to a value.

operation : MOVE at line 2592 column 164
operands  : Irhosqd

Irhosqd    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2592 column 164
ERROR: (execution) Matrix has not been set to a value.

operation : || at line 2592 column 235
operands  : delcol, Irhosqd

delcol    0 row    0 col    (type ?, size 0)

Irhosqd    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2592 column 222
ERROR: (execution) Matrix has not been set to a value.

operation : # at line 2594 column 17
operands  : delcol, xstar

delcol    0 row    0 col    (type ?, size 0)

xstar 1191 rows    29 cols (numeric)

statement : ASSIGN at line 2594 column 2
ERROR: (execution) Invalid argument or operand; contains missing values.

operation : `* at line 2594 column 79
operands  : xstar, xstar
xstar 1191 rows    29 cols (numeric)
xstar 1191 rows    29 cols (numeric)

statement : ASSIGN at line 2594 column 54
ERROR: Matrix asyvcov has not been set to a value.

statement : PRINT at line 2594 column 180
ERROR: (execution) Matrix has not been set to a value.

operation : VECDIAG at line 2598 column 90
operands  : asyvcov

asyvcov    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2598 column 72
ERROR: (execution) Matrix has not been set to a value.

operation : / at line 2598 column 197
operands  : _TEM1003, asyse
_TEM1003    29 rows    1 col    (numeric)

asyse    0 row    0 col    (type ?, size 0)

statement : ASSIGN at line 2598 column 166
ERROR: Matrix variable has not been set to a value.

statement : PRINT at line 2602 column 66
NOTE: Exiting IML.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

bobguy

2012-5-13 21:16:35

You forget this macro. It is simpler to write your own code or use QLIM is SAS/ETS.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

bobguy

2012-5-13 22:13:35

Here is a simple way for heckman's 2 step sample selection model.

1) first step probit regression
2) usual regression with an additional variable - inverse mills ratio to correct the selection bias.

%let rho=0.8;
%let size=10000;

data binormal;
rho=ρ
a1=sqrt((1+rho)/2);
a2=sqrt((1-rho)/2);
do i=1 to &size;
   rd1=rannor(12390);
   rd2=rannor(12390);
   e1=a1*rd1+a2*rd2;
   e2=a1*rd1-a2*rd2;
   output;
end;
run;

*verify the sample data;
proc corr;
var e1 e2;
run;

data simu_data;
set binormal;
*participation eq;
w=rannor(12340);
z=(1+2*w > e1);
*observe y;
x=rannor(12340);
if z=1 then y=3+3*x+e2;
else y=.;
*err=0;
run;

title 'selection biaed results with OLS<<<';
proc reg data=simu_data;
model y=x;
where y ne .;
run;
quit;

title '2-step appraoch 1-probit model 2-inverse mills ratio';
proc logistic data=simu_data desc;
model z=w/link=probit;
output out=simu_data2 xbeta=xbeta;
run;

*calculte inverse mills ratio;
data simu_data2;
set simu_data2;
imr=pdf('NORMAL',xbeta)/cdf('NORMAL',xbeta);
run;

proc reg data=simu_data2;
model y=x imr;
where y ne .;
run;
quit;

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

edwardzxf

2012-5-13 22:48:57

bobguy 发表于 2012-5-13 22:13
Here is a simple way for heckman's 2 step sample selection model.

1) first step probit regression ...

thanks, your approach is easy to apply,however, one improvement that I want to make is to estimate consistent standard errors of OLS stage, which is not taken into consideration in yours. It would be nice if you can figure it out. thanks again.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

爱萌

2012-5-13 23:31:46

谢谢，你们，我又学了一点，呵呵

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

bobguy

2012-5-14 01:12:18

edwardzxf 发表于 2012-5-13 22:48
thanks, your approach is easy to apply,however, one improvement that I want to make is to estimate ...

For this problem, the maximum likelihood estimation is readily available.

title 'results from heckman approaches ---QLIM';
proc qlim data=simu_data;
   model z = w /discrete (d=normal);
   model y = x / select(z=1);
run;

title 'results from heckman approaches ---nlmixed';
proc nlmixed data=simu_data;
bounds s> 0, -1<r<1;
parms a=2 b=2 c=1 d=1 s=1 r=0.5;
*selection function;
xbeta=c+d*w;
p=probnorm(xbeta);
if z=0 then l=log(1-p);
else if z=1 then do;
   e=y-(a+b*x);
      l2=(1/(sqrt(2*3.1415927)*s))*exp(-(e**2)/(2*s**2));
      l3=probnorm((xbeta+r*e/s)/sqrt(1-r**2));
   l=log(l2)+log(l3);
end;
MODEL  z ~general(l);

run;

Heckman proposed the method in 70's. The computation power is not like today.

There are other estimates available. I need to dig up my notes.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群