我也试试。@邓贵大 的代码很牛!但是我不知道如果同样年龄的2个人有不同的收入水平会不会导致问题。我写了一个macro。献丑了。test数据的生成使用了@邓贵大的代码。
**生成一个测试用的数据;
data test;
do id=1 to 100;
age = 18+ceil(60*ranuni(12345));
income = 100000+20000*rannor(12345);
output;
end;
run;
*生成一个临时数据;
data one;
input avg_income age;
cards;
. .
;
run;
**注:只要指定的这个max_age 值大于数据中的最大年龄就可以。delta就是相差的年龄;
%macro est(max_age,delta);
data age_mean_income;
set one;
run;
%let j=1;
%do %while (&j.<=&max_age.);
data temp;
set test;
where &j.-&delta<=age<=&j.+&delta.;
run;
proc means data=temp noprint mean;
var income;
output out=mean_income mean=avg_income;
run;
data mean_income;
set mean_income;
age=&j.;
drop _type_ _freq_;
run;
proc append base=age_mean_income data=mean_income;
run;
%let j=%eval(&j.+1);
%end;
proc sort data=age_mean_income;
by age;
run;
proc sort data=test;
by age;
run;
data income;
merge age_income test(in=a);
by age;
if a^=1 then delete;
run;
%mend;
%est(100,2);
I used data step + array to solve the problem. Here Age has to be positive integer. Since no table merge involved, the efficiency is very high. The limit of size of data can be run in this code depends on your data strucutre. I can run the code on the data set of 7 million records, 100 disctinct BY-level, age from 1 to 101 (uniformly distributed), and for 2 analyzed variables in about 12s.
Jingju