原数据:
姓名 日期 收入
AAA 200803 5300
AAA 200806 5500
AAA 200809 5800
AAA 2008012 5700
BBB 200806 6400
BBB 2008012 6900
现在假设对AAA来说4,5月收入和3月的一样,7,8月收入和6月的一样,10,11月收入和9月的一样,对BBB来说,7,8,9,10,11月的收入和6月的一样。
如何把这些缺失的月份和收入加进去?最后生成:
姓名 日期 收入
AAA 200803 5300
AAA 200804 5300
AAA 200805 5300
AAA 200806 5500
AAA 200807 5500
AAA 200808 5500
AAA 200809 5800
AAA 200810 5800
AAA 200811 5800
AAA 2008012 5700
BBB 200806 6400
BBB 200806 6400
BBB 200806 6400
BBB 2008012 6900

proc sql;
insert into _datatable_
values()
.......;
run;
proc sort data=_datatable_;
by 姓名 日期 ;
run;
data test;
infile datalines missover;
input name : $3. date : 6. income : 4.;
datalines;
AAA 200803 5300
AAA 200806 5500
AAA 200809 5800
AAA 200812 5700
BBB 200806 6400
BBB 200812 6900
;
run;
proc sort data=test out=testsort;
by name date;
run;
data result;
set testsort;
by name date;
retain ic1 0 dt1 0;
if first.name then
do;
ic1=income;
dt1=date;
output;
end;
if not first.name then
do;
dt2=date;
ic2=income;
output;
do i=dt1 +1 to dt2-1;
income=ic1;
date=i;
output;
end;
ic1=ic2;
dt1=dt2;
end;
keep name date income;
run;
proc sort data=result;
by name date;
run;
proc print;
run;
现在假设再把数据复杂化一些:
1,如果有前一年的数据,比如要将200801 and 200802的数据和200712月的相同;
2,收入的品种增加了,有工资,奖金,津贴
原数据:
姓名 | 日期 | 收入种类 | 金额 |
AAA | 200712 | 工资 | 5000 |
AAA | 200712 | 奖金 | 2000 |
AAA | 200712 | 津贴 | 1000 |
AAA | 200803 | 工资 | 5400 |
AAA | 200806 | 工资 | 5500 |
BBB | 200710 | 工资 | 3500 |
BBB | 200710 | 奖金 | 3600 |
BBB | 200804 | 工资 | 6100 |
仍要把缺失了的月份的值补上。 假设后面的月份和前一次出现的月份的收入种类和金额都相同,
即:对AAA来说,200801 and 200802的数据和200712的相同;200804 and 200805的数据和200803的相同(收入种类和金额均相同);对BBB,200711, 200712, 2009801, 200802, 200803 的数据和200710的相同。 如何把这些缺失的值加进去?
望高手指点!希望变成这样的数据:
姓名 | 日期 | 收入种类 | 金额 |
AAA | 200712 | 工资 | 5000 |
AAA | 200712 | 奖金 | 2000 |
AAA | 200712 | 津贴 | 1000 |
AAA | 200801 | 工资 | 5000 |
AAA | 200801 | 奖金 | 2000 |
AAA | 200801 | 津贴 | 1000 |
AAA | 200802 | 工资 | 5000 |
AAA | 200802 | 奖金 | 2000 |
AAA | 200802 | 津贴 | 1000 |
AAA | 200803 | 工资 | 5400 |
AAA | 200804 | 工资 | 5400 |
AAA | 200805 | 工资 | 5400 |
AAA | 200806 | 工资 | 5500 |
BBB | 200710 | 工资 | 3500 |
BBB | 200710 | 奖金 | 3600 |
BBB | 200711 | 工资 | 3500 |
BBB | 200711 | 奖金 | 3600 |
BBB | 200712 | 工资 | 3500 |
BBB | 200712 | 奖金 | 3600 |
BBB | 200801 | 工资 | 3500 |
BBB | 200801 | 奖金 | 3600 |
BBB | 200802 | 工资 | 3500 |
BBB | 200802 | 奖金 | 3600 |
BBB | 200803 | 工资 | 3500 |
BBB | 200803 | 奖金 | 3600 |
BBB | 200804 | 工资 | 6100 |
希望四楼的zhitler再次给与帮助!多谢了!
data origin;
length item $10;
input name $ date : yymmn6. item $ money;
cards;
AAA 200712 salary 5000
AAA 200712 bonus 2000
AAA 200712 allowance 1000
AAA 200803 salary 5400
AAA 200806 salary 5500
BBB 200710 salary 3500
BBB 200710 bonus 3600
BBB 200804 salary 6100
;
proc sort data=origin;
by name descending date;
run;
data result;
set origin;
by name descending date;
nextdate=lag(date);
if first.name then nextdate=.;
if date and nextdate then datedif=datdif(date,nextdate,'30/360');
retain dif;
if first.date then dif=datedif;
if dif then i=dif/30;
if dif le 1 or dif=. then output;
else do j=0 to i-1;
date=date+30;
output;
end;
format date yymmn6.;
keep name date item money;
run;
proc sort data=result;
by name date;
run;
The result:
Obs name item date money
1 AAA salary 200712 5000
2 AAA bonus 200712 2000
3 AAA allowance 200712 1000
4 AAA salary 200801 5000
5 AAA bonus 200801 2000
6 AAA allowance 200801 1000
7 AAA salary 200802 5000
8 AAA bonus 200802 2000
9 AAA allowance 200802 1000
10 AAA salary 200803 5400
11 AAA salary 200804 5400
12 AAA salary 200805 5400
13 AAA salary 200806 5500
14 BBB salary 200710 3500
15 BBB bonus 200710 3600
16 BBB salary 200711 3500
17 BBB bonus 200711 3600
18 BBB salary 200712 3500
19 BBB bonus 200712 3600
20 BBB salary 200801 3500
21 BBB bonus 200801 3600
22 BBB salary 200802 3500
23 BBB bonus 200802 3600
24 BBB salary 200803 3500
25 BBB bonus 200803 3600
26 BBB salary 200804 6100
data origin;
length item $10;
input name $ date : yymmn6. item $ money;
cards;
AAA 200712 salary 5000
AAA 200712 bonus 2000
AAA 200712 allowance 1000
AAA 200803 salary 5400
AAA 200806 salary 5500
BBB 200710 salary 3500
BBB 200710 bonus 3600
BBB 200804 salary 6100
;
proc sort data=origin;
by name descending date;
run;
data result;
set origin;
by name descending date;
nextdate=lag(date);
if first.name then nextdate=.;
if date and nextdate then datedif=datdif(date,nextdate,'30/360');
retain dif;
if first.date then dif=datedif;
if dif then i=dif/30;
if dif le 1 or dif=. then output;
else do j=0 to i-1;
date=date+30;
output;
end;
format date yymmn6.;
keep name date item money;
run;
proc sort data=result;
by name date;
run;

data origin;
length item $10;
input name $ date : yymmn6. item $ money;
cards;
AAA 200712 salary 5000
AAA 200712 bonus 2000
AAA 200712 allowance 1000
AAA 200803 salary 5400
AAA 200806 salary 5500
BBB 200710 salary 3500
BBB 200710 bonus 3600
BBB 200804 salary 6100
;
proc sort data=origin;
by name descending date;
run;
data result;
set origin;
by name descending date;
nextdate=lag(date);
if first.name then nextdate=.;
if date and nextdate then datedif=datdif(date,nextdate,'30/360');
retain dif;
if first.date then dif=datedif;
if dif then i=dif/30;
if dif le 1 or dif=. then output;
else do j=0 to i-1;
date=date+30;
output;
end;
format date yymmn6.;
keep name date item money;
run;
proc sort data=result;
by name date;
run;
出现这种情况:
如果下一个月按前一个月加30天来得出,可能出现1月后面是3月,缺少了2月。
比如198912后面是199001,但再后面就出现了199003。
如果把上面的AAA的第一个日期改为198912,就会发现没有199002了;
是否与一开始把日期 input date yymmn6.有关?因为我要输入的日期其实是每个月的月末,但如果用format date yymmddn8. 可以发现其实输入了月初。
望高手解答一下!
关键不在月初 月末,主要还在于30/360这个假设。
如果是闰年,比如1990,2月只有28天,1月31,加30天就过了2月,到了3月1,再加30天,是3月31,还在3月。所以出现没有2月却有两个3月的情况!
这个问题如何解决??
data origin;
length item $10;
input name $ year 5-8 month 9-10 item $ money;
monnum=year*12+month;
cards;
AAA 198912 salary 5000
AAA 198912 bonus 2000
AAA 198912 allowance 1000
AAA 200803 salary 5400
AAA 200806 salary 5500
BBB 200710 salary 3500
BBB 200710 bonus 3600
BBB 200804 salary 6100
;
proc sort data=origin;
by name descending monnum;
run;
data result;
set origin;
by name descending monnum;
nextmon=lag(monnum);
if first.name then nextmon=.;
if monnum and nextmon then dif=nextmon-monnum;
retain dif1;
if first.monnum then dif1=dif;
if dif1 le 1 or dif1=. then output;
else do i=0 to dif1-1;
monnum1=monnum+i;
year=int(monnum1/12);
month=monnum1-year*12;
mod=mod(monnum1,12);
if mod=0 then do;
year=year-1;
month=12;
end;
output;
end;
keep name item year month money;
run;
data result1;
set result;
yearchar=trim(left(year));
monchar=trim(left(month));
if length(monchar)=1 then monchar="0"||monchar;
yymm=trim(yearchar)||monchar;
date=input(yymm,yymmn6.);
format date yymmn6.;
keep date name item money;
run;
proc sort data=result1;
by name date;
run;
data origin;
length item $10;
input name $ year 5-8 month 9-10 item $ money;
monnum=year*12+month;
cards;
AAA 198912 salary 5000
AAA 198912 bonus 2000
AAA 198912 allowance 1000
AAA 200803 salary 5400
AAA 200806 salary 5500
BBB 200710 salary 3500
BBB 200710 bonus 3600
BBB 200804 salary 6100
;
proc sort data=origin;
by name descending monnum;
run;
data result;
set origin;
by name descending monnum;
nextmon=lag(monnum);
if first.name then nextmon=.;
if monnum and nextmon then dif=nextmon-monnum;
retain dif1;
if first.monnum then dif1=dif;
if dif1 le 1 or dif1=. then output;
else do i=0 to dif1-1;
monnum1=monnum+i;
year=int(monnum1/12);
month=monnum1-year*12;
mod=mod(monnum1,12);
if mod=0 then do;
year=year-1;
month=12;
end;
output;
end;
keep name item year month money;
run;
data result1;
set result;
yearchar=trim(left(year));
monchar=trim(left(month));
if length(monchar)=1 then monchar="0"||monchar;
yymm=trim(yearchar)||monchar;
date=input(yymm,yymmn6.);
format date yymmn6.;
keep date name item money;
run;
proc sort data=result1;
by name date;
run;
Perfect!
Pobel, thanks so much!
扫码加好友,拉您进群



收藏
