全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
8041 8
2014-07-25
悬赏 300 个论坛币 已解决


数据集A,3个变量:股票代码(code)、成交时间(date)、成交金额(volume),如:

000001 200203 5200
000001 200204 8100
000001 200205 3600
...
000002 200203 3300
000002 200204 4500
000002 200205 6700
...

现在想根据数据集A所有股票的成交金额确定2个分位点(0.1,0.9),将数据集A分为三组,然后对每只股票分别求三组(低、中、高)的样本数、总成交金额,我自己写的程序如下:

proc univariate data=a;
     var volume;
     output out=a_pct pctlpre=p pctlpts=(10 90);
run;

data b;
set a;
if _n_=1 then set a_pct;
if volume<=p10 then group1=1;
else if volume>p90 then group1=3;
else group1=2;
run;

proc sql;
create table result as
select count(group1) as num, sum(volume) as sumv from b
group by code group1;
quit;
run;


我给的例子是每个月的,但我的实际数据是每天的,数据量特别大,但上面的程序太慢了,哪位高手帮我改个快捷一点的程序?

最佳答案

bobguy 查看完整内容

In the following simulation the data set contains 1000 stocks with transactions from Jan 1, 1980 to Jan 1, 2014. The total obersavtions are over 12,000,000. Real time ~2.40 + 6.21 seconds CPU time ~ 8.59 + 14.13 seconds *****************log**************; 200 201 proc means data=stock p10 p90 noprint; 202 var volume; 203 output out=pctl p10=p10 p90=p90; 204 run; NOTE: There wer ...
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-7-25 22:05:50
In the following simulation the data set contains 1000 stocks with transactions from Jan 1, 1980 to Jan 1, 2014. The total obersavtions are over 12,000,000.

Real time ~2.40 + 6.21 seconds
CPU time ~ 8.59  + 14.13 seconds

*****************log**************;
200
201  proc means data=stock p10 p90 noprint;
202  var volume;
203  output out=pctl p10=p10 p90=p90;
204  run;

NOTE: There were 12432420 observations read from the data set WORK.STOCK.
NOTE: The data set WORK.PCTL has 1 observations and 4 variables.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           2.40 seconds
      cpu time            8.59 seconds


205
206  data pctl_fmt;
207    set pctl;
208    length start $20.;
209    fmtname='group';
210    start='low' ; end=put(p10,best.); label=1;output;
211    start=put(p10,best.); ; end=put(p90,best.); label=2;output;
212    start=put(p90,best.); ; end='high'; label=3;output;
213
214    run;

NOTE: There were 1 observations read from the data set WORK.PCTL.
NOTE: The data set WORK.PCTL_FMT has 3 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


215
216    proc format cntlin=pctl_fmt;
NOTE: Format GROUP is already on the library WORK.FORMATS.
NOTE: Format GROUP has been output.
217    run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

NOTE: There were 3 observations read from the data set WORK.PCTL_FMT.

218
219   data stock_view/view=stock_view;
220     set stock;
221     group=put(volume,group.);
222     run;

NOTE: DATA STEP view saved on file WORK.STOCK_VIEW.
NOTE: A stored DATA STEP view cannot run under a different operating system.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


223
224    proc means data=stock_view n sum noprint;
225    class code group ;
226    var volume;
227    output out=sum n=count  sum=sum_volume;
228    run;

NOTE: View WORK.STOCK_VIEW.VIEW used (Total process time):
      real time           6.21 seconds
      cpu time            14.13 seconds

NOTE: There were 12432420 observations read from the data set WORK.STOCK.
NOTE: There were 12432420 observations read from the data set WORK.STOCK_VIEW.
NOTE: The data set WORK.SUM has 4008 observations and 6 variables.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           6.22 seconds
      cpu time            14.13 seconds
***********************************************************;

data stock;
  do date='1jan1980'd to '1jan2014'd;
     do code=1000 to 2000;
           volume=ceil(ranuni(123)*10000);
           output;
         end;
  end;
run;

proc print data=stock(obs=10);
run;

proc means data=stock p10 p90 noprint;
var volume;
output out=pctl p10=p10 p90=p90;
run;

data pctl_fmt;
  set pctl;
  length start $20.;
  fmtname='group';
  start='low' ; end=put(p10,best.); label=1;output;
  start=put(p10,best.); ; end=put(p90,best.); label=2;output;
  start=put(p90,best.); ; end='high'; label=3;output;
  
  run;

  proc format cntlin=pctl_fmt;
  run;

data stock_view/view=stock_view;
   set stock;
   group=put(volume,group.);
   run;

  proc means data=stock_view n sum noprint;
  class code group ;
  var volume;
  output out=sum n=count  sum=sum_volume;
  run;

  proc print data=sum;run;
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-7-25 22:26:56
你先确定,上面3段程序是哪一段最慢。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-7-25 22:38:21
zhanglianbo35 发表于 2014-7-25 22:26
你先确定,上面3段程序是哪一段最慢。
都很慢,我数据至少好几个G
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-7-25 22:44:56
几个G不算大,可以造index,后面两步用一个sql搞定,create view result as, 然后在查询预览部分view中的结果
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-7-25 22:50:49
zhanglianbo35 发表于 2014-7-25 22:44
几个G不算大,可以造index,后面两步用一个sql搞定,create view result as, 然后在查询预览部分view中的结 ...
能帮忙写一下吗?我现在还算新手,只会用比较基础的
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群