全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
2539 2
2014-11-24

[size=13.63636302948px]Hi! I am a junior SAS analyst.


[size=13.63636302948px]I intend to split data into train and test sets, and use the model built from train set to predict data in test set, the number of observation is up to 50000 or more.


[size=13.63636302948px]the easiest way that I think of is to use the syntax "PROC SURVEYSELECT" to random-sample observations from whole data. For example,

[size=13.63636302948px]I may ask SAS to random-sample 30% as test set, (and the rest 70% is train set):


[size=13.63636302948px]PROC SURVEYSELECT DATA=whole.data OUT=test.set METHOD=srs SAMPRATE=0.3;

[size=13.63636302948px]RUN;


[size=13.63636302948px]Now, I have a test set in the dataset: 'test.set', however:


[size=13.63636302948px]1.how could I create a dataset (e.g. 'train.set') to accommodate the rest 70% data?

[size=13.63636302948px]2.After using 'train.set' to build a predictive model  (e.g. linear model), how could I use this model built in the 'train.set' to

[size=13.63636302948px]  predict data in the 'test.set'? and let the output revealing every predicted value and residual?


[size=13.63636302948px]Thanks for your patience!


[size=13.63636302948px]David



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-11-25 08:08:57
sssh307 发表于 2014-11-24 23:27
Hi! I am a junior SAS analyst.

I intend to split data into train and test sets, and use the model ...
看不懂
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-11-25 18:42:59
我用中文重打一次看看

我現在有一份50000多筆數據的資料檔,使用SAS軟體。 想要將其切割成train and test sets,用train set建模後,再用那個模型去對test set做預測。

首先我先用proc surveyselect分割了data,train set跟test set的資料數為7:3。並且已經篩出train set的data建立一個linear model,以下為SAS語法:

PROC SURVEYSELECT DATA=WORK.MERGED OUTALL OUT=all  METHOD=SRS  SAMPRATE=0.3;  
/*進行simple random sampling來分出train and test sets,SAMPRATE表示多少比例的觀察值為test set*/
RUN;
PROC PRINT DATA=all (obs=100);
RUN;
PROC FREQ DATA=all;
TABLES selected;              /*計算train (coded as 0) 與 test (coded as 1)set 底下分別的觀察值數量,確保data split正確*/
PROC REG DATA=all;     
where selected=0;               /*利用selected=0的數據作為train set來建模*/
MODEL y= x1-x1000/ stb;
RUN;
QUIT;

最後,我想了解,如何用估出來的模型,用其來預測test set中的數據 (跑出predicted values跟residual for per observations)?

非常感謝!!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群