全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
2079 3
2014-01-20
如题,在数据分析中,遇到群组抽样的问题,但是不知道哪里有相关方面的书籍或文献可以参考学习,特别是在SAS程序设计方面的,求坛子里的大侠们赐教!多谢多谢!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-3-5 11:38:31
同样需要,楼主若有了,希望可以拿出来分享!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-4-9 16:50:52
在sas 程序示例中就有,没有比这个更详细的介绍了。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-5-6 04:22:39
Usage Note 24555: Using PROC SURVEYSELECT for single-stage cluster sampling
Details        About        Rate It       
Beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3, use the SAMPLINGUNIT or CLUSTER statement to name variables that identify the sampling units as groups of observations (clusters).
For example, suppose you have 10 different clusters with one to five people per cluster.
      data A;
        do ClusterID=1 to 10;
          do i=1 to 1+int(5*ranuni(34920));
            if i=1 then PersonID=0;
            PersonID+1;
            output;
          end;
        end;
        drop i;
        run;
These statements select a simple random sample of three clusters without replacement:
      proc surveyselect data=a out=sample method=srs sampsize=3 seed=377183 noprint;
        samplingunit ClusterID;
        run;
Cluster sampling involves sampling units that are groups or clusters, each consisting of one or more subunits. Often, a listing of clusters is available while the complete listing of subunits or observations within clusters is not. Clusters can be sampled, and an enumeration of subunits obtained later for data collection or further subsampling. Even if an enumerated list is available, there could be other constraints on collecting data from units that are selected randomly from among the entire population, and cluster sampling is done instead. (Note that when a listing of all subunits is available, estimates based on a random sample from the entire population are often more precise than those obtained from a cluster sample. This is because of the tendency for units within clusters to be more alike than units between clusters.)
If a listing of the entire target population is available and you want to carry out a cluster sample, the following shows how PROC SURVEYSELECT can be used in releases prior to SAS 9.2 TS2M3. The steps are to identify the individual clusters, select a random sample of clusters, and then collect all the original observations from each sampled cluster.
Using the same 10 cluster data set above, first identify the individual clusters:
      proc freq data=A noprint;
        tables ClusterID / out=ClusterIDList(drop=count percent);
        run;
The following statements select a simple random sample without replacement of three of the cluster ID's:
      proc surveyselect data=ClusterIDList out=ClusterSample method=srs n=3 noprint;
        run;
Collect all the observations for each sampled cluster from the original data set to create the final sample:
      data Sample;
        merge ClusterSample(in=sample) A(in=all);
        by ClusterID;
        if Sample and All;
        run;
The IN= data set option creates a new variable that indicates whether the data set contributes to the current observation. Using the MERGE and BY statements above to match-merge the sample of clusters with the original data, and then subsetting using the IF statement causes only those CLUSTERIDs that exist in both the sample of clusters and the original data set to be included in the SAMPLE data set.
      proc print data=Sample;
        run;


Operating System and Release Information
Product Family        Product        System        SAS Release
Reported        Fixed*
SAS System        SAS/STAT        All        n/a       
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群