全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
4610 19
2007-09-05
In USA, one teacher must have some certificates to be assigned to teach a subject. One certificate could be used in several assignments, for example, teaching math needs either cert1 or cert2 or cert3.
There are two datasets, ‘certificate’ which shows how many certificates one 'id' has, at most 30 certs.
the ‘assignment’shows what type of certificates one assignment needs. If one assignment have more than 2 qualified_cert,
which means either of them could be counted.

data certificate;
input id (certArea1-certArea30) ($) ;
datalines;
1 32 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 32 . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 23 21 32 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 30 24 25 . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data assignment;
input assign $ (qualified_cert1- qualified_cert13) ($) ;
datalines;
a 11 . . . . . . . . . . . .
b 30 11 . . . . . . . . . . .
c 23 . . . . . . . . . . . .
d 32 23 . . . . . . . . . . .
;
run;

My question is, how to calculate how many people have the required certificate for
each assignment. If one person can be qualified for two jobs, then count ‘1/2’for each
assignment.

Basically, I have to first output each value of cert1-cert30 for each observation,
then use that value to check if this certificate meets the requirement of
assignment(go through the assignment dataset). To avoid count one person with
multiple certificates as multiple person, we have to go through the assignment
dataset with all his/her certificates for one teacher to get the total number of his
match, then we go through the assignment dataset again.

Would any body tell me how to write this code?


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2007-9-5 01:32:00
any one trying to help must keep one fact in mind that 'certificate' dataset is a over 500,000 records dataset. So optimization must be considered.
I tried one code with a lot of MACRO which took me 6-hour CPU time. So I don't think my code is usable and that's why I'm asking help for master of SAS.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-9-5 03:44:00

对于你的dataset有点疑惑,eg:

data certificate;
input id (certArea1-certArea30) ($) ;
datalines;
1 32 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 32 . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 23 21 32 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 30 24 25 . . . . . . . . . . . . . . . . . . . . . . . . . . .

如上,比如certArea1,为何它对应的cert的种类会不同?即,certarea1 在第二个第三个obs里面分别是32,23 -是说

第二三个人在certarea1里面分别取得了32和23分数么?(但你用的又是char)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-9-5 04:11:00
thanks for reading my question.

The following explanation might be helpful. You can assume certArea1 is the first Certificate he/she obtained. and CertArea30 is the last one.
So id ='1' obtained his certificate which is '32' ,for example ,32='math', id='3' obtained his certificate which is '23', for example 23='english'.

Let's put it in another way, the value of certArea1-certArea30 specify the area of certificate.

Please let me know if I haven't put it clear.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-9-5 09:49:00

希望是你要求的结果。

这里必须要说明一下,因为没有看你的sample,所以做这个假设- - - 科目种类,即23,32这样的,代表math或者eng的代码种类的数量是有限的!(或者说,即使数量很大,也有办法替换)- - - 虽然我也相信不会出现几百种科目-_-|||

那么:

首先,替换你的raw data values,比如23 32这类的,既然他们没有算术上的意义,那么一律替换为除1以外的素数,举个例子,如果整个
dataset中出现过的科目种类是11,21,23,32,那么,就把它们全部对应替换为3,5,7,11,如果科目种类更多,那么就以此替换下去,换成质数。同时,把certarea1-certarea30里面所有的missing values替换为 1.这个在data step中即使是50w obs相信也可以很快完成。

然后,在新的dataset中增加一个variable,比如叫做prod,作为certarea1-certarea30的乘积。

这样一来,只要各个assignment中的值能够被prod整除,就说明改obs 可以做这个assignment从而避免了你文中提到的用2个表的数据做cross comparison,也不存在人数重复计算的问题。

但是由于没有你的dataset sample,所以我不把整个code写完,下面是部分草稿,仅为了阐述我上面说的。

data certificate ;
input id (certArea1-certArea30) ($) ;
datalines ;
1 11 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 11 . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 5 13 11 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 7 17 19 . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ;
data assignment ;
input assign $ (certArea1-certArea30) ($) ;
datalines ;
a 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
b 7 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
c 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
d 11 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ; *这里我直接替换了数据 ;
data certificate1 ;
set certificate ;
array certa[30] certarea1 - certarea30 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
certa_n= input( certa, $8.) ;
end ;
drop i certarea1 - certarea30 ;
run ;

* 把char换成numeric ;

data certificate2 ;
set certificate1 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
if certa_n=. then certa_n= 1 ;
end ;
prod = 1 ;
do j =1 to 30 ;

prod =prod*certa_n[j] ;
end ;
keep id prod ;
run ; *加了一个var叫做prod ;
data certificate3 ;
set certificate2 ;
if mod(prod,7)=0 or mod(prod,3)=0 then qualified= 1 ;
else qualified = 0 ;
run ;

* 如果可以整除,那么qualified写为1 ,(这里我简略了下,就直接用assign a来测试了下。)



[此贴子已经被作者于2007-9-5 9:59:20编辑过]

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-9-5 11:12:00
I can see you really have strong math foundation. But I'm not sure if your result is correct.

Let's put it simpler. Let's suppose we only have four persons 'id=1,2,3,4. Each person has up to 3 certificates .
there are only four assignments and each assignment has up to 2 qualified certificates.

data certificate;
input id (certArea1-certArea3) ($) ;
datalines;
1 32 11 .
2 32 . 23
3 23 21 32
4 30 24 25
;
run;
data assignment;
input assign $ (qualified_cert1- qualified_cert2) ($) ;
datalines;
a 11 .
b 30 11
c 23 .
d 32 23
;
run;

Let's do it one by one;
for 'id=1', he has two certs '32','11', and can meet the requirement of three assignments 'a' , 'b','d'. Thus any assignment
counting 'id=1' will be 1/3. now :
a: 1/3
b: 1/3
c: 0
d: 1/3

for 'id=2', he has two certs '32' '23', and can meet the requirement of two assignments 'c' 'd', but three qualified certs,
c:23, d:32 , d:23. thus any assignment counting 'id=2' will be 1/3. now:
a: 1/3
b: 1/3
c: 1/3
d: 1/3+1/3+1/3= 1

after 'id=3', the assignment will be :
a:1/3
b: 1/3
c: 1/3+1/3= 2/3
d: 1+1/3+1/3=1.67

after 'id=4', the assignment will be:
a:1/3
b: 1/3+1= 1.3
c: 2/3
d: 1.67

So is your result the same?


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

分享

扫码加好友,拉您进群