存在两个数据集,如何对一个数据集加weighting,来达到和第一个数据集的变量分布一致。具体例子,如下;数据集A:第一列panelist,第二列GROUPS,第三列INCOME。然后一共100条记录
| panelist | Groups | income |
| 10001602 | ADULT FAMIL | INCOME 3001-5000 RMB |
| 10001603 | ADULT FAMIL | INCOME > 5000 RMB |
| 10002003 | YOUNG FAMILY | INCOME > 5000 RMB |
| 10002101 | ADULT FAMIL | INCOME > 5000 RMB |
| 10002103 | ADULT FAMIL | INCOME > 5000 RMB |
| 10003305 | ADULT FAMIL | INCOME < 3000 RMB |
| 10004503 | OLDER FAMILY | INCOME > 5000 RMB |
| 10004907 | YOUNG FAMILY | INCOME > 5000 RMB |
| 10007606 | ADULT FAMIL | INCOME > 5000 RMB |
| 10007610 | ADULT FAMIL | INCOME 3001-5000 RMB |
| 10007704 | YOUNG FAMILY | INCOME > 5000 RMB |
| 10008118 | ADULT FAMIL | INCOME > 5000 RMB |
| 10008306 | YOUNG FAMILY | INCOME > 5000 RMB |
| 10008404 | OLD SINGLE/COUPLE | INCOME 3001-5000 RMB |
| 10008408 | OLDER FAMILY | INCOME 3001-5000 RMB |
| 10008412 | ADULT FAMIL | INCOME > 5000 RMB |
数据集B也是一样的结构,不同的是,panelist不同。一共80条记录。我现在需要计算出数据集B的每个panelist的一个weight.比如
| panelist | Groups | income | weight |
| 10008716 | ADULT FAMIL | INCOME < 3000 RMB | 1.2 |
| 10009106 | ADULT FAMIL | INCOME > 5000 RMB | 1.3 |
| 10009111 | OLDER FAMILY | INCOME > 5000 RMB | 0.8 |
| 10009112 | YOUNG FAMILY | INCOME 3001-5000 RMB | 2.1 |
| 10009604 | ADULT FAMIL | INCOME > 5000 RMB | 1.5 |
| 10009808 | OLDER FAMILY | INCOME 3001-5000 RMB | 0.3 |
| 10010709 | ADULT FAMIL | INCOME > 5000 RMB | 2.3 |
| 10011009 | YOUNG FAMILY | INCOME > 5000 RMB | 0.1 |
| 10011204 | OLD SINGLE/COUPLE | INCOME < 3000 RMB | 1.5 |
| 10011206 | ADULT FAMIL | INCOME 3001-5000 RMB | 1.2 |
计算weight的目的是让我
proc freq;
tables groups;
tables income;
run;
出来的结果两个数据集接近,类似。问题就是如何计算这个weight呢?????
跪等高手啊~~~~我有个基本思路就是用每个var里面每个值除以他所在值的总和,然后经过几次迭代之类的。最后模拟出一个最后的freq比较接近的。
求高手啊!!!!!!!!!!!!!!!