全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
2073 2
2015-07-01
悬赏 30 个论坛币 未解决
请各位高手能否帮忙解答一下关于logistic modeling的问题:

1. 建立数据

data credit_risk;

  do cus_id=1 to1000;

    os=round(1000*ranuni(_n_),1);

   ut=round(min(2,max(0,rannor(99)+1)),0.01);

   tr_num_3m=round(31*ranuni(5),1);

   tr_num_1m=max(round(tr_num_3m*0.31+2.6*rannor(9),1),0);

    if ranuni(2)>0.7 then overdu_num=round(5*ranuni(8),1);

    else overdu_num=0;

   sc=overdu_num*(-1.26)+os/78+tr_num_3m/13+tr_num_1m/2.25+0.5+2*rannor(7);

    if ranuni(32)>0.15 then target=(exp(sc)/(exp(sc)+1)>0.93);

    else target=0;

    output;

  end;

  drop sc;

  run;


2. 建模 - logistic model based on independent variables (except target and cus_id)

    proc logistic data=credit_riskdescending;

     model target=os ut tr_num_3m tr_num_1m overdu_num /stb;

    run;


问题如下:


问题1:How much is the concordance andexplain it - 如何解释这个"Concordance"?


问题2: Score each customers according to themodel - 这个问题我也不明白,是要对每一个用户算probability么?


问题3: What is the average target rate forthe top 10% group (worst customer or high risk customer group)?

问题4: What are the gaps between actualtarget rate and predicted probability in the top 10% group?

问题5: Explain the odds ratio for eachfactor and negative/positive impact for each variable?


问题6: Is there any correlation amongindependent variables? How to deal with the higher correlation?


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2015-7-1 12:40:19
看看proc logistic的帮助基本上就能解决了
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-7-1 13:40:18
我也是初学者~ 一己之见,不能保证正确啦
1.
concordance 就是c 吧  
摘一段,不知道你能看明白不,就是把观测到的each case两两配对,只看1,0或者0,1这样的情况
然后看1的predicted value是不是比0的predicted value高,如果是的话,就说pair concordant
这样看看这样的pair concordant占总数的多少就可以得到Percent Concordant
For the 147 observations in the sample, there are 147(146)/2 =10731 different
ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1's on the
dependent variable or both 0's. We ignore these, leaving 4850 pairs in which one case has a 1 and the other case has a 0.
For each pair, we ask the question, "Does the case with a 1 have a higher predicted value (based on the model) than the
case with a 0?" If the answer is yes, we call that pair concordant. If no, the pair is discordant. If the two cases have the
same predicted value, we call it a tie.

2.
这个我也不是很清楚,你可以发邮件问问confirm下,不行就把probability和predicted target value都算了呗
不过感觉这个model goodness of fit不是很好,你再看看呢
复制代码

3.不知道这个group是按什么划分的,也不知道这里target rate指的是predicted target rate吗

4.
复制代码


5.odds ratio的话 比方说os的系数是0.00258,odds ratio为exp(0.00258)=1.026 >1 那么是对targe=1为正影响
interpreation: the estimated odds of target increase by 2.6% with one unit increase by os

6.tr_num_3m tr_num_1m之间有比较强的共线性
解决方法的话,粗暴点的话是直接删掉一个Wald Chi-Square小的,这里是tr_num_3m
你们应该也提到其他的解决共线性的方法吧~
复制代码

有意思的是,照之前一些书上的方法用proc reg来看
复制代码

其实这两个变量的vif也不算大,不过可能是我哪里想错了

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群