全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 Stata专版
29687 18
2021-09-29
开一个帖子记录做二元变量回归时遇到的问题:
1. logit 对应逻辑分布,probit对应标准正态分布的假设,在估计的时候使用的都是MLE最大似然估计。

2. 最大似然估计有一个特点,就是在不断迭代iteration中,寻找到最大化似然函数的那个估计值

3. 因此,问题常常处在iteration的过程中,一般有两种:第一是出现(not concave),第二是出现(back up)。需要注意的是,只要这两个不出现在最后一行iteration的后面,就都不是事儿,不用管。但是如果出现在最后一行,或者一直在iteration不汇报结果,那么就有问题了。比如下面这样:

Iteration 4441: log pseudolikelihood = -51554.273  (backed up)
Iteration 4442: log pseudolikelihood = -51554.273  (backed up)
Iteration 4443: log pseudolikelihood = -51554.273  (backed up)



4. 根据stata的官方文件,not concave的问题在于:可能是存在自变量的共线性问题;或者数据结构中,最优的点的位置不是concave的,是一个“平台”。

If a “not concave” message appears at the last step, there are two possibilities. One is that the result is valid, but there is collinearity in the model that the command did not otherwise catch. Stata checks for obvious collinearity among the independent variables before performing the maximization, but strange collinearities or near collinearities can sometimes arise between coefficients and ancillary parameters. The second, more likely cause for a “not concave” message at the final step is that the optimizer entered a flat region of the likelihood and prematurely declared convergence




5. 根据stata的官方文件,back up的问题在于:可能是找到了一个完美的point,已经找不到更好的继续iteration的点的(大概率不可能);另外有可能是根据优化路径,路径太坏,不知道下一步向哪里优化。【论坛上面有人认为原因是:数据质量差,或者存在共线性问题,或者存在异常值的问题干扰估计】


If a “backed up” message appears at the last step, there are also two possibilities. One is that Stata found a perfect maximum and could not step to a better point; if this is the case, all is fine, but this is a highly unlikely occurrence. The second is that the optimizer worked itself into a bad concave spot where the computed gradient and Hessian gave a bad direction for stepping.




6. 解决方法:使用gradient 或者difficult 的option;或者我觉得也可以更换technique的option,就是更换做迭代iteration的方法,常用的有nr,bhhh,dfp,bfgs,其中nr是默认选项。


【difficult】 specifies that the likelihood function is likely to be difficult to maximize because of nonconcave regions. When the message “not concave” appears repeatedly, ml’s standard stepping algorithm may not be working well. difficult specifies that a different stepping algorithm be used in nonconcave regions. There is no guarantee that difficult will work better than the default; sometimes it is better and sometimes it is worse. You should use the difficult option only when the default stepper declares convergence and the last iteration is “not concave” or when the default stepper is repeatedly issuing “not concave” messages and producing only tiny improvements in the log likelihood.


【gradient】 adds to the iteration log a display of the current gradient vector.


【technique(algorithm spec)】 specifies how the likelihood function is to be maximized. The following algorithms are allowed. For details, see Gould, Pitblado, and Poi (2010). technique(nr) specifies Stata’s modified Newton–Raphson (NR) algorithm. technique(bhhh) specifies the Berndt–Hall–Hall–Hausman (BHHH) algorithm. technique(dfp) specifies the Davidon–Fletcher–Powell (DFP) algorithm. technique(bfgs) specifies the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm.




加入gradient之后,如果gradient变成0的时候,最终的结果就是可以接受的优化结果,但如果不是0,那么得到的就不是valid的结果,需要对于收敛进行严格定义,比如使用 ltol(0) tol(1e-7) 等。
If the gradient goes to zero, the optimizer has found a maximum that may not be unique but is a maximum. From the standpoint of maximum likelihood estimation, this is a valid result. If the gradient is not zero, it is not a valid result, and you should try tightening up the convergence criterion, or try ltol(0) tol(1e-7) to see if the optimizer can work its way out of the bad region.




使用difficult的option的时候,也要注意,可能会得到更坏的结果。
If you get repeated “not concave” steps with little progress being made at each step, try specifying the difficult option. Sometimes difficult works wonderfully, reducing the number of iterations and producing convergence at a good (that is, concave) point. Other times, difficult works poorly, taking much longer to converge than the default stepper.






7. 最后提醒大家,logit和probit是可以通过iterate(#) 这个option来限制最大迭代次数的,但是一般都不建议使用,因为这样很有可能得到的是无效的结果。




先写到这里,欢迎大家补充。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2021-9-29 22:04:04
stata.com/manuals/rmaximize.pdf#rMaximizeSyntaxalgorithm_spec

附上stata 官网help原文
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2021-9-30 15:20:27
补充:我最后如何解决我的back up 不收敛的问题的

1. 发现是控制变量的数据质量问题!

2. 有一个控制变量是家庭人均支出,是一个偏峰非正态分布。利用log转化成正态分布之后,就解决了不收敛的问题。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2021-9-30 16:15:12
补充一个网上别人解决问题的方法:也是觉得difficult这个option不好用,还是更建议检查自己的模型、变量、数据等是否有问题

Often and often. It is very difficult to tell from this kind of report whether you are trying to fit a model that is a bad idea for your data or the fitting process is just a bit tricky (or both). As you have several predictors, fitting an overcomplicated model really is a possibility, whatever the scientific (or non-scientific, e.g. economic) grounds for wanting to use them all. You could try tuning the -ml- engine by e.g. changing -technique()-. Simplifying the model first and then introducing complications gradually can sometimes isolate problematic predictors.

Nick
n.j.cox@durham.ac.uk
Anna-Leigh Stone
~~~~~~~~~~~~~~
I am using Stata 12.0 and I am attempting to run a fractional probit
regression with the command: glm dependent independent, fa(bin)
link(probit) cluster(gck). I have made sure that the dependent
variable values are not negative. I have 1500 dependent variable
observations out of 69,900 that fall at 0 and 1. Regardless of whether
I leave in the 0 and 1 values or take them out, I get the same log
likelihood iteration followed by backed up. It continues like this and
does not converge until I break it.
Has anyone else had this problem and know a solution to it? I do have
several variables but they are all necessary to my regression. I have
also tried the difficult option but that does not work either.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2021-10-1 07:55:26
fengbjmu 发表于 2021-9-29 22:03
开一个帖子记录做二元变量回归时遇到的问题:
1. logit 对应逻辑分布,probit对应标准正态分布的假设,在估 ...
实在对不起,手机上操作,不熟练,本要点赞不小心点睬而且取消不了。抱歉抱歉
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2021-11-29 08:16:37
fengbjmu 发表于 2021-9-29 22:03
开一个帖子记录做二元变量回归时遇到的问题:
1. logit 对应逻辑分布,probit对应标准正态分布的假设,在估 ...
我做潜类别分析的时候,分两类的话,可以收敛,但是分3类和3类以上,就不收敛了,一直重复一个数(not concave),没有结果。怎么办呀?可以强行设置迭代次数,让它停止吗?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群