各位大牛,
我在其他论坛上问了一下问题,暂时没有人回复,只好把问题搬到这里再问问,希望遇到牛人。在线急等,谢谢各位。
I'm hoping there was someone that could help me out of this problem when running the zero-inflated negative binomial model with stata. In this model, I'm trying to 1) predict the probability of time on care-giving > 0, and 2) the total amount of time on care-giving if greater than zero (hours/week). It went well when there was only one independent variable added. But it took forever when I started to add more variables there. Below is my code for the survey data that I'm using. Please see results in the attachment.
*unadjusted model
program trysimple
args ylist subset
svyset w1varunit [pweight=w1anfinwgt0], strata(w1varstrat) singleunit(centered)
svy, subpop(`subset' if `subset' < 2): zinb `ylist' ib0.aaWhite, inflate (ib0.aaWhite)
margins, subpop(`subset' if `subset' < 2) at(aaWhite=(0 1)) vce(unconditional) post
test 1._==2._
end program
trysimple time raceEthStrokeSubset
trysimple TOTnumhrswk1 raceEthStrokeSubset
* adjusted for soicaldemographic, comorbidity and physical capacity
program tryadjusted
args ylist subset
svyset w1varunit [pweight=w1anfinwgt0], strata(w1varstrat) singleunit(centered)
svy, subpop(`subset' if `subset' < 2): zinb `ylist' ib0.aaWhite i.ageCat i.gender i.educ3 i.married i.meanIncome5 ///
mi2 cad2 htn2 dm2 cancer2 dementia2 osteoporosis2 athritis2 i.phq2Positive i.gad2Positive c.capacityIndex , ///
inflate (ib0.aaWhite i.ageCat i.gender i.educ3 i.married i.meanIncome5 ///
mi2 cad2 htn2 dm2 cancer2 dementia2 osteoporosis2 athritis2 i.phq2Positive i.gad2Positive c.capacityIndex)
margins, subpop(`subset' if `subset' < 2) at(aaWhite=(0 1)) vce(unconditional) post
test 1._==2._
end program
tryadjusted time raceEthStrokeSubset
tryadjusted TOTnumhrswk1 raceEthStrokeSubset
Here, I use two different outcome variables with the model: TOTnumhrswk and time ( int(TOTnumhrswk)) since it is a count model, i guess it would only fit for count? The first unadjusted model worked while the adjusted one didn't. Stata kept running for a while without giving any result. My questions are:
1. The maximum of variable 'time' is over 500. Is that the reason for stata taking so long to get the result? Do i need to consider some other model? What the alternatives would be?
2. If not, do i need to have different variable list for the inflate part? Let's say, at least one different x variable there?
3. Variables like mi2, cad2, cancer2 are indicator of disease but with some missing data. They are supposed to be binary. Would that be a potential problem for running the estimate?