全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件
12417 12
2007-07-26
有人做过否?请简要介绍怎么做?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2007-7-27 03:49:00

1. Introduction

An event count is the realization of a nonnegative integer-valued random variable (Cameron and Trivedi 1998). Examples are the number of car accidents per month, thunder storms per year, and wild fires per year. The ordinary least squares (OLS) method for event count data results in biased, inefficient, and inconsistent estimates (Long 1997). Thus, researchers have developed various nonlinear models that are based on the Poisson distribution and negative binomial distribution.

1.1 Count Data Regression Models

The left-hand side (LHS) of the equation has event count data. Independent variables are, as in the OLS, located at the right-hand side (RHS). These RHS variables may be interval, ratio, or binary (dummy). Table 1 below summarizes the categorical dependent variable regression models (CDVMs) according to the level of measurement of the dependent variable.

Table 1. Comparison between OLS and CDVMs

Model Dependent (LHS) Method Independent (RHS)
OLS Ordinary least squares Interval or ratio scale Moment based method A linear function of interval/ratio or binary independent variables
CDVMs Binary response Binary (0 or 1) Maximum Likelihood Method
Ordinal response Ordinal (1st, 2nd, ...)
Nominal response Nominal (A, B, ...)
Event count data Count (0, 1, 2, ...)

The Poisson regression model (PRM) and negative binomial regression model (NBRM) are basic models for count data analysis. Either the zero-inflated Poisson (ZIP) or the zero-inflated negative binomial regression model (ZINB) is used when there are many zero counts. Other count models are developed to handle censored, truncated, or sample selected count data. This document, however, focuses on PRM, NBRM, ZIP, and ZINB.

1.2 Poisson Models versus Negative Binomial Models

The Poisson probability distribution, , has the same mean and variance (equidispersion), Var(y)=E(y)=mu. As the mean of a Poisson distribution increases, the probability of zeros decreases and the distribution approximates a normal distribution (Figure 1). The Poisson distribution also has the strong assumption that events are independent. Thus, this distribution does not fit well if differs across observations (heterogeneity) (Long 1997).

The Poisson regression model (PRM) incorporates observed heterogeneity into the Poisson distribution function, Var(y|x)=E(y|x)=mu=exp(xb). As mu increases, the conditional variance of y increases, the proportion of predicted zeros decreases, and the distribution around the expected value becomes approximately normal (Long 1997). The conditional mean of the errors is zero, but the variance of the errors is a function of independent variables, var(y|x)=exp(xb). The errors are heteroscedastic. Thus, the PRM rarely fits in practice due to overdispersion (Long 1997; Maddala 1983).

Figure 1. Poisson Probability Distribution with Means of .5, 1, 2, and 5

The negative binomial probability distribution is , where 1/v=alpha determines the degree of dispersion and the Gamma is the Gamma probability distribution. As the dispersion parameter alpha increases, the variance of the negative binomial distribution also increases, Var(y|x)=mu(1+mu/v).

The negative binomial regression model (NBRM) incorporates observed and unobserved heterogeneity into the conditional mean, mu=exp(xb+e) (Long 1997). Thus, the conditional variance of y becomes larger than its conditional mean, E(y|x)=mu, which remains unchanged. Figure 2 illustrates how the probabilities for small and larger counts increase in the negative binomial distribution as the conditional variance of y increases, given mu=2.

Figure 2. Negative Binomial Probability Distribution with Alpha of .01, .5, 1, and 5

The PRM and NBRM, however, have the same mean structure. If , the NBRM reduces to the PRM (Cameron and Trivedi 1998; Long 1997).

1.3 Overdispersion

When Var(y|x) > E(y|x), we are said to have overdispersion. Estimates of a PRM for overdispersed data are unbiased, but inefficient with standard errors biased downward (Cameron and Trivedi 1998; Long 1997). The likelihood ratio test for overdispersion examines the null hypothesis of alpha=0. The LR statistic follows the Chi-squared distribution with one degree of freedom. If the null hypothesis is rejected, NBRM is preferred to PRM.

Zero-inflated models handle overdispersion by changing the mean structure to explicitly model the production of zero counts (Long 1997). These models assume two latent groups. One is the always-zero group and the other is not-always-zero or sometime-zero group (Long 1997). Thus, zero counts come from the former group and some of the latter group with a certain probability.

The likelihood ratio tests the null hypothesis of alpha=0 to compare the ZIP and NBRM. The PRM and ZIP, and NBRM and ZINB cannot, however, be tested by this likelihood ratio, since they are not nested respectively. The Voung’s statistic compares these non-nested models. If V is greater than 1.96, the ZIP or ZINB is favored. If V is less than -1.96, the PRM or NBRM is preferred (Long 1997).

1.4 Estimation in SAS, STATA, and LIMDEP

The SAS GENMOD estimates Poisson and negative binomial regression models. STATA has individual commands (e.g., .poisson and .nbreg) for the corresponding count data models. LIMDEP has Poisson$ and Negbin$ commands to estimate various count data models including zero-inflated and zero-truncated models. Table 2 summarizes the procedures and commands for count data regression models.

Table 2. Comparison of the Procedures and Commands for Count Data Models

Model SAS 9.1 STATA 9.0 SE LIMDEP 8.0
Poisson Regression (PRM) GENMOD .poission Poisson$
Negative Binomial Regression (NBRM) GENMOD .nbreg Negbin$
Zero-infliated Poisson (ZIP) - .zip Poisson; Zip; Rh2$
Zero-Inflacted Negative Binomial (ZINB) - .zinb Negbin; Zip; Rh2$
Zero-truncated Poisson (ZTP) - .ztp Poisson; Truncation$
Zero-truncated Negative Binomial (ZTNB) - .ztnb Negbin; Truncation$

The example here examines how waste quotas (emps) and the strictness of policy implementation (strict) affect the frequency of waste spill accidents of plants (accident).

1.5 Long and Freese's SPost Module

STATA users may take advantages of user-written modules such as SPost written by J. Scott Long and Jeremy Freese. The module allows researchers to conduct follow-up analyses of various CDVMs including event count data models. See 2.3 for examples of major SPost commands.

In order to install SPost, execute the following commands consecutively. For more details, visit J. Scott Long’s Web site at http://www.indiana.edu/~jslsoc/spost_install.htm.

. net from http://www.indiana.edu/~jslsoc/stata/

. net install spost9_ado, replace

. net get spost9_do, replace

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-7-27 03:51:00
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-7-27 03:53:00

Negative Binomial; Testing For Overdispersion in Poisson regression

http://www.uky.edu/ComputingCenter/SSTARS/P_NB_3.htm

[此贴子已经被作者于2007-7-27 3:56:05编辑过]

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-7-27 09:21:00
用stata做最好
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-4-10 22:30:11
stata 比较简洁快速
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群