Logit是ML估计
而ML估计的优良性状应该是大样本特征的。
J.scott long的书上提到,样本数超过500是安全的。
看Allison,P.D.(1995)Survival analysis using the sas system-a practical guide的 P80
All these approximations get better as the sample size gets larger.
The fact that these desirable properties have only been proven for large
samples does not mean that ML has bad properties for small samples. It simply
means that we usually don't know what the small-sample properties are. And
in the absence of attractive alternatives, researchers routinely use ML
estimation for both large and small samples. Although I won't argue against
that practice, I do urge caution in interpreting p-values and confidence
intervals when samples are small. Despite the temptation to accept larger
p-values as evidence against the null hypothesis in small samples, it is actually
more reasonable to demand smaller values to compensate for the fact that the
approximation to the normal or chi-square distributions may be poor.
The other reason for ML's popularity is that it is often
straightforward to derive ML estimators when there are no other obvious
possibilities. As we will see, one case that ML handles nicely is data with
censored observations. While you can use least squares with certain
adjustments for censoring (Lawless 1982, p. 328), such estimates often have
much larger standard errors, and there is little available theory to justify the
construction of hypothesis tests or confidence intervals.
The basic principle of ML is to choose as estimates those values
that will maximize the probability of observing what we have, in fact,
observed. There are two steps to this: (1) write down an expression for the
probability of the data as a function of the unknown parameters, and (2) find
the values of the unknown parameters that make the value of this expression
as large as possible.
The first step is known as constructing the likelihood function. To
accomplish this, you must specify a model, which amounts to choosing a
probability distribution for the dependent variable and choosing a functional
form that relates the parameters of this distribution to the values of the
covariates. We have already considered those two choices.
The second step—maximization—typically requires an iterative
numerical method, that is, one involving successive approximations. Such
methods are often computationally demanding, which explains why ML
estimation has become popular only in the last two decades.