RIDGE REGRESSION ESTIMATION OF THE LINEAR PROBABILITY MODEL  ByRAJARAM GANA 
SUMMARY This paper proposes using ridge regression to resolve the problem of getting the least-squares-estimated values of the 0-1 dummy regress and in the linear probability model to lie between 0 and 1. 
1 Introduction 
The linear probability model (LPM) is Y= Xb + u but, where Y only takes the values 0 and 1. It is usually estimated by logit or probit analysis. Ordinary least-squares (OLS) estimation of Y is problematic. However, for large samples, the non-normality of u does not matter and its heteroskedasticity can be resolved using weighted least squares (WLS). The remaining problem is that the least-squares-estimated values of Y do not necessarily lie in the range 0-1. 
This paper indicates, by example, that ridge regression (Hoerl & Kennard, 1970; Maddala, 1992) may resolve the problem of getting the estimated values of Y to lie between 0 and 1. Here, b is estimated by adding a constant k to each of the variances in the variance-covariance matrix X of the regressors, after they have been standardized to have zero means and unit variances. The problem is that the selection of k is arbitrary. This estimation shortens the length of b but is biased. Shortening the length of the least-squares-estimated coefficient vector may not be unreasonable, because Brook and Moore (1980) show it is much too long on average. 
2 An example 
The data set of Spector and Mazzeo (1980) is presented as an end-of-chapter exercise by Gujarati (1988), and is re-examined here. OLS estimates of the regressand, i.e. Y (GRADE), produce a total sum of squares (TSS) of 7.219 and a residual sum of squares (RSS) of 4.216. The resultant OLS-estimated equation and standard error (SE) values are 
Y = -1.4980 + 0.4639 GPA + 0.3786 PSI + 0.0105 TUCE
     (0.5239) (0.1620)     (0.1392)     (0.0195)
No predictions exceed unity. Five of the 32 predictions given by this equation are negative. The predictions which correspond to observations 1, 4, 7, 11 and 16 are -0.0543, -0.0176, -0.0394, -0.0682 and -0.0277 respectively. Gujarati assumed that these five values were all nearly zero, set them equal to 0.001 and re-estimated the LPM using the WLS method. The resultant values of the weighted TSS and RSS are 95.637 and 22.739, respectively, and the WLS-estimated equation and SE values are 
Y= -1.3087 + 0.3982 GPA + 0.3878 PSI + 0.0122 TUCE
   (0.2885) (0.0878)     (0.1052)     (0.0045)
No predictions exceed unity. Four predictions given by this equation are negative. 
This paper demonstrates first that some shrinkage of the coefficients in the OLS-estimated equation for Y will make all 32 estimated Yi values lie between 0 and 1. The smallest value of k that achieves this is selected. Next, the resultant ridge-estimated Yi values are used to compute the residual variances Yi(1- Yi). Finally, these residual variances are used to re-estimate the LPM, using the usual WLS method to take account of the heteroskedasticity. Results indicate that this procedure may improve the estimation properties of the LPM without creating unreasonable problems. 
When k = 0.2, the ridge estimate of Y11, i.e. the only estimate outside the range 0-1, is - 5.73 x 10-3 and the RSS is 4.287. When k = 0.3, all the ridge estimates of Y are in the range 0-1 and the RSS is 4.350. The classical bisection method is used to determine the smallest value of k (0.2 < k < 0.3) for which all the ridge estimates of Y are in the range 0-1. When k =0.222402 588 580, the ridge estimate of Y11, i.e. the only estimate outside the range 0-1, is -9.3x10-16. When k =0.222 402 588 581, all the ridge estimates of Y are in the rnage 0-1 and the ridge estimate of Y11 becomes + 2.504 x 10-13. At this level of precision, bisection is stopped. The predictions that correspond to observations 1, 4, 7 and 16 are 0.0113, 0.0072, 0.0107 and 0.0286 respectively. The resultant RSS is 4.300--a value not unreasonably distant from the minimum RSS of 4.216--with 28.1248 degrees of freedom (see Hoerl & Kennard, 1990). Hence, the resultant mean square error, used to calculate the SE is 0.1529. The resultant ridge-regression-estimated equation and SE values are 
Y= - 1.2432 + 0.3759 GPA + 0.3107 PSI + 0.0127 TUCE
    (0.3714) (0.1404)     (0.1363)     (0.0185)
The residual variance is estimated using the ridge-regression-estimated equation for Y and the LPM is re-estimated using the WLS method. The resultant weighted TSS and RSS values are 55.213 and 20.361, respectively, and the WLS-estimated equation and SE values are 
Y= - 1.2546 + 0.3772 GPA + 0.3608 PSI + 0.0131 TUCE
    (0.3390) (0.1025)     (0.1148)     (0.0079)
The WLS estimate of Y11, i.e. the only estimate outside the range 0-1, is - 2.44 x 10-12 (which is very nearly zero). 
It is hoped that this paper will encourage applied researchers to consider the LPM as an option, so providing further empirical evidence on the usefulness of ridge regression estimation of the LPM. 
Acknowledgements 
It is a pleasure to acknowledge that I have profited from the many conversations that I have had with Arthur E. Hoerl. I thank the anonymous referee for suggesting how to focus my presentation. 
A version of this paper was presented at the Decision Sciences Institute meeting in 1994. 
REFERENCES 
BROOK, R. J. & MOORE, T. (1980) On the expected length of the least squares coefficient vector, Journal of Econometrics, 12, pp. 245-246. 
GUJARATI, D. N. (1988) Basic Econometrics (New York, McGraw-Hill). 
HOERL, A. E. & KENNARD, R. W. (1970) Ridge regression: biased estimation of nonorthogonal problems, Technometrics, 12, pp. 56-67. 
HOERL, A. E. & KENNARD, R. W. (1990) Ridge regression: degrees of freedom in the analysis of variance, Communications in Statistics, 19, pp. 1485-1495. 
MADDALA, G. S. (1992) Introduction to Econometrics (New York, Macmillan). 
SPECTOR, L. C. & MAZZEO, M. (1980) Probit analysis and economic education, Journal of Economic Education, 11, pp. 37-44.