How can I compute the Chow test statistic

蓝色

60312

收藏 2008-01-02

http://www.stata.com/support/faqs/stat/chow.html

How can I compute the Chow test statistic?

Title		Computing the Chow statistic
Author		William Gould, StataCorp
Date		January 1999; minor revisions July 2005

You can include the dummy variables in a regression of the full model and then use the test command on those dummies. You could also run each of the models and then write down the appropriate numbers and calculate the statistic by hand—you also have access to functions to get appropriate p-values.

Here is a longer answer:

Let’s start with the Chow test to which many refer. Consider the model

 y = a + b*x1 + c*x2 + u

and say that we have two groups of data. We could estimate that model on the two groups separately:

 y = a1 + b1*x1 + c1*x2 + u for group == 1 y = a2 + b2*x1 + c2*x2 + u for group == 2

and we could estimate a single, pooled regression

 y = a + b*x1 + c*x2 + u for both groups

In the last regression, we are asserting that a1==a2, b1==b2, and c1==c2. The formula for the “Chow test” of this constraint is

 ess_c - (ess_1+ess_2) --------------------- k --------------------------------- ess_1 + ess_2 --------------- N_1 + N_2 - 2*k

and this is the formula to which people refer. ess_1 and ess_2 are the error sum of squares from the separate regressions, ess_c is the error sum of squares from the pooled (constrained) regression, k is the number or estimated parameters (k=3 in our case), and N_1 and N_2 are the number of observations in the two groups.

The resulting test statistic is distributed F(k, N_1+N_2-2*k).

Let’s try this. I have created small datasets:

 clear set obs 100 set seed 1234 generate x1 = uniform() generate x2 = uniform() generate y = 4*x1 - 2*x2 + 2*invnormal(uniform()) generate group = 1 save one, replace clear set obs 80 generate x1 = uniform() generate x2 = uniform() generate y = -2*x1 + 3*x2 + 8*invnormal(uniform()) generate group = 2 save two, replace use one, clear append using two save combined, replace

The models are different in the two groups, the residual variances are different, and so are the number of observations. With this dataset, I can carry forth the Chow test. First, I run the separate regressions:

 . regress y x1 x2 if group==1 Source | SS df MS Number of obs = 100 ---------+------------------------------ F( 2, 97) = 36.10 Model | 328.686307 2 164.343154 Prob > F = 0.0000 Residual | 441.589627 97 4.55247038 R-squared = 0.4267 ---------+------------------------------ Adj R-squared = 0.4149 Total | 770.275934 99 7.78056499 Root MSE = 2.1337 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | 5.121087 .728493 7.03 0.000 3.67523 6.566944 x2 | -3.227026 .7388209 -4.37 0.000 -4.693381 -1.760671 _cons | -.1725655 .5698273 -0.30 0.763 -1.303515 .9583839 ------------------------------------------------------------------------------ . regress y x1 x2 if group==2 Source | SS df MS Number of obs = 80 ---------+------------------------------ F( 2, 77) = 5.02 Model | 544.11726 2 272.05863 Prob > F = 0.0089 Residual | 4169.24211 77 54.1460014 R-squared = 0.1154 ---------+------------------------------ Adj R-squared = 0.0925 Total | 4713.35937 79 59.6627768 Root MSE = 7.3584 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | -1.21464 2.9578 -0.41 0.682 -7.104372 4.675092 x2 | 8.49714 2.688249 3.16 0.002 3.144152 13.85013 _cons | -2.2591 1.91076 -1.18 0.241 -6.06391 1.545709 ------------------------------------------------------------------------------

and then I run the combined regression:

 . regress y x1 x2  Source | SS df MS Number of obs = 180 ---------+------------------------------ F( 2, 177) = 2.93 Model | 176.150454 2 88.0752272 Prob > F = 0.0559 Residual | 5316.21341 177 30.035104 R-squared = 0.0321 ---------+------------------------------ Adj R-squared = 0.0211 Total | 5492.36386 179 30.683597 Root MSE = 5.4804 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | 2.692373 1.41842 1.90 0.059 -.1068176 5.491563 x2 | 2.061004 1.370448 1.50 0.134 -.6435156 4.765524 _cons | -1.380331 1.017322 -1.36 0.177 -3.387973 .62731 ------------------------------------------------------------------------------

For the Chow test,

 ess_c - (ess_1+ess_2) --------------------- k --------------------------------- ess_1 + ess_2 --------------- N_1 + N_2 - 2*k

here are the relevant numbers copied from the output above:

 ess_c = 5316.21341 (from combined regression) ess_1 = 441.589627 (from group==1 regression) ess_2 = 4169.24211 (from group==2 regression) k = 3 (we estimate 3 parameters) N_1 = 100 (from group==1 regression) N_2 = 80 (from group==2 regression)

So, plugging in, we get

 5316.21341 - (441.589628+4169.24211) 705.38167 ------------------------------------ --------- 3 3 ----------------------------------------- = --------------- 441.589628 + 4169.24211 4610.8317 ----------------------- --------- 100+80-2*3 174 235.12722 = ---------- 26.499033 = 8.8730491

The Chow test is F(k,N_1+N_2-2*k) = F(3,174), so our test statistic is F(3,174) = 8.8730491.

Now, I will do the same problem by running one regression and using test to test certain coefficients equal to zero. What I want to do is estimate the model

 y = a3 + b3*x1 + c3*x2 + a3'*g2 + b3'*g2*x1 + c3'*g2*x2 + u

where g2=1 if group==2 and g2=0 otherwise. I can do this by typing

 . generate g2 = (group==2) . generate g2x1 = g2*x1 . generate g2x2 = g2*x2 . regress y x1 x2 g2 g2x1 g2x2

Think about the predictions from this model. The model says

 y = a3 + b3*x1 + c3*x2 + u when g2==0 y = (a3+a3') + (b3+b3')*x1 + (c3+c3')*x2 + u when g2==1

Thus the model is equivalent to estimating the separate models

 y = a1 + b1*x1 + c1*x2 + u for group == 1 y = a2 + b2*x1 + c2*x2 + u for group == 2

the relationship being

 a1 = a3 a2 = a3 + a3' b1 = b3 b2 = b3 + b3' c1 = c3 c2 = c3 + c3'

Some of you may be concerned that in the pooled model (the one estimating a3, b3, etc.), we are constraining the var(u) to be the same for each group, whereas, in the separate-equation model, we estimate different variances for group 1 and group 2. This does not matter, because the model is fully interacted. That is probably not convincing, but what should be convincing is that I am about to obtain the same F(3,174) = 8.87 answer and, in my concocted data, I have different variances in each group.

So, here is the result of the alternative test coeffiecients against 0 in a pooled specification:

 . generate g2 = (group==2) . generate g2x1 = g2*x1 . generate g2x2 = g2*x2 . regress y x1 x2 g2 g2x1 g2x2 Source | SS df MS Number of obs = 180 ---------+------------------------------ F( 5, 174) = 6.65 Model | 881.532123 5 176.306425 Prob > F = 0.0000 Residual | 4610.83174 174 26.499033 R-squared = 0.1605 ---------+------------------------------ Adj R-squared = 0.1364 Total | 5492.36386 179 30.683597 Root MSE = 5.1477 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x1 | 5.121087 1.757587 2.91 0.004 1.652152 8.590021 x2 | -3.227026 1.782504 -1.81 0.072 -6.745139 .2910877 g2 | -2.086535 1.917507 -1.09 0.278 -5.871102 1.698032 g2x1 | -6.335727 2.714897 -2.33 0.021 -11.6941 -.9773583 g2x2 | 11.72417 2.59115 4.52 0.000 6.610035 16.8383 _cons | -.1725655 1.374785 -0.13 0.900 -2.885966 2.540835 ------------------------------------------------------------------------------ . test g2 g2x1 g2x2 ( 1) g2 = 0 ( 2) g2x1 = 0 ( 3) g2x2 = 0 F( 3, 174) = 8.87 Prob > F = 0.0000

Same answer.

This definition of the “Chow test” is equivalent to pooling the data, estimating the fully interacted model, and then testing the group 2 coefficients against 0.

That is why I said, “Chow Test is a term I have heard used by economists in the context of testing a set of regression coefficients being equal to 0.”

Admittedly, that leaves a lot unsaid.

The issue of the variance of u being equal in the two groups is subtle, but I do not want that to get in the way of understanding that the Chow test is equivalent to the “pool the data, interact, and test” procedure. They are equivalent.

Concerning variances, the Chow test itself is testing against a pooled, uninteracted model and so has buried in it an assumption of equal variances. It is really a test that the coefficients are equal and variance(u) in the groups are equal. It is, however, a weak test of the equality of variances because that assumption manifests itself only in how the pooled coefficient estimates are manufactured. Since the Chow test and the “pool the data, interact, and test” procedure are the same, the same is true of both procedures.

Your second concern might be that in the “pool the data, interact, and test” procedure there is an extra assumption of equality of variances because everything comes from the pooled model. As shown, that is not true. It is not true because the model is fully interacted and so the assumption of equal variances never makes a difference in the calculation of the coefficients.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

蓝色

2008-1-2 18:05:00

http://www.stata.com/support/faqs/stat/chow2.html

How can I do a Chow test with the robust variance estimates, that is, after estimating with regress, vce(robust)?

Title		Chow and Wald tests
Author		William Gould, StataCorp
Date		July 1999; minor revision August 2007

First, see the FAQ How can I compute a Chow test statistic?. The point of that FAQ is that you can do Chow tests using Stata’s test command and, in fact, Chow tests are what the test command reports.

Well, that’s not exactly right. test uses the estimated variance–covariance matrix of the estimators, and test performs Wald tests,

 W = (Rb-r)'(RVR')^-1 (Rb-r)

where V is the estimated variance–covariance matrix of the estimators.

For linear regression with the conventionally estimated V, the Wald test is the Chow test and vice versa.

You might say that you are performing a Chow test, but I say that you are performing a Wald test. That distinction is important, because the Wald test generalizes to different variance estimates of V, whereas the Chow test does not. After regress, vce(robust), for instance, test uses the V matrix estimated by the robust method because that is what regress, vce(robust) left behind.

Thus the short answer is that you estimate your model using regress, vce(robust) and then use Stata’s test command. You then call the result a Wald test.

If you are bothered that a Wald test produces F rather than chi-squared statistics, also see the FAQ Why does test sometimes produce chi-squared and other times F statistics?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2008-1-2 18:06:00

http://www.stata.com/support/faqs/stat/chow3.html

Can you explain Chow tests?

Title		Chow tests
Author		William Gould, StataCorp
Date		January 2002; updated August 2005

Privately I was asked yet another question on Chow tests. The question started out “Is a Chow test the correct test to determine whether data can be pooled together?” and went on from there. Like many on the list, I am tired of seeing and answering questions on Chow tests. I do not blame the questioner for asking; I blame their teachers for confusing them with what is, these days, unnecessary jargon.

In the past, I have always given in and cast my answer in Chow-test terms. In this reply, I try a different approach and, I think, the result is more useful.

This reply concerns linear regression (though the technique is really more general than that), and I gloss over the detail of pooling the residuals and whether the residual variances are really the same. For the last, I think I can be forgiven.

Here is what I wrote:

Is a Chow test the correct test to determine whether data can be pooled together?

A Chow test is simply a test of whether the coefficients estimated over one group of the data are equal to the coefficients estimated over another, and you would be better off to forget the word Chow and remember that definition.

History: In the days when statistical packages were not as sophisticated as they are now, testing whether coefficients were equal was not so easy. You had to write your own program, typically in FORTRAN. Chow showed a way you could perform a Wald test based on statistics that were commonly reported, and that would produce the same result as if you performed the Wald test.

What does it mean “whether data can be pooled together”? Do you often meet nonprofessionals who say to you, “I was wondering whether the data could be pooled?” Forget that phrase, too: it is another piece of jargon for testing whether the behavior is the same, as measured by whether the coefficients are the same.

Let’s pretend that you have some model and two or more groups of data. Your model predicts something about the behavior within the group based on certain characteristics that vary within the group. Under the assumption that each group's behavior is unique, you have

 y_1 = X_1*b_1 + u_1 (equation for group 1) y_2 = X_2*b_2 + u_2 (equation for group 2)

and so on. Now, you want to test whether the behavior for one group is the same as for another, which means you want to test

 b_1 = b_2 = ...

How do you do that? Testing coefficients across separately estimated models is difficult to impossible, depending on things we need not go into right now. A trick is to “pool” the data to convert the multiple equations into one giant equation:

 y = d1*(X_1*b1 + u1) + d2*(X_2*b2 + u2) + ...

where y is the set of all outcomes (y_1, y_2, ...), and d1 is a variable that is 1 when the data are for group 1 and 0 otherwise, d2 is 1 when the data are for group 2 and 0 otherwise, ....

Notice that from the above I can retrieve the original equations. Setting d1=1 and d2=d3=...=0, I get the equation for group 1; setting d1=0 and d2=1 and d3=...=0, I get the equation for group 2; and so on.

Now, let’s start with

 y = d1*(X_1*b1 + u1) + d2*(X_2*b2 + u2) + ...

and rewrite it by a little algebraic manipulation:

 y = d1*(X_1*b1 + u1) + d2*(X_2*b2 + u2) + ... = d1*X_1*b1 + d1*u2 + d2*X_2*b2 + d2*u2 + ... = d1*X_1*b1 + d2*X_2*b2 + ... + d1*u1 + d2*u2 + ... = X_1*d1*b1 + X_2*d2*b2 + ... + d1*u1 + d2*u2 + ... = (X_1*d1)*b1 + (X_2*d2)*b2 + ... + d1*u1 + d2*u2 + ...

By stacking the data, I can get back estimates of b1, b2, ...

I include not X_1 in my model, but X_1*d1 (a set of variables equal to X_1 when group is 1 and 0 otherwise); I include not X_2 in my model, but X_2*d2 (a set of variables equal to X_2 when group is 2 and 0 otherwise); and so on.

Let’s use the auto dataset and pretend that I have two groups.

 . sysuse auto,clear . generate group1=rep78==3 . generate group2=group1==0

I could fit the models separately:

 . regress price mpg weight if group1==1 Source | SS df MS Number of obs = 30 -------------+------------------------------ F( 2, 27) = 16.20 Model | 196545318 2 98272658.8 Prob > F = 0.0000 Residual | 163826398 27 6067644.36 R-squared = 0.5454 -------------+------------------------------ Adj R-squared = 0.5117 Total | 360371715 29 12426610.9 Root MSE = 2463.3 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 13.14912 184.5661 0.07 0.944 -365.5492 391.8474 weight | 3.517687 1.015855 3.46 0.002 1.433324 5.60205 _cons | -5431.147 6599.898 -0.82 0.418 -18973.02 8110.725 ------------------------------------------------------------------------------ . regress price mpg weight if group2==1 Source | SS df MS Number of obs = 44 -------------+------------------------------ F( 2, 41) = 5.16 Model | 54562909.6 2 27281454.8 Prob > F = 0.0100 Residual | 216614915 41 5283290.61 R-squared = 0.2012 -------------+------------------------------ Adj R-squared = 0.1622 Total | 271177825 43 6306461.04 Root MSE = 2298.5 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -170.5474 93.3656 -1.83 0.075 -359.103 18.0083 weight | .0527381 .8064713 0.07 0.948 -1.575964 1.68144 _cons | 9685.028 4190.693 2.31 0.026 1221.752 18148.3 ------------------------------------------------------------------------------

I could fit the combined model:

 . generate mpg1=mpg*group1 . generate weight1=weight*group1 . generate mpg2=mpg*group2 . generate weight2=weight*group2 . regress price group1 mpg1 weight1 group2 mpg2 weight2, noconstant  Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 6, 68) = 91.38 Model | 3.0674e+09 6 511232168 Prob > F = 0.0000 Residual | 380441313 68 5594725.19 R-squared = 0.8897 -------------+------------------------------ Adj R-squared = 0.8799 Total | 3.4478e+09 74 46592355.7 Root MSE = 2365.3 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- group1 | -5431.147 6337.479 -0.86 0.394 -18077.39 7215.096 mpg1 | 13.14912 177.2275 0.07 0.941 -340.5029 366.8012 weight1 | 3.517687 .9754638 3.61 0.001 1.571179 5.464194 group2 | 9685.028 4312.439 2.25 0.028 1079.69 18290.37 mpg2 | -170.5474 96.07802 -1.78 0.080 -362.2681 21.17334 weight2 | .0527381 .8299005 0.06 0.950 -1.603303 1.708779 ------------------------------------------------------------------------------

What is this noconstant option? We must remember that when we fit the separate models, each has its own intercept. There was an intercept in X_1, X_2, and so on. What I have done above is literally translate

 y = (X_1*d1)*b1 + (X_2*d2)*b2 + d1*u1 + d2*u2

and so included the variables group1 and group2 (variables equal to 1 for their respective groups) and told Stata to omit the overall intercept.

I do not recommend you fit the model the way I have just illustrated because of numerical concerns—we'll get to that later. Fit the models separately or jointly, and you will get the same estimates for b_1 and b_2.

Now we can test whether the coefficients are the same for the two groups:

 . test _b[mpg1]=_b[mpg2], notest ( 1) mpg1 - mpg2 = 0 . test _b[weight1]=_b[weight2], accum ( 1) mpg1 - mpg2 = 0 ( 2) weight1 - weight2 = 0 F( 2, 68) = 5.61 Prob > F = 0.0056

That is the Chow test. Something was omitted: the intercept. If we really wanted to test whether the two groups were the same, we would would test

 . test _b[mpg1]=_b[mpg2] ( 1) mpg1 - mpg2 = 0 . test _b[weight1]=_b[weight2], accum ( 1) mpg1 - mpg2 = 0 ( 2) weight1 - weight2 = 0 . test _b[group1]=_b[group2], accum ( 1) mpg1 - mpg2 = 0 ( 2) weight1 - weight2 = 0 ( 3) group1 - group2 = 0 F( 3, 68) = 4.07 Prob > F = 0.0102

Using this approach, however, we are not tied down by what the "Chow test" can test. We can formulate any hypothesis we want. We might think that weight works the same way in both groups but that mpg works differently, and each group has its own intercept. Then, we could test

 . test _b[mpg1]=_b[mpg2] ( 1) mpg1 - mpg2 = 0 F( 1, 68) = 0.83 Prob > F = 0.3654

by itself. If we had more variables, we could test any subset of variables.

Is “pooling the data” justified? Of course it is: we just established that pooling the data is just another way of fitting separate models and that fitting separate models is certainly justified—we got the same coefficients. That’s why I told you to forget the phrase about whether pooling the data is justified. People who ask that don’t really mean to ask what they are saying: they mean to ask whether the coefficients are the same. In that case, they should say that. Pooling is always justified, and it corresponds to nothing more than the mathematical trick of writing separate equations,

 y_1 = X_1*b_1 + u_1 (equation for group 1) y_2 = X_2*b_2 + u_2 (equation for group 2)

as one equation

 y = (X_1*d1)*b1 + (X_2*d2)*b2 + d1*u1 + d2*u2

There are many ways I can write the above equation, and I want to write it a little differently because of numerical concerns. Starting with

 y = (X_1*d1)*b1 + (X_2*d2)*b2 + d1*u1 + d2*u2

let’s do some algebra to obtain

 y = X*b1 + d2*X_2*(b2-b1) + d1*y1 + d2*u2

where X = (X_1, X_2). In this formulation, I measure not b1 and b2, but b1 and (b2−b1). This is numerically more stable, and I can still test that b2==b1 by testing whether (b2−b1)=0. Let’s fit this model

 . regress price mpg weight mpg2 weight2 group2 Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 5, 68) = 9.10 Model | 254624083 5 50924816.7 Prob > F = 0.0000 Residual | 380441313 68 5594725.19 R-squared = 0.4009 -------------+------------------------------ Adj R-squared = 0.3569 Total | 635065396 73 8699525.97 Root MSE = 2365.3 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 13.14912 177.2275 0.07 0.941 -340.5029 366.8012 weight | 3.517687 .9754638 3.61 0.001 1.571179 5.464194 mpg2 | -183.6965 201.5951 -0.91 0.365 -585.9733 218.5803 weight2 | -3.464949 1.280728 -2.71 0.009 -6.020602 -.9092956 group2 | 15116.17 7665.557 1.97 0.053 -180.2075 30412.56 _cons | -5431.147 6337.479 -0.86 0.394 -18077.39 7215.096 ------------------------------------------------------------------------------

and, if I want to test whether the coefficients are the same, I can do

 . test _b[mpg2]=0 ( 1) mpg2 = 0 . test _b[weight2]=0, accum ( 1) mpg2 = 0 ( 2) weight2 = 0 F( 2, 68) = 5.61 Prob > F = 0.0056

and that gives the same answer yet again. If I want to test whether *ALL* the coefficients are the same (including the intercept) I can use

 . test _b[mpg2]=0, notest ( 1) mpg2 = 0 . test _b[weight2]=0, accum notest ( 1) mpg2 = 0 ( 2) weight2 = 0 . test _b[group2]=0, accum ( 1) mpg2 = 0 ( 2) weight2 = 0 ( 3) group2 = 0 F( 3, 68) = 4.07 Prob > F = 0.0102

Just as before, I can test any subset.

Using this difference formulation, if I had three groups, starting with

 y = (X_1*d1)*b1 + (X_2*d2)*b2 + (X_3*d3)*b3 + d1*u1 + d2*u2 + d3*u3

 y = X*b1 + (X_2*d2)*(b2-b1) + (X_3*d3)*(b3-b1) + d1*u1 + d2*u2 + d3*u3

Let’s create the group variables and fit this model:

 . sysuse auto,clear . generate group1=rep78==3 . generate group2=rep78==4 . generate group3=(group1+group2)==0 . generate mpg1=mpg*group1 . generate weight1=weight*group1 . generate mpg2=mpg*group2 . generate weight2=weight*group2 . generate mpg3=mpg*group3 . generate weight3=weight*group3 . regress price mpg weight mpg2 weight2 group2 /// > mpg3 weight3 group3 Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 8, 65) = 5.80 Model | 264415585 8 33051948.1 Prob > F = 0.0000 Residual | 370649811 65 5702304.78 R-squared = 0.4164 -------------+------------------------------ Adj R-squared = 0.3445 Total | 635065396 73 8699525.97 Root MSE = 2387.9 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 13.14912 178.9234 0.07 0.942 -344.1855 370.4837 weight | 3.517687 .9847976 3.57 0.001 1.55091 5.484463 mpg2 | 130.5261 336.6547 0.39 0.699 -541.8198 802.872 weight2 | -2.18337 1.837314 -1.19 0.239 -5.85274 1.486 group2 | 4560.193 12222.22 0.37 0.710 -19849.27 28969.66 mpg3 | -194.1974 216.3459 -0.90 0.373 -626.27 237.8752 weight3 | -3.160952 1.73308 -1.82 0.073 -6.622152 .3002481 group3 | 14556.66 9167.998 1.59 0.117 -3753.101 32866.41 _cons | -5431.147 6398.12 -0.85 0.399 -18209.07 7346.781 ------------------------------------------------------------------------------

If I want to test whether the three groups were the same in the Wald-test sense, I can use

 . test (_b[mpg2]=0) (_b[weight2]=0) (_b[group2]=0) /* > */ (_b[mpg3]=0) (_b[weight3]=0) (_b[group3]=0) ( 1) mpg2 = 0 ( 2) weight2 = 0 ( 3) group2 = 0 ( 4) mpg3 = 0 ( 5) weight3 = 0 ( 6) group3 = 0 F( 6, 65) = 2.28 Prob > F = 0.0463

which I could more easily type as

 . testparm mpg2 weight2 group2 mpg3 weight3 group3 ( 1) mpg2 = 0 ( 2) weight2 = 0 ( 3) group2 = 0 ( 4) mpg3 = 0 ( 5) weight3 = 0 ( 6) group3 = 0 F( 6, 65) = 2.28 Prob > F = 0.0463

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2008-1-2 18:07:00

http://www.stata.com/support/faqs/stat/awreg.html

How can I pool data (and perform Chow tests) in linear regression without constraining the residual variances to be equal?

Title		Pooling data and performing Chow tests in linear regression
Author		William Gould, StataCorp
Date		December 1999; updated August 2005

1. Pooling data and constraining residual variance

Consider the linear regression model

y = β₀ + β₁x₁ + β₂x₂ + u, u ~ N(0, σ² )

and let us pretend that we have two groups of data, group=1 and group=2. We could have more groups; everything said below generalizes to more than two groups.

We could estimate the models separately by typing

 . regress y x1 x2 if group==1 and . regress y x1 x2 if group==2

or we could pool the data and estimate a single model, one way being

 . gen g2 = (group==2) . gen g2x1 = g2*x1 . gen g2x2 = g2*x2 . regress y x1 x2 g2 g2x1 g2x2

The difference between these two approaches is that we are constraining the variance of the residual to be the same in the two groups when we pool the data. When we estimated separately, we estimated

group 1: y = β₀₁ + β₁₁x₁ + β₂₁x₂ + u₁, u₁ ~ N(0, σ₁²)

and

group 2: y = β₀₂ + β₁₂x₁ + β₂₂x₂ + u₂, u₂ ~ N(0, σ₂²)

When we pooled the data, we estimated

y = β₀₁ + β₁₁x₁ + β₂₁x₂ + (β₀₂-β₀₁)g₂ + (β₁₂-β₁₁)g₂x₁ + (β₂₂-β₂₁)g₂x₂ + u, u ~ N(0, σ²)

If we evaluate this equation for the groups separately, we obtain

y = β₀₁ + β₁₁x₁ + β₂₁x₂ + u, u ~ N(0,σ²) for group=1

and

y = β₀₂ + β₁₂x₁ + β₂₂x₂ + u, u ~ N(0,σ²) for group=2

The difference is that we have now constrained the variance of u for group=1 to be the same as the variance of u for group=2.

If you perform this experiment with real data, you will observe the following:

You will obtain the same values for the coefficients either way.
You will obtain different standard errors and therefore different test statistics and confidence intervals.

If u is known to have the same variance in the two groups, the standard errors obtained from the pooled regression are better—they are more efficient. If the variances really are different, however, then the standard errors obtained from the pooled regression are wrong.

2. Illustration (See the do-file and the log with the results in section 7)

I have created a dataset (containing made-up data) on y, x1, and x2. The dataset has 74 observations for group=1 and another 71 observations for group=2. Using these data, I can run the regressions separately by typing

 [1] . regress y x1 x2 if group==1 [2] . regress y x1 x2 if group==2

or I can run the pooled model by typing

 . gen g2 = (group==2) . gen g2x1 = g2*x1 . gen g2x2 = g2*x2 [3] . regress y x1 x2 g2 g2x1 g2x2

I did that in Stata and let me summarize the results. When I typed command [1], I obtained the following results (standard errors in parentheses):

 y = -8.650993 + 1.21329*x1 + -.8809939*x2 + u, Var(u) = 15.891² (22.73703) (.54459) (.405401)

and when I ran command [2], I obtained

 y = 4.646994 + .9307004*x1 + .8812369*x2 + u, Var(u) = 7.5685² (11.1593) (.236696) (.1997562)

When I ran command [3], I obtained

 y = -8.650993 + 1.21329*x1 + -.8809939*x2 + (17.92853) (.42942) + (.3196656) 13.29779*g2 + -.2825893*g2x1 + 1.762231*g2x2 + u, Var(u) = 12.531² (25.74446) (.6123452) (.459958)

The intercept and coefficients on x1 and x2 in [3] are the same as in [1], but the standard errors are different. Also, if I sum the appropriate coefficients in [3], I obtain the same results as [2]:

 Intercept: 13.29779 + -8.650993 = 4.646797 ([2] has 4.646994) x1: -.2825893 + 1.21329 = .9307004 ([2] has .9307004) x2: 1.762231 + -.8809939 = .8812371 ([2] has .8812369)

The coefficients are the same, estimated either way. (The fact that the coefficients in [3] are a little off from those in [2] is just because I did not write down enough digits.)

The standard errors for the coefficients are different.

I also wrote down the estimated Var(u), what is reported as RMSE in Stata’s regression output. In standard deviation terms, u has s.d. 15.891 in group=1, 7.5685 in group=2, and if we constrain these two very different numbers to be the same, the pooled s.d. is 12.531.

3. Pooling data without constraining residual variance

We can pool the data and estimate an equation without constraining the residual variances of the groups to be the same. Previously we typed

 . gen g2 = (group==2) . gen g2x1 = g2*x1 . gen g2x2 = g2*x2 . regress y x1 x2 g2 g2x1 g2x2

and we start exactly the same way. To that, we add

 . predict r, resid . sum r if group==1 . gen w = r(Var)*(r(N)-1)/(r(N)-3) if group==1 . sum r if group==2 . replace w = r(Var)*(r(N)-1)/(r(N)-3) if group==2 [4] . regress y x1 x2 g2 g2x1 g2x2 [aw=1/w]

In the above, the constant 3 that appears twice is 3 because there were three coefficients being estimated in each group (an intercept, a coefficient for x1, and a coefficient for x2). If there were a different number of coefficients being estimated, that number would change.

In any case, this will reproduce exactly the standard errors reported by estimating the two models separately. The advantage is that we can now test equality of coefficients between the two equations. For instance, we can now read right off the pooled regression results whether the effect of x1 is the same in groups 1 and 2 (answer: is _b[g2x1]==0?, because _b[x1] is the effect in group 1 and _b[x1]+_b[g2x1] is the effect in group 2, so the difference is _b[g2x1]). And, using test, we can test other constraints as well.

For instance, if you wanted to prove to yourself that the results of [4] are the same as typing regress y x1 x2 if group==2, you could type

 . test x1 + g2x1 == 0 (reproduces test of x1 for group==2) and . test x2 + g2x2 == 0 (reproduces test of x2 for group==2)

4. Illustration

Using the made-up data, I did exactly that. To recap, first I estimated separate regressions:

 [1] . regress y x1 x2 if group==1 [2] . regress y x1 x2 if group==2

and then I ran the variance-constrained regression,

 . gen g2 = (group==2) . gen g2x1 = g2*x1 . gen g2x2 = g2*x2 [3] . regress y x1 x2 g2 g2x1 g2x2

and then I ran the variance-unconstrained regression,

 . predict r, resid . sum r if group==1 . gen w = r(Var)*(r(N)-1)/(r(N)-3) if group==1 . sum r if group==2 . replace w = r(Var)*(r(N)-1)/(r(N)-3) if group==2 [4] . regress y x1 x2 g2 g2x1 g2x2 [aw=1/w]

Just to remind you, here is what commands [1] and [2] reported:

 y = -8.650993 + 1.21329*x1 + -.8809939*x2 + u, Var(u) = 15.891² (22.73703) (.54459) (.405401) y = 4.646994 + .9307004*x1 + .8812369*x2 + u, Var(u) = 7.5685² (11.1593) (.236696) (.1997562)

Here is what command [4] reported:

 y = -8.650993 + 1.21329*x1 + -.8809939*x2 + (22.73703) (.54459) (.405401) 13.29779*g2 + -.2825893*g2x1 + 1.762231*g2x2 + u (25.3269) (.6050657) (.451943)

Those results are the same as [1] and [2]. (Pay no attention to the RMSE reported by regress at this last step; the reported RMSE is the standard deviation of neither of the two groups but is instead a weighted average; see the FAQ on this if you care. If you want to know the standard errors of the respective residuals, look back at the output from the summarize statements typed when producing the weighting variable.)

Technical Note: Note that in creating the weights, we typed

 . sum r if group==1 . gen w = r(Var)*(r(N)-1)/(r(N)-3) if group==1

and similarly for group 2. The 3 that appears in the finite-sample normalization factor (r(N)-1)/(r(N)-3) appears because there are three coefficients per group being estimated. If our model had fewer or more coefficients, that number would change. In fact, the finite-sample normalization factor changes results very little. In real work, I would have ignored it and just typed

 . sum r if group==1 . gen w = r(Var) if group==1

unless the number of observations in one of the groups was very small. The normalization factor was included here so that [4] would produce the same results as [1] and [2].

5. The (lack of) importance of not constraining the variance

Does it matter whether we constrain the variance? Here, it does not matter much. For instance, if after

 [4] . regress y x1 x2 g2 g2x1 g2x2 [aw=1/w]

we test whether group 2 is the same as group 1, we obtain

 . test g2x1 g2x2 g2 ( 1) g2x1 = 0.0 ( 2) g2x2 = 0.0 ( 3) g2 = 0.0 F( 3, 139) = 307.50 Prob > F = 0.0000

If instead we had constrained the variances to be the same, estimating the model using

 [3] . regress y x1 x2 g2 g2x1 g2x2

and then repeated the test, the reported F-statistic would be 300.81.

If there were more groups, and the variance differences were great among the groups, this could become more important.

6. Another way to fit the variance-unconstrained model

Stata’s xtgls, panels(het) command (see help xtgls) fits exactly the model we have been describing, the only difference being that it does not make all the finite-sample adjustments and so its standard errors are just a little different from those produced by the method just described. (To be clear, xtgls, panels(het) does not make the adjustment described in the technical note above and it does not make the finite-sample adjustments regress itself makes, which is to say, variances are invariable normalized by N, the number of observations, rather than N-k, observations minus number of estimated coefficients.)

Anyway, to estimate xtgls, panels(het), you pool the data just as always,

 . gen g2 = (group==2) . gen g2x1 = g2*x1 . gen g2x2 = g2*x2

and then type

 [5] . xtgls y x1 x2 g2 g2x1 g2x2, panels(het) i(group)

to estimate the model. The result of doing that with my fictional data is

 y = -8.650993 + 1.21329*x1 + -.8809939*x2 + (22.27137) (.53344) (.397099) 13.29779*g2 + -.2825893*g2x1 + 1.762231*g2x2 + u (24.80488) (.5925734) (.442610)

These are the same coefficients we have always seen.

The standard errors produced by xtgls, panels(het) here are about 2% smaller than those produced by [4] and in general will be a little smaller because xtgls, panels(het) is an asymptotically based estimator. The two estimators are asymptotically equivalent, however, and in fact quickly become identical. The only caution I would advise is not to use xtgls, panels(het) if the number of degrees of freedom (observations minus number of coefficients) is below 25 in any of the groups. Then, the weighted OLS approach [4] is better (and you should make the finite-sample adjustment described in the above technical note).

7. Appendix: do-file and log providing results reported above

7.1 do-file

The following do-file, named uncv.do, was used. Up until the line reading “BEGINNING OF DEMONSTRATION’, the do-file is concerned with constructing the artificial dataset for the demonstration: uncv.do

7.2 log

The do-file shown in 7.1 produced the following output: uncv.log

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

kkwei

2008-1-3 00:13:00

版主真是辛苦了……很好的东西……

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

bookbug

2008-1-3 10:54:00

这几篇东西真好前阵子我妹问题邹氏检验的问题我也没用过就查到了上面几篇说明看了还是挺容易明白的

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

三木

2008-1-5 22:12:00

版主辛苦了！

我前阵子还在为这个烦恼。看了一下这几篇，非常不错。

谢谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

histidine

2008-10-1 04:25:00

谢谢版主！

刚刚做作业参考了这个帖子！写的很清楚很有用！Thanks a million~

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

kakajin

2009-3-8 23:33:00

Is Chow test just a F-test?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sungmoo

2009-4-30 02:33:00

以下是引用kakajin在2009-3-8 23:33:00的发言：Is Chow test just a F-test?

reg y x1-xn if g==1
scalar r1=e(rss)
scalar n1=e(N)
reg y x1-xn if g==2
scalar r2=e(rss)
scalar n2=e(N)
reg y x1-xn if g==1|g==2
scalar r=e(rss)
scalar k=e(df_m)+1
di 1-F(n1, n2, (r-r1-r2)*(n1+n2-2*k)/((r1+r2)*k))

[此贴子已经被作者于2009-4-30 2:39:49编辑过]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

唐伯小猫

2009-4-30 02:36:00

嘻嘻，蓝版还是一如既往地好。。。向你学习，踏踏实实。。。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2009-4-30 07:39:00

以下是引用唐伯小猫在2009-4-30 2:36:00的发言：

嘻嘻，蓝版还是一如既往地好。。。向你学习，踏踏实实。。。

你太谦虚了。

我也只是去stata主页看看而已。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jianlamhua

2009-5-2 13:53:00

这个帖子很好，让我进一步理解了chow检验。

顺便请教一下大家，stata里有直接的chow命令（之前看arlionn的贴子）。

chow命令的帮助里，它的例子是：

chow y x1 x2, chow(year>1975)

我的数据是2005年和2006年，所以我写的命令是chow y x1 x2, chow(year>2005)

可结果窗口显示：“program error: code follows on the same line as open brace”

请问这是怎么回事呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jianlamhua

2009-5-2 13:54:00

这个帖子很好，让我进一步理解了chow检验。

顺便请教一下大家，stata里有直接的chow命令（之前看arlionn的贴子）。

chow命令的帮助里，它的例子是：

chow y x1 x2, chow(year>1975)

我的数据是2005年和2006年，所以我写的命令是chow y x1 x2, chow(year>2005)

可结果窗口显示：“program error: code follows on the same line as open brace”

请问这是怎么回事呢？

[此贴子已经被作者于2009-5-2 13:55:28编辑过]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

chen1029j

2009-5-15 10:08:00

请问分位数回归怎样实现chow test呢?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sungmoo

2009-5-31 16:09:00

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

伊朗石油

2013-10-11 12:32:09

受教，顶之

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

How can I compute the Chow test statistic?

扫码加我 拉你入群

How can I do a Chow test with the robust variance estimates, that is, after estimating with regress, vce(robust)?

扫码加我 拉你入群

Can you explain Chow tests?

扫码加我 拉你入群

How can I pool data (and perform Chow tests) in linear regression without constraining the residual variances to be equal?

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群