使用聚类稳健标准误后，F值缺失的解决办法！

28275

收藏 2019-12-13

这是我日常操作遇到的问题，查阅了论坛中很多回复，没有清晰的解决办法，经过反复的研究，找出了问题的所在，在这里和大家分享。下面以一个简单的例子来解决这个问题。下面以一个简答的例子来理解，
1、reg roa sif lnage lncopen i.year i.ind if region ==2,vce(cluster code) ；在这个模型中，我控制了年份和行业，同时对个体进行聚类，使用聚类稳健标准误。

上面使用聚类稳健标准误后，F值和显著性都缺失了。点开这个F值的蓝色链接，会有stata对这个问题的解答，大体上缺漏的原因就是因为如果聚类的时候，只有一个code的话，那么就无法实现聚类。但是通过计算每个code的数量的时候，发现最少也有2个，也就是样本公司，最少也有两年的，不存在数据中某个公司只有一个样本的情况。（不过要注意的是，如果你数据中存在这种情况，需要删减掉）

既然不存在样本中code只有一个无法聚类的问题，那么就排除了第一种情况。
2、控制变量年份和行业中，存在某一个行业，在某一年中只有一个样本的情况，于是同样的方法，先检查一下数据，是否存在这样的情况。

果然结果发现，的确存在某个行业，在该年度只有一个样本的情况。

下面删除这些只有一个样本的情况，再次进行回归分析。

现在结果出来了，可以看到F值了。
对上面F值缺漏的情况，总结两种可能的结果，并相应地处理：
1、聚类的个体（上面例子中的code），是否存在只有一个样本的情况。
2、加入年份和行业的控制变量，也要检查，是否存在某个行业在该年度，只有一个样本的情况。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

zhukuaishen

2020-1-15 23:49:35

[em17]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

eileenqhy

2020-2-10 04:52:41

Are any standard errors missing?

If any standard errors are reported as dots, something is wrong with your model:  one or more coefficients could not be estimated in the normal statistical sense.  You need to address that problem and ignore the rest of this discussion.

Are you using bootstrap or jackknife?

The VCE you have just estimated is not of sufficient rank to perform the model test.  This is most likely due to not having enough replications.

The bootstrap command has a reps(#) option, and if # is less than the number of coefficients in the model, the VCE will have insufficient rank.  The solution is to rerun bootstrap with a much larger number of replications.

The jackknife command estimates the VCE by refitting the model for each observation in the dataset, leaving the associated observation out of the estimation sample each time.  As with the conventional variance estimator, the VCE will be singular if
the number of observations is less than the number of parameters.  See the following discussion if you supplied the cluster() option to jackknife.

Are you using a svy estimator or did you specify the vce(cluster clustvar) option?

The VCE you have just estimated is not of sufficient rank to perform the model test.  As discussed in [R] test, the model test with clustered or survey data is distributed as F(k,d-k+1) or chi2(k), where k is the number of constraints and d=number
of clusters or d=number of PSUs minus the number of strata.  Because the rank of the VCE is at most d and the model test reserves 1 degree of freedom for the constant, at most d-1 constraints can be tested, so k must be less than d.  The model that
you just fit does not meet this requirement.

To simplify the remaining discussion, let's consider the case of clustered data.  This discussion applies to survey estimation in general by substituting, "PSUs - strata" for "clusters".

There is no mechanical problem with your model, but you need to consider carefully whether any of the reported standard errors mean anything.  The theory that justifies the standard error calculation is asymptotic in the number of clusters, and we
have just established that you are estimating at least as many parameters as you have clusters.

That concern aside, the model test statistic issue is that you cannot simultaneously test that all coefficients are zero because there is not enough information.  You could test a subset, but not all, and so Stata refuses to report the overall model
test statistic.

Here note the degrees of freedom reported for the chi2 or F.  You might see chi2(6) or F(6, 5).  If you were to count the number of coefficients that would be constrained to 0 in a model test in this case, you would find that number to be greater
than 6.  You could find out what that number is by reestimating the model parameters without the vce(robust) and vce(cluster clustvar) options (or, for the survey commands, using the corresponding non-svy estimator).  In any case, the 6 reported is
the maximum number of coefficients that could be simultaneously tested.

Is there a regressor that is nonzero for only 1 observation or for one cluster?

The VCE you have just estimated is not of sufficient rank to perform the model test.  This can happen if there is a variable in your model that is nonzero for only 1 observation in the estimation sample.  Likewise, it can happen if a variable is
nonzero for only one cluster when using the cluster-robust VCE.  In such cases the derivative of the sum-of-squares or likelihood function with respect to that variable's parameter is zero for all observations.  That implies that the
outer-product-of-gradients (OPG) variance matrix is singular.  Because the OPG variance matrix is used in computing the robust variance matrix, the latter is therefore singular as well.

再补充一点，也有可能是自由度不够，即聚类组数量必须大于方程自变量数量。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

cct25

2020-4-13 14:11:28

感谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

梓煜

2020-5-2 14:09:43

那请问ｆ值缺失的模型回归结果可以用吗。如果两个主检验方程，可以一个cluster，一个不cluster吗

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

y97986

2020-5-24 17:01:35

梓煜发表于 2020-5-2 14:09
那请问ｆ值缺失的模型回归结果可以用吗。如果两个主检验方程，可以一个cluster，一个不cluster吗

我也想问这个问题您知道了吗 F为蓝色，缺失的回归结果可以用在论文里吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

kuaixuantou3

2020-6-5 14:30:53

y97986 发表于 2020-5-24 17:01
我也想问这个问题您知道了吗 F为蓝色，缺失的回归结果可以用在论文里吗？

根据stata官方的解释，F值的缺失并不影响结果的准确性

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

y97986

2020-6-7 21:26:22

kuaixuantou3 发表于 2020-6-5 14:30
根据stata官方的解释，F值的缺失并不影响结果的准确性

喔喔！谢谢！那就是可以汇报在论文里喽？但是我是在分样本看异质性的时候遇到这个问题，F缺失的样本，其回归结果下边报告的样本量看起来会明显小于另一个，这样可以吗

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

苏幕遮Doris

2021-2-2 15:38:05

聚类的个体（上面例子中的code），是否存在只有一个样本的情况，如果存在的话，应该怎么删除呢？就是怎么有命令可以直接删除呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

henry.chan

2021-2-25 20:31:05

两种情况都剔除了之后，还是不报告F值是怎么回事

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Singerwood

2021-3-3 11:53:58

henry.chan 发表于 2021-2-25 20:31
两种情况都剔除了之后，还是不报告F值是怎么回事

我运用楼主的方法也出现了这个问题 
但是呢我不用聚类稳健标准误 
只用异方差稳健标准误后有F值 
也就是reg robust可以 
reg cluster或者xtreg robust不可以 
你试试你是否也是这样？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

冯梦媛

2021-6-17 23:45:23

苏幕遮Doris 发表于 2021-2-2 15:38
聚类的个体（上面例子中的code），是否存在只有一个样本的情况，如果存在的话，应该怎么删除呢？就是怎么有 ...

duplicates 命令

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

周生辰

2022-2-14 22:02:16

冯梦媛发表于 2021-6-17 23:45
duplicates 命令

你好，可以分享具体的命令吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

qgmyysj

2022-4-9 08:54:17

非常感谢，知道怎么回事了

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

X_X_

2022-5-6 13:11:38

周生辰发表于 2022-2-14 22:02
你好，可以分享具体的命令吗？

你好，请问你知道具体命令了吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tianliwu

2023-3-13 13:54:27

谢谢！很有用！我和楼主情况一样，使用的方法比较笨，命令是这个bysort industry：tab year 它可以按照行业汇报每年的样本数量，然后就可以drop了。ps:我比较笨，出于平衡面板的考虑，我是把该行业直接删除了（我的数据中这种行业一般观测值都很少）

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

愿景岚

2023-3-28 16:11:06

tianliwu 发表于 2023-3-13 13:54
谢谢！很有用！我和楼主情况一样，使用的方法比较笨，命令是这个bysort industry：tab year 它可以按照行 ...

你好，请问我输入您说命令后显示标红的字体required是怎么回事呀，能否分享一下完整的命令，谢谢~

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

2022000017

2023-4-9 21:11:52

还是想知道怎么回事啊，这个很玄学。加了一个控制变量，F值就出现了，不知道是为啥。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

xcdbg

2023-5-31 00:08:07

愿景岚发表于 2023-3-28 16:11
你好，请问我输入您说命令后显示标红的字体required是怎么回事呀，能否分享一下完整的命令，谢谢~

把冒号改为英文冒号

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

1004_4043

2024-8-12 12:48:14

根据stata的解释，（也就是3楼回复最后的那段）来看，本质问题是存在某个变量里只有一个非零的观测或者只有一个聚类的观测非零。所以用i.city进行回归的时候，如果数据里这个城市只出现1次，那必然这个城市的虚拟变量只有一个非零观测，要删掉。

Is there a regressor that is nonzero for only 1 observation or for one cluster?

The VCE you have just estimated is not of sufficient rank to perform the model test. This can happen if there is a variable in your model that is nonzero for only 1 observation in the estimation sample. Likewise, it can happen if a variable is nonzero for only one cluster when using the cluster-robust VCE. In such cases the derivative of the sum-of-squares or likelihood function with respect to that variable's parameter is zero for all observations. That implies that the outer-product-of-gradients (OPG) variance matrix is singular. Because the OPG variance matrix is used in computing the robust variance matrix, the latter is therefore singular as well.

是否有一个回归量仅对1个观测值或一个聚类非零？

您刚刚估计的方差-协方差矩阵（VCE）的秩不足以执行模型检验。如果模型中有一个变量在估计样本中仅对1个观测值非零，就会发生这种情况。同样，如果在使用聚类稳健VCE时，一个变量仅对一个聚类非零，也会发生这种情况。在这些情况下，对该变量参数的残差平方和或似然函数的导数对所有观测值都是零。这意味着梯度的外积（OPG）方差矩阵是奇异的。因为OPG方差矩阵用于计算稳健方差矩阵，因此后者也是奇异的。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sxno1

2024-9-11 00:34:42

henry.chan 发表于 2021-2-25 20:31
两种情况都剔除了之后，还是不报告F值是怎么回事

请问该问题您之后是如何解决的呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群