stata合成控制拟合度不高，要怎么处理？

nsjwzx2022

363

收藏 2025-07-16

当 Stata 里用 `synth` 或 `synth_runner` 跑合成控制（SCM）出现“拟合度不高”时，本质是预处理期（pre-treatment）的均方根预测误差（RMSPE）过大，导致合成权重无法复现处理组的特征。下面按“诊断→改进→稳健性”三步给出可立即操作的解决方案，并附 Stata 代码片段。

1. 快速诊断：到底哪里拟合差？
```stata
安装/更新
ssc install synth, replace
ssc install synth_runner, replace

跑完 synth 后，用 predict 看拟合
synth y x1 x2 x3 x4, trunit(3) trperiod(2015) ///
      counit(1 2 4 5 6 7 8) ///
      fig keep(res) replace
predict y_synth if unit==3, fitted
twoway (line y year if unit==3, lcolor(red)) ///
   (line y_synth year, lcolor(blue)), ///
   xline(2015) legend(order(1 "真实" 2 "合成"))

计算 RMSPE
gen diff = y - y_synth if year<2015
tabstat diff, stat(rmse)  // 看具体数值
```
-可视化：如果蓝线（合成）在红线（真实）附近大幅波动 → 拟合差。
-数值：RMSPE > 真实值标准差的 20% → 通常认为拟合不足。

---

2. 改进拟合的 6 个实战技巧

| 问题 | 改进动作 | Stata 操作示例 |
|---|---|---|
|1. 协变量维度不足 | 加入更多滞后因变量和结构性预测变量（如GDP、人口、政策强度等） | `synth y y(2010) y(2011) y(2012) x1 x2 x3, ...` |
|2. 控制组样本过少 | 扩大潜在控制池（如全国地级市→县级市，或加入相邻省份） | `counit(1/300)` |
|3. 变量量纲差异大 |标准化协变量（z-score）或改用差分增长形式 | `foreach v of varlist x1 x2 { egen std_`v' = std(`v') }` |
|4. 权重集中在少数单位 | 用约束权重（如非负、和为1）或惩罚项（ridge） | `synth y x1 x2, custom keep(res) constraints(nonneg)` |
|5. 预处理期太短 | 延长预处理期（如从5年→10年）或滚动窗口验证 | `trperiod(2010)` 而非 `2015` |
|6. 结构突变 |排除异常年份（如金融危机）或分时段拟合 | `drop if year==2008` |

---

3. 稳健性检验：确保改进可信
```stata
3.1 安慰剂检验：随机置换处理组
synth_runner y x1 x2 x3, trunit(3) trperiod(2015) ///
            gen_vars pre_limit_mult(1) training_propr(0.5) ///
            n_pl(1000) keep(res_placebo)

3.2 留一法交叉验证
forvalues i=1/8 {
synth y x1 x2 x3, trunit(3) trperiod(2015) ///
      counit(1 2 4 5 6 7 8 \ `i')  // 每次剔除一个控制单元
estat rmspe
}
3.3 事件研究法对比
gen post = (year>=2015 & unit==3)
reg y i.unit##i.year post x1 x2, cluster(unit year)
```

---

4. 终极方案：改用扩展模型
如果以上仍无效，说明经典 SCM 假设（线性组合+无未观测混杂）可能不满足：
-Augmented SCM (`ascm` 命令)：在 SCM 基础上加入弹性网络惩罚。
  ```stata
  ssc install ascm, replace
  ascm y x1 x2, trunit(3) trperiod(2015) lambda(0.1)  // 调lambda
  ```
-合成双重差分 (SDID)：结合 DID 和 SCM，用 `sdid` 命令（需 GitHub 安装）。
-贝叶斯 SCM (`bsynth`)：处理高维协变量和不确定性。

---

5. 检查清单（投稿前必备）
1. [ ] 预处理期 RMSPE < 真实标准差×0.2
2. [ ] 权重无极端值（最大权重<0.5）
3. [ ] 安慰剂检验 p-value < 0.1
4. [ ] 留一法 RMSPE 稳定
5. [ ] 图表包含：真实 vs 合成、权重排序、安慰剂分布

---

一句话总结：先通过“加变量、扩控制池、标准化”三步提升拟合，再用安慰剂和留一法验证稳健性；若仍失败，转用 Augmented SCM 或 SDID 等扩展模型。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群