The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
R2可能为负。 当模型的预测使数据拟合度比输出值的平均值差时,就会出现负分数。
score(X, y[,]sample_weight) 定义为(1-u/v),
u = ((y_true - y_pred)**2).sum(),
v=((y_true-y_true.mean())**2).mean()
最好的得分为1.0,一般的得分都比1.0低,得分越低代表结果越差。
SST=SSE+SSR 在没有截距项的回归模型中,该等式不成立。不带截距项的线性回归的R^2会小于0或者大于1。但此时我们可用Uncentered R-square。
【1】sklearn计算出的score(r2)是严格按照公式计算。
【2】statsmodels计算出的r2 在没有截距时R2 is computed without centering (uncentered) since the model does not contain a constant.
- rsquared – R-squared of a model with an intercept. This is defined here as 1 - ssr/centered_tss if the constant is included in the model and 1 - ssr/uncentered_tss if the constant is omitted.
- rsquared_adj – Adjusted R-squared. This is defined here as 1 - (nobs-1)/df_resid * (1-rsquared) if a constant is included and 1 - nobs/df_resid * (1-rsquared) if no constant is included.
【3】
特别说明:centered/uncentered R2与 R2/ adjusted R2 不是一个概念,而一般的教科书也提醒我们,使用 IV 估计时,R2 是没有太大意义的(所以通常不报告 R2 值)!
可以参考 https://www.stata-journal.com/sjpdf.html?articlenum=st0030
Regression through the origin is an important and useful tool in applied statistics, but it remains a subject of pedagogical neglect, controversy and confusion. Hopefully, this synthesis provides some clarity. However, in the light ofthe unresolved debate, perhaps the strongest conclusion to be drawn from this review is that the practice of statistics remains as much an art asit is a science, and the development of statistical judgment is therefore as important as computational skill.