岭回归是需要标准化的。因为岭回归是在最小二乘的基础上加一个惩罚项,通过惩罚项来限制参数的大小。所以参数的量纲会影响惩罚项的大小,所以需要标准化。但在调用R程序时不需要,lm.ridge自动标准化了。
> library(MASS)
> example(lm.ridge)
lm.rdg> longley # not the same as the S-PLUS dataset
GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
1947 83.0 234.289 235.6 159.0 107.608 1947 60.323
1948 88.5 259.426 232.5 145.6 108.632 1948 61.122
1949 88.2 258.054 368.2 161.6 109.773 1949 60.171
1950 89.5 284.599 335.1 165.0 110.929 1950 61.187
1951 96.2 328.975 209.9 309.9 112.075 1951 63.221
1952 98.1 346.999 193.2 359.4 113.270 1952 63.639
1953 99.0 365.385 187.0 354.7 115.094 1953 64.989
1954 100.0 363.112 357.8 335.0 116.219 1954 63.761
1955 101.2 397.469 290.4 304.8 117.388 1955 66.019
1956 104.6 419.180 282.2 285.7 118.734 1956 67.857
1957 108.4 442.769 293.6 279.8 120.445 1957 68.169
1958 110.8 444.546 468.1 263.7 121.950 1958 66.513
1959 112.6 482.704 381.3 255.2 123.366 1959 68.655
1960 114.2 502.601 393.1 251.4 125.368 1960 69.564
1961 115.7 518.173 480.6 257.2 127.852 1961 69.331
1962 116.9 554.894 400.7 282.7 130.081 1962 70.551
lm.rdg> names(longley)[1] <- "y"
lm.rdg> lm.ridge(y ~ ., longley)
GNP Unemployed Armed.Forces Population Year
2946.85636017 0.26352725 0.03648291 0.01116105 -1.73702984 -1.41879853
Employed
0.23128785