全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件
2445 2
2006-05-04

实在心急了,苦苦看了一些书,也没找到怎样用主成分缩减指标!

具体是这样的:我选取了18个指标,做主成分分析,出来三个主成分,可是每个主成分涉及的指标比较分散。我想把一些指标剔除,可是不知道如何操作来剔除,斑竹推荐了SAS,指点说使用SAS软件的PRINCOMP过程可以求得主成分,然后再剔除指标。本人没学过SAS,不知道有没有具体实例可以参照的,还望各位具体指点下,多谢了!!!

也许我的确急功近利了,可是实在急着要交了,又不想太马虎的应付了事,各位帮忙一下了,谢谢了!!!

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2006-5-4 12:40:00

Here is what I thought:

1) Look at the scale of measurement of all the variables. If scales are different, e.g., Height vs age, you should normalize all the variables. Or use correlation matrix.

2) Check each variable, if they are close to normally distributed. Otherwise use transformation. This will ensure that all variables are in the elliptical space.


3) In SAS, here is an example: (using correlation matrix)


A) The data:


 data Crime;
title 'Crime Rates per 100,000 Population by State';
input State $1-15 Murder Rape Robbery Assault
Burglary Larceny Auto_Theft;
datalines;
Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7
Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3
Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4
California 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
Hawaii 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4
Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6
Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9
Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
Kentucky 10.1 19.1 81.1 123.3 872.2 1662.1 245.4
Louisiana 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9
Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
...
;


2) Use the 'PrinComp' procedure:

proc princomp out=Crime_Components;
run;

3) Look at the output (results): pay attention to the correlation matirx. If the correlation between two variables close to 1 or -1, you can omit one of them.
Also pay attention to the eigenvalues of correlation matrix. If eigenvalue < 1 do not use it


Hope this helps.



Crime Rates per 100,000 Population by State

The PRINCOMP Procedure

Observations 50
Variables 7

Simple Statistics
Murder Rape Robbery Assault Burglary Larceny Auto_Theft
Mean 7.444000000 25.73400000 124.0920000 211.3000000 1291.904000 2671.288000 377.5260000
StD 3.866768941 10.75962995 88.3485672 100.2530492 432.455711 725.908707 193.3944175

Correlation Matrix
Murder Rape Robbery Assault Burglary Larceny Auto_Theft
Murder 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688
Rape 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489
Robbery 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907
Assault 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758
Burglary 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580
Larceny 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442
Auto_Theft 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000

Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
1 4.11495951 2.87623768 0.5879 0.5879
2 1.23872183 0.51290521 0.1770 0.7648
3 0.72581663 0.40938458 0.1037 0.8685
4 0.31643205 0.05845759 0.0452 0.9137
5 0.25797446 0.03593499 0.0369 0.9506
6 0.22203947 0.09798342 0.0317 0.9823
7 0.12405606 0.0177 1.0000

Eigenvectors
Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7
Murder 0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593
Rape 0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485
Robbery 0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903
Assault 0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745
Burglary 0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117
Larceny 0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690
Auto_Theft 0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2006-5-4 17:12:00
主成分分析中得出的主成分只是原始指标的线性组合,即载荷矩阵,要实现主成分与原始指标的关联,还需要对载荷矩阵进行因子旋转,得出公共因子,有的时候并不一定得出结论。请参考人大出版社出版的《多元统计分析》何晓群编著,有主成分分析因子分析的上机实现(spss)很详细,在SPSS中主成分分析和因子分析是结合在一起的
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群