全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
25956 7
2008-02-12
两个变量相关分析直接输入便可进行,可是两个变量组之间相关分析该怎样进行呢?急问,^_^
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2008-2-13 02:12:00
 
UCLA Academic Technology ServicesHomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Canonical Correlation Analysis

Examples of Canonical Correlation Analysis

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Description of the Data

Let's pursue Example 1 from above.

We have a data file, mmreg.dta, with 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

Let's look at the data.

    use http://www.ats.ucla.edu/stat/stata/dae/mmreg, clear summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- id | 600 300.5 173.3494 1 600 locus_of_c~l | 600 .0965333 .6702799 -2.23 1.36 self_concept | 600 .0049167 .7055125 -2.62 1.19 motivation | 600 .6608333 .3427294 0 1 read | 600 51.90183 10.10298 28.3 76 -------------+-------------------------------------------------------- write | 600 52.38483 9.726455 25.5 67.1 math | 600 51.849 9.414736 31.8 75.5 science | 600 51.76333 9.706179 26 74.2 female | 600 .545 .4983864 0 1 tabulate female female | Freq. Percent Cum. ------------+----------------------------------- 0 | 273 45.50 45.50 1 | 327 54.50 100.00 ------------+----------------------------------- Total | 600 100.00

We did not include correlations among the variables at this point because we will get them later as part of the canonical correlation analysis.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a canonical correlation analysis, let's consider some other methods that you might use.
  • Separate OLS Regressions - You could analyze these data using separate OLS regression analyses for each variable in one set. The OLS regressions will not produce multivariate results and does not report information concerning dimensionality.
  • Multivariate multiple regression is a reasonable option if you have no interest in dimensionality.

Stata Canonical Correlation Analysis

    canon (locus_of_control self_concept motivation)(read write math science female), test(1 2 3) Linear combinations for canonical correlations Number of obs = 600 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- u1 | locus_of_c~l | 1.253834 .1210229 10.36 0.000 1.016153 1.491515 self_concept | -.3513499 .116424 -3.02 0.003 -.5799987 -.1227012 motivation | 1.26242 .2435532 5.18 0.000 .7840983 1.740742 -------------+---------------------------------------------------------------- v1 | read | .0446206 .0122741 3.64 0.000 .0205152 .068726 write | .0358771 .0122944 2.92 0.004 .0117318 .0600224 math | .0234172 .0127339 1.84 0.066 -.0015914 .0484258 science | .0050252 .0122762 0.41 0.682 -.0190845 .0291348 female | .6321192 .1747222 3.62 0.000 .2889767 .9752618 -------------+---------------------------------------------------------------- u2 | locus_of_c~l | -.6214775 .3731786 -1.67 0.096 -1.354375 .11142 self_concept | -1.187687 .3589975 -3.31 0.001 -1.892733 -.4826399 motivation | 2.027264 .7510053 2.70 0.007 .5523406 3.502187 -------------+---------------------------------------------------------------- v2 | read | -.00491 .0378475 -0.13 0.897 -.07924 .0694199 write | .0420715 .0379101 1.11 0.268 -.0323814 .1165244 math | .0042295 .0392656 0.11 0.914 -.0728854 .0813444 science | -.0851622 .0378541 -2.25 0.025 -.1595052 -.0108192 female | 1.084642 .5387622 2.01 0.045 .02655 2.142735 -------------+---------------------------------------------------------------- u3 | locus_of_c~l | -.6616896 .6064262 -1.09 0.276 -1.85267 .5292904 self_concept | .8267209 .5833814 1.42 0.157 -.3190007 1.972443 motivation | 2.000228 1.220406 1.64 0.102 -.3965655 4.397022 -------------+---------------------------------------------------------------- v3 | read | .0213806 .0615033 0.35 0.728 -.0994078 .1421689 write | .0913073 .0616051 1.48 0.139 -.0296808 .2122955 math | .0093982 .0638077 0.15 0.883 -.1159158 .1347122 science | -.109835 .0615141 -1.79 0.075 -.2306445 .0109745 female | -1.794647 .8755045 -2.05 0.041 -3.514078 -.0752155 ------------------------------------------------------------------------------ (Standard errors estimated conditionally) Canonical correlations: 0.4641 0.1675 0.1040 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .754361 15 1634.65 11.7157 0.0000 a Pillai's trace .254249 15 1782 11.0006 0.0000 a Lawley-Hotelling trace .314297 15 1772 12.3763 0.0000 a Roy's largest root .274496 5 594 32.6101 0.0000 u ---------------------------------------------------------------------------- Test of significance of canonical correlations 1-3 Statistic df1 df2 F Prob>F Wilks' lambda .754361 15 1634.65 11.7157 0.0000 a ---------------------------------------------------------------------------- Test of significance of canonical correlations 2-3 Statistic df1 df2 F Prob>F Wilks' lambda .96143 8 1186 2.9445 0.0029 e ---------------------------------------------------------------------------- Test of significance of canonical correlation 3 Statistic df1 df2 F Prob>F Wilks' lambda .989186 3 594 2.1646 0.0911 e ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F
The output for canonical correlation analysis is long and complex, and it is made up of two parts. First is the raw canonical coefficients with standard errors, Wald t-tests, p-values and confidence intervals for the raw coefficients. The second part begins with the canonical correlations and includes the multivariate tests for dimensionality. Stata is fairly unique among statistics packages in giving standard errors and Wald tests for the raw canonical coefficients.

The output for the coefficients is further divided into three sections, one for each of the canonical dimensions. Within each of the dimensions there are the "u" variables, variables in the first set, which for this example are the psychological variables. There are also the "v" variables, variables in the second set, in this case, the academic variables plus gender.

In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions are significant (they are), the next test tests whether dimensions 2 and 3 combined are significant (they are). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant.

The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for set 2 with the other predictors held constant.

Note that for the first dimension all of the variables except for math and science are statistically significant along with the dimension as a whole. For the second dimension only self-concept, motivation, math and female are significant. The third dimension is not significant and no attention will be paid to its coefficients or Wald tests.

When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables. Next we'll display the standardized canonical coefficients for the first two (significant) dimensions.

    canon (locus_of_control self_concept motivation)(read write math science female), first(2) stdcoef notest Canonical correlation analysis Number of obs = 600 Standardized coefficients for the first variable set | 1 2 -------------+-------------------- locus_of_c~l | 0.8404 -0.4166 self_concept | -0.2479 -0.8379 motivation | 0.4327 0.6948 ---------------------------------- Standardized coefficients for the second variable set | 1 2 -------------+-------------------- read | 0.4508 -0.0496 write | 0.3490 0.4092 math | 0.2205 0.0398 science | 0.0488 -0.8266 female | 0.3150 0.5406 ---------------------------------- Canonical correlations: 0.4641 0.1675 0.1040

The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for set 2 when the other variables in the model are held constant.

Next, we'll use the estat correlations command to look at all of the correlations within and between sets of variables.

    estat correlations Correlations for variable list 1 | locus_~l self_c~t motiva~n -------------+------------------------------ locus_of_c~l | 1.0000 self_concept | 0.1712 1.0000 motivation | 0.2451 0.2886 1.0000 -------------------------------------------- Correlations for variable list 2 | read write math sci female -------------+-------------------------------------------------- read | 1.0000 write | 0.6286 1.0000 math | 0.6793 0.6327 1.0000 science | 0.6907 0.5691 0.6495 1.0000 female | -0.0417 0.2443 -0.0482 -0.1382 1.0000 ---------------------------------------------------------------- Correlations between variable lists 1 and 2 | locus_~l self_c~t motiva~n -------------+------------------------------ read | 0.3736 0.0607 0.2106 write | 0.3589 0.0194 0.2542 math | 0.3373 0.0536 0.1950 science | 0.3246 0.0698 0.1157 female | 0.1134 -0.1260 0.0981 --------------------------------------------

Finally, we'll use the estat loadings command to display the loadings of the variables on the canonical dimensions (variates). These loadings are correlations between variables and the canonical variates.

    estat loadings Canonical loadings for variable list 1 | 1 2 -------------+-------------------- locus_of_c~l | 0.9040 -0.3897 self_concept | 0.0208 -0.7087 motivation | 0.5672 0.3509 ---------------------------------- Canonical loadings for variable list 2 | 1 2 -------------+-------------------- read | 0.8404 -0.3588 write | 0.8765 0.0648 math | 0.7639 -0.2979 science | 0.6584 -0.6768 female | 0.3641 0.7549 ---------------------------------- Correlation between variable list 1 and canonical variates from list 2 | 1 2 -------------+-------------------- locus_of_c~l | 0.4196 -0.0653 self_concept | 0.0097 -0.1187 motivation | 0.2632 0.0588 ---------------------------------- Correlation between variable list 2 and canonical variates from list 1 | 1 2 -------------+-------------------- read | 0.3900 -0.0601 write | 0.4068 0.0109 math | 0.3545 -0.0499 science | 0.3056 -0.1134 female | 0.1690 0.1265 ----------------------------------

Sample Write-Up of the Analysis

There is a lot of variation in the write-ups of canonical correlation analyses. The write-up below is fairly minimal, including only the tests of dimensionality and the standardized coefficients. Typically, one does not include raw coefficients with standard errors and Wald tests of significance.
    Table 1: Tests of Canonical Dimensions Canonical Mult. Dimension Corr. F df1 df2 p 1 0.46 11.72 15 1634.65 0.000 2 0.17 2.94 8 1186 0.003 3 0.10 2.16 3 594 0.091
    Table 2: Standardized Canonical Coefficients Dimension 1 2 Psychological Variables locus of control 0.84 -0.42 self-concept -0.25 -0.84 motivation 0.43 0.69 Academic Variables plus Gender reading 0.45 -0.05 writing 0.35 0.41 math 0.22 0.04 science 0.05 -0.83 gender (female=1) 0.32 0.54

Tests of dimensionality for the canonical correlation analysis, as shown in Table 1, indicate that two of the three canonical dimensions are statistically significant at the .05 level. Dimension 1 had a canonical correlation of 0.46 between the sets of variables, while for dimension 2 the canonical correlation was much lower at 0.17.

Table 2 presents the standardized canonical coefficients for the first two dimensions across both sets of variables. For the psychological variables, the first canonical dimension is most strongly influenced by locus of control (.84) and for the second dimension self-concept (-.84) and motivation (.69). For the academic variables plus gender, the first dimension was comprised of reading (.45), writing (.35) and gender (.32). For the second dimension writing (.41), science (-.83) and gender (.54) were the dominating variables.

Cautions, Flies in the Ointment

  • Multivatiate normal distribution assumptions are required for both sets of variables.
  • Canonical correlation analysis is not recommended for small samples.

    See Also

    • Stata Online Manual
    • References
        Afifi, A, Clark, V and May, S. 2004. Computer-Aided Multivariate Analysis. 4th ed. Boca Raton, Fl: Chapman & Hall/CRC.
     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California


     

  • 二维码

    扫码加我 拉你入群

    请注明:姓名-公司-职位

    以便审核进群资格,未注明则拒绝

    2008-2-13 02:12:00
    INCLUDE '[installdir]/Canonical correlation.sps'.
    CANCORR SET1=varlist1 /
            SET2=varlist2 / .
    The two variable lists must be separated with a slash.
    [installdir] is the installation directory.
    二维码

    扫码加我 拉你入群

    请注明:姓名-公司-职位

    以便审核进群资格,未注明则拒绝

    2008-2-14 13:04:00
    thank you very much ,^_^
    二维码

    扫码加我 拉你入群

    请注明:姓名-公司-职位

    以便审核进群资格,未注明则拒绝

    2008-2-20 14:53:00

    我打开我的Canonical correlation,在语法中加入

     

    INCLUDE'C:\Program Files\SPSS\Canonical correlation.sps'.

    CANCORR set1=conflictall IN COM OB DO AV/

            set2=sensiall ENJ RES CON ENJ ATT.

     

    可是老出现很多错误,结果运行不出来怎么回事?

    谢谢,急问!

     

    二维码

    扫码加我 拉你入群

    请注明:姓名-公司-职位

    以便审核进群资格,未注明则拒绝

    2008-3-21 22:21:00
    两组变量之间的相关分析叫典型相关分析,你可以参考张晓彤的高级教程。里面说的比较清楚。
    二维码

    扫码加我 拉你入群

    请注明:姓名-公司-职位

    以便审核进群资格,未注明则拒绝

    点击查看更多内容…
    相关推荐
    栏目导航
    热门文章
    推荐文章

    说点什么

    分享

    扫码加好友,拉您进群
    各岗位、行业、专业交流群