全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
2851 15
2014-03-29
Hello, I was asked to do a factor analysis of 40 variables but I only have 70 cases. Needless to say, I had to increase iterations to 100 to get the program to converge and I still believe that it makes no sense to do a factor analysis with less than 2 cases per variable. I was then asked to provide a citation for that. Could someone point me to a source discussing the minimum case per variable requirement for factor analysis that I can cite? Thanks a lot.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-3-29 10:10:25
Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-29 10:10:39
Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003.  Washington DC.  Page 100.    Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables."
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-29 10:11:10
I would look at the article by McCallum et al in Psycholgical Methods as well as some in MBR that show problems with rules of thumb for EFA......one needs to take into account scaling issues, over/under determination, communalities/saturation, etc..........

Robert Marshall <marshall_pmp@comcast.net> wrote:  Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-29 10:12:28
I do not remember a specific citation, but the general idea is that factor analysis is a derivation of regression, and regression rests on the normal distribution of estimation errors. This normal distribution of estimation errors is known as "the law of large numbers" and is a tendency shown by errors as N gets larger and larger. More exactly, as the "degrees of freedom" get larger. The degrees of freedom equal number of cases minus number of variables, N-k-1, which in your case is quite small. As the number of cases are few, the margin of error of your estimates will be very wide, and you could not be sure of their probable true value in the universe or population, especially for minor factors after the first or second one, where the coefficients or loadings will be close to zero (and there may therefore be difficult to tell whether they are not zero in the population).

An old rule of thumb says you need at the very least 10 cases per variable, but this is "the very least". With less than 30-50 cases experimental error distributions hardly (or very infrequently) resemble a normal curve. So my advise is you try a model with fewer variables, possibly one underlying factor if your 40 variables are mostly explained by one overarching factor, or abandon factor analysis altogether and try some more modest approaches like a simple summatory scale, simple regression, 2 or 3 way cross tabulations, and the like. Next time, go bigger in your sample design. And then again, do you really have a theory that is so complex that no less than 40 independent factors are required by it? Isaac Newton explained the universe with only two or three variables, and did very well indeed, thank you.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-29 10:13:39
I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction.

So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents.Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year.

In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements  requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while
retaining maximum diversity of responses?
From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not
additive.

Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy
reinforces degree of satisfaction with that particular factor, and the low
between-group correlation assures that different aspects of satisfaction
are represented.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群