いきなり因子分析（その3）：様々な推定法を試してみた

以前からの続きで因子分析を試してみるシリーズ。

その1

その2

今回はいろいろな推定方法について。あいもかわらず清水先生のスライドを頼りにしながら試してみる。

Rで因子分析　商用ソフトで実行できない因子分析のあれこれ

データセットは前回、前々回と同じでlavaanに入っているHolzingerSwineford1939を利用する。

psychパッケージ

一番最初の記事で、デフォルトで入っているfactanal()という関数を用いたが、この関数は最尤法しか使えないらしい。違った推定方法ができるものとしてpsychのfa()という関数があるようだ。

あと、因子分析を実行する際にGPArotationというパッケージも必要になるようなのでその2つをインストールして実行してみる。

library(psych)
library(GPArotation)

result <- fa(dat[7:15],
             nfactor = 3,
             fm = "minres",
             rotate= "promax",
             use = "pairwise"
             )

rという引数にデータを指定する。今回のようにローデータを突っ込んでもよいし、相関行列を突っ込んでもよいらしい。

nfactorは因子数である。前回記事で3つがおすすめという結論が出たのでそれを使う。

fmの引数に推定法を指定する。最小残差法minres、最尤法ml、反復主因子法pa、一般化最小二乗法gls、重みつき最小二乗法wlsといったように色々指定できる。（知らないものが多い）

useの引数は欠損値の処理でデフォルトはペアワイズ。Helpをみると、cor()と同じ引数が指定できるよと書いてある。

結果を見てみよう。

print(result, digits = 3)

Factor Analysis using method =  minres
Call: fa(r = dat[7:15], nfactors = 3, rotate = "promax", fm = "minres",
    use = "pairwise")
Standardized loadings (pattern matrix) based upon correlation matrix
      MR1    MR2    MR3    h2    u2  com
x1  0.157  0.039  0.598 0.477 0.523 1.15
x2  0.010 -0.120  0.531 0.255 0.745 1.10
x3 -0.109  0.028  0.699 0.453 0.547 1.05
x4  0.849  0.007  0.005 0.728 0.272 1.00
x5  0.895  0.005 -0.078 0.754 0.246 1.02
x6  0.804 -0.013  0.072 0.691 0.309 1.02
x7  0.047  0.759 -0.229 0.519 0.481 1.19
x8 -0.050  0.710  0.060 0.520 0.480 1.02
x9  0.001  0.476  0.344 0.460 0.540 1.82

                        MR1   MR2   MR3
SS loadings           2.211 1.328 1.318
Proportion Var        0.246 0.148 0.146
Cumulative Var        0.246 0.393 0.540
Proportion Explained  0.455 0.273 0.271
Cumulative Proportion 0.455 0.729 1.000

 With factor correlations of
      MR1   MR2   MR3
MR1 1.000 0.257 0.391
MR2 0.257 1.000 0.350
MR3 0.391 0.350 1.000

Mean item complexity =  1.2
Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are  36  and the objective function was  3.053 with Chi Square of  904.097
The degrees of freedom for the model are 12  and the objective function was  0.077

The root mean square of the residuals (RMSR) is  0.019
The df corrected root mean square of the residuals is  0.033

The harmonic number of observations is  301 with the empirical chi square  7.867  with prob <  0.795
The total number of observations was  301  with Likelihood Chi Square =  22.555  with prob <  0.0317

Tucker Lewis Index of factoring reliability =  0.9633
RMSEA index =  0.0553  and the 90 % confidence intervals are  0.0157 0.0882
BIC =  -45.93
Fit based upon off diagonal values = 0.996
Measures of factor score adequacy             
                                                    MR1   MR2   MR3
Correlation of (regression) scores with factors   0.944 0.859 0.848
Multiple R square of scores with factors          0.892 0.738 0.719
Minimum correlation of possible factor scores     0.783 0.477 0.439

上のブロックには因子負荷量が表示されている。h2の列が共通性でu2が独自性。デフォルトの関数と違ってどっちも表示されるので便利。comというのは複雑性というもので1に近いほどその変数は単純構造に近いらしい。上の観測変数だとx9は因子２と因子３でどっちつかずなため複雑性が1から離れている。

その下のブロックには因子寄与や因子寄与率、累積因子寄与などが表示されている。説明率と累積説明率もその下に表示される。

次のブロックは因子間相関が表示される。ちなみに直行回転だとこれは表示されない。

そのあとは複雑性の平均が表示されている。きっと観測変数のx9を削除するともう少しあがるのかな？

下の方には、様々な適合度の指標が順番に表示される。χ2乗値、RMSR、Tucker Lewis Index、、RMSEA、BICなど。

RMSEAだと0.050以下が良好だとか、TuckerLewisIndexは1に近いほどよいだとか指標ごとにあてはまりの程度を教えてくれる。BICは同一データからのモデル間の比較などに使われて小さいほど良いとか言われているもの。（どうやって算出しているかは調べたことがないから知らない）

違った推定法を比較してみる

fa()の用法の説明を終えたところで、推定法だけを変えてみて結果がどう変わるのかを見てみる。

result_minres <- fa(r = dat[7:15],
                    nfactor = 3,
                    fm = "minres",
                    rotate= "promax",
                    use = "pairwise"
                    )
result_ml <- fa(r = dat[7:15],
                nfactor = 3,
                fm = "ml",
                rotate= "promax",
                use = "pairwise"
                )
result_gls <- fa(r = dat[7:15],
                 nfactor = 3,
                 fm = "gls",
                 rotate= "promax",
                 use = "pairwise"
                 )
print(result_minres, digits = 3)
print(result_ml, digits = 3)
print(result_gls, digits = 3)
print(result_wls, digits = 3)
print(result_pa, digits = 3)

最小残差法

	MR1	MR2	MR3	h2	u2	com
x1	0.157	0.039	0.598	0.477	0.523	1.15
x2	0.010	-0.120	0.531	0.255	0.745	1.1
x3	-0.109	0.028	0.699	0.453	0.547	1.05
x4	0.849	0.007	0.005	0.728	0.272	1
x5	0.895	0.005	-0.078	0.754	0.246	1.02
x6	0.804	-0.013	0.072	0.691	0.309	1.02
x7	0.047	0.759	-0.229	0.519	0.481	1.19
x8	-0.050	0.710	0.060	0.520	0.480	1.02
x9	0	0.476	0.344	0.460	0.540	1.82

最尤法

	ML1	ML2	ML3	h2	u2	com
x1	0.153	0.036	0.609	0.487	0.513	1.13
x2	0.013	-0.116	0.525	0.251	0.749	1.1
x3	-0.115	0.029	0.703	0.457	0.543	1.06
x4	0.844	0.005	0.010	0.721	0.279	1
x5	0.898	0.007	-0.082	0.757	0.243	1.02
x6	0.807	-0.011	0.068	0.695	0.305	1.01
x7	0.044	0.743	-0.215	0.498	0.502	1.17
x8	-0.049	0.722	0.049	0.531	0.469	1.02
x9	0	0.479	0.335	0.457	0.543	1.79

一般化最小二乗法

	GLS1	GLS2	GLS3	h2	u2	com
x1	0.154	0.036	0.613	0.492	0.508	1.13
x2	0.011	-0.117	0.525	0.251	0.749	1.1
x3	-0.108	0.028	0.696	0.451	0.549	1.05
x4	0.845	0.007	0.009	0.723	0.277	1
x5	0.902	0.007	-0.081	0.766	0.234	1.02
x6	0.807	-0.013	0.073	0.697	0.303	1.02
x7	0.049	0.741	-0.219	0.497	0.503	1.18
x8	-0.051	0.728	0.054	0.542	0.458	1.02
x9	0	0.483	0.347	0.469	0.531	1.82

重み付き最小二乗法

	WLS1	WLS2	WLS3	h2	u2	com
x1	0.156	0.038	0.603	0.482	0.518	1.14
x2	0.011	-0.118	0.528	0.253	0.747	1.1
x3	-0.110	0.028	0.701	0.456	0.544	1.05
x4	0.846	0.007	0.007	0.723	0.277	1
x5	0.896	0.006	-0.080	0.756	0.244	1.02
x6	0.806	-0.013	0.071	0.693	0.307	1.02
x7	0.047	0.746	-0.222	0.502	0.498	1.18
x8	-0.051	0.720	0.056	0.531	0.469	1.02
x9	0	0.477	0.342	0.459	0.541	1.81

反復主因子法

	PA1	PA2	PA3	h2	u2	com
x1	0.157	0.039	0.598	0.477	0.523	1.15
x2	0.010	-0.120	0.531	0.256	0.744	1.1
x3	-0.109	0.028	0.699	0.453	0.547	1.05
x4	0.850	0.007	0.005	0.728	0.272	1
x5	0.894	0.005	-0.078	0.753	0.247	1.02
x6	0.805	-0.013	0.072	0.692	0.308	1.02
x7	0.047	0.754	-0.226	0.512	0.488	1.19
x8	-0.050	0.714	0.059	0.524	0.476	1.02
x9	0	0.477	0.344	0.461	0.539	1.82

どれもそう大きく数値的に違いはなさそうである。じゃあ、どれ使えばいいんだよという声があるかもしれないが、清水先生のHPによれば、ファーストチョイスは最尤法で良いらしい。サンプルサイズが小さくて最尤法をあきらめるとなったときに、最小二乗法、反復主因子法などが候補にあがってくるそうだ。ここらへんは、不適解が出るかであったり、収束するかであったりを見ながら決めていくと良いとのことである。

おまけ

調べ物をする最中で次の論文は良さそうだと感じた。

Current Methodological Considerations in Exploratory and Confirmatory Factor Analysis

Schmittという人が2011年に書いたもの。Abstractには、因子分析を行う際には色々決めなければあると紹介した後に次のように書いている。

Unfortunately, researchers continue to use outdated methods in each of these areas.The present article provides a current overview of these areas in an effort to provide researchers with up-to-date methods and considerations in both exploratory and confirmatory factor analysis.

サンプルサイズ・因子数の決定・推定法・回転など何をベースに考えて行けば良いのか注意書きのようなものがあり、因子分析を使う人にとっては役に立つ内容だろう。