確認的因子分析の際のモデルの識別についての調べ物

必要があってモデルの識別についての調べ物をしたその覚書。使うデータはおなじみHolzingerSwineford1939。

潜在変数間に相関を仮定しない次のようなモデルを作ってみる。

library(lavaan)

dat <- HolzingerSwineford1939

model1 <-"
A =~ x1 + x2
B =~ x4 + x5 + x6
C =~ x7 + x8 + x9
A ~~ 0*B
A ~~ 0*C
B ~~ 0*C
"

res1 <- cfa(model1,dat=dat)

このときに、観測変数が2つだけ（潜在変数Aの部分）だとモデルが識別できないとエラーを吐く。

警告メッセージ:
lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  で:
 lavaan WARNING:
   Could not compute standard errors! The information matrix could
   not be inverted. This may be a symptom that the model is not
   identified.

エラーメッセージの通り標準誤差が算出されない。

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  A =~                                                
    x1                1.000                           
    x2                0.661       NA                  
  B =~                                                
    x4                1.000                           
    x5                1.133       NA                  
    x6                0.924       NA                  
  C =~                                                
    x7                1.000                           
    x8                1.225       NA                  
    x9                0.854       NA

ミューテン先生によると次のようなことが原因だと。（これはbifactorモデルについてのところでの説明）

When specific factors have only 2 indicators you cannot identify the loading for the second of those indicators. Think of the specific factor as absorbing a residual correlation between those 2 indicators - there is only 1 such correlation and therefore you can only identify 1 parameter, in this case the specific factor variance.

Bifactor Model Problems

これを潜在変数間に相関を仮定した次のようなモデルにすると識別エラーは解決する。

model2 <-"
A =~ x1 + x2
B =~ x4 + x5 + x6
C =~ x7 + x8 + x9
"

res2 <- cfa(model2,dat=dat)

ちゃんと標準誤差も算出できている。

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  A =~                                                
    x1                1.000                           
    x2                0.438    0.129    3.401    0.001
  B =~                                                
    x4                1.000                           
    x5                1.113    0.065   17.073    0.000
    x6                0.923    0.055   16.708    0.000
  C =~                                                
    x7                1.000                           
    x8                1.180    0.165    7.175    0.000
    x9                1.018    0.141    7.205    0.000

潜在変数が無相間だから、Aの部分だけで考えると、p(p+1)/2 = 3 （pは観測変数の数）で、ここから推定する母数の数を引いた値がプラスにならないとモデルが識別されないこととなる。で、最初のモデル1だと潜在変数からのパス1本と残差の分散２つでの3つの母数を推定しているので, 3-3=0となりこれがいかんようだ。制約を例えば次のように足してみると無事に識別される。

model3 <-"
A =~ x1 + x2
B =~ x4 + x5 + x6
C =~ x7 + x8 + x9
A ~~ 0*B
A ~~ 0*C
B ~~ 0*C
x1~~1*x1
"

res3 <- cfa(model3,dat=dat)

標準誤差もちゃんと計算されている。

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  A =~                                                
    x1                1.000                           
    x2                1.137    0.333    3.410    0.001
  B =~                                                
    x4                1.000                           
    x5                1.133    0.067   16.906    0.000
    x6                0.924    0.056   16.391    0.000
  C =~                                                
    x7                1.000                           
    x8                1.225    0.190    6.460    0.000
    x9                0.854    0.121    7.046    0.000