Tải bản đầy đủ
2 Notation, Terminology, and Software

# 2 Notation, Terminology, and Software

Tải bản đầy đủ

640

â†œæ¸€å±®

â†œæ¸€å±®

StrUctUral eQUation Modeling

 Table 16.1:â•‡ Conventional Symbols and Terminology
Diagram

Chapter
notation

Description

Conventional notation

A latent variable that is
not directly measured
(may also be called a
factor or construct)

η (eta) represents latent
endogenous variables

Observed variable (may
also be called measured
or manifest variable)

Y represents observed
endogenous variables (also
indicators of η)

Y

X represents observed
exogenous variables (also
indicators of ξ)

X

Errors or disturbances
for latent endogenous
variables

ζ (zeta) represents errors for
latent endogenous variables

D

Errors or residuals for
observed endogenous
variables

ε (epsilon) represents errors for
Y variables

E

δ (delta) represents errors for
X variables

E

F

ξ (ksi) represents latent
exogenous variables

 Figure 16.1â•‡ Observed variable path model example.

also modeled to directly affect Y2 (represented by path e). Because these variables are
illustrated pictorially with squares, they represent observed variables that are directly
measured (e.g., annual income, math score). In this example, X1 and X2 are exogenous or independent variables and Y1 and Y2 are endogenous or dependent variables.
In general, variables (observed or latent) that have arrows pointing toward them are

Chapter 16

â†œæ¸€å±®

â†œæ¸€å±®

endogenous whereas variables (observed or latent) that are directly affecting other variables and have no arrows pointing toward them are exogenous variables. Additionally,
X1 and X2 are modeled to covary because they are connected with a double-headed
arrow (represented by the covarianceÂ€fâ•›â•›).
Another concept that is important to introduce is error. As in standard regression analysis,
when a variable is endogenous, it will have explained variance (e.g., R square) as well
as unexplained variance. Thus, endogenous variables in SEM have error terms associated
with them. These are oftentimes represented via circles because they are also unobserved or
latent and have direct effects on their respective endogenous variables. Errors are generally
exogenous variables since they have direct effects on endogenous variables. In FigureÂ€16.1,
both Y1 and Y2 have errors associated with them (E1 and E2, respectively) since they are
endogenous variables. You may also notice double-headed arrows associated with exogenous variables in FigureÂ€16.1. This reflects the fact that these variables are free to vary and
that their variances will be estimated because they are not directly affected by other variables in the model. TableÂ€16.2 provides a summary of directional symbols used inÂ€SEM.
If the observed variables in FigureÂ€16.1 were instead unobserved constructs or latent
factors (e.g., self-efficacy, perceived social support), they would be pictorially illustrated with circles instead of squares. This is demonstrated in the structural equation
model or latent variable path model in FigureÂ€16.2.
 Table 16.2:â•‡ Additional Symbols inÂ€SEM
Diagram

Description

X

Y

Represents a direct effect from X to Y

X

Y

Represents a covariance between X and Y
Variances associated with exogenous variables, including
errors or disturbances associated with endogenous variables

 Figure 16.2â•‡ Latent variable path model example.

641

642

â†œæ¸€å±®

â†œæ¸€å±®

Structural Equation Modeling

F1 and F2 are exogenous or independent latent variables, and F3 and F4 are endogenous or dependent latent variables. In FigureÂ€16.2, the endogenous factors (F3 and F4)
still have errors associated with them, but they are typically differentiated from the
errors associated with observed variables and commonly called disturbances (e.g., D1
and D2, respectively).
Various software programs exist that are capable of analyzing structural equation models. Some of the programs are stand-alone software, such as LISREL, EQS, AMOS,
Mx, and Mplus. Other procedures that estimate parameters for structural equation models are available and are subsumed within larger software platforms, such as the CALIS
procedure in SAS, the SEPATH procedure in STATISTICA, the RAMONA procedure
in SYSTAT, and add-on packages in R (lavaan and sem). AÂ€review of the capabilities of
these software programs is beyond the scope of this chapter. AÂ€fairly comprehensive discussion of available SEM software programs is provided in Kline (2011) as well as the
available syntax for examples using LISREL, EQS, and Mplus. Further, Barbara Byrne
has written a chapter describing popular SEM software (Byrne, 2012b) in addition to
several books describing the analysis of structural equation models using different software, including LISREL (Byrne, 1998), EQS (Byrne, 2006), AMOS (Byrne, 2010),
and Mplus (Byrne, 2012a). To be consistent with the software used in this textbook,
software application examples in this chapter will be done using the CALIS procedure
in SAS (SPSS does not include a structural equation modeling procedure). Hence, basic
knowledge of SAS will be assumed when presenting these examples.
16.3â•‡ CAUSAL INFERENCE
Given the various types of relationships that may exist in path models, it is important to briefly discuss the issue of causal inference. By no means is this discussion
exhaustive. As such, readers should refer to seminal readings in the area (Davis, 1985;
Mulaik, 2009; Pearl, 2000; Pearl, 2012; Sobel, 1995).
As is evident in the models previously presented in this chapter, one-headed arrows
represent hypotheses concerning causal directions. For instance, in FigureÂ€16.1, X1 is
hypothesized to directly affect both Y1 and Y2. Based on theoretical bases, the implication is that X1 causes Y1 and Y2. In SEM, three conditions are generally necessary
to infer causality: association, isolation, and temporal precedence. Association simply
means that the cause and effect are observed to covary with one another. Isolation
signifies that the cause and effect continue to covary when they are isolated from other
influential variables. This condition is generally the most difficult to meet in its entirety,
but it may be closer to realization if data are collected for the variables that may influence the relationship between the cause and effect (e.g., SES) and controlled for statistically and/or with respect to design considerations (e.g., collecting data only on
females to avoid the influence of sex differences). Temporal precedence indicates that
the hypothesized cause occurs prior to the hypothesized effect in time. Temporal precedence may be accomplished by way of collecting data using methods incorporated in

Chapter 16

â†œæ¸€å±®

â†œæ¸€å±®

experimental (e.g., random assignment) or quasi-experimental designs (e.g., manipulation of treatment exposure) or collecting data for the causal variables prior to collecting
data for the outcome variables on which they are hypothesized to affect.
16.4â•‡ FUNDAMENTAL TOPICS INÂ€SEM
Before introducing the models associated with the three SEM techniques mentioned in
the opening paragraph (i.e., observed variable path analysis, CFA, and latent variable
path analysis), there are some fundamental topics in SEM that must first be outlined
because they generally apply to all of the models in the SEM arena. These topics
include model identification, model estimation, model fit, and model modification and
selection. Some of these fundamentals will be discussed again in the context of the
model being introduced for more clarity.
16.4.1â•‡Identification
Model identification is a fundamental requirement for parameters to be estimated in a
model. Before elaborating upon this important issue, it is first necessary to introduce
some relevant information concerning the data input and the parameters to be estimated in a model. FigureÂ€16.3 will be used to demonstrate these issues.
The model in FigureÂ€16.3 is a basic multiple regression model with two predictor variables (X1 and X2) and one outcome variable (Y). In SEM, procedures are performed
on the covariances among the observed variables. Thus, the input data used in SEM
analyses consists of a sample covariance matrix, which is simply the unstandardized
version of a correlation matrix. The sample covariance matrix (S) is given here for the
variables used in FigureÂ€16.3:
 200 110 115 
S = 110 220 130 
115 130 175 
 Figure 16.3â•‡ Multiple regression model with two predictors.

643

644

â†œæ¸€å±®

â†œæ¸€å±®

Structural Equation Modeling

As you know, the variances of X1, X2, and Y appear in the main diagonal of the sample
covariance matrix (noted in the covariance matrix with bold font). Here, the variances
of X1, X2, and Y are equal to 200, 220, and 175, respectively. Further, the covariances
between all possible pairs of variables appear in the off-diagonal elements of the sample covariance matrix (noted in the covariance matrix with italics). For instance, the
covariance between X1 and X2 is 110; the covariance between X1 and Y is 115; and
the covariance between X2 and Y is 130. This is what SEM software then uses during
the estimation process which will be discussed in more detail subsequently.
You can determine if a given model is identified by calculating the difference between
the number of nonredundant observations in the sample covariance matrix (â•›â•›p*) and
the number of model parameters that must be estimated (q; for a more detailed explanation of the various rules of model identification, see Bollen, 1989 and KennyÂ€ &
Milan, 2012). Nonredundant observations do not pertain to the number of participants for which data were collected. Rather, the nonredundant observations in
a sample Â�covariance matrix include the variance elements in the main diagonal of the
sampleÂ€covariance matrix and the upper or lower triangle of covariance elements
in the sample covariance matrix. In the sample covariance matrix for the model in
FigureÂ€16.3, there are three variances and three covariances. Thus, there are six nonredundant observations. The number of nonredundant observations in a sample covariance matrix can more easily be calculated with the following formula:
p* =

p ( p + 1)
2

=

3 (3 + 1)
2

= 6,

where p* is the number of nonredundant observations and p is the number of observed
variables in the model. The second quantity needed to determine if a model is identified is the number of model parameters (q) requiring estimation in SEM. This number
consists of the variances of exogenous variables (which include error and/or disturbance variances), direct effects (represented with one-headed arrows), and covariances
Three different scenarios may occur when subtracting the number of model parameters that are to be estimated from the number of nonredundant observations, resulting in three different types of models: 1) just-identified; 2) over-identified; and
3) under-identified. AÂ€ just-identified model has the same number of nonredundant
observations in the sample covariance matrix as model parameters to estimate and
is sometimes referred to as a saturated model. An over-identified model contains
more nonredundant observations in the sample covariance matrix than model parameters to estimate. Models that are just- and over-identified allow model parameters
to be estimated. However, an under-identified model, having fewer nonredundant
observations in the sample covariance matrix than parameters to estimate, does not
allow for parameters to be estimated. Thus, prior to collecting data, you should determine if your hypothesized model is identified, thus allowing you to obtain parameter
estimates.

Chapter 16

â†œæ¸€å±®

â†œæ¸€å±®

To illustrate, consider the multiple regression model in FigureÂ€16.3. To obtain the
number of parameters that will be estimated, we first observe that the variances of
X1, X2, and E1 (exogenous variables) will be estimated. In addition, the direct effect
from X1 to Y and the direct effect from X2 to Y will be estimated. Lastly, the covariance between X1 and X2 will be estimated. As a result, we have six parameters to
be estimated, and recall that there are six nonredundant observations in the sample
covariance matrix. The difference between nonredundant observations and parameters to estimate (p* − q), in this case, is 6 − 6Â€=Â€0. This value is referred to as the
degrees of freedom associated with the theoretical model (dfT). Thus, there are zero
dfT associated with our multiple regression model presented in FigureÂ€16.3, which
means that it is a just-identified model. Over-identified models will be associated with
more than zero (positive) dfT whereas under-identified models will result in negative
(less than zero) dfT.
Under-identified models are mathematically impossible to analyze because there are
an infinite set of solutions that will satisfy the structural model, which makes estimation of a unique set of model parameters unattainable. To help illustrate the notion
of under-identification, we borrow an example from Kline (2011, chap. 6) because it
nicely clarifies the concept. Consider the following equation in which we have one
known value (6) and two unknown values (a andÂ€b):
a + bÂ€=Â€6
Note that you would not be able to uniquely solve for a and b in this equation because
they could take on numerous sets of corresponding values that satisfy the equation
(i.e., that sum toÂ€6).
Further, while models that are just- and over-identified allow for parameters to be estimated, there is an important difference between these models. That is, just-identified
models reproduce the data exactly whereas there may be multiple solutions when
estimating over-identified models. Just-identified models reproduce the data perfectly because there is only one solution for the parameter estimates. Further, because
just-identified models simply reproduce the data, the model fits the data perfectly.
Thus, you cannot test the model fit of just-identified models (which is often desired)
whereas you can test the model fit of over-identified models (more to come later about
model fit). To help illustrate this point, consider the following set of equations in which
we have three known values (17, 13, and 5) and three unknown values (a, b, andÂ€c):
a + bcÂ€=Â€17
b + acÂ€=Â€13
cÂ€=Â€5
Solving for c was easy enough (because it was given). To solve for the remaining
unknown values, however, you will have to revisit the algebra course you took during
your high school days. Here are the solved values:

645

646

â†œæ¸€å±®

â†œæ¸€å±®

Structural Equation Modeling

aÂ€=Â€2
bÂ€=Â€3
cÂ€=Â€5
Note that there is only one solution for a, b, and c values that will satisfy the three
equations (as in a just-identified model).
When estimating parameters in over-identified models, you will not be able to solve
the equations so easily. For instance, suppose that you instead hypothesized that X1
directly affected X2, which in turn directly affected Y (as opposed to the multiple
regression model). This would render the model illustrated in FigureÂ€16.4.
Again, there are six non-redundant observations in the sample covariance matrix:
p* =

p ( p + 1)
2

=

3 (3 + 1)
2

=6

Note that the number of nonredundant observations is the same as before because
we have three observed variables. Although you are hypothesizing a different causal
relationship between the three variables, you are using the same sample covariance
matrix. What does change, however, is the number of parameters to estimate given
that the hypothesized model differs from the previous one. With this model, there are
now five parameters to estimate, including the variances of X1, E1, and E2 in addition
to the two direct effects, one from X1 to X2 and the other from X2 to Y. Thus, the dfT
associated with this model is 6 − 5Â€=Â€1, resulting in an over-identified model.
To help illustrate the difficulty with solving for unknown values in over-identified
models, consider the following set of equations with three known values (5, 4, and 18)
and two unknown values (a andÂ€b):
aÂ€=Â€5
bÂ€=Â€4
abÂ€=Â€18
As seen from this example, you would not be able to solve for values of a and b that
would reproduce the data perfectly (i.e., abÂ€=Â€4 × 5 ≠ 18). Consequently, a criterion is
 Figure 16.4:â•‡ Over-identified observed variable path model.

Chapter 16

â†œæ¸€å±®

â†œæ¸€å±®

necessary to determine which estimates are the most optimal estimates for the model.
between observed and predicted Y values are minimized when solving for the intercept
and slope values. AÂ€similar concept is implemented in SEM. The underlying principle
in SEM analysis is to minimize the discrepancy between the elements in the sample covariance matrix and the corresponding elements in the covariance matrix that
is implied (or reproduced) by the hypothesized model. More specifically, structural
equation models are tested to determine how well they account for the variances and
covariances among the observed variables.
16.4.2â•‡Estimation
Consider the basic equation used in structural equation procedures:
Σ = Σ (θ ) ,
where Σ is the population covariance matrix for p observed variables, θ is the vector
containing model parameters, and Σ (θ) is the covariance matrix implied by the function of model parameters (θ) (Bollen, 1989). In applications of structural equation
modeling, the population covariance matrix (Σ ) is unknown and is estimated by the
sample covariance matrix (S). The unknown model parameters (θ ) are also estimated
θ by minimizing a discrepancy function between the sample covariance matrix (S)

()

and the implied covariance matrix Σ (θ) :
F S, Σ (θ ).

Substituting the estimates of the unknown model parameters in Σ (θ) results in the
 = Σ θ . The unknown model parameters are estimated
implied covariance matrix, Σ
to reduce the discrepancy between the implied covariance matrix and the sample
covariance matrix. An indication of the discrepancy between the sample covariance
matrix and the implied covariance matrix may be deduced from the residual matrix:

()

(S - Σ ) ,
with values closer to zero indicating better fit of the structural model to the data
(Bollen, 1989).
Estimation of model parameters in SEM is an iterative process that begins with initial
structural model parameter estimates, or starting values, which are either generated
by the model fitting software package or provided by the user. Depending upon these
values, the model fitting program will iterate through sequential cycles that calculate improved estimates. That is, the elements of the implied covariance matrix that
are based on the parameter estimates from each iteration will become closer to the

647

648

â†œæ¸€å±®

â†œæ¸€å±®

Structural Equation Modeling

elements of the observed covariance matrix (Kline, 2011). The discrepancy function
is fundamentally the sum of squared differences between respective elements in the
 and will result
sample covariance matrix (S) and the implied covariance matrix Σ
in a single value.

()

If the structural model is just-identified, the discrepancy function will equal zero
because the elements in the sample covariance matrix will exactly equal the elements
 (Bollen, 1989). For instance, the residual
in the implied covariance matrix, S = Σ
matrix associated with the just-identified model in FigureÂ€16.3 would be calculated as
follows:

(

 200 110 115  200 110 115 0 0 0 

S - Σ = 110 220 130  - 110 2220 130  = 0 0 0 
115 130 175 115 130 175 0 0 0 

)

For over-identified structural models, the sample covariance matrix will not equal the
implied covariance matrix, but the estimation iterations will proceed until the difference between the discrepancy function in one iteration to the next falls below a specified default value used in the software program (e.g., < .00005 in Mplus) or until the
maximum number of iterations has been reached (Kline, 2011). For instance, after
estimating parameters for the over-identified model in FigureÂ€16.4, the residual matrix
would be calculated as follows:

(

 200 110 115  200 110 65   0 0 50 

S - Σ = 110 220 130  - 110 2200 130  =  0 0 0 
115 130 175  65 130 175  50 0 0 

)

()

 equal their respecNotice how all of the values in the implied covariance matrix Σ
tive values in the sample covariance matrix (S) with the exception of the covariance
between X1 and Y. Specifically, the relationship (or covariance) between X1 and Y
was not fully explained by the hypothesized model as compared to the remaining relationships (or variances and covariances). This is a consequence of not modeling X1
as directly affecting Y in the model. Accordingly, the difference between respective
elements will not equal zero for this relationship in the residual matrix.
SEM software programs have default settings for the maximum number of iterations
allowed during the estimation process. When the number of iterations necessary to
obtain parameter estimates exceeds the maximum number of iterations without reaching the specified minimum difference between the discrepancy function from one iteration to the next, the estimates have failed to converge on the parameters. That is, the
estimation process failed to reach a solution for the parameter estimates. Nonconvergent solutions may provide unstable parameter estimates that should not be considered
reliable. Nonconvergence may be corrected by increasing the maximum number of
iterations allowed in the SEM software, changing the minimum difference stopping

Chapter 16

â†œæ¸€å±®

â†œæ¸€å±®

criteria (e.g., < .0005) between iterations, or providing start values that are closer to the
initial estimates of the model parameter estimates. If these corrections do not result in
a convergent solution, other issues may need to be addressed, such as sample size and
model complexity (Bollen, 1989).

()

Recall that the unknown model parameters (θ) are estimated θ by minimizing a
discrepancy function. Different types of discrepancy functions may be used during the
estimation process. Again, a discrepancy function, sometimes referred to as a loss or fit
function, basically reflects the sum of squared differences between respective elements
 . However,
in the sample covariance matrix (S) and the implied covariance matrix Σ
the various estimators currently available implement different matrix weighting procedures while calculating these differences (see Bollen, 1989, and LeiÂ€& Wu, 2012, for
more detailed explanations concerning estimation procedures).

()

The most widely employed discrepancy function in structural equation modeling, and
usually the default discrepancy function in structural equation modeling software (e.g.,
LISREL, EQS, AMOS, and Mplus), is the maximum likelihood (ML) discrepancy
function (see FerronÂ€& Hess, 2007 for a detailed example using ML estimation). ML
estimation is based on the assumption of multivariate normality among the observed
variables and is often referred to as normal theory ML. The popularity of the ML discrepancy function is evident when considering the following strengths of the estimators’ properties. Under small sample size conditions, ML estimators may be biased,
although they are asymptotically unbiased. Thus, as sample size increases, the expected
values of the ML estimates represent the true values in the population. The ML estimator is also consistent, meaning that as sample size approaches infinity, the probability
that the estimate is close to the true value becomes larger (approaches 1.0). Another
essential property of ML estimators is asymptotic efficiency. That is, the ML estimator
has the lowest asymptotic variance among a class of consistent estimators. Further, the
ML estimator is scale invariant in that the values of the ML discrepancy function will
be the same for any change in the scale of the observed variables (Bollen, 1989).
Another normal theory estimator is generalized least squares (GLS; for a review of
GLS, see Bollen, 1989). When the assumption of multivariate normality is met, ML
and GLS estimates are asymptotically equal. Thus, as sample size increases, the estimates produced by GLS are approximately equal to the estimates produced by ML.
However, ML estimation has been shown to outperform GLS estimation under model
misspecification conditions (Olsson, Foss, Troye,Â€& Howell, 2000).
Under violations of the assumption associated with multivariate normality, the parameters estimated by ML are generally robust and produce consistent estimates (BeauducelÂ€& Herzberg, 2006; DiStefano, 2002; Dolan, 1994). However, skewed and kurtotic
distributions may sometimes render an incorrect asymptotic covariance matrix of
parameter estimates (Bollen, 1989). Further, increased levels of skewness (e.g., greater
than 3.0) and/or kurtosis (e.g., greater than 8.0) largely invalidates the property of
asymptotic efficiency associated with the estimated parameters, producing inaccurate

649

650

â†œæ¸€å±®

â†œæ¸€å±®

Structural Equation Modeling

model test statistics (Kline, 2011). Consequently, observed variables with nonnormal
distributions may affect statistical significance tests of overall model fit as well as the
consistency and efficiency of the estimated parameters.
Other discrepancy functions that produce asymptotically efficient estimators have
been proposed that do not require multivariate normality among the observed variables. One of these discrepancy functions is the weighted least squares (WLS) function
(Browne, 1984), also referred to as asymptotically distribution free (ADF) estimation
(see Browne, 1984, and MuthénÂ€ & Kaplan, 1985, for more information concerning
WLS estimation). WLS was proposed as an efficient estimator for any arbitrary distribution of observed variables, including ordered categorical variables (Browne, 1984).
Although WLS estimation has been shown to be efficient and more consistent than ML
estimation under the presence of nonnormality among categorical variables (MuthénÂ€&
Kaplan, 1985), the performance of WLS estimation in other studies has been questionable under certain conditions. For instance, model fit tests associated with WLS
have been shown to reject the correct factor model too frequently, even under normal
distributions at small sample sizes (Hu, Bentler,Â€& Kano, 1992), and increasingly overestimate the expected value of the model fit test statistic as nonnormality and model
misspecification increase (Curran, West,Â€& Finch, 1996). Although WLS has demonstrated better efficiency under nonnormal distributions than ML (Chou, Bentler,Â€ &
Satorra, 1991; MuthénÂ€& Kaplan, 1985), WLS efficiency is adversely affected under
conditions of increasing nonnormality, small sample sizes, and large model size
(MuthénÂ€& Kaplan, 1992). Thus, WLS estimation requires very large (and possibly
inaccessible) sample sizes (approximately 2,500 to 5,000) for accurate model fit tests
and parameter estimates (FinneyÂ€& DiStefano, 2006; Hu et al., 1992; Loehlin, 2004).
In addition, WLS estimation is more computationally intensive than other estimation
procedures due to taking the inverse of a full weighting matrix, which increases in size
as the number of observed variables increases (Loehlin, 2004).
Robust WLS approaches were subsequently developed in order to correct for the difficulties inherent with full WLS estimation (see Muthén, du Toit,Â€& Spisic, 1997, and
JöreskogÂ€ & Sörbom, 1996, for more information concerning robust WLS). Generally, these approaches use a diagonal weight matrix instead of a full weight matrix.
Robust WLS has been shown to outperform full WLS with respect to chi-square test
and parameter estimate accuracy (FloraÂ€& Curran, 2004; ForeroÂ€& Maydeu-Olivares,
2009).
As discussed earlier, model test statistics and the standard errors of the parameter estimates may become biased under increased conditions of nonnormality when using
normal theory estimators, such as ML estimation (HooglandÂ€& Boomsma, 1998; Hu
et al., 1992). While nonnormal theory estimators (e.g., WLS) and their robust counterparts may be implemented, another alternative is to implement the Satorra and
Bentler (1994) scaling correction that adjusts the model test statistic (i.e., a chi-square
test statistic, χ2) to provide a chi-square test statistic that more closely approximates
the chi-square distribution and adjusts the standard errors to be more robust when