Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017
Learning objectives 1. Get familiar with the basic idea of structural equation modeling (SEM) 2. Identify special characteristics of SEM 3. Develop understanding of the latent variable constructs 4. Understand the process of SEM 5. Learn to estimate and interpret the results of measurement (CFA) and structural models in SEM 2
Part I: Structural equation modeling (SEM) overview
Research question What are the antecedents and consequences of emotional exhaustion of the individuals who do people work? Organizational commitment Role ambiguity Role conflict Emotional exhaustion Job satisfaction Performance Intention to leave Babakus, E., Cravens, D. W., Johnston, M., & Moncrief, W. C. (1999). The Role of Emotional Exhaustion in Sales Force Attitude and Behavior Relationships. Journal of the Academy of Marketing Science, 27(1), 58-70. 4
How many regression analyses do you need to estimate this model? Organizational commitment Role ambiguity Role conflict Emotional exhaustion Job satisfaction Performance Intention to leave SEM can do that in one model Babakus, E., Cravens, D. W., Johnston, M., & Moncrief, W. C. (1999). The Role of Emotional Exhaustion in Sales Force Attitude and Behavior Relationships. Journal of the Academy of Marketing Science, 27(1), 58-70. 5
What is SEM? a multivariate statistical technique which combines (confirmatory) factor analysis and multiple regression modeling Can simultaneously test measures and structural relationships for the purpose of analyzing hypothesized relationships Tests models that are conceptually derived, a priori Tests if the theory fits the data among latent (i.e. unobserved or theoretical) variables measured by manifest variables (i.e. observed or empirical indicators) SEM encompasses an entire family of models known by many names, e.g. covariance structure analysis, latent variable analysis, confirmatory factor analysis, LISREL analysis 6
Components of SEM SEM typically consists of two parts (or sub-models): The measurement model specifies how latent variables depend upon or are indicated by the observed variables describes the measurement properties (reliabilities and validities) of the observed variables The structural equation model specifies causal relationships among the latent variables describes the causal effects assigns the explained and unexplained variance 7
Why should we use SEM? The key benefits of SEM are: 1. Estimation of multiple and interrelated dependence relationships via series of separate, but interdependent, multiple regression equations 2. Ability to model both observed and unobserved (latent) variables and account for measurement error in the estimation process (parameter estimates closer to population values) 3. Ability to define a model to explain the entire set of relationships simultaneously 8
Latent Variables A latent variable is an unobserved concept that can only be approximated by observable or measurable variables (i.e. happiness, satisfaction, emotional exhaustion), often called factor. The observed variables, which are gathered from respondents through various data collection methods, are known as indicators or manifest variables. REFLECTIVE FORMATIVE Factor Manifest variables/ indicators Factor Manifest variables/ indicators The latent variable is viewed as an underlying construct that gives rise to something that is observed (i.e. an observed variable). presumed cause of item values The latent variable is viewed as a summary (weights of the relative importance) of the observed variables. Changes in the indicators cause change in the latent variable. Summary of the measurement 9
Example: Which causal direction is more accurate? The tasks that I do at work are enjoyable I really think that my job is meaningful The tasks that I do at work are themselves an important driving force to me My job is so interesting that it is a motivation in itself Intrinsic job motivation The tasks that I do at work are enjoyable I really think that my job is meaningful The tasks that I do at work are themselves an important driving force to me My job is so interesting that it is a motivation in itself Do values of the indicators cause the latent variable (intrinsic job motivation)? or Does the latent variable cause the values of the indicators?
Measurement Error No matter how concrete we think our variables are, they always contain some error when we try to measure them. Measurement error is that proportion of the variable which our measure is unable to capture for various reasons (systematic or random). It is vital to consider the amount of error in our measurement, no matter how confident we are that we have got it right. However, in all other multivariate techniques we assume there is no error in variables. Delta an error term associated with an estimated, measured x-variable 11
Measurement Error Impact & Correction The impact of measurement error: βyx = βs * ρx βyx observed regression coefficient βs true structural coefficients ρx reliability of the predictor variable Unless the reliability is 100%, the observed correlation (and resulting regression coefficient) will always understate the true relationship SEM accounts for or corrects for the amount of measurement error in the variables (latent constructs) and estimates what the relationship would be if there was no measurement error. βs = βyx / ρx Due to this correction, SEM regression coefficients are more accurate (closer to population value) and tend to be larger than coefficients obtained with multiple regression analysis. 12
Types of Relationships 1. Correlation 2. Dependence δ1 x1 δ1 x1 δ2 x2 Role ambiguity δ2 x2 Role ambiguity δ3 δ4 x3 x4 δ3 δ4 x3 x4 Emotional exhaustion δ5 x5 Role conflict δ5 x5 Role conflict y1 y2 y3 δ6 x6 δ6 x6 ε1 ε2 ε3 13
Types of Variables y4 y5 y6 Exogenous variables Endogenous variables x1 x2 Role ambiguity Organizational commitment x3! Emotional exhaustion Job satisfaction Intention to leave x4 x5 Role conflict y1 y2 y3 y7 y8 y9 y10 y11 y12 x6 Determined by constructs within the model 14
Theory in SEM Modeling Importance of theory: SEM model should not be developed without underlying theory! SEM analyses should be dictated first and foremost by a strong theoretical base. Theory implies consequences, some of which are tested vs. data. Refuting any consequences refutes the theory (i.e. SEM is primarily a confirmatory method) Testing theory-based models: Model implies a pattern in the covariance matrix Under multiple regression assumptions intact, we can compare model-implied covariance matrix (i.e. the one based on the model we have developed) with empirical covariance matrix (i.e. the one based on the collected data) If the difference in covariance matrices is non-significant, we confirm the hypothesized theoretical relationships (χ2 test) 15
Additional Considerations Causation evidence: Covariation SEM can determine significant covariation between the cause and effect constructs; Sequence causation in the temporal sequence could be provided through experimental or longitudinal research design; Nonspurious covariation the size and nature of the relationship between the cause and the effect should not be affected by including other constructs (variables) in the model (Support=>Job satisfaction and Work environment) Theoretical support compelling theoretical rationale to support a cause-and-effect relationship Input matrix: SEM differs from other multivariate techniques in that it uses only the variance-covariance or correlation matrix as its input data. Individual observations can be input into the programs, but they are converted into one of these two types of matrices before estimation. The focus of SEM is on the pattern of relationships across respondents. Number of indicators: The minimum number of indicators for a construct is one, but the use of only a single indicator requires the researcher to provide estimates of reliability A construct can be represented with two indicators, but three is the preferred minimum number of indicators because using only two indicators increases the chances of reaching an infeasible solution There is no upper limit in terms of the number of indicators. In practice, 5-7 indicators should represent most constructs 16
Part II: Confirmatory Factor Analysis (CFA)
What is CFA? Confirmatory factor analysis tests the extend to which a researcher s a-priori, theoretical pattern of factor loadings on prespecified constructs represents the actual data (i.e. confirms or rejects our preconceived theory) The factors are assigned based on the researcher s prior theoretical knowledge (statistical technique does not assign variables to factors like in Exploratory Factor Analysis) Each measured variable loads only on one pre-defined factor Cross-loadings are not assigned CFA provides information about the validities and reliabilities of the observed indicators 18
Steps in CFA 1. Developing a theoretically based model 2. Constructing a path diagram of causal relationships 3. Converting the path diagram into a set of measurement model 4. Choosing the input matrix type 5. Assessing the identification 6. Evaluating goodness-of-fit criteria 7. Interpreting and modifying the model (if theoretically justified) 19
Assumptions of the path diagram 1. All causal relationships are indicated. Theory is the basis for inclusion or omission of any relationship. It is just as important to justify why a causal relationship does not exist between two constructs as it is to justify the existence of another relationship. Yet it is important to remember that the objective is to model the relationships among constructs with the smallest number of causal paths or correlations among constructs that can be theoretically justified (parsimonious). 2. All causal relationships are assumed to be linear. Nonlinear relationships cannot be directly estimated in structural equation modeling, but modified structural models can approximate nonlinear relationships. Assumption of linearity of the relationships requires all other assumptions for multivariate analysis to hold true 20
CFA Measurement Model Example Role ambiguity Role conflict Emotional exhaustion Job satisfaction Intention to leave x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 y7 y8 y9 δ1 δ2 δ3 δ4 δ5 δ6 ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8 ε9 x1-x3 correlate, but correlation is zero if we identify a common cause of x1-x3, i.e. Role ambiguity Indicators are unidimensional, error terms should not be correlated First, we test the measures (we can fix it) Second, we test the theory between the constructs 21
SIMPLIS translation of the Model Role ambiguity Role conflict Emotional exhaustion Job satisfaction Intention to leave x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 y7 y8 y9 δ1 δ2 δ3 δ4 δ5 δ6 ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8 ε9 ROLAM1 = 1*Rolam ROLCONF1 = 1*Rolconf EMEXH1 = 1*Emexh JOBSAT1 = 1*Jobsat LEAV1 = 1*Leav ROLAM2 = Rolam ROLAM3 = Rolam ROLCONF2 = Rolconf ROLCONF3 = Rolconf EMEXH2 = Emexh EMEXH3 = Emexh JOBSAT2 = Jobsat JOBSAT3 = Jobsat LEAV2 = Leav LEAV3 = Leav e.g. Rolam = W1(ROLAM1) + W2(ROLAM2) + W3(ROLAM3) 22
Example: Measurement model General selfefficacy Tertius iungens orientation Affective commitment x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x1 x2 x3 x4 δ1 δ2 δ3 δ4 δ5 δ6 δ1 δ2 δ3 δ4 δ5 δ1 δ2 δ3 δ4 23
And then, let s do it with Amos! Opening Amos Open your dataset (Role clarity data) in SPSS Choose Analyze IBM SPSS AMOS 22 (Creating a new model) only if work area isn t open already From the menus, choose File new Specify a dataset From the menus, choose File Data Files Choose the dataset or click File name to find another dataset and then click ok 24
Drawing variables Start by drawing an unobserved indicator for GSE From the menus, choose Diagram Draw unobserved Draw an oval in the drawing area Next, draw five rectangles to represent the observed indicators for GSE From the menus, choose Diagram Draw observed Draw five rectangles in the drawing area Finally, draw five unobserved indicators for error variances of GSE s indicators From the menus, choose Diagram Draw unobserved Draw five ovals in the drawing area 25
It should look like this: 26
Specifying the model Name the variables Click a rectangle with the right mouse button Object properties in the variable name box, type @1A, then repeat for others Using the same procedures, name a large oval GSEL (note that the name has to be different than in the dataset) Finally, name the small ovals, which indicate the error terms (e.g. e1a ) Draw arrows Choose a single headed arrow and draw paths from GSE to observed indicators Then, draw a path from each error term to its observed indicator Constrain parameters Constrain one path from a latent variable to an observed indicator to 1 Constrain all paths from error terms to observed indicators to 1 Right-click the arrow that you wish to constrain click the parameters tab In the regression weight box, type 1 27
It should look something like this 28
Add TIO and Affective commitment Repeat the same procedures 29
To get this or hopefully something prettier 30
Add covariances between the latent variables 31
Select some additional outputs Go to View Analysis properties Output Select: Standardized estimates Squared multiple correlations (that is, R 2 s) 32
And finally, run the analysis Click Analyze Calculate estimates To access the output, choose View Text output Estimates and Model fit are of particular relevance for us Model fit indices: 33
Evaluating goodness-of-fit criteria In evaluating the measurement part of the model, focus on the relationships between the latent variables and their indicators (i.e. manifest variables). The evaluation of the measurement part of the model should precede the detailed evaluation of the structural part of the model. The aim is to determine the validity and reliability of the measures used to represent the constructs of interest. validity reflects the extent to which an indicator actually measures what it is supposed to measure reliability refers to the consistency of measurement (i.e. the extent to which an indicator is free of random error) 34
Evaluating goodness-of-fit criteria Reliability Reliability of the indicators can be examined by looking at the squared multiple correlations (R 2 ) of the indicators - they show the proportion of variance in an indicator that is explained by its underlying latent variable - the rest is due to measurement error - high multiple squared correlations value denotes high reliability for the indicator In addition to assessing the reliability of the individual indicators, it is possible to calculate a composite reliability value for each latent variable 35
Evaluating goodness-of-fit criteria Reliability (Cont d) Composite Reliability (CR) - also known as construct reliability - use information on the indicator loadings and error variances from the Completely Standardized Solution - with single item measures, it is not possible to empirically estimate the reliability (could be fixed at 1.0=no error or estimated by the researcher) - Recommended threshold value is.60 - The reliability for the latent construct must be computed separately for each multiple indicator construct in the model. 36
Evaluating goodness-of-fit criteria Convergent validity Average Variance Extracted (AVE): - Reflects the overall amount of variance in the indicators accounted for by the latent construct - Higher variance extracted occur when the indicators are truly representative of the latent construct - AVE greater than.50 provides evidence of convergent validity of a latent variable - AVE is quite similar to the CR measure but differs in that the standardized loadings are squared before summing them - Both AVE and CR are used for the purpose of assessment of the construct validity - If AVE of a variable is higher than its squared correlations with other variables, it provides evidence of its discriminant validity 37
Evaluating goodness-of-fit criteria Overall Model Fit Goodness-of-fit measures reflect correspondence of the actual or observed input (covariance or correlation) matrix with that predicted from the proposed model. There are three types of Goodness-of-fit measures: - Absolute fit measures - Incremental fit measures - Parsimonious fit measures 38
Evaluating goodness-of-fit criteria Overall Model Fit (Cont d) Absolute fit measures: Chi-square: provides a test of perfect fit in which the null hypothesis is that the model fits the population data perfectly (i.e. we want the chi-square test to be non-significant p>.05) RMSEA (Root Mean Square Error of Approximation): Focuses on the discrepancy between a predicted and observed covariance per degree of freedom by taking into account the model complexity. It shows how well would the model, with unknown, but optimally chosen parameter values, fit the population covariance matrix if it was available. Ideally below 0.05 Incremental fit measures: Normed Fit Index (NFI) Non-normed Fit Index (NNFI) Comparative Fit Index (CFI) Goodness of Fit Index (GFI) Parsimonious fit measures Adjusted goodness of Fit Index (AGFI) All ideally above 0.90 39
Part III: Structural model in SEM
From measurement to structural model ε4 ε5 ε6 y4 y5 y6 ζ2 δ1 δ2 δ3 δ4 x1 x2 x3 x4 Role ambiguity Emotional exhaustion ζ1 Organizational commitment Job satisfaction ζ3 Intention to leave ζ4 δ5 x5 Role conflict y1 y2 y3 y7 y8 y9 y10 y11 y12 δ6 x6 ε1 ε2 ε3 ε7 ε8 ε9 ε10 ε11 ε12 δn measurement error in exogenous (independent) variables εn measurement error in endogenous (dependent) variables ζn covariation between the endogenous variables errors 41
Testing the hypotheses y4 y5 y6 x1 x2 Role ambiguity.514* -.164* Organizational commitment.495* -.223* x3 x4 Emotional exhaustion -.252 * Job satisfaction -.322* Intention to leave x5 Role conflict.392* y1 y2 y3 y7 y8 y9 y10 y11 y12 x6 42
Example: Structural model General selfefficacy Tertius iungens orientation Affective commitment x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x1 x2 x3 x4 δ1 δ2 δ3 δ4 δ5 δ6 δ1 δ2 δ3 δ4 δ5 δ1 δ2 δ3 δ4 43
And then, let s try it with Amos! Use the measurement model as the baseline Replace double-headed arrows with single-headed arrows to indicate the direction of causality (retain double-headed arrows between exogenous variables) Add residual error variables (ζ) to endogenous variables and constrain their parameters to 1 Run the analysis as we did before Analyze Calculate estimates 44
Structural model input in Amos 45
Output View Text output Estimates Non-significant value All other values are positive (see, the estimates) and significant (see, the p-value) 46
Results: Structural model General selfefficacy Tertius iungens orientation Affective commitment x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x1 x2 x3 x4 δ1 δ2 δ3 δ4 δ5 δ6 δ1 δ2 δ3 δ4 δ5 δ1 δ2 δ3 δ4 47
Fit indices for the structural model Same indices as for a measurement model apply: - Chi-square, p-value - RMSEA - All other incremental indices (NFI, NNFI, CFA, GFI) - Parsimony indices (AGFI) 48
What we learned today? 1. We got familiar with the basic idea of structural equation modeling (SEM) 2. We identified special characteristics of SEM compared to other statistical modeling methods known to us 3. We developed understanding of what latent variable are 4. We developed understand of the process of SEM 5. We learned to estimate and interpret the results of measurement (CFA) and structural models in SEM 49