DIFFERENTIAL EFFECTS OF OMITTING FORMATIVE INDICATORS: A COMPARISON OF TECHNIQUES

Size: px
Start display at page:

Download "DIFFERENTIAL EFFECTS OF OMITTING FORMATIVE INDICATORS: A COMPARISON OF TECHNIQUES"

Transcription

1 DIFFERENTIAL EFFECTS OF OMITTING FORMATIVE INDICATORS: A COMPARISON OF TECHNIQUES Completed Research Paper Miguel I. Aguirre-Urreta DePaul University 1 E. Jackson Blvd. Chicago, IL USA maguirr6@depaul.edu George M. Marakas Florida International University S.W. 8th Street RB250 Miami, FL USA gmarakas@fiu.edu Abstract Research examining the formative specification of constructs has highlighted the need for researchers to capture all relevant causes of a construct of interest. However, the consequences of omitting a formative indicator have not been thoroughly examined. Given that one of the commonly employed techniques for modeling formatively specified constructs, Partial Least Squares, implicitly assumes that all relevant causes of a construct have been modeled, the consequences of omitting one of those are of prime importance. In this research we compare latent variable and PLS techniques on this issue based on theoretical arguments and results from Monte Carlo simulations. In particular, we focus on the presence or absence of estimation bias in the relationships between formative indicators and the formatively specified construct, and between the latter and other constructs in the research model. Our results highlight differences in how these two techniques cope with the omission of formative indicators, and discuss why those differences occur. Keywords: partial least squares, formative specification, latent variables, simulation, omitted variable bias Thirty Third International Conference on Information Systems, Orlando

2 Research Methods Introduction A significant amount of attention has been given to issues of instrument validity in the past, including content, construct, internal and statistical conclusion validity, as well as reliability (Straub 1989). Only recently, however, have researchers begun to focus on the underlying relationship between constructs and their empirical indicators, prompted by the seminal work of Diamantopoulos and Winklhofer (2001), although much of the theoretical development dates from earlier (Bollen 1984; Bollen and Lennox 1991; Cohen et al. 1990; Curtis and Jackson 1962). This relationship can be either reflective or formative. While the former is quite well understood, recent efforts have been made to better understand the alternative, formative specification, and its implications for research. Exemplars of this work include Jarvis, Mackenzie and Podsakoff (2003), Mackenzie, Podsakoff and Jarvis (2005), and recently in the information systems literature, Petter, Straub and Rai (2007) and Marakas, Johnson and Clay (2007). While most of the focus on the theory building and testing process is generally placed in the substantive relationships between constructs of interest, more than forty years ago Costner (1969) noted the need to include what he termed auxiliary theories, those relating abstract dimensions and their empirical indicators, as an integral part of scientific theories. In addition, he argued that these should be treated as any other theoretical proposition. Empirical testing of auxiliary theories, then, would serve to tentatively establish the adequacy of particular sets of indicators for testing the implications of their respective abstract formulations. In more modern terms, researchers should establish whether validity is adequate, although the original emphasis was solely on the issue of measurement error and its implications for theory testing. Having found that the indicators are not inadequate for this purpose, only then should researchers attempt to ascertain whether the relationships between constructs are themselves tenable. This logic is consistent with the work of Straub (1989) on instrument validity. To better establish the extent to which formatively specified constructs are actively being employed in mainstream IS research, as well as which statistical techniques are employed in the estimation of research models involving them, journal issues of MIS Quarterly, Information Systems Research, the Journal of Management Information Systems, and the Journal of the Association for Information Systems for the period January 1998 through December 2011 were examined 1. Forty six empirical studies that included one or more formatively-specified first-order constructs were found. Whereas the research included in this review represents a relatively small fraction of all research published in this period and outlets, a relatively large proportion of those (38 out of 46, or 83%) has appeared since 2006 alone, we believe reflecting the newly found interest of researchers in this topic. Examples of formatively specified constructs include Virtual Copresence (a subjective feeling of being together with others in a virtual environment; I find that people respond to my posts quickly, I am usually aware of who are logged on online ) (Ma and Agarwal 2007) and Technology Interaction (IT interactions undertaken with the purpose of accomplishing an individual or organizational task; I use this system (or application) to solve various problems, I use this system (or application) to justify my decisions ) (Barki et al. 2007). Defining for what reasons and under which circumstances researchers would want to specify the relationship between constructs and their indicators as formative or reflective is beyond the scope of this work, and others (Jarvis et al. 2003; MacKenzie et al. 2005; Petter et al. 2007) have provided quite extensive treatments of the issue. That said, it must be understood that once a decision has been made to specify a construct as formative, the researcher must make a choice between one of two families of statistical techniques, component- or covariance-based SEM, for the subsequent data analysis and model estimation. As expected, PLS was the most popular procedure used for this purpose in the reviewed research (37 out of 46 studies, or 80%), with only five studies using a latent variable technique (LISREL or AMOS) to analyze the research model, and other four studies employing OLS regression. Further examination of these studies indicates that IS researchers testing models which include formatively-specified constructs operate largely on two main assumptions: (a) covariance-based techniques (of which LISREL is an example) cannot handle models postulating first-order formative relationships (Choudhury & Karahanna, 2008; Liang, Saraf, Hu, & Xue, 2007; Limayen, Hirt, & Cheung, 1 This review of extant research was confined only to first-order formative constructs, which are the main focus of interest in this paper. The list of articles is not included here due to space limitations but is available from the first author upon request. 2 Thirty Third International Conference on Information Systems, Orlando 2012

3 2007; Ma & Agarwal, 2007), and (b) PLS is a viable alternative for doing so (Chin, Marcolin, & Newsted, 2003; Gefen et al., 2000; Petter et al., 2007). However, despite its widespread use in this regard, an extensive review of the literature on these methods and associated simulations was unable to uncover any in-depth examination of the ability of either alternative to analyze research models including formatively specified constructs when some of those formative indicators are omitted, what the effects of these omissions are on other parameters in the model, and whether those vary by technique. In general, most literature on the specification of formative constructs has been limited to latent variable techniques, such as LISREL, whereas PLS remains underexamined in this area, despite it being the most popular alternative for modeling these scenarios. Given that most research in this area highlights the need for researchers to specify all causes of a formatively specified construct (e.g., Bollen and Lennox, 1991) and that PLS implicitly assumes all such causes have been included in the model by virtue of not including a residual disturbance term that captures omitted causes, the consequences of violating this assumption seem worthy of detailed examination. In the rest of this article we first discuss the specific research models employed here to ground our discussion of these issues. Then, the specification and analysis of models containing a formatively specified construct are examined from the perspective of latent variable techniques. Problems resulting from the omission of a relevant formative indicator are discussed conceptually and then validated with Monte Carlo simulations. Next, we examine the same issues as they relate to PLS analyses, and further extend the concept of reliability as shared variance between a composite and the latent variable it represents to include components representing formatively specified constructs. We conclude our work with a comparison of our results for each technique, and some of the limitations of this research. Research Model and Simulation Parameters Throughout this research we employ a set of models to ground our discussion of the many issues associated with formatively-specified constructs and omitted indicators, and choice of statistical analysis technique. These models also provide the population parameters used in our simulations. Given that these apply to both techniques under examination here, it is worth briefly reviewing those at this time. The basic structure of the model is shown in Figure 1. Figure 1. Population Model Thirty Third International Conference on Information Systems, Orlando

4 Research Methods This model has been adapted from previous work investigating related issues by Jarvis, Mackenzie and Podsakoff (2003); see also Petter, Straub and Rai (2007) and Aguirre-Urreta and Marakas (2012). Population covariance matrices for all our models are available from the first author upon request. All values shown in Figure 1 for the parameters of interest are expressed in standardized metric. The model shown in Figure 1 has been modified from earlier uses to allow us to examine the effects of omitted formative indicators of varying importance. In particular, modifications have been made to ensure that the variance of the formatively specified latent variable remains constant across all three scenarios (e.g., Models A, B, and C), whereas the paths from the formative indicators to the latent variable vary to accomplish this, given that each model features different correlation strengths amongst these indicators. In all three cases the path coefficients from the formative indicators have been set to represent varying degrees of relative importance, as follows: the path from x 1 is four times as large as the path from x 4, that from x 2 three times as large, and that from x 3 two times as large. In all three models the residual variance of the formatively specified variable has been set at five percent of the total variance. We take this residual variance to represent the random shocks (minor, unstable influences on a variable) discussed by James, Mulaik and Brett (1983) (see our discussion below when dealing with identification issues). All variables in the model, both latent and manifest, follow a multivariate normal distribution. Data were simulated and subject to analysis with both of the statistical techniques under examination. All data generation was performed with EQS 6.1 (Bentler and Wu 1995). Statistical analyses within the latent variable framework were also conducted with EQS 6.1, and those for PLS with PLS-Graph 3.0. All analyses were conducted on standardized estimates, which are directly comparable across techniques and with those included in the models shown in Figure 1. Aside from the omission of a specific formative indicator, all other aspects of the models employed in the analyses were correctly specified. Data were generated for each different model (Models A, B, and C varying in the strength of correlation amongst formative indicators) for N = 200, N = 350 and N = 500, with one thousand replications in each condition. Within each one of these scenarios, the datasets were alternatively analyzed including all formative indicators, or missing one formative indicator at a time. Simulated data for all conditions are available from the first author upon request. Formative Modeling with Latent Variables The first of the two alternative approaches for modeling formatively specified constructs discussed here is structural equation modeling with latent variables (SEM-LV), also commonly referred to as covariancebased structural equation modeling. Estimation of both measurement and structural parameters included in the model is achieved by iterative minimization of a fit function that compares the observed covariance matrix with that implied by the research model. Maximum likelihood (ML) is the most commonly employed estimator, although there are others available. When ML is employed, the following assumptions are made: sample observations are independent and identically distributed following a multivariate normal distribution, the hypothesized model is approximately correct, a sample covariance matrix is analyzed, and the sample size is large (Boomsma and Hoogland 2000). The research model employed here and shown in Figure 1 has been identified as follows. In all reflectively-specified latent variables the loading of their first indicator (that is, the loadings for y 1, y 5, y 9 and y 13) has been fixed to one. For the identification of the formatively specified latent variable, three alternatives were considered. First, the disturbance term for the formatively specified construct could be set to zero. This is equivalent to assuming that the latent variable is perfectly determined by the formative indicators. This appears undesirable for a number of reasons. First, the definition of a formatively specified latent variable as (Bollen 2007; Bollen and Lennox 1991): recognizes that there is something more to the latent variable of interest than what is captured by its set of formative indicators. Fixing the disturbance term to zero for identification purposes would transform the latent variable into a weighted composite of its formative indicators, which creates the additional conundrum of positing that a set of manifest variables cause their own weighted sum. Bollen and Davis (2009) take a similar position. 4 Thirty Third International Conference on Information Systems, Orlando 2012

5 Second, seminal work in causal analysis (James et al. 1983) (see, in particular, their discussion on selfcontainment of functional equations) has defined the variance of a variable as the aggregate of a set of stable, non-minor and direct causes that are related to each other (i.e., correlated amongst them), if any; a set of stable, non-minor and direct causes that are uncorrelated to each other, if any; and random shocks, which are minor and unstable causes of a variable. In order to obtain unbiased estimates of path coefficients (see our discussion on omitted variable bias below), only the first set should be completely specified in the functional equation for a latent variable, as the disturbance term will capture any unmeasured but uncorrelated causes, and any random shocks that affect the latent variable at any given time. Fixing the residual variance to zero ignores that, even if all relevant causes of a latent variable are accounted for, there is still some portion of the variance that is due to minor variations (e.g., random shocks as put by James et al, 1983). Third, fixing the residual term to zero deprives researchers of an important diagnostic tool that can be used to assess the extent to which all causes of the formatively specified latent variable have indeed been incorporated in the model (Diamantopoulos 2006). Finally, recognizing that present knowledge about these relationships is generally incomplete, one would be hard pressed to provide complete and unequivocal assurance that all possible causes of a latent variable have been included. As noted by James (1980, p. 415): The operative question is not whether one has an unmeasured variables problem but rather the degree to which the unavoidable unmeasured variables problem biases the estimates of path coefficients and provides a basis for alternative explanations of results (see below for a more comprehensive discussion of the omitted variables problem). Given these considerations, we cannot recommend this approach to the identification of formatively specified latent variables, though at first sight it would appear to be the most simple alternative. The two other approaches considered, which are also discussed by Bollen and Davis (2009), require constraining a path coefficient to a non-zero value. This could be done by either fixing one of the path coefficients emitted by the formatively specified latent variable to another latent variable to one, or by fixing the path from a formative indicator to the formatively specified latent variable to one. Either of these alternatives will result in identical model fit, since they are merely setting the scale of the formatively specified latent variable; whereas unstandardized coefficients vary depending on scaling approach and the particular nonzero value chosen for identification, standardized coefficients will also be equal across both approaches. The downside of either of these approaches is that standard errors cannot be estimated for the parameter on which the constraint is placed. This occurs as well when constraining a loading to a non-zero value in order to identify a reflectively specified variable, but in those cases the significance of a particular loading is likely not of theoretical importance and assessing its magnitude, as well as the magnitude and significance of other items loading on the same variable, is generally deemed sufficient to establish its validity. In the case of formatively specified latent variables, however, the significance of either of these paths (from one latent variable to another, or from a formative indicator to its latent variable) is surely of interest. Given that we believe researchers are generally more interested in assessing parameters relating latent variables to each other, we believe they would be better served by fixing the path from a formative indicator to its latent variable for the purpose of establishing identification 2. In all our analyses and simulations, the formatively specified construct in the model shown in Figure 1 was identified by constraining the path from x 1 to one when x 1 was included in the model, and the path from x 2 in those cases where x 1 was omitted. The Problem of Omitted Variables and its Consequences When analyzing models using SEM-LV, the coefficient linking the formative indicators to their respective latent variable takes the form of a regression estimate. Therefore, the omission of one of those indicators from the model represents a special case of the more general issue, of great importance for empirical research, of omitted (or left-out, or unmeasured) variables. The issue has been examined in great detail by James (1980), James, Mulaik and Brett (1983), Hosman, Hansen and Holland (2010), Mauro (1990), Cellini (2008) or Meade, Behrend and Lance (2009), to name a few. Here we provide a summary of the 2 When the path from a formative indicator to the latent variable is fixed the standardized solution still reflects the relative contribution of all formative indicators, including the one with the fixed path, to the latent variable. However, the significance of a fixed path cannot be ascertained. Researchers should choose which paths to fix for identification purposes based on the goals of their research. In this case we have chosen to focus our attention on the structural paths relating latent variables to each other. Thirty Third International Conference on Information Systems, Orlando

6 Research Methods problem and how it is expected to affect research models with omitted formative indicators. The problem occurs when a research model omits a variable that (a) has a substantial effect on the dependent variable of interest, (b) is correlated with another predictor that is included in the model, and (c) makes a unique contribution to the prediction of the dependent variable, that is, is not itself linearly dependent on other predictors included in the model (James et al. 1983; Mauro 1990). When such a variable exists in the population but is not included in the model tested by a researcher, bias will occur in the regression coefficients of the predictors that are included in the model. Such bias occurs because the omission of a relevant predictor results in covariation between predictors that are explicitly included in the model and the disturbance term. More precisely, it violates the requirement that functional equations representing the causal model be self-contained (James et al. 1983). A simple example helps illustrate the problem (the principle is the same for more complex models but the demonstration is more involved; see Mauro, 1990, for an example with three predictors). Consider the following linear model: Where y is the dependent variable, x 1 and x 2 are the only two predictors, correlated at r 12, and b 1 and b 2 are standardized regression coefficients. The closed-form solutions for the coefficients are (where ry 1 is the correlation of x 1 and y, and ry 2 is the correlation between x 2 and y): and If a researcher omits one of the predictors, say x 2, then the formula above for b 1 simplifies to b 1 = r y1, that is, the standardized regression coefficient equals its correlation with the dependent variable. If the two predictors were not correlated, which is one of the required conditions for bias to occur, the standardized coefficient b 1 would always equal its correlation with the dependent variable (as r 12 would equal zero in the formula above). If the two predictors are correlated, however, this is no longer the case, as: Both direction and degree of bias will vary as a function of the sign and magnitude of the correlations involved. For models involving multiple predictor variables the net effects of omitting one or more of those on the coefficients for the included variables are more complex to determine given the multiple correlation coefficients, in both strength and sign, that play a role. However, unless all correlations involved are very small or close to zero (and leaving aside suppressor effects), omitting a relevant predictor variable will cause some degree of bias on the remaining ones in the model. See also Cenfetelli and Bassellier (2009) for some discussion of the possibility of suppressor effects within formatively specified latent variables. First, as expected from our discussion of the omitted variables problem above, the omission of a relevant formative indicator results in a significant upward bias in the path coefficients relating the included formative indicators to the latent variable, as compared to their population values obtained when the full model is estimated. The magnitude of this bias is itself a function of three different variables: relative importance of the missing indicator, strength of correlation between indicators, and relative importance of each of the included indicators for which bias occurs. First, bias increases as the strength of the correlation between formative indicators increases. Second, the degree of bias is a function of the relative importance of the missing indicator. When that indicator contributes the most toward the variance of the latent variable (e.g., x 1 in these models) bias will be higher than is the case when the missing indicator is of more limited importance to the definition of the latent variable. Finally, relatively less important indicators, that is, those with smaller path coefficients in the population model, will exhibit more bias than relatively more important ones. For example, in one extreme case (Table 1, estimated path for x 4 when x 1 is missing from the model) the resulting estimate is 90% larger than its population counterpart. The second major result of importance is that omissions of formative indicators from the model have no consequence on the estimates of the structural parameters of interest (i.e., those path coefficients that relate latent variables to each other). As can be seen in Table 1, all structural coefficients are identical to their population values, regardless of which indicators are missing and how strongly those are correlated amongst themselves. As a result, though most discussions on the formative specification of latent 6 Thirty Third International Conference on Information Systems, Orlando 2012

7 variables emphasize the need to obtain a complete set of formative indicators (e.g., a census according to Bollen and Lennox, 1991), our results indicate that violating this requirement will not create bias in the path coefficients between latent variables 3. To see how this is the case, consider the following simple MIMIC example (which abstracts the complexities associated with other latent variables in the model, but in no way affects our results) with two formative indicators and two reflective indicators 4. The implied population covariance matrix is shown in Figure 2 below, with a graphical representation of this example shown in Figure 3. In this model, there are six equations (for the variances of the two reflective indicators and the four covariances among the reflective and formative indicators) but only four unknowns 5 (the two loadings and the two regression paths from the formative indicators), which means that the equation group is over identified and a solution implies that two of the covariances can be expressed as a function of the other covariances. In this sense, one of the formative indicators is redundant, and the model could still be solved if only one of them were present indeed, missing a formative indicator in this example would result in an equation system with three equations (for the variances of the two reflective indicators and their covariance) and three unknowns (the two loadings and the regression path from the single remaining indicator), which can be solved. Another way to consider this is that the proportionality constraints imposed in the model imply redundancy between the formative indicators. x 1 Var x ) ( 1 x 1 x 2 y 1 y 2 x 2 φ Var x ) 2 y 1 λ γvar ( x ) + γ ] λ γvar ( x ) + γ ] λvar η) + Var( ) 1[ 1 1 2φ ( 2 1[ 2 2 1φ 1 ( ε1 y 2 2 λ γvar ( x ) + γ ] λ γ Var ( x ) + γ ] λλvar ( ) λvar η) + Var( ) 2[ 1 1 2φ 2[ 2 2 1φ 1 2 η ( η γ 1 1 γ 2 2 γ 1γ 2φ ζ 2 2 Note: Symmetric upper triangle omitted. Var ) = Var( x ) + Var( x ) Var( ) Figure 2. Population Covariance Matrix for MIMIC Example 2 ( ε 2 ζ x 1 γ 1 λ 1 y 1 ε 1 Φ η x 2 γ 2 λ 2 y 2 ε 2 Figure 3. MIMIC Example Graphical Representation 3 In order to validate this result we ran various other analyses including only one or two formative indicators in the analyzed models. In all cases the standardized structural coefficients are identical to their population values. 4 We thank Mikko Rönkkö for this insight. 5 Setting the scale of the latent variable with some of the available identification approaches removes one unknown. In this case we have omitted discussion of the residual for the latent variable. Thirty Third International Conference on Information Systems, Orlando

8 Research Methods Our analysis shows that, in a simple MIMIC model, the formative indicators are redundant when estimating the path coefficients from the latent variable toward the indicators. The next question is how well this analysis generalizes to more complex models. Adding more formative indicators makes the equations more complex, but it does not change the basic principle: The covariance between the two reflective indicators determines the product of the two paths λ 1 and λ 2 and one formative indicator is sufficient to solve how the covariance between the reflective indicators is partitioned between the two parameters. Similarly, replacing the reflective indicators with two latent variables maintains the basic principle. The only difference would be that we would not be using an observed covariance between two reflective indicators, but rather a model implied covariance between two latent variables when solving the values for the paths emitted from the formatively measured latent variable. However, our results hold here as well, as the issue is not the nature of the dependent variables, but the fact that there are more variables than unknowns when solving the system of equations implied by the models examined here. Our analysis indicates that for a correctly specified formative model, omitting a formative indicator in latent variable estimation is not as severe as earlier thought. As is the case in reflective measurement models, where multiple indicators of the same latent variable add a layer of redundancy to the model-implied equations, formative indicators exhibit similar characteristics in this regard. Given the likely controversial nature of these results, we also conducted a population analysis to validate our Monte Carlo simulations. Our approach here is similar to that for conducting power analyses developed by Satorra and Saris (1985). In this approach, the covariance matrix implied by the known population model (derived from Figure 1) is subject to analyses using alternative models which contain some misspecified component. In this case, the misspecification is the omission of a formative indicator from the alternative model. While some empirical issues cannot be examined with this approach (for example, convergence and solution propriety), results thus obtained will hold when data are sampled from the population and subject to analyses using the same alternative models. In essence, this approach allows a researcher to develop expectations about the consequences of various misspecifications if an infinitely large number of samples from the original population were collected and analyzed. In the case of Satorra and Saris (1985), misspecification due to constraining a particular parameter to zero allows researchers to estimate the a priori power of a variety of sample sizes to detect that parameter based on the non-centrality resulting from the analysis, which follows a non-central chi-square distribution (for an extended discussion and annotated example see Brown, 2006, Chapter 10). Results from these additional analyses are identical to those obtained from the Monte Carlo simulations previously discussed, lending additional credence to our findings. Third, our results in Table 1 show that all models fit the population covariance matrix equally well. That is, neither absolute (i.e., chi-square) nor relative (e.g., RMSEA, CFI) fit indexes will alert researchers that a relevant formative indicator has been omitted from the model. This constitutes a special case of the phenomena described by Tomarken and Waller (2003) who discuss how otherwise misspecified models can fit equally well as correctly specified ones when the restrictions on the covariance matrix implied by the misspecified model are a subset of those specified by the correctly specified one. As a result, lack of significant ill-fit cannot be taken as validation that all relevant formative indicators have been included in the research model. Finally, we consider the often raised possibility (Cenfetelli and Bassellier 2009; Diamantopoulos 2006; Diamantopoulos et al. 2008) of using the degree of variance explained (or unexplained) in the formatively specified construct by its indicators as a measure of whether all causes of the construct have been accounted for, that is, completeness of the specification. While the underlying logic is sound, researchers should be mindful of the fact that bias in the path coefficients relating formative indicators to their construct that occurs when a relevant indicator is omitted will also have an upward biasing effect on the reported variance explained in the formatively specified latent variable. As a result, though the proportion of variance explained may seem high, that may not be a clear cut indication that there are no missing indicators. As discussed before, however, omitted indicators do not appear to bias the structural coefficients relating latent variables to one another. 8 Thirty Third International Conference on Information Systems, Orlando 2012

9 TABLE 1. Latent Variable Simulation Results, Model B, N = 500 Full Model X1 Missing X2 Missing X3 Missing X4 Missing X1 K (- 1%) (+16%) (+11%) (+ 5%) X2 K ( 0%) (+30%) (+15%) (+ 8%) X3 K ( 0%) (+45%) (+34%) (+11%) X4 K ( 0%) (+90%) (+68%) (+46%) K Eta ( 0%) ( 0%) ( 0%) ( 0%) ( 0%) K Eta ( 0%) ( 0%) ( 0%) ( 0%) ( 0%) Eta1 Eta ( 0%) ( 0%) ( 0%) ( 0%) ( 0%) Eta1 Eta ( 0%) ( 0%) ( 0%) ( 0%) ( 0%) Χ 2 (d.f.) (160) (145) (145) (145) (145) RMSEA CFI R Note: Values on the left of each column show the average standardized parameter over 1000 replications. The values on the right show the average percentage bias compared to known population values for each parameter, calculated as (average estimate population value ) / population value. Values on the bottom four rows are averages over all replications. Formative Modeling with Partial Least Squares In PLS the latent variables of interest are represented as weighted composites of the observed variables that are directly related to them. The practice of substituting composites of observed variables as proxies for those of theoretical interest is certainly not new, and has been discussed before by many authors (e.g., McDonald, 1996) and, specifically in the context of formative specification of latent variables, by Bollen and Lennox (1991). In the particular case of the composites employed by PLS, those are weighted combinations of observed variables, whereas most of the literature discussing the use of composites refers to the unweighted case. As is well known, estimates of the relationship between latent variables obtained from the relationship between composites that substitute for them will be biased unless those composites are perfectly reliable. Whether that bias will be upward or downward compared to the population value of the relationship under examination is a function of the complexity of the functional equation relating the composites and which composites are less than perfectly reliable. In the case of only one predictor and one dependent variable, the obtained estimate will be biased downward when either of those composites is less than perfectly reliable. For a more general discussion of the effects of lack of reliability on various statistical procedures see, for example, Ree and Carretta (2006) and Bollen (1989) specifically dealing with latent variables. As it is also well known, estimates of the relationships between composites in PLS will be biased for any finite number of indicators related to each of those composites (i.e., the consistency-at-large requirement). In the case of reflectively specified latent variables this occurs because the composites representing them in PLS analyses, being themselves a weighted aggregate of individual indicators containing both true score and error components, are not perfect representations of the latent variables under examination, i.e., they are not perfectly reliable. To the extent that the number of indicators is large and those are of high quality, then the reliability of the composites will be relatively high and bias more limited. However, as noted by Bollen and Lennox (1991): this is not the same as saying that it [the composite] has perfect reliability (p. 310). The issue is generally overlooked by researchers who, after showing that the composites exhibit reliability above a certain threshold (typically 0.70 or 0.80), proceed as if estimates of the relationships of interest obtained from those composites accurately reflected those present in the population. Unless the reliabilities of the involved composites are quite high (in the range), however, there will be substantial bias in the estimates. Following Raykov (1997), and extending work in this regard by Bollen and Lennox (1991), the reliability of a composite can be understood as the shared variance between the composite and the latent variable it represents in the analysis; that is, the squared correlation between composite and latent variable. This is consistent with the commonly used measure of composite reliability by Werts, Linn and Jöreskog (1974) that is employed in PLS analyses; see also Fornell and Larcker (1981a; 1981b). More generally, reliability Thirty Third International Conference on Information Systems, Orlando

10 Research Methods as squared correlation between a composite and corresponding latent variable can be expressed as follows (Bollen and Lennox 1991; Raykov 1997), where c is the composite, η the corresponding latent variable, w i the weights employed in the formation of the composite, and λ i the loadings relating a reflective items to their latent variable: In this formulation, the numerator represents the variance in the composite that is due to the latent variable, whereas the denominator captures the total variance of the composite. The ratio between the two is the reliability of said composite. In the case of an equally weighted composite it simplifies into the well known statistic by Werts et al (1974). Although the notion of reliability is always associated with reflective specifications (indeed, various authors have noted that the idea of reliability as internal consistency does not apply to formative indicators), we argue here that the idea of reliability as shared variance between composite and latent variable is applicable, at least in principle, to composites representing formatively specified latent variables, as follows. In the case of PLS, composites representing the latent variables are weighted aggregates of the formative indicators included in the model without, as noted above, a disturbance term. To the extent that these composites are not perfectly correlated with the theoretical latent variable (i.e., are not perfectly reliable proxies for the latent variable of interest), any relationship between these composites and other variables in the model will be biased with respect to their population-level counterpart (Bollen and Lennox 1991). As a result, lack of perfect composite reliability, for either formative or reflective composites, explains the presence of bias in the estimates obtained from our PLS simulations. Even when all formative indicators are included in the models, the lack of a disturbance term reduces the correlation of the formative composite slightly, resulting in a small amount of bias in the relationships between this composite and others in the model (lack of perfect reliability in those other composites is also responsible for this bias). More problematic, however, are those cases where a formative indicator is omitted from the model. In those scenarios, the reliability of the formative composite greatly suffers as the absence of the indicator reduces the theoretical correlation with the latent variable it represents, since the variance due to the omitted indicator is no longer included. This is the reason why estimates obtained from models with omitted indicators exhibit substantial bias in the relationship between composites representing latent variables, and a major downside of employing PLS for analyzing these models. To better show how lack of reliability lies behind these occurrences, we conducted the following exercise. Adapting a procedure outlined by Raykov (1997), we estimated the reliability of the formative composite and that of one of the reflective composites it is related to (in this exercise, Eta3) for one scenario (Model A, X2 missing, N = 500) and show how, correcting for unreliability, the relationship between the two composites is no longer biased. We proceed as follows. First, we extracted the weights used to form both composites from the results of our PLS analyses. Next, using EQS 6.1 and the procedure by Raykov (1997) we create weighted composites (using the weights determined by PLS) and included them in the latent variable analysis of the same data, omitting the same indicator as well. As an additional parameter, the software estimates the correlation between these composites and the latent variables included in the model. Using those correlations, and the correcting procedures described by Ree and Carretta (2006) we then calculated a corrected value for the relationship between Ksi and Eta3 in Figure 1. The process was repeated for each replication and results averaged over all replications are discussed next. To highlight that the inclusion of these composites has no bearing on the latent variable analysis itself, we compared the chi-square statistic and its p-value for models with and without the composites, with identical results. For the particular scenario discussed here, the average path coefficient between Ksi and Eta3 obtained from the PLS analyses was The average reliability of the formative composite was and that of the reflective composite Correcting the path estimates obtained from PLS by lack 10 Thirty Third International Conference on Information Systems, Orlando 2012

11 of reliability in the composites 6, the average path is now 0.713, which compares well with a population value of (note that we discuss these results here in terms of averages for ease of exposition, but the procedures outlined above were conducted for each replication separately and only then averaged for presentation). These results underscore the fact that the presence of bias in these estimates is a result of employing weighted composites as (imperfect) substitutes for the latent variables of interest. When correcting for the lack of complete reliability introduced by that substitution, estimated parameters are in agreement with expected values based on population settings. Bias as a Result of Omitted Indicators We are now thus able to explain the pattern of results reported next with regards to the presence of bias in the relationships between the various composites representing latent variables of interest, and between those composites and their formative indicators. Given that all these composites are always imperfect representations of the ideal latent variables, any relationships between those will exhibit bias when compared to their population values. This is a well known feature of PLS. For the specific case of interest here, formative specifications, unreliability in the formative composite will be limited when all measurable formative indicators are included in the model, and thus limited bias is likely to occur (subject to high reliability of the other composites related to the focal one). This occurs because the only omitted portion of the latent variable in the weighted composite is the residual disturbance due to random shocks, which should be a small portion of the overall variance and, further, uncorrelated with the formative indicators. Therefore, its omission does not represent a major problem. As shown in our results, this is the case when all formative indicators are included in the model. Table 5 shows results for Model B where N = 500, which can be directly compared with those in Table 1; results for other cases are available from the first author upon request. TABLE 2. PLS Simulation Results, Model B, N = 500 Full Model X1 Missing X2 Missing X3 Missing X4 Missing X1 K (+ 2%) (+26%) (+16%) (+ 9%) X2 K (+ 3%) (+ 48%) (+21%) (+11%) X3 K (+ 2%) (+ 65%) (+45%) (+14%) X4 K (+ 4%) (+117%) (+83%) (+53%) K Eta (- 6%) (- 16%) (- 12%) (- 8%) (- 7%) K Eta (- 6%) (- 16%) (- 11%) (- 8%) (- 7%) Eta1 Eta (- 8%) (- 8%) (- 8%) (- 8%) (- 8%) Eta1 Eta (- 8%) (- 8%) (- 8%) (- 8%) (- 8%) Note: Values on the left of each column show the average standardized parameter over 1000 replications. The values on the right show the average percentage bias compared to known population values for each parameter, calculated as (average estimate population value ) / population value. When a formative indicator is omitted, however, the shared variance between the theoretical latent variable and the composite (i.e., its reliability) decreases as a function of how important the omitted indicator was to the determination of the formatively specified latent variable. When a major indicator is omitted, this drop in reliability has major consequences for the estimates obtained. As can be seen in our results, omitting an indicator leads to significant bias in the structural parameters of the model. The situation is most problematic when the formative indicators are not highly correlated amongst themselves, as omission of one indicator results in a larger portion of the variance of the composite that is not shared with the latent variable, resulting in lower reliability of the composite. For example, the estimated relationship between the formatively specified construct in Figure 1 and its direct consequent constructs exhibits a 28% downward bias when x 1 is missing and formative indicators are not highly correlated, which improves to 16% and 10% downward bias as the correlations between those indicators increase (Table 2 shows the 16% case). When the least important formative indicator in our models was omitted the resulting bias in the structural relationships was not different than when all indicators were included (compare, for example, the x 4 Missing column with the Full Model column in Table 2 for any one 6 Since this is a direct path with only one predictor, the estimate represents the correlation between the two composites. The formula for correcting correlations for unreliability is where r xy is the attenuated correlation and r xx and r yy the reliabilities of x and y, respectively. Thirty Third International Conference on Information Systems, Orlando

12 Research Methods of our model variations with regards to estimates for the structural relationships). Whereas the degree of bias in structural relationships due to an omitted formative indicator decreases as a function of how correlated the formative indicators are amongst themselves, bias in the measurement relationships between indicators and composite increases when an indicator is omitted as a result of increased correlations amongst those, as can be seen in Table 2. There are two related explanations for the occurrence of this bias. First, similar to the discussion above with respect to latent variables, when a relevant cause of a variable is omitted from a model, this leads to bias in the parameters linking the included causes to the dependent variable; in this case, the weights linking formative indicators to the composite. When all correlations amongst the predictors as well as the population parameters between those and the dependent variable are positive, upward bias will occur. The issue, however, is compounded by the fact that omitting a relevant formative indicator from the model omits it from the composite as well. Whereas in the case of latent variables the residual disturbance term captured the effects of omitted variables (and the violation of the self-containment assumption which made the residual term correlated with the included indicators was responsible for the bias in those estimates), in the case of PLS the weighted composite is only formed by those indicators explicitly included in the model. This results in a further biasing of the relationships between indicators and composite as each indicator ends up representing an even larger portion of the composite than would have been the case had the omitted indicator been included in the model. These effects can be best seen by comparing the bias in the relationship between indicators and composite shown in Table 2. First, the biasing effects of omitting a relevant formative indicator decrease as the omitted indicator is a relatively less important part of the definition of the construct. For example, bias when x 4 is omitted is an order of magnitude smaller than when x 1 is omitted. Second, the degree of bias itself is also a function of the relative importance of the indicator for which it is quantified. As a percentage of their population value, more important indicators (e.g., x 1 or x 2) will be less severely biased when an indicator is omitted than less important ones (such as x 3 or x 4). Bias for the latter in some combinations can be, in fact, quite extreme (consider, for instance, that the average estimated coefficient for x 4 when x 1 is omitted in Table 2 is more than twice as large as its population value). Finally, irrespective of which indicator is omitted, the general level of bias increases as the magnitude of the correlations amongst formative indicators increases. For example, bias in the relationship between x 3 and the formatively specified composite when x 2 is omitted equals 48% in Model A, 83% in Model B and 97% (essentially twice as large as its population value) in Model C (Table 2 shows the 83% case). Discussion and Limitations Comparison Between Approaches All results and conclusions discussed in this research are specific to the particular set of research models and varied conditions previously discussed, and subject to other limitations noted in more detail below. There are, however, a number of interesting results arising from our work, which we discuss in detail next. See Table 3. First, although the performance of the two examined techniques for those cases where models are correctly specified in all aspects was not our main focus of interest, the work conducted here sheds some light on that as well. While most research in this area (e.g., Jarvis et al, 2003; Petter et al, 2007) has focused on latent variables, there is limited evidence of the performance of PLS even for correctly specified models, when those include a formatively specified construct. Results shown in Table 3 indicate that when models are correctly specified, the latent variable technique (LV-SEM) exhibits no bias in estimation of either structural (those relating constructs to each other) or measurement parameters (those relating formative indicators to their corresponding construct), whereas PLS does exhibit some bias. Bias in the estimation of measurement parameters in this case is quite small, and due to the lack of a disturbance term in the composites modeled by PLS, leading to the formative indicators appearing to be slightly biased upwards. Bias for the structural parameters is small but not negligible, due to the lack of perfect reliability in the composites that substitute for the latent variables of interest; this bias is a function of the reliabilities of both composites involved in the estimates. Second, the omission of a formative indicator results in no bias for structural parameters under LV-SEM 12 Thirty Third International Conference on Information Systems, Orlando 2012

Can We Assess Formative Measurement using Item Weights? A Monte Carlo Simulation Analysis

Can We Assess Formative Measurement using Item Weights? A Monte Carlo Simulation Analysis Association for Information Systems AIS Electronic Library (AISeL) MWAIS 2013 Proceedings Midwest (MWAIS) 5-24-2013 Can We Assess Formative Measurement using Item Weights? A Monte Carlo Simulation Analysis

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions

The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions Journal of Applied Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 90, No. 4, 710 730 0021-9010/05/$12.00 DOI: 10.1037/0021-9010.90.4.710 The Problem of Measurement Model

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use Heshan Sun Syracuse University hesun@syr.edu Ping Zhang Syracuse University pzhang@syr.edu ABSTRACT Causality

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status Simultaneous Equation and Instrumental Variable Models for Seiness and Power/Status We would like ideally to determine whether power is indeed sey, or whether seiness is powerful. We here describe the

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

COMPARING PLS TO REGRESSION AND LISREL: A RESPONSE TO MARCOULIDES, CHIN, AND SAUNDERS 1

COMPARING PLS TO REGRESSION AND LISREL: A RESPONSE TO MARCOULIDES, CHIN, AND SAUNDERS 1 ISSUES AND OPINIONS COMPARING PLS TO REGRESSION AND LISREL: A RESPONSE TO MARCOULIDES, CHIN, AND SAUNDERS 1 Dale L. Goodhue Terry College of Business, MIS Department, University of Georgia, Athens, GA

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

Asignificant amount of information systems (IS) research involves hypothesizing and testing for interaction

Asignificant amount of information systems (IS) research involves hypothesizing and testing for interaction Information Systems Research Vol. 18, No. 2, June 2007, pp. 211 227 issn 1047-7047 eissn 1526-5536 07 1802 0211 informs doi 10.1287/isre.1070.0123 2007 INFORMS Research Note Statistical Power in Analyzing

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

Appendix B Construct Reliability and Validity Analysis. Initial assessment of convergent and discriminate validity was conducted using factor

Appendix B Construct Reliability and Validity Analysis. Initial assessment of convergent and discriminate validity was conducted using factor Appendix B Construct Reliability and Validity Analysis Reflective Construct Reliability and Validity Analysis Initial assessment of convergent and discriminate validity was conducted using factor analysis

More information

Use of Structural Equation Modeling in Social Science Research

Use of Structural Equation Modeling in Social Science Research Asian Social Science; Vol. 11, No. 4; 2015 ISSN 1911-2017 E-ISSN 1911-2025 Published by Canadian Center of Science and Education Use of Structural Equation Modeling in Social Science Research Wali Rahman

More information

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick. Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches

Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches Applications of Structural Equation Modeling (SEM) in Humanities and Science Researches Dr. Ayed Al Muala Department of Marketing, Applied Science University aied_muala@yahoo.com Dr. Mamdouh AL Ziadat

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Marianne (Marnie) Bertolet Department of Statistics Carnegie Mellon University Abstract Linear mixed-effects (LME)

More information

CLUSTER-LEVEL CORRELATED ERROR VARIANCE AND THE ESTIMATION OF PARAMETERS IN LINEAR MIXED MODELS

CLUSTER-LEVEL CORRELATED ERROR VARIANCE AND THE ESTIMATION OF PARAMETERS IN LINEAR MIXED MODELS CLUSTER-LEVEL CORRELATED ERROR VARIANCE AND THE ESTIMATION OF PARAMETERS IN LINEAR MIXED MODELS by Joseph N. Luchman A Dissertation Submitted to the Graduate Faculty of George Mason University in Partial

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

STRUCTURAL EQUATION MODELING AND REGRESSION: GUIDELINES FOR RESEARCH PRACTICE

STRUCTURAL EQUATION MODELING AND REGRESSION: GUIDELINES FOR RESEARCH PRACTICE Volume 4, Article 7 October 2000 STRUCTURAL EQUATION MODELING AND REGRESSION: GUIDELINES FOR RESEARCH PRACTICE David Gefen Management Department LeBow College of Business Drexel University gefend@drexel.edu

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Assessing Unidimensionality Through LISREL: An Explanation and an Example

Assessing Unidimensionality Through LISREL: An Explanation and an Example Communications of the Association for Information Systems Volume 12 Article 2 July 2003 Assessing Unidimensionality Through LISREL: An Explanation and an Example David Gefen Drexel University, gefend@drexel.edu

More information

Session 1: Dealing with Endogeneity

Session 1: Dealing with Endogeneity Niehaus Center, Princeton University GEM, Sciences Po ARTNeT Capacity Building Workshop for Trade Research: Behind the Border Gravity Modeling Thursday, December 18, 2008 Outline Introduction 1 Introduction

More information

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart

Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart Other Methodology Articles Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart J. E. KENNEDY 1 (Original publication and copyright: Journal of the American Society for Psychical

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller Inclusive strategy with CFA/MI 1 Running head: CFA AND MULTIPLE IMPUTATION Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Jin Eun Yoo, Brian French,

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

HIGH-ORDER CONSTRUCTS FOR THE STRUCTURAL EQUATION MODEL

HIGH-ORDER CONSTRUCTS FOR THE STRUCTURAL EQUATION MODEL HIGH-ORDER CONSTRUCTS FOR THE STRUCTURAL EQUATION MODEL Enrico Ciavolino (1) * Mariangela Nitti (2) (1) Dipartimento di Filosofia e Scienze Sociali, Università del Salento (2) Dipartimento di Scienze Pedagogiche,

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

An Empirical Study of the Roles of Affective Variables in User Adoption of Search Engines

An Empirical Study of the Roles of Affective Variables in User Adoption of Search Engines An Empirical Study of the Roles of Affective Variables in User Adoption of Search Engines ABSTRACT Heshan Sun Syracuse University hesun@syr.edu The current study is built upon prior research and is an

More information

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Nazila Alinaghi W. Robert Reed Department of Economics and Finance, University of Canterbury Abstract: This study uses

More information

A Comparison of First and Second Generation Multivariate Analyses: Canonical Correlation Analysis and Structural Equation Modeling 1

A Comparison of First and Second Generation Multivariate Analyses: Canonical Correlation Analysis and Structural Equation Modeling 1 Florida Journal of Educational Research, 2004, Vol. 42, pp. 22-40 A Comparison of First and Second Generation Multivariate Analyses: Canonical Correlation Analysis and Structural Equation Modeling 1 A.

More information

Session 3: Dealing with Reverse Causality

Session 3: Dealing with Reverse Causality Principal, Developing Trade Consultants Ltd. ARTNeT Capacity Building Workshop for Trade Research: Gravity Modeling Thursday, August 26, 2010 Outline Introduction 1 Introduction Overview Endogeneity and

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

Chapter 9. Youth Counseling Impact Scale (YCIS)

Chapter 9. Youth Counseling Impact Scale (YCIS) Chapter 9 Youth Counseling Impact Scale (YCIS) Background Purpose The Youth Counseling Impact Scale (YCIS) is a measure of perceived effectiveness of a specific counseling session. In general, measures

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

The Bilevel Structure of the Outcome Questionnaire 45

The Bilevel Structure of the Outcome Questionnaire 45 Psychological Assessment 2010 American Psychological Association 2010, Vol. 22, No. 2, 350 355 1040-3590/10/$12.00 DOI: 10.1037/a0019187 The Bilevel Structure of the Outcome Questionnaire 45 Jamie L. Bludworth,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

AIS Electronic Library (AISeL) Association for Information Systems. Wynne Chin University of Calgary. Barbara Marcolin University of Calgary

AIS Electronic Library (AISeL) Association for Information Systems. Wynne Chin University of Calgary. Barbara Marcolin University of Calgary Association for Information Systems AIS Electronic Library (AISeL) ICIS 1996 Proceedings International Conference on Information Systems (ICIS) December 1996 A Partial Least Squares Latent Variable Modeling

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Arun Advani and Tymon Sªoczy«ski 13 November 2013 Background When interested in small-sample properties of estimators,

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Preliminary Conclusion

Preliminary Conclusion 1 Exploring the Genetic Component of Political Participation Brad Verhulst Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University Theories of political participation,

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

A reply to Rose, Livengood, Sytsma, and Machery

A reply to Rose, Livengood, Sytsma, and Machery A reply to Rose, Livengood, Sytsma, and Machery Chandra Sekhar Sripada 1,2, Richard Gonzalez 3,4,5, Daniel Kessler 2, Eric Laber 6, Sara Konrath 7,8, Vijay Nair 4 1 Department of Philosophy, University

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines Ec331: Research in Applied Economics Spring term, 2014 Panel Data: brief outlines Remaining structure Final Presentations (5%) Fridays, 9-10 in H3.45. 15 mins, 8 slides maximum Wk.6 Labour Supply - Wilfred

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Studying the effect of change on change : a different viewpoint

Studying the effect of change on change : a different viewpoint Studying the effect of change on change : a different viewpoint Eyal Shahar Professor, Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona

More information

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned? BARRY MARKOVSKY University of South Carolina KIMMO ERIKSSON Mälardalen University We appreciate the opportunity to comment

More information

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer A NON-TECHNICAL INTRODUCTION TO REGRESSIONS David Romer University of California, Berkeley January 2018 Copyright 2018 by David Romer CONTENTS Preface ii I Introduction 1 II Ordinary Least Squares Regression

More information

existing statistical techniques. However, even with some statistical background, reading and

existing statistical techniques. However, even with some statistical background, reading and STRUCTURAL EQUATION MODELING (SEM): A STEP BY STEP APPROACH (PART 1) By: Zuraidah Zainol (PhD) Faculty of Management & Economics, Universiti Pendidikan Sultan Idris zuraidah@fpe.upsi.edu.my 2016 INTRODUCTION

More information

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke

More information

baseline comparisons in RCTs

baseline comparisons in RCTs Stefan L. K. Gruijters Maastricht University Introduction Checks on baseline differences in randomized controlled trials (RCTs) are often done using nullhypothesis significance tests (NHSTs). In a quick

More information

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational data Improving health worldwide www.lshtm.ac.uk Outline Sensitivity analysis Causal Mediation

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Sample Size Determination and Statistical Power Analysis in PLS Using R: An Annotated Tutorial

Sample Size Determination and Statistical Power Analysis in PLS Using R: An Annotated Tutorial Communications of the Association for Information Systems 1-2015 Sample Size Determination and Statistical Power Analysis in PLS Using R: An Annotated Tutorial Miguel Aguirre-Urreta Information Systems

More information

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Gregory T. Knofczynski Abstract This article provides recommended minimum sample sizes for multiple linear

More information

UMbRELLA interim report Preparatory work

UMbRELLA interim report Preparatory work UMbRELLA interim report Preparatory work This document is intended to supplement the UMbRELLA Interim Report 2 (January 2016) by providing a summary of the preliminary analyses which influenced the decision

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Complex modeling in marketing using component based SEM

Complex modeling in marketing using component based SEM University of Wollongong Research Online Faculty of Commerce - Papers (Archive) Faculty of Business 2011 Complex modeling in marketing using component based SEM Shahriar Akter University of Wollongong,

More information

Throughout this book, we have emphasized the fact that psychological measurement

Throughout this book, we have emphasized the fact that psychological measurement CHAPTER 7 The Importance of Reliability Throughout this book, we have emphasized the fact that psychological measurement is crucial for research in behavioral science and for the application of behavioral

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Durham Research Online

Durham Research Online Durham Research Online Deposited in DRO: 15 April 2015 Version of attached le: Accepted Version Peer-review status of attached le: Peer-reviewed Citation for published item: Wood, R.E. and Goodman, J.S.

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study

Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study Behav Res (16) 8:813 86 DOI 1.3758/s138-15-618-8 Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study Minjung Kim 1 & Andrea

More information

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca March 28, 2019 Ed Kroc (UBC) EPSE 594 March 28, 2019 1 / 32 Last Time Publication bias Funnel

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information