Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller

Size: px
Start display at page:

Download "Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller"

Transcription

1 Inclusive strategy with CFA/MI 1 Running head: CFA AND MULTIPLE IMPUTATION Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Jin Eun Yoo, Brian French, Susan Maller Purdue University March, 2007 CORRESPONDING AUTHOR: Jin Eun Yoo 400 Center Ridge Drive Pearson Educational Measurement Austin, TX (512) X4005 jineun.yoopearson.com Paper to be presented at the annual meeting of National Council on Measurement in Education, Chicago April 2007.

2 Inclusive strategy with CFA/MI 2 Abstract Even very well-designed, well-executed research can result in missing responses at any rate, particularly in survey research. This Monte Carlo study investigated the effectiveness of the inclusive strategy with multiple imputation, in a confirmatory factor analysis framework with incomplete data. Specifically, the study examined the influence of sample size, missing rates, missingness mechanism combinations, and the inclusive strategy on convergence failure, bias, standard error, and confidence interval coverage of parameters, and model fit. The inclusive strategy, which includes additional variables in the imputation model, was found to improve parameter estimation in most cases, particularly with the convex type of missingness as well as the nonignorable cases caused by MAR and not incorporating the inclusive strategy. Implications and future directions are discussed.

3 Inclusive strategy with CFA/MI 3 Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Standard statistical methods are designed for a complete data set, yet missing responses occur. Although missing data are a common problem, and theory behind it has been known for some decades, this topic has not received the deserved attention; Few statistical textbooks have addressed this issue, and software packages (e.g., AMOS, LISREL, MPLUS, SAS, etc.) only recently have started to incorporate missing data techniques (Allison, 2001; Schafer & Olsen, 1998). Not surprisingly, the lack of knowledge about missing data may result in inappropriate practice. For instance, listwise deletion, one of the most commonly used methods, yields biased parameter estimates if missing completely at random (MCAR) does not hold (Wothke, 2000). However, it is doubtful what percentage of researchers who employed listwise deletion for their data indeed thought if their missing data mechanism meets the MCAR assumption, and what percentage of the incomplete data sets really followed MCAR. Most journals in the field of educational psychology (e.g., Applied Measurement in Education, Applied Psychological Measurement, Educational Assessment, Multivariate Behavioral Research, Structural Equation Modeling, etc.) do not overtly state publication guidelines about missing data. Some journals (e.g., American Psychologist, Educational Psychological Measurement, etc.) tell authors to follow the Publication Manual of the American Psychological Association (5th ed.), which does not mention missing data, either. Currently, the plausibility of MCAR is testable in SAS or SPSS (Enders, 2006). Apart from the needs to meet the strict assumption of MCAR, listwise

4 Inclusive strategy with CFA/MI 4 deletion can lead to inefficient utilization of the data and thus lower power of hypothesis tests (Allison, 2001). More seriously, some ad hoc methods tend to distort test statistics even under MCAR. For instance, pairwise deletion produces unbiased parameter estimates but biased standard error estimates, which may invalidate the hypothesis tests (Allison, 2001, 2003). Mean imputation (unconditional mean imputation) retains means, but distorts marginal distributions and measures of covariation (Little & Rubin, 2002; Schafer & Graham, 2002). Regression imputation (conditional mean imputation) systematically underestimates variability and inflates correlations (Allison, 2001; Little & Rubin, 2002; Schafer & Graham, 2002). Hot deck imputation (draws from unconditional distribution) reserves variances, but distorts correlations and other measures of association (Schafer & Graham, 2002). On the other hand, maximum likelihood (ML) and multiple imputation (MI) techniques produce consistent, asymptotically normal and efficient parameter estimates under missing at random (MAR), less restrictive than MCAR, and under suitable regularity conditions (Allison, 2003; Schafer & Graham, 2002). ML and MI are considered the two accepted approaches to handling missing data (Allison, 2003; Little & Rubin, 2002; Schafer, 1997). Although ML and MI are known to be asymptotically equivalent under some suitable conditions (Collins, Schafer, & Kam, 2001; Gelman, Carlin, Stern, & Rubin, 1995), MI offers more flexibility of analysis and easier incorporation to research design than ML (Allison, 2003; Schafer, 1997). Particularly, with full information maximum likelihood (FIML), the ML method for incomplete data in the field of SEM, the inclusive strategy is

5 Inclusive strategy with CFA/MI 5 difficult to implement (Enders & Peugh, 2004; Graham, 2003), while SEM/MI can easily implement the inclusive strategy. The inclusive strategy (IS) generously employs auxiliary variables in the ML or imputation model, whereas the restrictive strategy (RS) utilizes no or little auxiliary variables (Collins et al., 2001). Auxiliary variables can be any variables in the data set, if not included in the model (Enders, 2006), and are defined as those that are not of essential interest but can be conducive to analyses with missing data (Collins et al., 2001). More specifically, auxiliary variables are categorized by whether they are a cause of the missingness or a correlate of the imputed variable. Notably, the auxiliary variable which is both the cause of missingness and the correlate of the imputed variable should be included in the ML or the imputation model to meet the ignorability assumption of missing data techniques, but in reality it is not obvious which variable falls into which category. The inclusive strategy is believed to increase the likelihood of the ignorability assumption when the auxiliary variables are highly correlated with the imputed variables and cause of missingness. Given data sets are not likely to meet the ignorability assumption (Schafer & Graham, 2002), the inclusive strategy can be important to analysis with incomplete data. Superiority of MI with the inclusive strategy has been shown (Collins et al., 2001; Van Buuren, Boshuizen, & Knook, 1999), but research is lacking with structural equation modeling (SEM) or confirmatory factor analysis (CFA) using MI with IS. For the last 10 plus years, a substantial body of Monte Carlo studies investigated missing data in SEM or CFA (Arbuckle, 1996; Brown, 1994; Enders, 2001; Enders & Bandalos, 2001; Enders & Peugh, 2004; Gold & Bentler, 2000;

6 Inclusive strategy with CFA/MI 6 Gold, Bentler, & Kim, 2003; Marsh, 1998; Newman, 2003; Olinsky, Chen, & Harlow, 2003; Wothke, 2000). As the focus of the study was ML and MI, not ad hoc methods, as well as normal data, studies only with ad hoc methods (Brown, 1994; Marsh, 1998) and studies with nonnormal data (Enders, 2001; Gold et al., 2003) were screened out. Consequently, there remained a total of seven studies. All the seven studies generated MCAR data. MCAR holds if missingness occurs independent of observed data and missing data. If missingness depends on observed data, but does not depend on missing data, then the missingness mechanism is said to follow MAR. In the case of MNAR, observed data cannot completely explain the missing data mechanism. The missing data mechanism needs to be modeled, but this usually requires substantial prior information and individualized approach for each problem (Allison, 2001; Schafer & Olsen, 1998). To say that missing data mechanism is ignorable, (a) at least MAR should hold, and (b) parameters for the model and missing mechanism need to be assumed independent (Rubin, 1976, 1987; Schafer, 1997). The second condition for the ignorable mechanism is unlikely to be violated in the real world, and even if violated, methods still perform very well only under MAR (Allison, 2003). Therefore, if at least MAR holds, the missing mechanism can be said to be ignorable in a practical sense. MCAR was carried out either by that all the variables had the same designated percentage of missing (Enders & Bandalos, 2001; Enders & Peugh, 2004) or by that every data point had the same and independent chance of being deleted (Arbuckle, 1996; Gold & Bentler, 2000; Newman, 2003; Olinsky et al., 2003; Wothke, 2000). For MAR, the values of one or more completely observed

7 Inclusive strategy with CFA/MI 7 variables determined the missingness of other variables (Arbuckle, 1996; Enders & Bandalos, 2001; Newman, 2003; Wothke, 2000). For instance, two variables were deleted in a two-latent, six-observed CFA, if one of the completely observed variables is less than or equal to 12 (Arbuckle, 1996). In nonignorable missingness (MNAR) situations, whether a variable is missing or not depends on the variable itself. For example, in the three-wave model by Newman (2003) a variable in wave 2 was dropped or kept, conditional on the value of the variable itself in the same wave. Notably, this study had both MAR and MNAR mechanisms in wave 3 by the nature of a longitudinal study. Observations in wave 2 influenced the missingness of wave 3 (MAR), and at the same time the missingness of a variable in wave 3 depended upon the variable itself in wave 3 (MNAR). Newman referred to this as partially MNAR, in contrast to the strictly MNAR of wave 2, but there was no further elaboration on this situation. Interestingly, some studies reported that smaller samples performed as well as larger samples. There was little difference in parameter estimation with sample size of 145 and 500 (Arbuckle, 1996), in estimation of relative efficiency with sample size of 100, 250, 500, and 750 (Enders & Bandalos, 2001), and in confidence interval coverage with sample size of 200, 400, and 600 (Enders & Peugh, 2004). However, sample size of 100 resulted in higher rates of convergence failures under MCAR and MAR (Enders & Bandalos, 2001) and with the structured EM model (Gold & Bentler, 2000). Sample size of 100 also produced biased estimates (Enders & Bandalos, 2001). Not surprisingly, higher missing rates generally lead to more estimation problems, particularly with ad hoc techniques. High missing rates such as 75%

8 Inclusive strategy with CFA/MI 8 and 32% resulted in convergence failures, which was most problematic with MNAR (Newman, 2003; Olinsky et al., 2003). Parameter estimation was mostly unacceptable with 75% of the data missing (Newman, 2003). On the other hand, missing rates under 25% were unrelated to convergence failures (Gold & Bentler, 2000; Enders & Bandalos, 2001) and did not influence confidence interval coverage (Enders & Peugh, 2004). For the comparison of sample size, missing rates, and number of replications for each Monte Carlo combination cell, see Table 1. Studies comparing the efficacy of ad hoc and maximum likelihood methods in SEM (or CFA) almost always showed the superiority of maximum likelihood (ML) methods to ad hoc methods in terms of convergence failures (Enders & Bandalos, 2001; Gold & Bentler, 2000; Newman, 2003; Olinsky et al., 2003), bias of parameter estimates (Arbuckle, 1996; Enders & Bandalos, 2001; Gold & Bentler, 2000; Newman, 2003; Olinsky et al., 2003; Wothke, 2000), standard error (Newman, 2003) or efficiency of parameter estimates (Enders & Bandalos, 2001; Wothke, 2000), and model fit (Enders & Bandalos, 2001; Olinsky et al., 2003). Some recent simulation studies which examined multiple imputation (MI) also showed the superiority of MI to ad hoc methods in bias and standard error estimation (Newman, 2003; Olinsky et al., 2003). Auxiliary Variables Another recommended approach to handling missing data is the inclusive strategy, which entails auxiliary variables (Collins et al., 2001). Collins et al. (2001) categorized three kinds of auxiliary variables: Type A auxiliary variables,

9 Inclusive strategy with CFA/MI 9 correlates of both the incomplete variable of research interest (y) and the indicator variable of missingness in y (R); Type B auxiliary variables, correlates of y, but not correlates of R; and Type C auxiliary variables, correlates of neither y nor R. To illustrate, suppose a researcher wants to know if her mathematics achievement test will be a good measure of students mathematics achievement. The test is administered to all 12th graders of a local school, as well as several background questions such as gender, ethnicity, and date of birth. The researcher has decided to use only selected items of the test for her CFA model. The selection decision may come from reliability evidences such as Cronbach s α, as well as the consideration of content validity. Suppose one of the excluded items was difficult to answer and made students skip the following items. This item is both a correlate of the mathematics achievement and a cause of the missingness of other items, and hence can be an example of Type A auxiliary variables. Any other items under the mathematics test can serve as Type B auxiliary variables, if the items do not influence the missingness of each other. Some other background questions such as gender and ethnicity can also be this type, if the questions are correlates of mathematics achievement but not correlates of the missingness of the test items. Type C auxiliary variables are junk variables (Collins et al., 2001, p.345) from the aspect of missing data imputation. This type of variables will have zero correlations with variables of research interest. An example of these can be students date of birth, provided that it is not correlated with mathematics achievement nor missing the test. The use of auxiliary variables (the inclusive strategy) produced better estimation of parameters than no use of auxiliary variables (the restrictive

10 Inclusive strategy with CFA/MI 10 strategy) (Collins et al., 2001). Specifically, Type A auxiliary variables will need to be included in the imputation model, particularly with high missing rate (e.g., 25%) and high correlation coefficient between auxiliary and incomplete variables (e.g.,.90). Type B auxiliary variables will increase the efficiency of parameters, and this finding is particularly promising, since Type B auxiliary variables are fairly common in practice (Graham, 2003) and merely including this type in the imputation models can improve the parameter estimation of variables of research interest. Finally, adding Type C auxiliary variables will not harm the analysis with a large sample. To summarize, this study by Collins et al. (2001) demonstrated that the inclusive strategy would benefit, or at least not harm, analyses of incomplete data. Auxiliary, Stratifying, and MAR-mediate Variables By the mathematical definition, any variables in the data set can be auxiliary variables, unless in the analysis model. Although Collins et al. (2001) intended to define auxiliary variables as those not of research interest, there can be auxiliary variables which in fact were of research interest but were omitted from the analysis for some reasons. For instance, the aforementioned example of Type A auxiliary variables fits this description. That is to say, items under the same construct but not included in the subsequent analysis are either Type A or Type B auxiliary variables, and these items will need to be included in the imputation model for more accurate estimation. As the auxiliary variables were comprehensively defined, stratifying (Rubin, 1996; Wothke, 2000) or design (Schafer, 1997) variables are said to be subsumed by the auxiliary variables. Examples of stratifying or design variables

11 Inclusive strategy with CFA/MI 11 can be strata of surveys such as gender, ethnicity, citizenship, or region of residence. Notably, stratifying should be complete, since stratifying variables are fixed, and fixed variables may result in the violation of the independently, identically distributed (iid) assumption (Schafer, 1997, pp ). Omitting these variables will be likely to result in the violation of assumptions such as ignorability. Stratifying variables will be obvious in practice. Once researchers know that these variables should be in the imputation model, it will not be difficult to include these variables in the model. There are variables which are correlates of both the cause of missingness and the imputed variable, and at the same time are of research interest and in the analysis model. The term, auxiliary, does not fit this description. Therefore, a need emerged to create a new terminology: MAR-mediate variables. MAR-mediate variables are of research interest, as well as both the cause of missingness and correlates of the imputed variable. Type A auxiliary variables can be the same as MAR-mediate variables, if this type of variables are of research interest and included in the analysis. Omitting MAR-mediate or Type A auxiliary variables from the imputation model will be likely to violate the ignorability assumption and hence invalidate the subsequent analyses. It should be noted that before labeling and defining the auxiliary variables and showing the superiority of the inclusive strategy by Collins et al. (2001), many researchers contended that it would be worth to include extra variables that are a potential cause of the missingness or that are a correlate of the incomplete variable in the imputation model, as well as variables of research interest (Allison, 2001; Schafer, 1997; Schafer & Olsen, 1998; Van Buuren et al., 1999). Rubin (1996)

12 Inclusive strategy with CFA/MI 12 even recommended that researchers should include as many variables as possible in the imputation model. Purpose of the Study Previous studies indicate the superiority of ML to ad hoc methods, but few examined MI, and none examined the effectiveness of the inclusive strategy with MI. There has been one simulation study by Enders and Peugh (2004) to compare the restrictive and inclusive strategies in CFA/ML. It was disappointing that the two strategies yielded little different results. Using Type B auxiliary variables under MCAR and employing low correlations (.10 and.30) between auxiliary and research variables appear to have contributed to the unimpressive results. As the first study to employ the inclusive strategy in CFA with multiple imputation (MI), this study differs from previous studies in many ways. Previous studies only dealt with a single missingness mechanism (e.g., MCAR or MAR) throughout a data set, although there can be more than one mechanisms present (e.g., MAR and MNAR in Newman (2003) s study). Missingness mechanisms are difficult to verify with real data. This difficulty seems to have contributed to the practice of considering only one missingness mechanism throughout a data set. However, for instance, it is unknown whether MNAR-MCAR will yield similar results as MNAR-MAR (Both situations are nonignorable). The number of missing situations increases exponentially with the number of factors. As there are three mechanisms, a two-factor measurement or structural model can have a total of nine (= 3 2 ) missing situations (e.g., MCAR-MCAR, MNAR-MAR, etc.), a three-factor model can have twenty-seven (= 3 3 ) (e.g.,

13 Inclusive strategy with CFA/MI 13 MCAR-MCAR-MNAR, etc.), and so on. Since there has been no research on all incomplete variables and their various missing situations, this study employed a basic two-factor measurement model. While the study by Enders and Peugh (2004) employed only Type B auxiliary variables, this study employed both Type A and Type B auxiliary variables under MAR and MCAR, respectively. Notably, auxiliary variables were placed under the extant latent variables. In the field of psychometrics, CFA, a submodel of SEM, is commonly used to examine the construct validity of an assessment instrument. Specifically, by placing the auxiliary variables under extant latent variables, this study illustrates a situation where there exist alternative indicators and only a subset is selected for subsequent analysis. ML methods cannot execute this design, as they handle missing data problems simultaneously with model-fitting. Furthermore, previous studies imposed a restriction that some of the variables were completely observed. However, correct imputation does not depend on some variables being completely observed, unless they are stratifying variables which are fixed (Schafer, 1997). Besides, in practice, it might be rare that some variables are completely observed. Thus, understanding how imputation functions where all variables are incomplete is warranted. Another unexamined factor was type of missingness. Previous studies only dealt with linear type of missingness. For example, if students whose scores were in the lower group in the first wave did not participate the second wave measurement, this is said to be MAR-linear. If both the upper and the lower groups in the first wave refused to come back for the second wave measurement, this is said to be MAR-convex (Collins et al., 2001). This study examined both

14 Inclusive strategy with CFA/MI 14 linear and convex types under MAR and MNAR situations. Lastly, the higher the correlation coefficient is between auxiliary and research variables, the better the inclusive strategy seems to perform (Collins et al., 2001). The study explored correlations between.48 and.72, which is explained in detail in the Methods section. In conclusion, the effectiveness of the inclusive strategy with MI was investigated for use with CFA with incomplete data through a Monte Carlo simulation. The main research question was under what condition combinations the inclusive strategy would be effective in the CFA and MI framework. Specifically, the study examined the influence of sample size, missing rates, missing situations, missing types, and strategies on convergence failure, bias, standard error, and confidence interval coverage of parameters, and model fit. Methods There has been no research on all incomplete variables and their various missing situations, and therefore a basic two-factor measurement model was used (Enders & Peugh, 2004). There were 96 combinations and 1000 replications for each combination: Two sample sizes (N = 200, 500), two missing rates (10%, 20%), six missing situations (MCAR-MCAR, MCAR-MAR, MAR-MAR, MCAR-MNAR, MAR-MNAR, and MNAR-MNAR), two missing types (linear and convex), and two strategies (RS and IS). A six-observed, two-factor measurement model was examined in terms of convergence failure, bias, standard error, and confidence interval coverage of parameters, and model fit. All the conditions except sample size are closely associated with auxiliary

15 Inclusive strategy with CFA/MI 15 variables as the following: First, six different missing situations from a two-factor model served as one of the conditions (missing situations). Type B auxiliary variables were used under MCAR, and Type A auxiliary variables under MAR. MNAR had no relations to auxiliary variables. Second, including auxiliary variables in the imputation model was the inclusive strategy, and omitting them the restrictive strategy. This served as another condition (strategies) in this study. It is notable that missingness mechanism and strategies are confounded in some situations containing MAR and the restrictive strategy. For instance, design combination cells such as MAR-MAR and the restrictive strategy in fact cannot meet the ignorability assumption, since Type A auxiliary variables are not included in the imputation model with the restrictive strategy. For that combination, although the label is MAR-MAR, the missingness mechanism was nonignorable. Third, there were two types of missing conditions for MAR and MNAR: Linear and convex (Collins et al., 2001). Research variables of each factor were arranged in the ascending order of its auxiliary variable. For the linear missing conditions, the corresponding research variables to the lower values of the auxiliary variable had the same chance of being deleted with the missing rate specified. For the convex missing conditions, the corresponding research variables to the upper and the lower values of the auxiliary variable had the same chance of being deleted with the missing rate specified. This procedure deals with two conditions, missing types and missing rates.

16 Inclusive strategy with CFA/MI 16 Measurement Model This study modified the measurement model of Enders and Peugh (2004), as there is no dominant measurement or structural model across simulation studies (Arbuckle, 1996; Brown, 1994; Enders, 2001; Enders & Peugh, 2004; Gold & Bentler, 2000; Muthén, Kaplan, & Hollis, 1987; Olinsky et al., 2003). The factor loadings of one factor were.60,.65, and.70 (Enders & Peugh, 2004), and the other factor had higher loadings of.80,.85, and.90 for the comparison of the different magnitudes of factor loadings. The error variances (Θ δ ) were defined as one minus the squared factor loadings (1 λ 2 x), and the factor correlation (φ 12 ) was set at.40 (Enders & Peugh, 2004). There was little difference between the two strategies when the correlations between observed variables ranged between.06 and.21 (Enders & Peugh, 2004). Thus, each auxiliary variable had the same factor loading of.80, in order to secure the correlations between auxiliary and research variables not such low. After accounting for measurement error, the correlations between auxiliary and research variables ranged between.48 and.72. These values are in between the.40 and.90 in Collins et al. (2001), and higher than the values between.06 and.21 in Enders and Peugh (2004). Therefore, this design is expected to yield more meaningful results than the study by Enders and Peugh (2004), if not that dramatic as in Collins et al. (2001). Data Generation All the variables in this study were continuous and multivariate normal. Data were randomly generated by the RANNOR function in SAS/IML,

17 Inclusive strategy with CFA/MI 17 conforming to the measurement model of the study. Each factor had four observed variables, one of which served as the auxiliary variable for each latent variable. Therefore, an eight-variable, two-factor measurement model was used for data generation, but a six-variable, two-factor measurement model was evaluated, after excluding the two auxiliary variables. Data deletion was executed with random uniform numbers. If the rank of the random number was equal to or less than the percentage specified (Enders & Peugh, 2004), the corresponding data point was deleted. For MCAR, every data point had the same chance of being deleted with a probability of 10% (20% for the 20% missing rate). For MAR-linear, three research variables of each factor had a chance of being deleted with a probability of 50%, if the auxiliary variable was in the lower 20% (40% for the 20% missing rate). In the MAR-convex situations, data deletion was executed in the same manner as in MAR-linear, if the auxiliary variable was either in the lower or the upper 10% (both 20% in the 20% missing rate). For MNAR-linear, the lower 10% (20% for the 20% missing rate) of each research variable was deleted. For MNAR-convex, the upper and the lower 5% of each research variable were deleted (both 10% for the 20% missing rate). The two auxiliary variables of the study had an equal and independent probability of 10% missing. Ignorability versus MNAR The missingness mechanisms of any situations which include MNAR are nonignorable and those of MCAR-MCAR are ignorable. However, the ignorable situations associated with MAR may not be as straightforward as the former situations without proper understanding of missing data theory and the design of

18 Inclusive strategy with CFA/MI 18 the study. Whether the missingness mechanism of this study is ignorable or not depends on the combinations of the missing situation (e.g., MCAR-MAR) and the strategy (e.g., restrictive or inclusive). Particularly under MAR, this study by design had the missingness of imputed variables depend on auxiliary variables, and the auxiliary variables were also correlates of the imputed variables. In other words, the auxiliary variables of this study were MAR-mediate variables. Therefore, under MAR including the auxiliary variables, the inclusive strategy, would meet the ignorability assumption in those cases, while omitting the auxiliary variables, the restrictive strategy, would make the missingness mechanism nonignorable. Consequently, there were largely two kinds of ignorable situations: (a) MCAR-MCAR combinations and (b) any MAR-IS combinations. As for nonignorable situations, there were also mainly two kinds: (a) any MNAR-included combinations and (b) any MAR-RS combinations. To reiterate, the missingness mechanisms of any MAR-IS combinations (e.g., MAR-MAR-IS and MCAR-MAR-IS) were ignorable, and those of any MAR-RS combinations (e.g., MAR-MAR-RS and MCAR-MAR-RS) were in fact nonignorable. Specifically, the MNAR-MNAR combination was the most severe case among the nonignorable cases, and was expected to produce the most problematic estimates. In the similar context, it needs to be noted that throughout this paper MNAR as a missing situation element (e.g., MNAR in MCAR-MNAR) indicated that the missingness of a variable depended on the variable itself, whereas nonignorable cases included both MAR-RS and MNAR cases. In theories of missing data, MNAR is interchangeable with nonignorability, but the missingness mechanisms of this study was more complex than they are typically assumed so

19 Inclusive strategy with CFA/MI 19 that this distinction had to be made. Data Analysis Upon the completion of the missing data generation, there were four remaining steps. First, multiple imputation (MI) was conducted using PROC MI in SAS, and ten imputed data sets were created. Although five sets are considered as sufficient for MI (Schafer, 1997; Little & Rubin, 2002) and the default for SAS (SAS Institute, 2001), more runs of MI are expected to yield more stable results (Allison, 2003; Collins et al., 2001; Enders, 2006). Second, CFA was conducted with each of the imputed data sets by PROC CALIS. Third, the results of the second step were combined into a single analysis by PROC MIANALYZE. Fourth, bias, standard error, and confidence interval coverage of each parameter were obtained, as well as convergence failures and model fit. All the programs were written in SAS/BASE, IML, MACRO, and STAT. Evaluation Criteria The evaluation criteria were convergence failure (Enders & Bandalos, 2001; Gold & Bentler, 2000; Newman, 2003; Olinsky et al., 2003), bias of parameter estimates (Arbuckle, 1996; Enders & Bandalos, 2001; Gold & Bentler, 2000; Newman, 2003; Olinsky et al., 2003; Wothke, 2000), standard error of parameter estimates (Newman, 2003), confidence interval coverage (Enders & Peugh, 2004), and model fit (Enders & Bandalos, 2001; Enders & Peugh, 2004; Olinsky et al., 2003). Convergence failure was obtained by counting the number of nonconvergence within 1000 iterations (Jöreskog & Sörbom, 1993). There were 13 bias, standard error, and confidence interval coverage of parameters, respectively,

20 Inclusive strategy with CFA/MI 20 consisting of four factor loadings and nine variances. Since there were 96 condition combinations, a total of 1248 (= 13 96) bias, standard error, and confidence interval coverage of parameters were obtained, respectively. Bias of a parameter estimate is defined as the difference between the estimate and its population parameter. In Monte Carlo simulations, bias can be obtained by the difference between the average of K replications ( ˆθ; Equation 1) and its population parameter in the design combination cell (θ). For better interpretation, the bias of this study, the percentage of relative bias, was obtained as in Equation 2 (Enders & Bandalos, 2001; Olinsky et al., 2003), and referred to as bias throughout the study: ˆθ = K k=1 ˆθ k /1000. (1) Bias = ( ˆθ θ)/θ 100. (2) Bias smaller than 10% to 15% is considered to be acceptable for SEM research (Enders & Bandalos, 2001; Muthén et al., 1987). As for the confidence interval coverage, a parameter estimate is expected to be covered 95% of the time. Coverage below 90% is considered to be unacceptable (Collins et al., 2001). Finally, model fit was assessed by the percentage of model rejections for each design combination cell, using an overall F -test (Allison, 2001; Li, Raghunathan, & Rubin, 1991; Meng & Rubin, 1992; Schafer, 1997). An overall F -test statistics was obtained with every replication, and a binary variable was created to indicate the status of model fit. For instance, if the p-value of an F -test statistic is smaller than.05, then the binary variable will be recorded as 1. A

21 Inclusive strategy with CFA/MI 21 p-value equal to or larger than.05 will make the binary variable as 0. The average of this binary variable was be calculated across condition combinations, which served as model fit. This test statistic allows for appropriate model fit decisions. Since the data were generated from the population covariance matrix with perfect fit, model fit was expected to reach the nominal rate of.05 with the correct model (Enders & Bandalos, 2001; Enders & Peugh, 2004). Results After missing data generation, data sets sometimes had rows whose observations were all missing. However, it rarely occurred, at most 0.15% under some condition combinations, and most of the data sets were free from this problem. Therefore, these rows with all missing observations were simply deleted from the subsequent analysis. Convergence Failures Nonconvergence was none or minimal in every condition combination. Most of the cells reported no convergence failures, and even the most problematic condition yielded less than 5% convergence failures. Particularly, the linear type of missingness and the inclusive strategy (IS) tended to yield fewer convergence failures than the convex type of missingness and the restrictive strategy (RS). Compared to the sample of 500, the sample size of 200 resulted in most of the nonconvergence, and the missing situation of MNAR-MNAR yielded more convergence failures than the other missing situations, when all the other conditions were the same.

22 Inclusive strategy with CFA/MI 22 Bias As expected, ignorable situations resulted in acceptably low levels of bias with every parameter estimation. The inclusive strategy tended to yield smaller levels of bias than the restrictive strategy in almost all condition combinations, but was not able to completely eliminate problematic bias resulting from MNAR. Specifically, the inclusive strategy was most effective with the convex-nonignorable cases caused by the restrictive strategy. To be more specific, all the ignorable cases, including any MCAR-MCAR and MAR-IS combinations, produced negligible levels of bias irrespective of sample size, missing rate, and missing type. Surprisingly, the linear-nonignorable cases of MAR-RS were so robust to parameter estimation that they also yielded negligible levels of bias. However, the convex-nonignorable cases of MAR-RS introduced problematic levels of bias, the problem of which was solved by employing the inclusive strategy. The inclusive strategy did not completely solve the bias problems of any MNAR-included cases, but lowered the bias count for those problematic cases, particularly with linear-10% combinations and with most of the convex combinations except 500-MNAR-MNAR (See Table 2). Particularly, the condition of convex-rs most clearly showed the effect of missing situations on bias count, depending on the levels of nonignorability; MCAR-MAR resulted in fewer numbers of biased estimates than MAR-MAR, followed by MCAR-MNAR, and then MAR-MNAR, and lastly MNAR-MNAR (See Table 2). Overall, the linear-is combinations tended to yield the smallest levels of bias, and the convex-rs combinations the largest levels of bias. There was no obvious trend between the linear-rs and convex-is combinations with regard to

23 Inclusive strategy with CFA/MI 23 levels of bias. The influence of sample size seems to have been little with the linear type of missingness, but the sample size of 200 tended to yield more estimation problems with convex-rs. Standard Error Strategies and missingness types had little effect on standard error estimation, particularly under ignorability. However when the missingness situations were nonignorable, the inclusive strategy and the linear type of missingness tended to outperform the restrictive strategy and the convex type of missingness, respectively. This was most noticeable with estimates of factor loadings. Also, the smaller sample size and the higher missing rate resulted in the larger standard error. Confidence Interval The rule of absolute value larger than 10 in the bias of parameter estimation appears more conservative than that of coverage count below 90%, but the results of bias and coverage were generally consistent with each other. As was with the bias of parameter estimates, the inclusive strategy was unable to solve the disruptive coverage problems in the nonignorable cases caused by MNAR. However, the inclusive strategy tended to yield higher coverage rates than the restrictive strategy, particularly in cases with problematic coverage rates (see Table 3).

24 Inclusive strategy with CFA/MI 24 Model Fit Model fit of this study was obtained excluding nonconverged cases. Specifically, no more than 0.1% of the replications (e.g., less than 10 out of 10000) failed to report model fit within each condition combination. Model fit was quite stable across the condition combinations, ranging between.05 and.10. The lower missing rate and the larger sample size tended to result in model fit closer to the optimal level of.05 (See Table 4). Discussion Like other statistical methods, analysis with MI relies on some critical assumptions, violation of which will lead to invalidation of the results. Particularly, the ignorability assumption needs to be satisfied for correct inferences, and including relevant auxiliary variables in the imputation model, although they are not of research interest, is believed to increase the likelihood of the ignorability assumption. However, strictly speaking, the missingness mechanism of real data sets will be unlikely to meet the ignorability assumption, unless missingness is completely in the researcher s control (Schafer & Graham, 2002). The questions now are how serious the consequences of violating this assumption and how effective the inclusive strategy will be. First, this simulation study found that the effectiveness of the inclusive strategy varied by missingness type and missingness situation. There were mainly three missingness cases determined by missing situation and strategy: (a) ignorable cases, (b) nonignorable cases caused by MAR-RS, and (c) nonignorable cases caused by MNAR. As expected, MI performed very well under ignorability

25 Inclusive strategy with CFA/MI 25 regardless of missingness type, missingness situation, and strategy. For the nonignorable cases caused by MAR-RS, MI was surprisingly robust with the linear type of missingness, yielding negligible levels of bias, relatively small standard error estimates, and acceptable levels of confidence interval coverage. On the other hand, the nonignorable cases by MAR-RS when the missingness type was convex lead to unacceptable levels of the parameter estimates, the problem of which was completely resolved by employing the inclusive strategy. Although the inclusive strategy either outperformed or performed as well as the restrictive strategy with the evaluation criteria, the inclusive strategy was unable to eliminate problems of nonignorable cases caused by MNAR. Even with only 10% missing rate, MNAR-included situations yielded irreparably problematic parameter estimates, and increasing missing rate tended to exacerbate the problems. Interestingly, the linear type of missingness overall yielded better results than the convex type, which was most obvious with the nonignorable cases resulting from MAR-RS. This may be due to the design of the missingness type; Convex type of missingness tended to delete the upper and the lower values, while the linear type remove only the lower values. Removing both ends of the data set will lead to more restriction of range problems than removing only one end of the data set. More restriction of range problems correspond to more problems of reducing variability, which in turn may result in more biased estimates, larger standard error estimates, and more disruptive confidence interval coverage of variability. Likewise, the study by Collins et al. (2001) reported that the convex type of missingness generally yielded more problems

26 Inclusive strategy with CFA/MI 26 with estimates of variability than the linear type with regard to bias, RMSE, and confidence interval coverage. Given this, it is not surprising that this CFA design was more adversely influenced by the convex type of missingness, since CFA or SEM utilizes covariance matrices. Second and related to the first finding, the degree of nonignorability influenced parameter estimation, which may be intuitively appealing but has been uninvestigated. Among the MNAR-included situations, the MNAR-MNAR combinations always yielded the most problematic estimates, followed by the other MNAR-included situations. Particularly, the MAR-MNAR combinations appeared more problematic than the MCAR-MNAR combinations, with regards to bias of parameter estimation when the missingness type was convex, and with regards to confidence interval coverage when the missingness type was linear and the missing rate was 20%. As MCAR is the more restrictive assumption than MAR, combinations containing MCAR appear to have resulted in better results than those containing MAR, when other conditions were set the same. Similarly, when nonignorability was introduced by the restrictive strategy, MCAR-MAR combinations tended to produce better parameter estimates than MAR-MAR combinations, particularly when the missingness type was convex. When compared to the nonignorable cases by MNAR, these nonignorable cases by MAR-RS indicated fewer problematic estimates. As discussed, the nonignorable cases resulting from MNAR unavoidably yielded problematic estimates. Considering the missing data generation procedures, it can be said that the nonignorable cases caused by MNAR has much stronger nonignorability than those by MAR-RS. As MNAR of a variable depended on the variable itself, once

27 Inclusive strategy with CFA/MI 27 the variable was removed out of analysis, there appears to be no way to recover this situation. On the other hand, MAR of a variable depended on whether its auxiliary variable was included in the imputation model or not. Omitting the auxiliary variables from the imputation model made the missingness mechanism as nonignorable, but these nonignorable cases by MAR-RS readily changed to ignorable cases once the inclusive strategy was employed. To summarize, MNAR-MNAR situations had the strongest nonignorability, followed by the other MNAR-included situations, and finally MAR-RS nonignorable cases. Third and as expected, the magnitude of correlation between auxiliary and research variables appears critical about whether the inclusive strategy will outperform the restrictive strategy. There was little difference between the two strategies when the correlations between observed variables ranged from.06 to.21 (Enders & Peugh, 2004). The study explored correlations between.48 and.72, the results of which generally showed the superiority of the inclusive strategy, but not as noticeably as in the study by Collins et al. (2001) whose higher correlation coefficient was.90. This finding was not surprising, since the higher the correlations between auxiliary and research variables are, the more information the auxiliary variables will carry regarding the variables of research interest, and therefore including these auxiliary variables will benefit the analysis. More specifically, there was little difference in parameter estimation between different magnitudes of factor loadings within a factor, partly because the factor loadings increased by.05 increment (e.g.,.60,.65, and.70). However, the higher correlation coefficients between.64 and.72 appear to have been more sensitive to the degree of nonignorability than the lower correlation coefficients

28 Inclusive strategy with CFA/MI 28 between.48 and.56. The variance estimates associated with the lower correlation coefficients yielded unacceptable levels of bias and confidence interval coverage only with MNAR-MNAR situations, while those associated with the higher correlation coefficients produced problematic levels of parameter estimates with the other MNAR-included situations. In other words, the parameter estimates associated with the higher correlation coefficients seem to have been more easily disrupted by any degree of MNAR-included nonignorability than those with the lower correlation coefficients. This may be due to the fact that the higher correlations were less successfully recovered after the variables were removed in any nonrandom fashion, and then were multiply-imputed. On the other hand, the lower correlations were relatively stable after deletion and imputation, and were disruptive only under the strongest nonignorability. Fourth, as the first SEM/MI simulation study, this study was also the first one to report model fit, employing one of the appropriate procedures by Meng and Rubin (1992), which was first proposed by Li et al. (1991). Compared to the previous ML studies, multiple imputation (MI) appears to yield quite stable model fit across condition combinations, and missing rate and sample size seem to have an effect on model fit. Specifically, the model fit of this SEM/MI study ranged between.04 and.10, but in most of the condition combinations the model fit was more close to the optimal.05 rate. In contrast, model fit with EM extremely fluctuated between.00 and 1.00, especially depending on the choice of sample size (Enders & Peugh, 2004), while model fit with FIML showed better results than EM, varying between.06 and.20 (Enders & Peugh, 2004) or between 0.02 and 0.08 (Enders & Bandalos, 2001).

29 Inclusive strategy with CFA/MI 29 Recommendations Recommendations for Research The conditions of this study such as missingness type and missing situation are mostly unverifiable in practice. Nevertheless, the results of this simulation study showed that the inclusive strategy in most condition combinations outperformed the restrictive strategy with regard to parameter estimation, which advocates the use of the inclusive strategy. Specifically, the findings correspond to the following recommendations for research. First, the inclusive strategy is recommended to be employed, particularly with the MAR-RS nonignorable cases. Second, MCAR better be implemented than MAR, given a choice, particularly when the missingness type is likely to be convex. Third, when the correlation between auxiliary and research variables is relatively high (e.g.,.48), the auxiliary variables need to be included in the imputation model. More specifically, the design of this study is applicable to the CFA context where item selection is required due to the presence of many alternative items (e.g., instrument development from an item bank). Including all the alternative items in the imputation model will increase the likelihood of meeting ignorability assumptions and thus producing correct inferences in the CFA, and possibly in the SEM context. However, an overly complex model is as much undesirable as an overly simple model, particularly in the context of CFA or SEM with ML. ML methods handle missing data problems simultaneously with model-fitting, and therefore all the parameters relating to auxiliary variables are estimated with ML. This complex model with ML may suffer from reduced effect size problems. In contrast, imputation is completed before model-fitting in MI, and thus MI does

30 Inclusive strategy with CFA/MI 30 not estimate the unnecessary parameters associated with auxiliary variables. To be precise, MI does not estimate any of the parameters of the analysis model, which is dealt with in the subsequent analysis phase such as CFA or SEM. In the same context, only MI can perform the inclusive strategy of this design, as auxiliary indicators of this study are under the same latent variable as well as indicators of research interest, and ML methods cannot execute this design. In conclusion, MI with the inclusive strategy before further analysis such as Cronbach s α will take care of the missing data problems better than ad hoc methods, especially in cases where there exist many alternative items and thus item selection is required. Recommendations for Practitioners A well-done imputation can solve a good many analysts future problems caused by incomplete data (Schafer, 1997). Although MI can be exceptionally useful with large-scale data sets (e.g., complex surveys), MI can benefit any analysis, except when missing rate is minimal. First, the imputation model should not hinder common model analyses. For instance, if a correlation between two variables is a common research interest, then the two variables should be included in the imputation model, and the correlation between them should not be restricted to a certain value such as 0 (Barnard & Meng, 1999; Schafer, 1997). Second, stratifying (Rubin, 1996; Wothke, 2000) or design (Schafer, 1997) variables should be in the imputation model. Examples can be strata of surveys such as gender, ethnicity, citizenship, or region of residence. Omitting the stratifying variables is very much likely to result in the violation of assumptions such as ignorability. Besides, the stratifying

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch. S05-2008 Imputation of Categorical Missing Data: A comparison of Multivariate Normal and Abstract Multinomial Methods Holmes Finch Matt Margraf Ball State University Procedures for the imputation of missing

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Multiple Imputation For Missing Data: What Is It And How Can I Use It?

Multiple Imputation For Missing Data: What Is It And How Can I Use It? Multiple Imputation For Missing Data: What Is It And How Can I Use It? Jeffrey C. Wayman, Ph.D. Center for Social Organization of Schools Johns Hopkins University jwayman@csos.jhu.edu www.csos.jhu.edu

More information

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

An Introduction to Multiple Imputation for Missing Items in Complex Surveys An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Sub-theme: Improving Test Development Procedures to Improve Validity Dibu

More information

SESUG Paper SD

SESUG Paper SD SESUG Paper SD-106-2017 Missing Data and Complex Sample Surveys Using SAS : The Impact of Listwise Deletion vs. Multiple Imputation Methods on Point and Interval Estimates when Data are MCAR, MAR, and

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Master thesis Department of Statistics

Master thesis Department of Statistics Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Missing Data in the Swedish National Patients Register: Multiple Imputation by Fully Conditional Specification Jesper Hörnblad

More information

Missing by Design: Planned Missing-Data Designs in Social Science

Missing by Design: Planned Missing-Data Designs in Social Science Research & Methods ISSN 1234-9224 Vol. 20 (1, 2011): 81 105 Institute of Philosophy and Sociology Polish Academy of Sciences, Warsaw www.ifi span.waw.pl e-mail: publish@ifi span.waw.pl Missing by Design:

More information

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are Running head: SELECTION OF AUXILIARY VARIABLES 1 Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal Felix Thoemmes Cornell University Norman Rose University

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children

Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Modern Strategies to Handle Missing Data: A Showcase of Research on Foster Children Anouk Goemans, MSc PhD student Leiden University The Netherlands Email: a.goemans@fsw.leidenuniv.nl Modern Strategies

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION Fong Wang, Genentech Inc. Mary Lange, Immunex Corp. Abstract Many longitudinal studies and clinical trials are

More information

Multiple imputation for handling missing outcome data when estimating the relative risk

Multiple imputation for handling missing outcome data when estimating the relative risk Sullivan et al. BMC Medical Research Methodology (2017) 17:134 DOI 10.1186/s12874-017-0414-5 RESEARCH ARTICLE Open Access Multiple imputation for handling missing outcome data when estimating the relative

More information

Strategies for handling missing data in randomised trials

Strategies for handling missing data in randomised trials Strategies for handling missing data in randomised trials NIHR statistical meeting London, 13th February 2012 Ian White MRC Biostatistics Unit, Cambridge, UK Plan 1. Why do missing data matter? 2. Popular

More information

The prevention and handling of the missing data

The prevention and handling of the missing data Review Article Korean J Anesthesiol 2013 May 64(5): 402-406 http://dx.doi.org/10.4097/kjae.2013.64.5.402 The prevention and handling of the missing data Department of Anesthesiology and Pain Medicine,

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Cahiers Recherche et Méthodes

Cahiers Recherche et Méthodes Numéro 1 Janvier 2012 Cahiers Recherche et Méthodes Multiple imputation in a longitudinal context: A simulation study using the TREE data André Berchtold & Joan-Carles Surís Jean-Philippe Antonietti &

More information

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods Merrill and McClure Trials (2015) 16:523 DOI 1186/s13063-015-1044-z TRIALS RESEARCH Open Access Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of

More information

Bayesian approaches to handling missing data: Practical Exercises

Bayesian approaches to handling missing data: Practical Exercises Bayesian approaches to handling missing data: Practical Exercises 1 Practical A Thanks to James Carpenter and Jonathan Bartlett who developed the exercise on which this practical is based (funded by ESRC).

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Published: Eekhout, I., de Vet, H.C.W., Twisk, J.W.R., Brand, J.P.L., de Boer, M.R.,

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Research Report DCSF-RW086 A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE) Andrea Piesse and Graham Kalton Westat Research Report No DCSF-RW086 A Strategy

More information

Graphical Representation of Missing Data Problems

Graphical Representation of Missing Data Problems TECHNICAL REPORT R-448 January 2015 Structural Equation Modeling: A Multidisciplinary Journal, 22: 631 642, 2015 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print / 1532-8007 online DOI: 10.1080/10705511.2014.937378

More information

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation Appendix 1 Sensitivity analysis for ACQ: missing value analysis by multiple imputation A sensitivity analysis was carried out on the primary outcome measure (ACQ) using multiple imputation (MI). MI is

More information

Missing Data: Our View of the State of the Art

Missing Data: Our View of the State of the Art Psychological Methods Copyright 2002 by the American Psychological Association, Inc. 2002, Vol. 7, No. 2, 147 177 1082-989X/02/$5.00 DOI: 10.1037//1082-989X.7.2.147 Missing Data: Our View of the State

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Complier Average Causal Effect (CACE)

Complier Average Causal Effect (CACE) Complier Average Causal Effect (CACE) Booil Jo Stanford University Methodological Advancement Meeting Innovative Directions in Estimating Impact Office of Planning, Research & Evaluation Administration

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

How should the propensity score be estimated when some confounders are partially observed?

How should the propensity score be estimated when some confounders are partially observed? How should the propensity score be estimated when some confounders are partially observed? Clémence Leyrat 1, James Carpenter 1,2, Elizabeth Williamson 1,3, Helen Blake 1 1 Department of Medical statistics,

More information

Missing data in medical research is

Missing data in medical research is Abstract Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account

More information

An Introduction to Missing Data in the Context of Differential Item Functioning

An Introduction to Missing Data in the Context of Differential Item Functioning A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Sociological Methods and Research, 2013, 42(1), 105-138 Should a Normal Imputation Model Be Modified to Impute Skewed Variables? Paul T. von Hippel Abstract (169 words) Researchers often impute continuous

More information

Missing Data and Institutional Research

Missing Data and Institutional Research A version of this paper appears in Umbach, Paul D. (Ed.) (2005). Survey research. Emerging issues. New directions for institutional research #127. (Chapter 3, pp. 33-50). San Francisco: Jossey-Bass. Missing

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

Methods for Computing Missing Item Response in Psychometric Scale Construction

Methods for Computing Missing Item Response in Psychometric Scale Construction American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Title. Description. Remarks. Motivating example. intro substantive Introduction to multiple-imputation analysis

Title. Description. Remarks. Motivating example. intro substantive Introduction to multiple-imputation analysis Title intro substantive Introduction to multiple-imputation analysis Description Missing data arise frequently. Various procedures have been suggested in the literature over the last several decades to

More information

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia Paper 109 A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia ABSTRACT Meta-analysis is a quantitative review method, which synthesizes

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error I. Introduction and Data Collection B. Sampling In this section Bias Random Sampling Sampling Error 1. Bias Bias a prejudice in one direction (this occurs when the sample is selected in such a way that

More information

Handling Missing Data in Educational Research Using SPSS

Handling Missing Data in Educational Research Using SPSS Handling Missing Data in Educational Research Using SPSS A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Jehanzeb

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

Published online: 27 Jan 2015.

Published online: 27 Jan 2015. This article was downloaded by: [Cornell University Library] On: 23 February 2015, At: 11:27 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Small Sample Bayesian Factor Analysis. PhUSE 2014 Paper SP03 Dirk Heerwegh

Small Sample Bayesian Factor Analysis. PhUSE 2014 Paper SP03 Dirk Heerwegh Small Sample Bayesian Factor Analysis PhUSE 2014 Paper SP03 Dirk Heerwegh Overview Factor analysis Maximum likelihood Bayes Simulation Studies Design Results Conclusions Factor Analysis (FA) Explain correlation

More information

Multiple imputation for multivariate missing-data. problems: a data analyst s perspective. Joseph L. Schafer and Maren K. Olsen

Multiple imputation for multivariate missing-data. problems: a data analyst s perspective. Joseph L. Schafer and Maren K. Olsen Multiple imputation for multivariate missing-data problems: a data analyst s perspective Joseph L. Schafer and Maren K. Olsen The Pennsylvania State University March 9, 1998 1 Abstract Analyses of multivariate

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Adjustments for Rater Effects in

Adjustments for Rater Effects in Adjustments for Rater Effects in Performance Assessment Walter M. Houston, Mark R. Raymond, and Joseph C. Svec American College Testing Alternative methods to correct for rater leniency/stringency effects

More information

To link to this article:

To link to this article: This article was downloaded by: [Vrije Universiteit Amsterdam] On: 06 March 2012, At: 19:03 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation

Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation Psicológica (2018), 39, 88-117. doi: 10.2478/psicolj-2018-0005 Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation Julianne M. Edwards *1

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials

Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials STATISTICS 303 Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials Craig H. Mallinckrodt, PhD Research Advisor, Lilly Research Laboratories, Eli Lilly and Company,

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Running Head: BAYESIAN MEDIATION WITH MISSING DATA 1. A Bayesian Approach for Estimating Mediation Effects with Missing Data. Craig K.

Running Head: BAYESIAN MEDIATION WITH MISSING DATA 1. A Bayesian Approach for Estimating Mediation Effects with Missing Data. Craig K. Running Head: BAYESIAN MEDIATION WITH MISSING DATA 1 A Bayesian Approach for Estimating Mediation Effects with Missing Data Craig K. Enders Arizona State University Amanda J. Fairchild University of South

More information

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan Sequential nonparametric regression multiple imputations Irina Bondarenko and Trivellore Raghunathan Department of Biostatistics, University of Michigan Ann Arbor, MI 48105 Abstract Multiple imputation,

More information

THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES MAURICE KAVANAGH DISSERTATION

THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES MAURICE KAVANAGH DISSERTATION THE EFFECT OF NUMBER OF IMPUTATIONS ON PARAMETER ESTIMATES IN MULTIPLE IMPUTATION WITH SMALL SAMPLE SIZES by MAURICE KAVANAGH DISSERTATION Submitted to the Graduate School of Wayne State University, Detroit,

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Small-variance priors can prevent detecting important misspecifications in Bayesian confirmatory factor analysis Jorgensen, T.D.; Garnier-Villarreal, M.; Pornprasertmanit,

More information