Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are

Size: px
Start display at page:

Download "Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are"

Transcription

1 Running head: SELECTION OF AUXILIARY VARIABLES 1 Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal Felix Thoemmes Cornell University Norman Rose University of Tuebingen Author Note The authors would like to thank the participants of the colloquium of the Methodology Center at Pennsylvania State University. Inquiries to this article should be addressed to the first author, Felix Thoemmes, MVR G62A, Cornell University, Ithaca, NY 14853, felix.thoemmes@cornell.edu.

2 SELECTION OF AUXILIARY VARIABLES 2 Abstract The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved, consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion, i.e., taking all available auxiliary variables. However, neither approach accounts for the fact that under a wide range of circumstances there is a class of variables that, when used as auxiliary variables, will always increase bias in the estimation of parameters from data with missing values. In this paper we show that this bias exists, quantify it in a simulation study, and discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables. Keywords: missing data, auxiliary variables,multiple imputation, full information maximum likelihood

3 SELECTION OF AUXILIARY VARIABLES 3 Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal Introduction The presence of missing data is a prevalent problem in social science research (Peugh & Enders, 2004). Given that a large portion of social science studies are conducted outside the confines of a laboratory, the threat of suffering missing data due to non-compliance or attrition is even more pronounced. The pervasiveness of this problem has triggered much research during the last 30 years. Rubin (1976) laid the foundation of modern missing data theory which has culminated in sophisticated methods to deal with missing values, specifically the use of full-information maximum likelihood (FIML) and multiple imputation (MI). For an overview see e.g., Enders (2010). Both of these so-called modern missing data techniques are expected to yield unbiased estimates of parameters in the presence of missing data, given that certain assumptions about missingness hold. It should be noted that especially MI, while conceptually straightforward (Rubin, 1996), can be conducted with various different techniques, see e.g., Schafer (1999), King, Honaker, Joseph, and Scheve (2001), van Buuren and Groothuis-Oudshoorn (2011), or Raghunathan, Lepkowski, Hoewyk, and Solenberger (2001). However, despite computational differences, all techniques, whether they may be FIML or variants of MI, rely on the same, untestable assumptions, notably, the missing at random (MAR) assumption (Rubin, 1976), which we will define more formally later in the manuscript. The goal of this paper is to critically examine current recommendations to increase the plausibility of MAR, especially in regards to the selection of auxiliary variables. We argue that the current recommendations are incomplete and simply ignore the possibility of complex relationships between substantive analysis variables and variables that are solely used to improve the missing data estimation, so-called auxiliary variables. Further, we believe that the complexities of the assumptions are not widely appreciated among social science researchers and many quantitative scientists alike, who have long believed that inclusion of as many auxiliary variables as possible is a safe strategy

4 SELECTION OF AUXILIARY VARIABLES 4 to asymptotically achieve or approximate unbiasedness. We will show in a small example and a larger simulation study that this strategy is not guaranteed to yield unbiased results and that biases due to missing data and the use of auxiliary variables are much more complex than previously thought. As a result, the use of modern missing data techniques, while laudable, does often not guarantee that bias in studies with missing data has been adequately dealt with. We will first review classic missingness mechanisms and discuss which conditional independencies these conditions imply and how these independencies can be encoded in a graph. Further, we demonstrate that there are situations and classes of variables that should not be used as auxiliary variables in FIML or MI as they tend to increase bias. We will quantify the bias in our simulation studies, and suggest possible ways to avoid it. Finally, we will discuss implications for applied research and offer an alternative framework to think about and communicate assumptions of missing data problems. Missing data mechanism We begin by reviewing the classic mechanisms defined by Rubin (1976): missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In our overview we use a slightly modified version of the notation employed by Schafer and Graham (2002). In addition, we also express missing mechanisms using conditional independence statements. In conjunction with the conditional independence statements, we present graphical displays to illustrate the mechanisms. Using graphs to illustrate how missingness relates to other variables in a model is not a novel approach and has in fact been used in popular texts and articles to aid understanding of the mechanisms (Enders, 2010; Schafer & Graham, 2002). In this paper however, we do not use graphs simply as illustrations, but also use formal graph theory (Pearl, 2000) to derive certain results.

5 SELECTION OF AUXILIARY VARIABLES 5 MCAR Following the notation of Schafer and Graham (2002), we denote an N K matrix Y. The rows of Y represents the cases n = 1,..., N of the sample and the columns represent the variables i = 1,..., K. Y can be partitioned into an observed part, labeled Y obs, and a missing part Y mis, which yields Y =(Y obs, Y mis ). Further, we denote an indicator matrix of missigness, R, whose elements take on values of 0 or 1, for observed or missing values of Y, respectively. Accordingly, R is also an N K matrix. Each variable in Y can therefore have both observed or unobserved values. Missing completely at random (MCAR) is the most restrictive assumption, but, when fulfilled, the least problematic. It states that the unconditional distribution of missingness P (R) is equal to the conditional distribution of missingness given Y obs and Y mis, or simply Y. P (R Y ) = P (R Y obs, Y mis ) = P (R) (1) These equalities of probabilities imply (can be expressed as) conditional independence statements, here in particular R (Y obs, Y mis ). (2) The MCAR condition is therefore fulfilled when the missingness has no relationship with either the observed and unobserved part of Y. In an applied research context we could imagine MCAR being fulfilled if the missing data arose from a purely accidental (random) process, like dropping a single sheet from a questionnaire. In other words, the probability of missingness is related only to factors that are completely unrelated to any other variable in the model. MCAR is rare in applied research and usually does not hold, unless it has been planned by the researcher in so-called missingness by design studies (Graham, Taylor, Olchowski, & Cumsille, 2006). When MCAR holds, even simple techniques, like listwise deletion will yield unbiased estimates (Enders, 2010), even though it might still not be advisable to use these simple methods due to loss in statistical power. As Gelman and Hill (2007) described and Raykov (2011) more formally showed, MCAR cannot be tested

6 SELECTION OF AUXILIARY VARIABLES 6 empirically, and homogeneity of means, variances, or more generally distributions, of observed variables across missing data patterns constitutes only necessary, but not sufficient evidence for MCAR. The inability to directly test MCAR can also be seen by the fact that it posits independence assumptions about quantities that are by definition unobserved, here in particular Y mis. Before we proceed further, it is necessary to address the graphical displays that we will be using. First, they are constructed as so-called direct acyclic graphs (Pearl, 2000), which we will abbreviate as DAGs. DAGs are widely used in epidemiology (Greenland, Pearl, & Robins, 1999; Hernán, Hernández-Díaz, Werler, & Mitchell, 2002; VanderWeele & Robins, 2007; VanderWeele & Shpitser, 2011), medicine (Merchant & Pitiphat, 2002; Shrier & Platt, 2008), computer science (Koller & Friedman, 2009; Pearl, 1995, 2000; Textor & Liśkiewicz, 2011) and other fields. They also have been used to examine missing data situations (Daniel, Kenward, Cousens, & De Stavola, 2011). Researchers who are familiar with structural equation models (SEM) will also feel familiar with DAGs, however there are some differences (for a complete overview of differences refer to Shadish and Sullivan (2012)). Briefly explained, in a DAG we use the ε terms, the so-called disturbance terms, to denote all unmeasured variables that may have an effect on the variable that is endowed with this ε term. Note that these disturbance terms are not identical to regression residuals that are by definition uncorrelated with variables that were used to predict the variable with the ε term. Further, the DAG is completely non-parametric and encodes conditional independencies among the variables displayed. Precisely because of this ability to encode conditional independencies are DAGs well suited to express missing data mechanisms (which can be expressed as such conditional independencies, as we have shown earlier in the example of MCAR). We will use DAGs to express conditional independencies that are prescribed by different missingness mechanisms and in doing so, show how novel insights about missingness problems can be gathered. In Figure 1 we present a graphical display of MCAR for the simple case in which a

7 SELECTION OF AUXILIARY VARIABLES 7 single variable X has an effect on a unidimensional variable Y. In this simple case, X is completely observed and only Y suffers from missingness. Whether data on Y is missing is encoded by the indicator R Y in the graph. We use an additional subscript for R here to denote that this missingness indicator pertains only to variable Y. Note that we could have visually partitioned Y in the graph into Y o bs and Y m is, but for clarity simply denote it as Y. In this example equation 2, which expresses the condition that needs to hold for MCAR, can be written as R Y (X, Y ). Independence relations in DAGs are expressed as so-called d-separation statements. d-separation is a graphical criterion that can be applied to DAGs to infer independence relations among variables. In short, if two variables are said to be d-separated, there exists no traceable, unblocked path in the diagram between the variables. Conversely, if two variables are d-connected, there exists a traceable and unblocked path between the variables. A traceable path is defined as any path that connects two variables in a graph. It is not of importance for the definition of a path whether the segments of the path have arrows pointing in one or the other direction. To examine d-separation one examines whether all paths are open or blocked. A path is said to be blocked if one conditions on a variable in the path that acts as a mediator, i.e., takes on the form X or X, or is an arrow-emanating variable, i.e., takes on the form, X. Further, a path is blocked if one does not condition on a variable that has two arrows pointing in it, i.e., takes on the form X. Such a variable is usually called a collider variable (Pearl, 2000). If two variables are said to be d-connected there exists at least one traceable path between them that has not been blocked. Being d-connected implies that the two variables are stochastically dependent on each other. Pearl (2000) has provided a proof that variables in a graph that are d-separated are stochastically independent from each other, regardless of the functional form of the relationships among the variables in the graph. For a more thorough introduction to d-separation for social scientist, consult Hayduk et al. (2003) or the original text by Pearl (2000).

8 SELECTION OF AUXILIARY VARIABLES 8 In the graph in Figure 1 we can see that there is only a single arrow pointing to R Y from the disturbance term ε R, meaning that missingness arises only due to unobserved factors. Further, these unobserved factors have no association with any other variable or disturbance term in the model, as can be seen by the fact that ε R is unassociated with other parts of the model. In this graph, there is no traceable path between Y and R Y (or X and R Y ) and they are said to be d-separated without having to condition on any other variables, implying unconditional stochastic independence between the variables Y and R Y (as defined in equation 1), and therefore the missing data mechanism is MCAR. So far we have used the the expression to condition on in the context of missing data problems this relates to observing and using a variable in a FIML or multiple imputation model. MAR A somewhat less restrictive condition is missing at random (MAR). MAR states that the conditional distribution of missingness, given the observed part Y obs is equal to the probability of missingness, given the observed and the unobserved part (Y obs, Y mis ). P (R Y ) = P (R Y obs, Y mis ) = P (R Y obs ). (3) These equalities of probabilities again imply (can be expressed as) conditional independence statements, here in particular R Y mis Y obs. (4) In words, MAR states that the missingness is stochastically independent of the unobserved variables, whereas dependencies between observed variables and missingness are allowed. In an applied research context, we could imagine that missingess is caused by certain observed variables that may also have an effect on important analysis variables. For example, missigness on an achievement measure could be caused by motivation (or lack thereof). Further we can assume that motivation has also an effect on achievement. MAR is an important condition, because when it holds, modern estimation techniques (MI and FIML)

9 SELECTION OF AUXILIARY VARIABLES 9 yield unbiased results. Just as MCAR, MAR cannot be tested empirically, as it also posits conditional stochastic independence assumptions among quantities that are by definition unobserved, specifically, Y mis. Returning to the example with variable X, the unidimensional variable Y and the respective missing indicator R Y, the MAR condition (see equation 4) implies the conditional stochastic independence R Y Y X. In Figure 2 (a) we show the simple situation in which MAR holds. In this figure, Y and R Y are d-connected, via the path Y X R Y. However, if one conditions on X, this path becomes blocked and Y and R Y are now d-separated, implying conditional stochastic independence R Y Y X, as similarly defined in equation 4, and therefore MAR holds, as long as one has observed X and uses it in the estimation of Y in a FIML framework, or uses it as a predictor variable in an MI framework. Often, researchers use variables to predict missingness that may not be of substantive interest. Such variables are usually called auxiliary variables, because they are not of theoretical interest to the applied researcher but aid in the estimation of the missing data. In the second graphical example in 2 (b), we explicitly describe an auxiliary variable and how it can help to create conditional independence between the missingness and the variable with missing value, thereby implying MAR. We use the same set of variables as in Figure 1 (a), but introduce a new variable A, which in this example must be used as an auxiliary variable for an unbiased estimate of the relationship of interest between X and Y in the presence of missing data on Y. In Figure 2 (b), Y and R Y are d-connected, via the path Y A R Y and via the path Y X R Y. However, if one conditions on A, the first path becomes blocked, and if one conditions on X, the second path becomes blocked and Y and R Y are now d-separated, implying conditional stochastic independence R Y Y (A, X), and therefore MAR holds. Note that A in the graph could be a multidimensional set of variables that all exhibit the same structure.

10 SELECTION OF AUXILIARY VARIABLES 10 MNAR Finally, missing not at random (MNAR) is the least stringent assumption, however the most problematic, as even FIML and MI will typically, though not always for all parameter estimates, yield biased results. MNAR is characterized by the probability of missingness being dependent on both the observed part, Y obs, and the unobserved part, Y mis. That is, P (R Y obs, Y mis ) P (R Y obs ). (5) No conditional independencies are implied be equation 5. In an applied research context, we could consider different ways that MNAR could arise. One situation would be if missingness was caused by the variable with missing data itself, e.g., participants with a very high income are more likely to not report their income. This situation is depicted in Figure 3 (a), in which Y and R Y are directly connected by a path. Y and R Y are said to be d-connected through the direct path Y R Y. Two adjacent, connected variables in a graph, can never be d-separated. Hence, no conditional stochastic independence can arise, and MNAR is present. A similar MNAR situation would arise when an unobserved variable has an effect on both the missingness R Y and Y. In an applied research context, this could happen whenever a variable that influences missingness also has an effect on analysis variables, but the variable has not been measured and is therefore omitted. This omitted variable can be displayed as a latent, unobserved variable in the graph, or simply as correlated disturbance terms. Figure 3 (b) displays such a situation in which an omitted variable influences both Y and R Y. Here, Y and R Y are d-connected via the path Y L 1 R Y. This path cannot be blocked via conditioning, because no observed variables reside in the middle of the path. Again, no stochastic conditional independence can be achieved through conditioning and MNAR holds. Note that the variable L 1 in the graph should not be confused with a modeled, latent variable in a SEM, but rather is a simple depiction of an unobserved variable. To make this clear, we deviate slightly from regular symbolic language of DAGs and SEM graphs and used a dashed outline for the unobserved variable.

11 SELECTION OF AUXILIARY VARIABLES 11 Equivalence of missing data mechanisms and graphs In the previous section we showed how the classic missingness mechanisms can be expressed via graphs that encode conditional independencies and applied the graph-theoretic concept of d-separation. In summary, when a variable Y and its associated missing indicator R Y are d-connected, MNAR holds and bias will typically emerge. If Y and R Y can be d-separated using any set of other observed variables, then MAR holds, and parameters related to Y can be estimated without bias, when using methods that rely on MAR (FIML,MI) and using those variables that are needed to d-separate Y and R Y in the imputation or analysis model, respectively for MI and FIML. A special case arises when Y and R Y are d-separated given no other variables (unconditionally independent), which maps on to the classic MCAR condition. As we shall see, relying on the graph-theoretic concept of d-separation will allow us to further determine, whether any given auxiliary variable is needed to achieve d-separation of Y and R Y or whether a variable would in fact make these two variables d-connected and induce conditional dependencies. We believe that herein lies an important advantage of using graphical models as we can easily spot auxiliary variables that may be bias-reducing or - as we will show - bias-inducing, something that is not apparent when relying on the classic conditional independence notation that has been used to describe the missing data mechanisms. Current approaches While all assumptions of the missing mechanisms are important, insofar as they prescribe which methods will yield biased or unbiased estimates, MAR is an assumption that is necessary for the two missing data approaches that are considered state-of-the-art, FIML and MI. A pertinent question is therefore how a researcher can achieve MAR or at least make MAR plausible in his or her study. As seen in equation 3 and 4 and in the accompanying graphs it is necessary to include all variables in the imputation or FIML model that make Y and R Y independent of each other. In other words, researchers need to

12 SELECTION OF AUXILIARY VARIABLES 12 capture all variables that they believe have a direct or indirect effect on the probability of being missing and at the same time a direct or indirect effect on the variable with missing data. Some of these variables might already be part of the analytic model, others might not be part of the analytic model, but might be needed to satisfy the MAR assumption, i.e., auxiliary variables. We now describe current approaches that aim to achieve MAR and present an example that illustrates potential problems with these approaches. Inclusive approach The so-called inclusive approach (Collins, Schafer, & Kam, 2001) to achieve MAR directs researchers to include many auxiliary variables in their imputation model (or in their FIML estimation, following guidelines by Graham (2003)). The reasoning behind the inclusive strategy is as follows: if many variables are included it becomes less likely that variables that are both causes of the missingness and the analytic variables with missing data are omitted. Such omission would be harmful as it would destroy the conditional independence posited in MAR and induce bias. Collins et al. (2001) showed that bias in means, variances, and regression estimates can be substantial if this kind of variable is omitted. A second rationale for adopting an inclusive strategy is that the inclusion of variables that may not be causes of the missingness or causes of the analytic variables with missing data, was shown to be far from being harmful[,]...at worst neutral, and at best extremely beneficial (Collins et al., 2001, p. 349). In particular Collins et al. (2001) examined the influence of including variables that are completely uncorrelated to missingness or analytic variables with missing data (so called trash variables ), or only related to analytic variables with missing data but not with the missingness itself. Completely uncorrelated variables did not have any impact on bias, and variables that were only correlated with Y, were shown to be able to attenuate bias in MNAR situations and reduce standard errors.

13 SELECTION OF AUXILIARY VARIABLES 13 Data-driven approach Even if one fully acknowledges the benefits of an inclusive strategy, such a strategy can reach its limits, especially when applied to large-scale datasets, which may contain hundreds of variables. If analytic models include many variables and many auxiliary variables are added, both MI and FIML will likely encounter problems in the convergence of models. To mitigate this problem it has been suggested to examine data for the inclusion of variables as auxiliaries. Schafer (1997) suggest that variables make good candidates for auxiliary variables if they are related to the missingness or the analytic variable that exhibits missingness. The rationale behind this advice is straight-forward: a variable that is completely uncorrelated with the probability of missing, cannot induce any dependencies between R Y and Y. Likewise, a variable that is completely uncorrelated with the analysis variable with missing values can also not induce any dependencies between R Y and Y. As a demonstration of this principle, consider Figure 4 in which three auxiliary variables A 1, A 2, and A 3 are added to a model in which X d-connects Y and R Y via Y X R Y and A 1 d-connects Y and R Y via Y A 1 R Y. The two variables A 2 and A 3 do not d-connect Y and R Y and conditioning on them is therefore not needed to render Y and R Y conditionally independent, and hence fulfilling the MAR condition. Simply using X and A 1 is sufficient in this example. 1 The data-driven approach advises us to screen our set of potential auxiliary variables as to whether they are related (usually examined using correlations) with any of the analysis variables, or any of the missing value indicator variables. Variables that are related to either or both should be included as auxiliary variables, while variables that fall below a certain correlation threshold to either, should not be used. Particular guidelines on the inclusion and exclusion of auxiliary variables were formulated by Van Buuren, Boshuizen, Knook, et al. 1 Note that if the disturbance terms of A 2 and A 3 were correlated (e.g., due to an unobserved variable that has a relationship to both of these variables), an active path Y A 2 ε A2 ε A3 A 3 R Y would be present, which could be blocked by either conditioning on A 2, A 3, or both. Hence at least one of these variables would need to be included in a FIML or imputation model.

14 SELECTION OF AUXILIARY VARIABLES 14 (1999) who recommend to include a variable if the correlation of it with either missingness or the variable with missing data exceeds ±.1 (or any other chosen threshold, e.g., Enders (2010) suggests correlations with the analysis variables greater than ±.4). The implicit assumption is that variables that are correlated even lower than the chosen threshold will have little power to induce any dependencies, and that variables that are correlated higher, are assumed to induce biases in the estimation of parameters in the presence of missing data. Generally, the advice to include auxiliary variables in missing data problems is sound and has, in both simulations studies (Collins et al., 2001) and theoretical work (Schafer, 1997), been shown to be useful. However, both the inclusive strategy and the data-driven approach ignore the possibility that there are certain instances and classes of variables that should not be used as auxiliary variables, because they induce bias in the estimation of parameters in the presence of missing data, by destroying the conditional independence between Y and R Y, hence violating MAR. We now turn to these situations and variables and show, using illustrative examples and simulations, that this bias can become potentially large, if ignored. Bias-enhancing auxiliary variables Consider first a simplified illustrative example of a single variable Y with missing data, a missing data indicator R Y, and two potential auxiliary variables A 1 and A 2 that are at the disposal of the applied researcher. In addition, two unobserved variables L 1 and L 2 are part of the true data-generating model. The full model is displayed in Figure 5. An initial reaction to this model might be that the unobserved variables L 1 and L 2 make this an MNAR situation and that some bias would be expected and is not surprising. However, the situation is more subtle. Variable A 1 indeed induces conditional dependencies between Y and R Y via the path Y A 1 R Y and therefore biases the estimates of Y, in the presence of missing data. Therefore, if one uses A 1 as an auxiliary variable, bias due to A 1 will be eliminated, as the biasing path is blocked. Variable A 2 on the other hand, even

15 SELECTION OF AUXILIARY VARIABLES 15 though spuriously correlated with Y and R Y, does not induce conditional dependencies via the path Y L 1 A 2 L 2 R Y and therefore cannot bias the estimates of Y no matter what values the constituent path coefficients would take on. This is because A 2 is a collider variable on this path and not conditioning on it, closes this path and does not induce any dependencies between Y and R Y. What however happens when A 2 is also used as an auxiliary variable, along with A 1? The inclusion of A 2 will actually destroy the conditional independence that was achieved earlier with the inclusion of A 1 and induce an MNAR situation. The path Y L 1 A 2 L 2 R Y that was initially blocked becomes open when A 2 is conditioned on (used as an auxiliary variable). To illustrate this point further using data, we simulated a single dataset based on the model in Figure 5. The data generation is fully described in the first simulation study below. Briefly described, we chose a large sample size of n = All continuous variables were multivariate normally distributed with mean of 0 and variance of 1. Path coefficients in the model were completely standardized and the size of the path coefficients was chosen so that the total R 2 (or the respective McKelvey-Zavoina pseudo-r 2 (McKelvey & Zavoina, 1975)) of every single dependent variable in the model (Y, A 2, R Y ) was identical to 50%. We chose the sign of the path coefficients so that the direction of bias due to the omission of A 1 and the bias due to the inclusion of A 2 was in the same direction and not incidentally offsetting each other. The amount of missing data was set to 50%. We estimated the mean and standard deviation of the variable Y using a listwise deletion approach, FIML estimation in Mplus (Muthén, 2011) and lavaan (Rosseel, 2012) using only A 1 as the auxiliary variables, using only A 2 as the auxiliary variable, or using both A 1 and A 2 as auxiliary variables. Auxiliary variables in the FIML estimation were included using the Mplus auxiliary command, which automatically fits a model suggested by Graham (2003). We also used mice (van Buuren & Groothuis-Oudshoorn, 2011) to generate 5 multiple imputations whose results were pooled following standard recommendations (Rubin, 1976). As expected, and previously reported by Collins et al. (2001), results of FIML and MI did not differ

16 SELECTION OF AUXILIARY VARIABLES 16 substantially when the same set of auxiliary variables were used. We only report results of the FIML estimation in Table 1. In the single simulated dataset the completely observed data of Y had a mean of.03, and a standard deviation (SD) of When using listwise deletion, the mean of Y was.19, and the SD was.98. Not surprisingly we observed bias in the means, as would be expected under a MAR situation in which missingness was induced through a linear function of other variables. Using A 1 as an auxiliary variable and estimating the mean of Y with FIML estimation yielded a mean of.06. Using A 1 does a very good job of reducing bias. The relative percent reduction of bias compared to the listwise model was %. Using A 2 as an auxiliary variable on the other hand actually increases bias! The estimated mean of Y was now.30, with a resulting percent bias amplification of % compared to the listwise results. Finally, when using both A 1 and A 2 as auxiliary variables, the mean of Y was estimated to be.14, resulting in a bias reduction of a mere %. We observed that using both variables as auxiliary variables was worse than using A 1 alone. This result may not be obvious when considering the formulas for MAR or MNAR, and in fact it goes counter to the advice that an auxiliary variable can be at worst neutral. Clearly, this auxiliary variable was not neutral, but highly bias-inducing. When one uses a graph to encode the structural relationships between the auxiliary variables and missingness and analysis variables, respectively, this result however is expected and can be directly seen by the fact that conditioning on A 2 d-connects Y and R Y by opening a previously blocked path. A single simulated dataset is seldom a convincing argument, however it can serve as a departing point for a more developed argument. First, it shows that an auxiliary variable can increase bias in the estimation of parameters in the presence of missing data. Second, a bias-inducing variable cannot be distinguished from a helpful auxiliary variable by examining correlations with analysis variables and missingness indicators. In fact, in this example, the variable A 2 posed as a perfectly innocent and potentially very helpful auxiliary variable. In the complete dataset A 2 was both significantly correlated with the analysis variable Y

17 SELECTION OF AUXILIARY VARIABLES 17 (r =.26, p <.001) and the missing data indicator R Y (point-biserial correlation r pb =.25, p <.001). Using inclusion criteria that rely solely on correlations would incorrectly lead to the inclusion of A 2 in the set of auxiliary variables. In addition, a simple example like this one helps to link what could simply be a mathematical curiosity to an applied context. To make this illustrative example more concrete, consider that Y, the variable with missing data, is a measure of mathematical ability with a missingness indicator R Y. For this example, we assume that MAR holds and that there is no direct path from Y to R Y. Variable A 1 is a measure of motivation of the participant that has been observed and is used in the analysis as a potential auxiliary variable. Specifically, more motivated participants score higher on the math achievement test, and are less likely to have missing data. Consider further that A 2 is the income of the participant, another variable that was assessed as part of the study. The two unobserved variables L 1 and L 2 are IQ and gender of the participant, respectively. Note that we are assuming in this model that IQ and gender are in fact uncorrelated (which seems like a tenable assumption). The model further expresses that participants with higher IQ scores also score higher on math achievement, and that participant s gender has an influence on missingness (maybe one gender group was more likely to skip certain items). While this example is admittedly somewhat artificial due to it s constrained nature, we believe that it is not entirely implausible and suggests that auxiliary variables of the type as A 2 in our example could in fact be lurking among seemingly benign potential auxiliary variables. Henceforth, we will refer to these variables as collider auxiliary variables. Research questions Having established in a single example that auxiliary variables can induce bias we set forth to answer several research questions. 1. First, we are interested in the absolute magnitude of bias that can be induced when using collider auxiliary variables as a function of the magnitude of the constituent paths that

18 SELECTION OF AUXILIARY VARIABLES 18 connect a collider auxiliary variable to missingness and analysis variables. In addition, we want to put this magnitude into context and contrasts it with bias that is induced due to the omission of a helpful auxiliary variable. This latter form of bias has been examined before and we only include it to provide a benchmark for the bias that we expect to observe with the inclusion of a collider auxiliary variable. Earlier research by Greenland (2003) in the area of confounding in causal inference suggests that the magnitude of bias due to conditioning on a collider, especially of the kind that we presented in our example, is usually smaller than omitting a confounder. We therefore suspect that bias due to including a collider variable as an auxiliary variable will be noticeable, but smaller in magnitude than omitting a true confounding auxiliary variable (i.e., a variable that is directly or indirectly causing both missingness and analysis variables with missing data). 2. The second research question examines behavior of auxiliary variables in data situations that are inherently MNAR. In the MAR cases considered in the first simulation study, the conditional independence between missingness and analysis variables with missing data can always be created by using some observed variables. Hence there is an expectation that including the collider auxiliary variable will necessarily increase bias. by disturbing the conditional independence. In the MNAR case collider auxiliary variables are expected to behave differently, insofar as the relationship that they induce between the missingness and variables with missing data can either enhance or reduce the already existing relationship between missingness and analysis variables with missing data. In a similar fashion, we will also explore the behavior of auxiliary variables that are directly related to both missingness and analysis variables. Simulations studies Simulation study 1.1 Our first simulation study explores the absolute magnitude of bias that can be induced when using a collider auxiliary variable in a MAR situation. The simulation study roughly

19 SELECTION OF AUXILIARY VARIABLES 19 followed Collins et al. (2001), in terms of data-generation and evaluation criteria. Generally speaking, data are first generated under a specific model, then missing data are imposed based on a described mechanism, then parameters are estimated using listwise deletion and FIML with auxiliary variables. Lastly, results of replications are pooled within condition and performance criteria assessed. While it is possible to examine bias in many different parameters of interest (means, variances, skew, regression coefficients, factor loadings, etc.), we only focus on estimates of the population mean. The reason behind this choice was that mean responses (potentially across different groups) are still one of the most widely used measures to describe research phenomena in the social sciences. The examination of regression coefficients is left to future studies and is briefly mentioned in the discussion. Data generation and analysis. The data-generating model for simulation 1.1 is shown in Figure 6. In the model, a single independent variable Y is generated with missing data, indicated by R Y. Auxiliary variable A 1 is spuriously correlated with the probability of missing and the outcome Y, via two unobserved, uncorrelated variables L 1 and L 2. In the model Y and R Y are d-separated but become d-connected as soon as A 1, the collider auxiliary variable, is used in MI or FIML. All continuous variables were multivariate normally distributed and completely standardized by fixing the total variance of each variable to 1 and setting means to 0. We did not vary sample size, but chose a single constant sample of 500. This single sample size was also chosen by other authors in similar simulations (Collins et al., 2001; Saris, Satorra, & Van der Veld, 2009), as a somewhat large, but still reasonable sample size to consider. Furthermore, changes in sample size usually yield predictable results when other factors are held constant, namely that standard errors decrease with increased sample size. We also did not vary the amount of missing data, but fixed it at a relatively high value of 30%, which was in-between the two values chosen by Collins et al. (2001). Varying the amount of missing data is often not very interesting as results of such variation have previously been shown to yield expected results (bias gets worse as missing data increases). All path coefficients in the data-generating model, labeled α were chosen so that the

20 SELECTION OF AUXILIARY VARIABLES 20 uniquely explained variance in the outcome variable that these paths were connected to was set to a particular value. Paths coefficients were set at 0,.224,.387,.500,.592 and.671. This corresponds to uniquely explained variance of 0%, 5%, 15%, 25%, 35%, and 45%, respectively. See the Appendix for details on how missingness was generated and how explained variance in R Y was defined. Finally, we varied the sign of the coefficient labeled α (positive or negative). This sign change of a single path of the constituent paths of the collider auxiliary variable does not alter the magnitude of the bias that is induced, but alters the direction. Note that it is not of importance which of the four paths α is varied in sign, because the direction of bias is determined by the product of all four constituent paths (Pearl, 2000). Finally note that conditions in which all paths were set to 0 correspond to a pure MCAR condition. In this simulation design we varied all paths labeled α simultaneously. Our primary interest was to observe overall bias and not bias due to differential changes in constituent paths. This simulation design thus yielded 5 conditions with a positive sign, 5 conditions with a negative sign, and one condition in which all paths were set to 0, for a total of 11 conditions. We replicated each condition 1000 times. All simulations were conducted using R (R Development Core Team, 2011) and the following packages: lavaan (Rosseel, 2012), MASS (Venables & Ripley, 2002), mice (van Buuren & Groothuis-Oudshoorn, 2011), MplusAutomation (Hallquist, 2012), and plyr (Wickham, 2011). For the generation of graphs we used ggplot2 (Wickham, 2009) and tikzdevice (Sharpsteen & Bracken, 2012). Performance measures. In order to analyze the results of our simulation study, we assess a range of standard criteria commonly employed in simulation studies. 1. We assessed standardized bias in the estimates (mean, variance) of variables with missing data, defined identical to Collins et al. (2001) as raw bias (average parameter estimate across replications minus true parameter value) divided by the standard error, defined as the standard deviation across all replication estimates. Collins et al. (2001) gives a rule of thumb that absolute values of.4 or higher are worrisome on the standardized bias metric.

21 SELECTION OF AUXILIARY VARIABLES We recorded the precision of the estimates defined as the average standard error across all replications. In general it is desirable to have estimates with smaller standard errors, and hence narrower confidence intervals and more precise estimates. 3. We computed the root mean squared error (RMSE) defined as the square root of the average squared difference between a parameter estimate and the true value of the parameter. 4. Lastly, we observed coverage rates, defined as the percentage of replications whose 95% confidence interval included the true parameter estimate. Ideally, one observes 95% coverage rates, as this would indicate that the confidence intervals of the estimator are in the long run accurately capturing the true parameter and have the nominal α error rate. Again, relying on rules of thumb by Collins et al. (2001), we regard coverage rates below 90% as worrisome. Results of simulation study 1.1. The complete results are shown in Table B1 in the Appendix. In order to communicate the most important findings, we display the amount of standardized bias in the means in Figure 7, and coverage values in Figure 8. Both figures shows that the listwise model is unbiased and has perfect coverage across all conditions. The inclusion of A 1 as an auxiliary variable in the FIML estimation induced bias in the mean, as would be expected based on missingness patterns that are imposed in a linear fashion. Bias emerges in all conditions that used FIML, expect the one in which all paths labeled α are set to 0 (the MCAR condition). Note that this is true even though variable A 1 is related to both Y and R Y and would be included as an auxiliary variable under all current recommendations to achieve MAR. The general pattern as seen in Figure 7 and 8 is that increases in the amount of explained variance yield monotonic increases in bias. Little to none bias is observed in conditions of weak path coefficients and stronger biases are observed in more extreme conditions. The standardized bias (and other performance measures) reach a critical threshold, based on the rule of thumbs by (Collins et al., 2001), when path coefficients are as strong that they explain slightly less than 25% of the variance. Bias in conditions with even stronger effects is so large that confidence intervals approach

22 SELECTION OF AUXILIARY VARIABLES 22 40% coverage. Also, not surprisingly, the direction of bias changes when the sign of the coefficient α changes its sign. In conditions in which the sign is negative, positive bias is induced due to the inclusion of the collider auxiliary variable, and negative bias is induced when the path coefficient has a positive sign, respectively. The results of this simulation clearly show that an auxiliary variable, even though it exhibits strong correlations with missingness and analysis variables, can increase bias. This somewhat surprising result is evident from the graphical model, in which we can see that A 1 is a collider auxiliary variable which will induce a bias in the path from Y to R Y. Simulation study 1.2 To put the results of the first simulation study into a broader context, we performed a second simulation study that was essentially a replication of earlier findings that an omitted variable that has an effect on both missingness and analysis variables with missing data can bias estimates. While this simulation study by itself does not give us any new insights, we performed this study to answer our research question 2, aimed at exploring whether the magnitude of bias due to omission of a bias-inducing collider auxiliary variable is similar in strength to omission of a potentially more helpful auxiliary variable. We replicated the first simulation study using the exact same values of explained variance in our data-generating model, but changed the role of the collider auxiliary variable to an auxiliary variable that has direct influences on both missingness and analysis variables. Data generation and analysis. The data-generating model for simulation 1.2 is shown in Figure 9. In this model, a single independent variable Y is generated with missing data, indicated by R Y. This time, an auxiliary variable A 2 is directly affecting both Y and R Y, thus d-connecting the two variables. The graphical criterion therefore tells us that A 2 is a bias-inducing variable that should be used in the FIML estimation. The generation of all variables was identical to simulation study 1.1. The unique explained variance of each effect labeled β was also identical to the previous simulation and set to 0%, 5%, 15%, 25%, 35%, and

23 SELECTION OF AUXILIARY VARIABLES 23 45%. Again, we varied the sign of the path labeled β, for a total of 11 simulation conditions. Results of simulation study 1.2. Table B2 in the Appendix lists the complete results of the second simulation study. To visualize our main findings we present standardized bias in the means and coverage rates of means in Figure 10 and Figure 11 for all conditions. In this simulation we observe a slightly different pattern than the previous simulation. Not surprisingly and shown previously by other researchers, the listwise model is biased in the parameter estimates of the means, and in the more extreme cases even in the variance of Y (not shown in Figure, but in table). The FIML model that included A 2 is virtually unbiased in all conditions and has perfect coverage, because the true data-generating mechanism of the missingness is captured. Several important observations can be made. First, the bias that is induced through the omission of a helpful auxiliary variable is larger in magnitude in comparison with the inclusion of a bias-inducing collider auxiliary variable. This can also be observed when examining coverage rates that drop much more dramatically than in the case of an included collider auxiliary variable. For example, in the condition with 25% explained variance, the standardized bias in the previous simulation was.61, whereas in this simulation with an omitted and helpful auxiliary variable, the bias is A second observation is that the direction of bias is flipped compared to the results of the previous study. A negative sign of the path coefficient labeled with a yielded negative bias, and likewise a positive path coefficient yielded positive bias. Intermediate summary of results of simulation study 1 We have shown that in cases that are not MNAR, bias can be induced through the inclusion of auxiliary variables in a FIML estimation framework. The fact that an auxiliary variable can actually make bias worse in parameter estimates in the presence of missing data is a novel point that is not addressed by the currently practiced approaches of including auxiliary variables. It also provides a counter-argument that is sometimes brought forth in defense of including many variables that states that as soon as the explained variance in the

24 SELECTION OF AUXILIARY VARIABLES 24 missingness or the outcome variable gets very large, there is no more room for any potential biasing influences. This is clearly wrong, as our simulation examined cases in which explained variance through the inclusion of a collider auxiliary variable was very large and yet bias increased. In our simulation studies this bias seemed to become problematic (as assessed through rules of thumbs of standardized bias and coverage) as soon as the explained variance of the unobserved variables associated with the collider auxiliary variable crossed a threshold of slightly less than 25%. On a correlation metric we therefore would have to observe correlations in the magnitude of approximately.4.5. While this may seem very high, it is important to remember that in our simulation studies there was only a single collider auxiliary variable with only 2 unobserved variables, while in reality there could be a multitude of both colliders and unobserved variables, especially if one is considering psychological constructs that are often multiply caused. Those taken together might be able to explain more variance and potentially make the inclusion of collider auxiliary variables more problematic. However, the second simulation study also demonstrated that the bias that is observed due to the inclusion of a collider auxiliary variable is much smaller than the bias observed due to the omission of an auxiliary variable that has directional effects on both missingness and analysis variables with missing data. In our simulation setup we observed troublesome levels of bias, as soon as the omitted auxiliary variable explained slightly less than 15% of the variance in the related variables, which translates to correlations of approximately.3.4. These intermediate results should not give the impression that listwise deletion is generally preferable over MI or FIML models with auxiliary variable, as may erroneously be believed based on the result of the first simulation study. However, it shows that inclusion of auxiliary variables does not always mitigate bias, but can enhance it and that researchers should be aware of picking good auxiliary variables. We discuss some strategies later in the discussion.

Graphical Representation of Missing Data Problems

Graphical Representation of Missing Data Problems TECHNICAL REPORT R-448 January 2015 Structural Equation Modeling: A Multidisciplinary Journal, 22: 631 642, 2015 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print / 1532-8007 online DOI: 10.1080/10705511.2014.937378

More information

Published online: 27 Jan 2015.

Published online: 27 Jan 2015. This article was downloaded by: [Cornell University Library] On: 23 February 2015, At: 11:27 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller

Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and. All Incomplete Variables. Jin Eun Yoo, Brian French, Susan Maller Inclusive strategy with CFA/MI 1 Running head: CFA AND MULTIPLE IMPUTATION Inclusive Strategy with Confirmatory Factor Analysis, Multiple Imputation, and All Incomplete Variables Jin Eun Yoo, Brian French,

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Missing by Design: Planned Missing-Data Designs in Social Science

Missing by Design: Planned Missing-Data Designs in Social Science Research & Methods ISSN 1234-9224 Vol. 20 (1, 2011): 81 105 Institute of Philosophy and Sociology Polish Academy of Sciences, Warsaw www.ifi span.waw.pl e-mail: publish@ifi span.waw.pl Missing by Design:

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Studying the effect of change on change : a different viewpoint

Studying the effect of change on change : a different viewpoint Studying the effect of change on change : a different viewpoint Eyal Shahar Professor, Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Propensity scores: what, why and why not?

Propensity scores: what, why and why not? Propensity scores: what, why and why not? Rhian Daniel, Cardiff University @statnav Joint workshop S3RI & Wessex Institute University of Southampton, 22nd March 2018 Rhian Daniel @statnav/propensity scores:

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status Simultaneous Equation and Instrumental Variable Models for Seiness and Power/Status We would like ideally to determine whether power is indeed sey, or whether seiness is powerful. We here describe the

More information

Multiple Imputation For Missing Data: What Is It And How Can I Use It?

Multiple Imputation For Missing Data: What Is It And How Can I Use It? Multiple Imputation For Missing Data: What Is It And How Can I Use It? Jeffrey C. Wayman, Ph.D. Center for Social Organization of Schools Johns Hopkins University jwayman@csos.jhu.edu www.csos.jhu.edu

More information

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Sub-theme: Improving Test Development Procedures to Improve Validity Dibu

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Causal Mediation Analysis with the CAUSALMED Procedure

Causal Mediation Analysis with the CAUSALMED Procedure Paper SAS1991-2018 Causal Mediation Analysis with the CAUSALMED Procedure Yiu-Fai Yung, Michael Lamm, and Wei Zhang, SAS Institute Inc. Abstract Important policy and health care decisions often depend

More information

What to do with missing data in clinical registry analysis?

What to do with missing data in clinical registry analysis? Melbourne 2011; Registry Special Interest Group What to do with missing data in clinical registry analysis? Rory Wolfe Acknowledgements: James Carpenter, Gerard O Reilly Department of Epidemiology & Preventive

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction University of Michigan, Ann Arbor, Michigan, USA Correspondence to: Dr A V Diez Roux, Center for Social Epidemiology and Population Health, 3rd Floor SPH Tower, 109 Observatory St, Ann Arbor, MI 48109-2029,

More information

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational data Improving health worldwide www.lshtm.ac.uk Outline Sensitivity analysis Causal Mediation

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (see an example) and are provided with free text boxes to

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch. S05-2008 Imputation of Categorical Missing Data: A comparison of Multivariate Normal and Abstract Multinomial Methods Holmes Finch Matt Margraf Ball State University Procedures for the imputation of missing

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

How should the propensity score be estimated when some confounders are partially observed?

How should the propensity score be estimated when some confounders are partially observed? How should the propensity score be estimated when some confounders are partially observed? Clémence Leyrat 1, James Carpenter 1,2, Elizabeth Williamson 1,3, Helen Blake 1 1 Department of Medical statistics,

More information

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Judea Pearl University of California, Los Angeles Computer Science Department Los

More information

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

An Introduction to Multiple Imputation for Missing Items in Complex Surveys An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed

More information

Multiple imputation for handling missing outcome data when estimating the relative risk

Multiple imputation for handling missing outcome data when estimating the relative risk Sullivan et al. BMC Medical Research Methodology (2017) 17:134 DOI 10.1186/s12874-017-0414-5 RESEARCH ARTICLE Open Access Multiple imputation for handling missing outcome data when estimating the relative

More information

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1 INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1 5.1 Clinical Interviews: Background Information The clinical interview is a technique pioneered by Jean Piaget, in 1975,

More information

Complier Average Causal Effect (CACE)

Complier Average Causal Effect (CACE) Complier Average Causal Effect (CACE) Booil Jo Stanford University Methodological Advancement Meeting Innovative Directions in Estimating Impact Office of Planning, Research & Evaluation Administration

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Approaches to Improving Causal Inference from Mediation Analysis

Approaches to Improving Causal Inference from Mediation Analysis Approaches to Improving Causal Inference from Mediation Analysis David P. MacKinnon, Arizona State University Pennsylvania State University February 27, 2013 Background Traditional Mediation Methods Modern

More information

Assignment 4: True or Quasi-Experiment

Assignment 4: True or Quasi-Experiment Assignment 4: True or Quasi-Experiment Objectives: After completing this assignment, you will be able to Evaluate when you must use an experiment to answer a research question Develop statistical hypotheses

More information

Missing Data: Our View of the State of the Art

Missing Data: Our View of the State of the Art Psychological Methods Copyright 2002 by the American Psychological Association, Inc. 2002, Vol. 7, No. 2, 147 177 1082-989X/02/$5.00 DOI: 10.1037//1082-989X.7.2.147 Missing Data: Our View of the State

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca March 28, 2019 Ed Kroc (UBC) EPSE 594 March 28, 2019 1 / 32 Last Time Publication bias Funnel

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Master thesis Department of Statistics

Master thesis Department of Statistics Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Missing Data in the Swedish National Patients Register: Multiple Imputation by Fully Conditional Specification Jesper Hörnblad

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Performance of Median and Least Squares Regression for Slightly Skewed Data

Performance of Median and Least Squares Regression for Slightly Skewed Data World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level

Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Chapter 3 Missing data in a multi-item questionnaire are best handled by multiple imputation at the item score level Published: Eekhout, I., de Vet, H.C.W., Twisk, J.W.R., Brand, J.P.L., de Boer, M.R.,

More information

Standard Errors of Correlations Adjusted for Incidental Selection

Standard Errors of Correlations Adjusted for Incidental Selection Standard Errors of Correlations Adjusted for Incidental Selection Nancy L. Allen Educational Testing Service Stephen B. Dunbar University of Iowa The standard error of correlations that have been adjusted

More information

Okayama University, Japan

Okayama University, Japan Directed acyclic graphs in Neighborhood and Health research (Social Epidemiology) Basile Chaix Inserm, France Etsuji Suzuki Okayama University, Japan Inference in n hood & health research N hood (neighborhood)

More information

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick. Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Placebo and Belief Effects: Optimal Design for Randomized Trials

Placebo and Belief Effects: Optimal Design for Randomized Trials Placebo and Belief Effects: Optimal Design for Randomized Trials Scott Ogawa & Ken Onishi 2 Department of Economics Northwestern University Abstract The mere possibility of receiving a placebo during a

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Bias reduction with an adjustment for participants intent to dropout of a randomized controlled clinical trial

Bias reduction with an adjustment for participants intent to dropout of a randomized controlled clinical trial ARTICLE Clinical Trials 2007; 4: 540 547 Bias reduction with an adjustment for participants intent to dropout of a randomized controlled clinical trial Andrew C Leon a, Hakan Demirtas b, and Donald Hedeker

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015 On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan Sequential nonparametric regression multiple imputations Irina Bondarenko and Trivellore Raghunathan Department of Biostatistics, University of Michigan Ann Arbor, MI 48105 Abstract Multiple imputation,

More information

Session 1: Dealing with Endogeneity

Session 1: Dealing with Endogeneity Niehaus Center, Princeton University GEM, Sciences Po ARTNeT Capacity Building Workshop for Trade Research: Behind the Border Gravity Modeling Thursday, December 18, 2008 Outline Introduction 1 Introduction

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Evaluation of Medication-Mediated Effects in Pharmacoepidemiology. By EJ Tchetgen Tchetgen and K Phiri

Evaluation of Medication-Mediated Effects in Pharmacoepidemiology. By EJ Tchetgen Tchetgen and K Phiri Evaluation of Medication-Mediated Effects in Pharmacoepidemiology By EJ Tchetgen Tchetgen and K Phiri eappendix 1 Additional discussion of Identification of NDE(a,a*) and NIE(a,a*) It is well known that

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

Cahiers Recherche et Méthodes

Cahiers Recherche et Méthodes Numéro 1 Janvier 2012 Cahiers Recherche et Méthodes Multiple imputation in a longitudinal context: A simulation study using the TREE data André Berchtold & Joan-Carles Surís Jean-Philippe Antonietti &

More information

CHAPTER 6. Conclusions and Perspectives

CHAPTER 6. Conclusions and Perspectives CHAPTER 6 Conclusions and Perspectives In Chapter 2 of this thesis, similarities and differences among members of (mainly MZ) twin families in their blood plasma lipidomics profiles were investigated.

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information