Stephen G. West and Felix Thoemmes

Size: px
Start display at page:

Download "Stephen G. West and Felix Thoemmes"

Transcription

1 24 Equating Groups Stephen G. West and Felix Thoemmes EQUATING GROUPS One of the most central tasks of both basic and applied behavioral science is to estimate the size of treatment effects. The basic procedure is conceptually very straightforward. The researcher identifies a treatment (T) of interest such as a new drug treatment or a new cognitive approach to psychotherapy. In our illustration T is designed as a possible means of reducing depression in a clinical population. The researcher then identifies a comparison (C) condition to which the treatment is to be compared. In the case of the new drug treatment, the researcher might choose a placebo which has no pharmaceutical effect on depression or another drug that is the current standard drug prescribed to help relieve depression. Similarly, in the case of the new psychotherapy, the researcher might choose no psychotherapy, psychotherapy without the new cognitive elements, or the standard psychotherapeutic treatment that is commonly delivered (standard of practice). Each patient s level of depression is then measured following treatment. The difference between the mean level of depression in the treatment and control groups,y T Y C, is then taken as the estimate of the treatment effect. However, the estimate of the treatment effect will be valid if and only if the two groups have been successfully equated prior to the implementation of the treatment. Otherwise stated, only if the groups are equated will Y T Y C be an unbiased estimate of the causal effect of the treatment. This chapter will examine some major methods of equating groups. We will draw from insights in statistics (Holland, 1986; Rosenbaum, 2002; Rubin, 1974, 1978, 2005), psychology (Reichardt, 2006; Shadish et al., 2002; West et al., 2000), public health (Little & Rubin, 2000), and sociology and econometrics (Winship & Morgan, 1999). We will focus on comparisons of a treatment and comparison group in two commonly used research designs: the randomized experiment and the observational study (i.e. nonequivalent control group design). The key feature that distinguishes these two designs is the process through which units are assigned to the T and C groups (Judd & Kenny, 1981). The randomized experiment uses some random process (e.g. flipping a coin, a random number generator) to determine assignment of the units to the T and C groups. The [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

2 EQUATING GROUPS 415 units are typically individual participants, but they may be larger aggregations such as schools or entire communities. This process implies that the expected mean of the units in the T group will equal the expected mean of the C group on any conceivable measured or unmeasured baseline variable so that Y T Y C may be taken as an unbiased estimate of the treatment effect. In contrast, the observational study uses an unknown process to assign participants to the groups. Participants may choose to receive the T versus C, or participants may receive the treatment because they are located in a single community, school, hospital, or other larger unit that has agreed to participate in the study. The process through which participants end up in the T versus C groups is unknown, implying that researchers should expect that there are potential mean differences on background variables between the T and C groups at baseline, even before treatment commences. Now Y T Y C no longer represents an unbiased estimate of the causal effect of the treatment, but rather a confounded estimate reflecting some combination of the true causal effect of treatment and preexisting differences between the groups on measured or unmeasured variables at baseline (Reichardt, 2006). Only by carefully assessing critical participant characteristics at baseline and developing methods to equate the T and C groups prior to the beginning of treatment can the researcher even approximately estimate the desired effect of the treatment. We begin this chapter by briefly reviewing the randomized experiment. The randomized experiment is often described as the gold standard design and it serves as an important benchmark for the observational study. We identify some ways in which even randomized experiments can be enhanced through the use of additional procedures designed to more closely equate the groups at baseline. We then briefly review studies comparing the treatment effect estimates from randomized experiments to those of observational studies studying similar treatments, to provide information about the conditions under which these two designs may lead to different estimates of the treatment effect. We then introduce modern methods of adjusting treatment effects in observational studies for measured differences at baseline. These methods can substantially reduce any bias in the estimate of the treatment effect. Other approaches attempt to bracket the size of the treatment effect so that it represents a reasonable estimate even if there are variations on important unmeasured differences at baseline. Finally, we consider design enhancements that help rule out likely effects of unmeasured variables that may provide alternative explanations for the observed effect of treatment. RANDOMIZED EXPERIMENTS Randomization approximately equates the T and C groups at baseline. More formally, randomization produces two important results (Holland, 1986; West et al., 2000). First, as we observed above, the expected mean on any participant characteristic at baseline will be equal in the T and C groups, E ( ) Y Tbaseline = E ( ) Y Cbaseline, where E( ) is the expected value of the variable in parentheses. Second, the binary variable X (1 = T; 0 = C) indicating the treatment condition, is expected to be unrelated to all possible participant characteristics at baseline, E ( ) r XYbaseline = 0. These two results imply that Y T Y C at post test will be an unbiased estimate of the treatment effect so that no adjustment of this effect is needed. Note, however, that these results are expectations. They will hold exactly only given very large sample sizes or across a large number of exact replications of the same experiment conducted on a single population. In any single experiment using more modest sample sizes unfortunate randomization in which the T and C groups differ at baseline on some subset of important background variables can be expected to occur with some regularity. For this reason many journals in the public health area formally require that means of the T and C groups on important baseline measures be reported as a check on the success of the randomization in the experiment. Following our presentation of [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

3 416 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS additional requirements for randomized field experiments, we will discuss procedures that use these baseline measures to equate groups more adequately prior to treatment in order to provide more statistically powerful tests of the treatment effects. Additional requirements Randomized experiments involve additional requirements that must be met for valid estimation of the treatment effect (see Chapter 8). These requirements are routinely met in most laboratory experiments, but can be easily violated in community settings. Failure to meet these requirements may necessitate the use of special procedures, the inclusion of additional design features, or the use of special analysis procedures that adjust for the potential bias (Barnard et al., 1998). Four requirements over which the experimenter may only have limited control are of particular importance in randomized field experiments 1. 1 Proper Randomization. The randomization process must be properly carried out and adhered to. Treatment providers must not be permitted to alter the assignment of participants to the T and C conditions. Kopans (1994) presents evidence that reassignment of high-risk women to the treatment condition apparently occurred in a large national randomized trial evaluating the effectiveness of screening mammography. Connor (1977) provides other examples of experiments in which randomization failed or was not maintained by treatment providers. He suggests procedures that potentially minimize the likelihood of such randomization failures. Robins (1989) and Hernán et al. (2001) present methods of adjusting treatment effect estimates in complex longitudinal studies, for example, when participants are reassigned to another treatment, as in certain medical studies in which the patient does not respond to the assigned treatment. 2 Treatment Compliance. The participants must receive the intended treatment. In randomized experiments studying mammography screening, some participants have refused screening (T). Other participants in the C group have sought out mammography screening outside the experiment (Baker, 1998). West and Sagarin (2000; see also Angrist et al., 1996; Jo, 2002) review statistical procedures that can provide proper estimates of the treatment effect when there is treatment noncompliance. 3 Absence of Attrition. All participants who are assigned to T and C conditions must be measured on the outcome variable. Even though randomization serves to equate participants on average at baseline, this equating is potentially lost if some participants are not measured at posttest. Of most concern is differential attrition in which participants with different characteristics drop out of the two groups. For example, in an experiment investigating a new method of mathematics instruction, less mathematically talented students might find the new course too challenging and withdraw prior to the collection of the outcome measure. Y T would only be based on the scores of the more talented students assigned to the T condition, leading to an overestimate of the effectiveness of the course. Modern missing data techniques (Little & Rubin, 2002; Schafer & Graham, 2002) can improve the estimation of the treatment effect, particularly if variables that are highly related to the outcome (e.g. baseline measures on the outcomes of interest), to missingness, or ideally both are measured at baseline. Full information maximum likelihood estimation (FIML), now available in several statistical packages (e.g. Mplus), combines all of the observed data to produce optimal estimates and standard errors for the treatment effect and other parameters of interest in the statistical model. Multiple imputation (MI), also available in several statistical packages (e.g. SAS), makes multiple copies of the dataset. In each copy, the optimal predicted value for each missing datum is calculated, then random error matching that in the complete data is added. The step of adding random error insures that the original variability of the observed data is retained in the values that are imputed. The statistical model testing the treatment effect is then estimated in each copy of the dataset. Finally, the estimates of the treatment effects (and other parameters of interest) in each copy of the dataset are recombined. FIML and MI will both produce unbiased estimates of the treatment effect with proper standard errors if missingness is related to measured variables in the dataset, but not if there are other aspects of the missing variables that are not captured by other variables in the dataset. Consider two potential reasons why participants might be missing from a measurement session in a study of health outcomes in a large company. In the first case, [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

4 EQUATING GROUPS 417 each participant s baseline measure of health (e.g. number of days of illness the previous year) is the only variable that systematically predicts whether the participant will be present for the session. In the second case, several of the participants in a division of the company are missing because they are suffering health problems from working day and night on an intensive new project. In the first case, neither FIML nor MI will produce unbiased estimates because the source(s) of missingness were measured at baseline and are present in the dataset. In the second case, both FIML and MI will produce biased estimates of the treatment effect unless information about project participation and the current project-related health problems are present in the dataset. Suppose, however, that the researchers had used available substantive theory and research to select an extensive set of baseline variables that were expected to be related to the outcome variables, missingness, or both. Once again, information about project participation and project-related health problems are not available in the dataset. In this case, the use of FIML or MI will typically lead to estimates of the treatment effect that are less biased, perhaps substantially so, than methods that ignore missing data or that use traditional approaches such as listwise deletion, pairwise deletion, and mean imputation to address missing data. 4 Stable-Unit-Treatment-Value Assumption. The response of the participant should not be affected by the treatments (or the participant s knowledge thereof) that other participants receive. This condition is known as the stable-unit-treatmentvalue-assumption (SUTVA); its purpose is to ensure that each participant can only have one true response in the treatment condition (see Rubin, 1978, 1980). Otherwise, the outcomes of the participants in the C group are likely to be atypical. For example, if cancer patients learn that other participants have been assigned to a more promising treatment condition, they may give up hope and stop performing their normal health supportive practices (e.g. proper diet) so that they will have worse outcomes than they would have had in the absence of this knowledge. Some effects of improving group comparability at baseline Randomization combined with meeting the four requirements outlined above assures that the estimate of the treatment effect is unbiased. Unfortunately, this only means that the treatment effect will be correct on average. There is no guarantee that unfortunate randomization will not occur in a particular experiment. If the T and C groups can be closely equated at baseline on variables thought to be important predictors of the outcome, then the likelihood of unfortunate randomization can be substantially reduced. Equating procedures thus reduces the potential of an incorrect estimate of the treatment effect in a specific experiment. Equating procedures can also have the benefit of increasing the statistical power of the test, the probability that a true treatment effect of a specified size can be detected. Finally, they may help reduce some of the uncertainty associated with statistical methods of correcting treatment estimates when the four additional requirements are not met. The use of equating procedures is particularly important when the number of units to be assigned is small, the units are not homogeneous, or the treatment effect is not constant, but rather differs in magnitude as a function of the variable(s) on which equating is based. Consider the following example that captures the importance of equating with a small number of non-homogeneous units. Suppose a randomized experiment is conducted in which the units are six different US cities. Each city receives either an intensive mass media campaign of anti-smoking public service announcements (T) or it does not receive any smoking-related messages in the media (C). The cities chosen for study are from three groups: (a) large cities: Chicago, IL, Los Angeles, CA; (b) medium-sized cities: Baltimore, MD, Portland, OR; and (c) small cities: Terre Haute, IN and San Angelo, TX. Three cities are to be assigned to T and three cities to C. Assume that size of the city is known to be strongly related to the effectiveness of mass media campaigns in health. Following Cochran and Cox (1957), when there are equal numbers ((n) ) of units in 2n the T and C groups, there are possible n randomizations. In the present example, there [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

5 418 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS ( ) 6 are 3 = 6 x 5 x 4 x 3 x 2 x 1 (3 x 2 x 1)(3 x 2 x 1) or 20 possible randomizations. A randomization that compared Chicago, Baltimore, and Terre Haute to Los Angeles, Portland, and San Angelo would be desirable. In contrast, a randomization that compared Chicago, Los Angeles, and Baltimore to Portland, Terre Haute, and San Angelo would be unfortunate. To avoid this problem, the researcher could match the two large cities, the two medium cities, and two small cities. Within each matched pair, one city would be randomly assigned to T and one to C, leading to a randomization in which the T and C groups will be more adequately balanced, particularly on the critical baseline variable of size of city. This procedure of pair matching followed by randomization is very general. For example, in a randomized experiment evaluating a new math instruction program, students could be assessed on a baseline measure of math ability that is expected to be highly related to the outcome variable, here math achievement. The students could be ranked based on their scores and pairs formed (the two highest; the next two highest;... down to the two lowest). Once again, within each pair students would be randomly assigned to T and C groups. This procedure insures that the T and C groups will be closely equated on the important baseline variable of pretest math ability, preventing any possibility of unfortunate randomization with respect to this critical variable. A second advantage of this procedure is that it can lead to far more statistically powerful tests of the treatment 2. For example, Student (1931) showed that an early randomized experiment on 10,000 children studying the effects of pasteurized (T) versus raw (C) milk on height and weight gains could have achieved the same level of statistical power with 50 pairs of identical twins. Matching followed by randomization may also lead to a third benefit, providing a stronger foundation for addressing failures to adequately meet the additional requirements of randomized experiments (presented above). For example, the existence of well-matched pairs may provide a stronger basis for modeling the effects of treatment non-compliance and attrition, particularly in experiments in which sample sizes are moderately rather than extremely large and the size of the treatment effect is not constant, but rather depends on the level of the baseline variable (i.e. a baseline x treatment condition interaction). Conceptually, matching followed by randomization may also have other potential advantages in certain contexts as it implicitly identifies a specific comparison participant with which each treatment recipient may be compared. For example, many clinicians would ideally like to understand the effects of treatments on single cases rather than the average effect of the treatment on patients in general. The matching and randomization procedure can permit a closer approximation of this ideal than simple randomization. When many measures are collected at baseline, matching becomes more difficult. In some cases the multiple measures can be combined a priori into a single composite variable on which matching can occur. For example, in research related to breast cancer, a set of measures including age at menarche, number of first-degree relatives (mother, sister) with breast cancer, number of previous breast biopsies, and age are combined into a single risk score using a formula based on prior epidemiological research (Gail et al., 1989). Alternatively, measures can be collected on the entire sample prior to randomization. The researcher can generate several thousand different possible randomizations and calculate Hotellings T 2 for each randomization using the key variables measured at baseline. Hotellings T 2 describes the magnitude of the multivariate difference between the groups, here on the baseline variables. The randomizations are sorted from low to high in terms of the values of Hotellings T 2. From the 5% or 10% of the randomizations with the lowest values of Hotellings T 2, a randomization is chosen, thereby minimizing potential problems of unfortunate randomization. More complicated blocking and randomization procedures to achieve these same goals in other specialized experimental contexts (e.g. trickle [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

6 EQUATING GROUPS 419 flow randomization in which participants are recruited over an extended period of time) are described in Friedman et al. (1998) and Matthews (2000). RESEARCH COMPARING THE RESULTS OF RANDOMIZED EXPERIMENTS AND OBSERVATIONAL STUDIES As a starting point for studying methods to improve the results of observational studies, it is useful to review literature comparing the results of randomized experiments with those of observational studies. Properly implemented randomized experiments serve as the gold standard they typically provide the best, unbiased estimates of the magnitude of the treatment effect. In contrast, the unknown rules through which participants in observational studies are assigned to the T or C conditions lead to far greater uncertainty about the treatment effect estimate. The researcher would like to claim that some aspect of the treatment caused the observed results; however, it may be possible that a failure to successfully equate the groups at the beginning of the experiment provides a strong alternative explanation (Reichardt, 2006). Even when adjustments in the treatment effect can be made on the basis of measures collected at baseline, there may be less than complete certainty that the T and C groups have been properly equated. Even though statistical theory clearly identifies failure to equate the T and C groups on important variables at baseline as an important, plausible problem that may occur in observational studies, it provides little guidance as to likely frequency of this problem in practice, nor to the contexts in which estimates of treatment effects are most likely to be biased. To gain some insights into this issue, below we briefly review literature comparing the results of randomized experiments with observational studies that employed similar treatments. We then turn to an examination of modern statistical and design solutions that attempt to address these issues. Two types of comparisons have been made: (a) single investigations of parallel randomized experiments and observational studies using similar (possibly identical) treatments; and (b) extensive meta-analyses of research areas investigating the effect of a treatment. Of note, exact agreement of the estimates of treatment effects in randomized experiments and observational studies should not be expected given sampling error, even exact replications of a randomized experiment using the same population would not be expected to produce identical treatment effects. In addition, other differences between the studies representing the two designs may exist. For example, the populations sampled in the two designs, the treatment delivery, the research setting, or other methodological features (e.g. a less adequate control condition is constructed in the observational study) may differ in addition to the focal difference of randomized versus non-randomized design (Cook et al., 2006; Reichardt, 2006; West et al., 2006). Single comparative studies Studies comparing treatment effect estimates from randomized experiments and observational studies have produced diverse results. A classic example is Meier s (1972) largescale evaluation of the effectiveness of the Salk polio vaccine in the US. In some states, a randomized experiment was used; in others, an observational study. Even though both designs led to the conclusion that the Salk vaccine was effective, the effect size in the randomized experiment was substantially larger. Gilbert et al. (1975) suggested that the difference in effect sizes primarily resulted from the different populations on which the polio rates were based in the C conditions. In the randomized experiment, the comparison group included only children who had permission to be vaccinated in contrast to the observational study in which the full population was represented. Cook et al. (2006) reviewed a unique subset of investigations in which a single randomized treatment group was compared [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

7 420 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS with both a randomized control group (randomized experiment) and a second nonrandomized comparison group (yoked observational study). Those observational studies that created a high-quality comparison group produced comparable results to those of the yoked randomized experiment. Investigations with a poorly selected comparison group, poor statistical adjustment for baseline differences, or which differed in other procedural or design features between the observational study and yoked randomized experiment often produced discrepant findings. Meta-analyses Across diverse substantive research areas, such as skill training, organizational development, psychotherapy, and medical interventions, meta-analyses have produced heterogeneous outcomes in which randomized experiments have shown larger, smaller, and no difference in treatment effect estimates relative to observational studies. An early influential meta-analytic investigation by Sacks et al. (1983) identified six medical therapies that had been studied using both randomized experiments and observational studies. Sacks et al. concluded that observational studies produced biased results in comparison to randomized controlled trials. Attempts to adjust treatment effects in observational studies for available prognostic factors did not remove this bias. More recently, Ioannidis et al. (2001) conducted meta-analyses of 45 medical interventions (e.g. vaccines for meningitis; local vs. general anesthesia) involving a total of 240 randomized trials and 168 observational studies. Overall, there was no consistent pattern of over- or under-estimation of treatment effects by the observational studies relative to the randomized experiments Significant differences between the randomized experiments and observational studies were found in only a small proportion of the meta-analyses. Ioannidis et al. provided evidence of smaller between-study variance in the randomized experiments than in the observational studies, an important finding that suggests that the effect size estimates of observational studies may be associated with more uncertainty than randomized experiments. Reviews of other areas also suggest that the direction of mean bias is by no means certain. Lipsey and Wilson (1993) analyzed 74 metaanalyses of behavioral and educational interventions, finding no difference in the mean effect sizes of randomized experiments and observational studies. Heinsman and Shadish (1996) analyzed four meta-analyses in the areas of drug-use prevention, psychosocial interventions for surgery, coaching for the SAT, and ability grouping in secondary schools. They found a larger effect size for randomized experiments than for observational studies. Taken together, the metaanalytic results suggest that the magnitude of bias resulting from the use of an observational study rather than a randomized is typically not large and its direction is uncertain. They also suggest that area-specific choices of samples and methodological features (e.g. type of comparison group) may be important determinants of any bias that is observed. Methodological features Heinsman and Shadish (1996) coded methodological features that might potentially account for the observed difference in effect sizes between randomized experiments and observational studies in four behavioral science research areas (e.g. SAT coaching, drug use prevention). Of importance, they found in a regression analysis that not allowing self-selection into T vs. C conditions in observational studies, using a control group from the same population as the treatment group, minimizing the baseline effect size difference between the T and C groups, and minimizing both overall attrition and differential attrition made the treatment effect estimates more comparable in the two designs. Shadish and Ragsdale (1996) found similar results in a meta-analysis of randomized experiments and observational studies of marital or family psychotherapy. Consistent with these findings, Heckman and Robb (1986) [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

8 EQUATING GROUPS 421 also point to conceptual and statistical reasons why allowing participants to self select into T and C groups is particularly likely to lead to biased estimates. These results suggest that it may be possible to improve estimates of treatment effects in observational studies through the careful use of design and analysis strategies. Adjustment strategies for equating groups at baseline Matching Matching is used in observational studies to identify a set of participants in the T and C groups that are comparable. To illustrate, consider two small school classrooms, labeled A and B, one of which implements an innovative new math curriculum, whereas the other implements a standard math curriculum in 6th grade. Table 24.1 illustrates the basic process of simple 1:1 matching. All students in both classrooms are given an IQ test at the beginning of the school year. For each Table 24.1 Illustration of simple matching of two small classroom on baseline IQ scores Pair Classroom A Classroom B Note: Scores were ordered within units and represent pretest IQ scores of participants. Pairs of participants on the same line represent matched pairs. One person in Classroom A and two persons in Classroom B have no matched pairs. The mean IQ score for all participants in Classroom A is 113; the mean IQ score for all participants in Classroom B is 108. The mean difference (Y A Y B ) for the full unmatched sample is 5. The mean for the matched pairs of Classroom A is and for Classroom B is 111.1, yielding a mean difference of 0.5. n A = 13 and n B = 14 for the full sample student in the classroom A, an attempt is made to identify a student in classroom B who is closely equated on IQ. This matching process diminishes the mean difference in baseline IQ between the two groups in our example from M A M B = 5 in the full unmatched sample to M A M B = 0.5 in the reduced, matched sample. A variety of computer algorithms are available that match T and C participants to produce the minimum discrepancy on the pretest variable (see Ming & Rosenbaum, 2001; Rosenbaum, 2002). These computer algorithms are particularly useful when both the T and C groups are large, are of dramatically different sizes, or both. For example, observational studies of initial trials of innovative programs (T) may involve a relatively small number of participants, whereas there are a substantially larger number of participants in the standard program (C) that serves as the comparison. In such cases, the algorithm will select a variable number of optimal matches (e.g. up to 5) for each participant 3. These variable matching procedures lead to more adequate equating of the groups on the matching variable and greater statistical power for the T vs. C comparison, given the larger sample size (Ming & Rosenbaum, 2000). Researchers are encouraged to measure many variables at baseline, particularly those that may be related to treatment group assignment or the outcome variable. Substantive theory and prior research can provide guidance in the selection of a set of measures that will capture as fully as possible potential baseline differences between the T and C groups. However, the availability of a large number of baseline variables makes matching far more complex. In rare cases, a composite variable can be created (e.g. the Gail score for breast cancer risk described earlier). More commonly, propensity scores are used. Propensity scores provide an estimate of the probability that a participant will be assigned to the treatment group (Rosenbaum, 2002; Rosenbaum & Rubin, 1983, 1984; Rubin, 1997; Shadish et al., 2006; Smith, 1997). The researcher uses all baseline variables (or a subset containing the most important ones [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

9 422 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS if this number is very large) and predicts the probability that the participant will be in the T group. This probability is known as the propensity score. There are two major issues in the creation of propensity scores. The first is to make sure that subject matter expertise in the form of prior research and theory has been used to select baseline measures that will capture as fully as possible important baseline differences between the T and C groups. The second is to choose a statistical model that adequately represents the form of the relationship between the variables and each participant s propensity score. Rosenbaum and Rubin (1983) used simple linear logistic regression to produce these estimates. Dehejia and Wahba (1999) used more complicated logistic regression models involving specification of interactions and curvilinear effects of baseline variables. McCaffrey et al. (2004) used automated stepwise nonparametric regression tree methods to model possible complex relationships between the variables and the propensity score. In each case the goal is to achieve T and C groups that are balanced on all important baseline variables and for which the error of prediction in the sample has been minimized (Shadish et al., 2006). As an important check on the success of this procedure, the data are divided into five strata and the balance of the baseline variables within each stratum is compared. When balance is achieved, there is a strong basis for comparing the groups. If balance is not achieved within one (or more) stratum, the comparison of the treatment and control groups is carried out only over those strata on which balance has been achieved. Each participant s propensity score may then be taken as the best summary of the baseline information. The propensity score is used as the basis for equating the groups. The groups may be equated using the standard 1 to 1 or variable many to 1 matching procedures described above. Alternatively, analysis of covariance or blocking on the strata may be used (but see footnote 3). As an illustration of the matching strategy, Wu et al. (2006) constructed propensity scores for retention in first grade from a large set of baseline variables measured early in the school year. In the full sample (n = 769) of children at risk for grade retention, there were large differences between students on the Woodcock Johnson reading score at baseline. Students who were later retained in first grade had substantially lower scores than students who were later promoted to second grade, Y baseline retained = 420 versus Y baseline promoted = 438. Optimal 1 to 1 matching on propensity scores yielded 97 matched pairs with Y baseline = for the retained students and Y baseline = for the promoted students. Similar reductions in baseline differences were achieved for other variables measured at baseline. Theoretically, propensity scores will provide a proper adjustment for the unknown assignment rule if all important baseline variables have been included and the form of the propensity model has been correctly specified. Matching has substantial strengths in that it does not require specification of the form of the relationship between the baseline and outcome variables; it clearly delimits the range of the baseline variables over which T and C can be appropriately compared, and it leads to efficient estimates of the treatment effect because of the small number of parameter estimates that are involved. Hypothesized treatment group x baseline level interactions can also be examined within the matched propensity score framework. There are two primary limitations of the matched propensity score framework. First, it does not adjust the treatment effect for measurement error in the baseline variables giving rise to potential regression to the mean effects if very reliable and stable measures of important baseline variables are not available. Second, it does not adjust for other important variables (hidden variables) that are not measured at pretest, again emphasizing the importance of selection of the full range of potential baseline variables based on subject matter expertise. Statistical adjustment strategies based on measured baseline differences A variety of statistical models may be developed that attempt to adjust for baseline differences in measured variables. Perhaps, [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

10 EQUATING GROUPS 423 the simplest is analysis of covariance (Huitema, 1980; Reichardt, 1979) which is used to provide an adjustment of the treatment effect for one or more baseline variables. Typically, a simple linear model is used, Ŷ = b 0 + b 1 COV + b 2 X, where Y is the outcome variable, COV is the covariate measured at baseline and X is the binary treatment indicator. This model can be extended to include multiple covariates, other parametric relationships (e.g. addition of a b 3 COV 2 term to represent a quadratic relationship between X and Y), and treatment x covariate interactions (Cohen et al., 2003; Huitema, 1980; Reichardt, 1979). Nonparametric methods can be used to model more complex relationships between X and Y (see Little et al., 2000). The primary limitation of ANCOVA methods is that their success in equating the T and C groups depends heavily on the correct specification of the adjustment model. For example, if the relationship between COV and Y is nonlinear and a simple linear ANCOVA model is used, the treatment effect estimate will be biased. The basic ANCOVA approach shares the limitation with matching that baseline variables may be measured with less than perfect reliability. This problem is most serious when the T and C groups are selected from different populations, so that regression to the mean will occur (see Campbell & Kenny, 1999; Shadish et al., 2002). Even if the statistical adjustment model is otherwise correctly specified, measurement error will typically lead to under-adjustment of the treatment effect for baseline differences. Huitema (1980) provides an introduction and Fuller (1987) provides a more advanced treatment of methods for correcting for measurement error in the context of ANCOVA. Alternatively, when multiple indicators are available for each important construct measured at pretest, structural equation models can be used to provide measurement error-free estimates of the treatment effect. Aiken et al. (1994) provide a good discussion of the use of this approach and apply it to the evaluation of a drug treatment program. One limitation of the structural equation modeling approach is that the models to date have specified a linear relationship between the baseline measures and the outcome. Lee et al. (2004), Marsh et al. (2004), and Wall and Amemiya (2007) describe extensions of structural equation models that may account for curvilinear and interactive effects. Correction for measurement error can also be desirable when treatment participants are selected on the basis of a variable that is unstable over time. For example, if T participants are selected based on high scores on a measure of depression (or because they are seeking treatment because of a severe depressive episode), it is likely that some of the participants are in a temporary state of high depression and would return to their typical level of depression in the absence of any treatment simply given the passage of time. Reliability correction methods that adjust the estimate of the treatment effect for the testretest reliability for the time interval between the baseline and outcome measures in the absence of treatment can improve the estimate of the treatment effect. If repeated measures are collected on multiple indicators of the outcome variable at baseline and multiple other time points, special structural equation models can be used that partition the variance at each time point into state (temporary) and trait (true score) components (cf. Khoo et al., 2006; Steyer et al., 1992). Adjusting for unmeasured baseline differences (hidden variables) The matching and the statistical adjustment strategies described above can provide appropriate correction of the estimate of the treatment effect for variables measured at baseline. However, it is also possible that variables that are not measured at baseline could account for all or part of the estimated treatment effect. Three general strategies exist for addressing this problem. First, a variety of methods have been proposed for conducting sensitivity analyses of treatment effect estimates (Marcus, 1997; McCafferty et al., 2004; Rosenbaum, 2002). As an illustration of one simple method, imagine a researcher has found a [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

11 424 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS 0.8 standard deviation difference (large effect size) between the T and C groups on the outcome variable. The researcher would then identify the largest standardized difference between the T and C groups on the set of variables measured at baseline. Suppose the largest baseline difference were d = 0.5 standard deviations. Then the researcher identifies the maximum correlation between any of the baseline measures and the posttest measure of the outcome of interest. Suppose the maximum correlation were r = 0.6. The product of these two quantities, adjustment = Y baselinet Y baselinec SD r baseline outcome, here adjustment = = 0.3, provides a rough estimate of the maximum extent that this estimate of the standardized treatment effect would need to be reduced given what is a worst case scenario for an important hidden variable. If the standardized treatment effect were reduced by this amount, to = 0.5 in our example, we would have a plausible estimate of its lower bound. If this value were still statistically significant, it would provide evidence that the treatment effect is robust. Note that there is no theoretical reason why the actual adjustment required for hidden variables could not exceed this value. However, in practice, if a number of variables are measured at baseline and they can be presumed to be representative of important hidden variables, the adjustment will nearly always be an overestimate of the adjustment needed in practice. Econometric approaches (e.g. Barnow et al., 1980; Heckman, 1979, 1989, 1990; Muthén & Jöreskog, 1983) have been proposed that adjust for the effects of both measured and unmeasured variables at baseline. Two separate equations are used in these models. The first (selection model) equation uses measured baseline variables to predict the assignment of the participant to the treatment or control group. The second uses this selection probability, an indicator variable (T = 1; C = 0) for treatment condition, and potentially other covariates to estimate the outcome. A key feature of this approach is the requirement of an instrumental variable 4, a variable that strongly predicts treatment assignment in the first equation but which has no separate relationship to the outcome (see Figure 24.1). In essence, the instrument can be thought of as a naturally occurring randomization (Heckman, 1996). The instrumental variable can only affect the outcome indirectly through its effect on treatment assignment, an assumption known as the exclusion restriction. If the assumptions of this approach are met, the treatment effect estimate will include proper adjustment for both measured and unmeasured baseline variables. However, in practice, this method is extremely sensitive to violations of its underlying assumptions, particularly the exclusion restriction (Heckman, 1997; Stolzenberg & Relles, 1990; Winship & Residual 1 Residual 2 Treatment Indicator Instrumental Variable B OT Outcome Figure 24.1 Illustration of econometric selection bias model Note: The instrumental variable directly affects only the Treatment Indicator (T = 1; C = 0). This condition is known as the exclusion restriction. Residual 1 is the error of the prediction of the Treatment Indicator including error produced by hidden variables. The hidden variables may also be associated with the residual of the Outcome (Residual 2). If the model is correctly specified, an adjustment of the regression coefficient B OT will yield an unbiased estimate of the treatment effect controlling for the hidden variables. If the assumptions of the model are violated (notably the exclusion restriction), the estimate of the treatment effect may be severely biased. [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

12 EQUATING GROUPS 425 Mare, 1992). When assumptions are violated, the treatment effect estimates of econometric models can be far more biased than those based on simpler approaches likeancovaor matching. In addition, even if the assumptions of the approach are met, the standard errors of the estimate of the treatment effect can be extremely large if the instrument is not very strongly related to treatment assignment. Finally, the econometric approach assumes that the treatment effect is constant across all participants. A third approach suggested by Manski (1994), Manski and Nagin (1998), Manski and Pepper (2000) has explored the effects of making weaker assumptions about instrumental variables in econometric selection models. This approach results in the estimation of a plausible range of values for the treatment effect within upper and lower bounds. However, in some cases, the bounds may be very large so that little information is conveyed about the size of the treatment effect. Adjusting for growth A final issue occurs when participants show different rates of natural growth (e.g. young children in math skills) or decline (e.g. Alzheimer s patients in memory) on the outcome variable of interest. With observations taken only at baseline, no measure of the natural growth rate in the absence of treatment is available for the participants. Change score analysis (Judd & Kenny, 1981) can be used to estimate the treatment effect. Participants are measured on the same measure at baseline and outcome. These baseline and outcome measures are then transformed so that their variances are equated (see Huitema, 1980). The mean change in the T group is then compared with the mean change in the C group to provide an estimate of the treatment effect. This approach adequately models special situations in which growth is occurring at a constant rate across all participants or is of the fan spread variety in which growth is occurring at a rate proportional to the participant s baseline score (e.g. those advantaged at baseline gain more). Treatment effects for other forms of growth are not well represented using this approach. More adequate modeling of growth requires the collection of additional data at multiple time points, ideally both before and after the treatment (Shadish et al., 2002; West et al., 2000). If sufficient additional time points are collected, the natural pattern of growth prior to treatment can be estimated; this pattern can be compared to the pattern of growth following the introduction of treatment in the T group. Singer and Willett (2002) describe multilevel modeling methods that estimate the treatment effect while allowing for differences between participants in growth rates. Design enhancements In line with the topic of this chapter, we have focused on methods of equating the T and C groups at baseline. However, we would be remiss if we did not remind readers of an important alternative strategy emphasized by Shadish and Cook (1999) and Shadish et al. (2002). This strategy involves adding design features that address specific threats to validity that arise in observational studies. Shadish and Cook (1999) argue that the use of design enhancements will often be preferable to the use of statistical adjustment strategies. We present three methods of enhancing the design of the basic observational study here (see Shadish & Cook, 1999, for an extensive list). Multiple control groups When a treatment and control group are selected in an observational study, they will be similar at baseline in some respects and different in others. This feature gives rise to the possibility that some hidden variable may be accounting for the result. If multiple control groups can be identified and the estimates of the treatment effects are similar when different control groups are used, the researcher s confidence that the treatment effect is not biased is increased. For example, using a large database, Roos et al. (1978) compared children receiving tonsillectomies (T) with two different comparison groups: (a) children having a matched history of [15:26 10/9/ Alasuutari-Ch24.tex] Paper: a4 Job No: 4997 Alasuutari: Social Research Methods (SAGE Handbook)Page:

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany Overview of Perspectives on Causal Inference: Campbell and Rubin Stephen G. West Arizona State University Freie Universität Berlin, Germany 1 Randomized Experiment (RE) Sir Ronald Fisher E(X Treatment

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

Complier Average Causal Effect (CACE)

Complier Average Causal Effect (CACE) Complier Average Causal Effect (CACE) Booil Jo Stanford University Methodological Advancement Meeting Innovative Directions in Estimating Impact Office of Planning, Research & Evaluation Administration

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy Jee-Seon Kim and Peter M. Steiner Abstract Despite their appeal, randomized experiments cannot

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016 Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP233201500069I 5/2/2016 Overview The goal of the meta-analysis is to assess the effects

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine Confounding by indication developments in matching, and instrumental variable methods Richard Grieve London School of Hygiene and Tropical Medicine 1 Outline 1. Causal inference and confounding 2. Genetic

More information

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews Non-Experimental Evidence in Systematic Reviews OPRE REPORT #2018-63 DEKE, MATHEMATICA POLICY RESEARCH JUNE 2018 OVERVIEW Federally funded systematic reviews of research evidence play a central role in

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments

Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Can Nonrandomized Experiments Yield Accurate Answers?

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

Approaches to Improving Causal Inference from Mediation Analysis

Approaches to Improving Causal Inference from Mediation Analysis Approaches to Improving Causal Inference from Mediation Analysis David P. MacKinnon, Arizona State University Pennsylvania State University February 27, 2013 Background Traditional Mediation Methods Modern

More information

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_ Journal of Evaluation in Clinical Practice ISSN 1356-1294 Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_1361 180..185 Ariel

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity Score Methods for Causal Inference with the PSMATCH Procedure Paper SAS332-2017 Propensity Score Methods for Causal Inference with the PSMATCH Procedure Yang Yuan, Yiu-Fai Yung, and Maura Stokes, SAS Institute Inc. Abstract In a randomized study, subjects are randomly

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

RESEARCH METHODS. Winfred, research methods,

RESEARCH METHODS. Winfred, research methods, RESEARCH METHODS Winfred, research methods, 04-23-10 1 Research Methods means of discovering truth Winfred, research methods, 04-23-10 2 Research Methods means of discovering truth what is truth? Winfred,

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop

More information

Should Causality Be Defined in Terms of Experiments? University of Connecticut University of Colorado

Should Causality Be Defined in Terms of Experiments? University of Connecticut University of Colorado November 2004 Should Causality Be Defined in Terms of Experiments? David A. Kenny Charles M. Judd University of Connecticut University of Colorado The research was supported in part by grant from the National

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2 CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2 1 Arizona State University and 2 University of South Carolina ABSTRACT

More information

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

RESEARCH METHODS. Winfred, research methods, ; rv ; rv RESEARCH METHODS 1 Research Methods means of discovering truth 2 Research Methods means of discovering truth what is truth? 3 Research Methods means of discovering truth what is truth? Riveda Sandhyavandanam

More information

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation Institute for Clinical Evaluative Sciences From the SelectedWorks of Peter Austin 2012 Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are

Running head: SELECTION OF AUXILIARY VARIABLES 1. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are Running head: SELECTION OF AUXILIARY VARIABLES 1 Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal Felix Thoemmes Cornell University Norman Rose University

More information

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness

More information

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Can Quasi Experiments Yield Causal Inferences? Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Sample Study 1 Study 2 Year Age Race SES Health status Intervention Study 1 Study 2 Intervention

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis December 20, 2013 Abstract Causal analysis in program evaluation has largely focused on the assessment of policy effectiveness.

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration 208 International Journal of Statistics in Medical Research, 2015, 4, 208-216 Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration Carrie Hosman 1,* and Hitinder S.

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Improving Causal Claims in Observational Research: An Investigation of Propensity Score Methods in Applied Educational Research

Improving Causal Claims in Observational Research: An Investigation of Propensity Score Methods in Applied Educational Research Loyola University Chicago Loyola ecommons Dissertations Theses and Dissertations 2016 Improving Causal Claims in Observational Research: An Investigation of Propensity Score Methods in Applied Educational

More information

Structural Approach to Bias in Meta-analyses

Structural Approach to Bias in Meta-analyses Original Article Received 26 July 2011, Revised 22 November 2011, Accepted 12 December 2011 Published online 2 February 2012 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/jrsm.52 Structural

More information

VALIDITY OF QUANTITATIVE RESEARCH

VALIDITY OF QUANTITATIVE RESEARCH Validity 1 VALIDITY OF QUANTITATIVE RESEARCH Recall the basic aim of science is to explain natural phenomena. Such explanations are called theories (Kerlinger, 1986, p. 8). Theories have varying degrees

More information

Experimental and Quasi-Experimental designs

Experimental and Quasi-Experimental designs External Validity Internal Validity NSG 687 Experimental and Quasi-Experimental designs True experimental designs are characterized by three "criteria for causality." These are: 1) The cause (independent

More information

Causal Inference in Randomized Experiments With Mediational Processes

Causal Inference in Randomized Experiments With Mediational Processes Psychological Methods 2008, Vol. 13, No. 4, 314 336 Copyright 2008 by the American Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0014207 Causal Inference in Randomized Experiments With Mediational

More information

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings Authors and Affiliations: Peter M. Steiner, University of Wisconsin-Madison

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs Alan S. Gerber & Donald P. Green Yale University From Chapter 8 of Field Experimentation: Design, Analysis,

More information

Logistic regression: Why we often can do what we think we can do 1.

Logistic regression: Why we often can do what we think we can do 1. Logistic regression: Why we often can do what we think we can do 1. Augst 8 th 2015 Maarten L. Buis, University of Konstanz, Department of History and Sociology maarten.buis@uni.konstanz.de All propositions

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

The use of quasi-experiments in the social sciences: a content analysis

The use of quasi-experiments in the social sciences: a content analysis Qual Quant (2011) 45:21 42 DOI 10.1007/s11135-009-9281-4 The use of quasi-experiments in the social sciences: a content analysis Marie-Claire E. Aussems Anne Boomsma Tom A. B. Snijders Published online:

More information

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M. EdPolicyWorks Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings Vivian C. Wong 1 & Peter M. Steiner 2 Over the last three decades, a research design has emerged

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational data Improving health worldwide www.lshtm.ac.uk Outline Sensitivity analysis Causal Mediation

More information

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters. Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we talk about what experiments are, we

More information

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research

Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 3 11-2014 Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research Jehanzeb R. Cheema University

More information

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation Formative and Impact Evaluation Formative Evaluation 2 An evaluation designed to produce qualitative and quantitative data and insight during the early developmental phase of an intervention, including

More information

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Quantitative Methods. Lonnie Berger. Research Training Policy Practice Quantitative Methods Lonnie Berger Research Training Policy Practice Defining Quantitative and Qualitative Research Quantitative methods: systematic empirical investigation of observable phenomena via

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

A Potential Outcomes View of Value-Added Assessment in Education

A Potential Outcomes View of Value-Added Assessment in Education A Potential Outcomes View of Value-Added Assessment in Education Donald B. Rubin, Elizabeth A. Stuart, and Elaine L. Zanutto Invited discussion to appear in Journal of Educational and Behavioral Statistics

More information

Introduction to Meta-Analysis

Introduction to Meta-Analysis Introduction to Meta-Analysis Nazım Ço galtay and Engin Karada g Abstract As a means to synthesize the results of multiple studies, the chronological development of the meta-analysis method was in parallel

More information

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews J Nurs Sci Vol.28 No.4 Oct - Dec 2010 Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews Jeanne Grace Corresponding author: J Grace E-mail: Jeanne_Grace@urmc.rochester.edu

More information

Title:Bounding the Per-Protocol Effect in Randomized Trials: An Application to Colorectal Cancer Screening

Title:Bounding the Per-Protocol Effect in Randomized Trials: An Application to Colorectal Cancer Screening Author's response to reviews Title:Bounding the Per-Protocol Effect in Randomized Trials: An Application to Colorectal Cancer Screening Authors: Sonja A Swanson (sswanson@hsph.harvard.edu) Oyvind Holme

More information

The Logic of Causal Inference

The Logic of Causal Inference The Logic of Causal Inference Judea Pearl University of California, Los Angeles Computer Science Department Los Angeles, CA, 90095-1596, USA judea@cs.ucla.edu September 13, 2010 1 Introduction The role

More information

Lecture Notes Module 2

Lecture Notes Module 2 Lecture Notes Module 2 Two-group Experimental Designs The goal of most research is to assess a possible causal relation between the response variable and another variable called the independent variable.

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

CHAMP: CHecklist for the Appraisal of Moderators and Predictors CHAMP - Page 1 of 13 CHAMP: CHecklist for the Appraisal of Moderators and Predictors About the checklist In this document, a CHecklist for the Appraisal of Moderators and Predictors (CHAMP) is presented.

More information

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

2. How do different moderators (in particular, modality and orientation) affect the results of psychosocial treatment?

2. How do different moderators (in particular, modality and orientation) affect the results of psychosocial treatment? Role of psychosocial treatments in management of schizophrenia: a meta-analytic review of controlled outcome studies Mojtabai R, Nicholson R A, Carpenter B N Authors' objectives To investigate the role

More information

Institute for Policy Research, Northwestern University, b Friedrich-Schiller-Universität, Jena, Germany. Online publication date: 11 December 2009

Institute for Policy Research, Northwestern University, b Friedrich-Schiller-Universität, Jena, Germany. Online publication date: 11 December 2009 This article was downloaded by: [Northwestern University] On: 3 January 2010 Access details: Access Details: [subscription number 906871786] Publisher Psychology Press Informa Ltd Registered in England

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t 1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement the program depends on the assessment of its likely

More information

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8 Experimental Design Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1 Problems in Experimental Design 2 True Experimental Design Goal: uncover causal mechanisms Primary characteristic: random assignment to

More information

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Evidence- and Value-based Solutions for Health Care Clinical Improvement Consults, Content Development, Training & Seminars, Tools

Evidence- and Value-based Solutions for Health Care Clinical Improvement Consults, Content Development, Training & Seminars, Tools Definition Key Points Key Problems Bias Choice Lack of Control Chance Observational Study Defined Epidemiological study in which observations are made, but investigators do not control the exposure or

More information

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Nevitt & Tam A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Jonathan Nevitt, University of Maryland, College Park Hak P. Tam, National Taiwan Normal University

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use.

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use. Table A. The Risk Of Bias In Non-romized Studies of Interventions (ROBINS-I) I) assessment tool The ROBINS-I tool is reproduced from riskofbias.info with the permission of the auths. The tool should not

More information

CHAPTER LEARNING OUTCOMES

CHAPTER LEARNING OUTCOMES EXPERIIMENTAL METHODOLOGY CHAPTER LEARNING OUTCOMES When you have completed reading this article you will be able to: Define what is an experiment Explain the role of theory in educational research Justify

More information