Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Similar documents
Propensity Score Matching with Limited Overlap. Abstract

Practical propensity score matching: a reply to Smith and Todd

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Empirical Strategies

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN

A Guide to Quasi-Experimental Designs

Estimating average treatment effects from observational data using teffects

Matching methods for causal inference: A review and a look forward

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

Propensity Score Analysis Shenyang Guo, Ph.D.

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Predicting the efficacy of future training programs using past experiences at other locations

Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Manitoba Centre for Health Policy. Inverse Propensity Score Weights or IPTWs

How should the propensity score be estimated when some confounders are partially observed?

Propensity scores: what, why and why not?

Fundamental Clinical Trial Design

Assessing the impact of unmeasured confounding: confounding functions for causal inference

Imputation classes as a framework for inferences from non-random samples. 1

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Analysis and Strategies for Its Application to Services Training Evaluation

Carrying out an Empirical Project

Early Release from Prison and Recidivism: A Regression Discontinuity Approach *

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template

Syllabus.

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Propensity scores and causal inference using machine learning methods

The Prevalence of HIV in Botswana

Mediation Analysis With Principal Stratification

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Combining the regression discontinuity design and propensity score-based weighting to improve causal inference in program evaluationjep_

Peter C. Austin Institute for Clinical Evaluative Sciences and University of Toronto

Propensity score methods : a simulation and case study involving breast cancer patients.

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Propensity score analysis with the latest SAS/STAT procedures PSMATCH and CAUSALTRT

Rise of the Machines

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Econometric Evaluation of Health Policies

Analysis methods for improved external validity

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

POL 574: Quantitative Analysis IV

Introduction to Observational Studies. Jane Pinelis

Introducing a SAS macro for doubly robust estimation

Methods for treating bias in ISTAT mixed mode social surveys

Implementing double-robust estimators of causal effects

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Too Much Ado about Propensity Score Models? Comparing Methods of Propensity Score Matching

Methodological requirements for realworld cost-effectiveness assessment

MS&E 226: Small Data

Introduction to Survival Analysis Procedures (Chapter)

Impact Assessment of Livestock Research and Development in West Africa: A Propensity Score Matching Approach

Combining machine learning and matching techniques to improve causal inference in program evaluation

Matt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority. SOSUG, April 17, 2014

SUNGUR GUREL UNIVERSITY OF FLORIDA

Bayesian Model Averaging for Propensity Score Analysis

Causal Inference Course Syllabus

MEA DISCUSSION PAPERS

Institute for Policy Research, Northwestern University, b Friedrich-Schiller-Universität, Jena, Germany. Online publication date: 11 December 2009

Bias and high-dimensional adjustment in observational studies of peer effects

A Potential Outcomes View of Value-Added Assessment in Education

Understanding Regression Discontinuity Designs As Observational Studies

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Using Inverse Probability-Weighted Estimators in Comparative Effectiveness Analyses With Observational Databases

You must answer question 1.

Data Analysis Using Regression and Multilevel/Hierarchical Models

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis

In this module I provide a few illustrations of options within lavaan for handling various situations.

Complier Average Causal Effect (CACE)

Statistical Tolerance Regions: Theory, Applications and Computation

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

By: Mei-Jie Zhang, Ph.D.

Title: New Perspectives on the Synthetic Control Method. Authors: Eli Ben-Michael, UC Berkeley,

Biostatistics II

REDUCING BIAS IN VALIDATING HEALTH MEASURES WITH PROPENSITY SCORE METHODS. Xian Liu, Ph.D. Charles C. Engel, Jr., M.D., M.PH. Kristie Gore, Ph.D.

Causal inference with large scale assessments in education from a Bayesian perspective: a review and synthesis

Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment

FROM LOCAL TO GLOBAL: EXTERNAL VALIDITY IN A FERTILITY NATURAL EXPERIMENT. Rajeev Dehejia, Cristian Pop-Eleches, and Cyrus Samii * August 2018

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Supplementary Appendix

Brief introduction to instrumental variables. IV Workshop, Bristol, Miguel A. Hernán Department of Epidemiology Harvard School of Public Health

Is Hospital Admission Useful for Syncope Patients? Preliminary Results of a Multicenter Cohort

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Causal Inference in Statistics and the Quantitative Sciences

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

Rajeev Dehejia* Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic

Transcription:

Effects of propensity score overlap on the estimates of treatment effects Yating Zheng & Laura Stapleton Introduction Recent years have seen remarkable development in estimating average treatment effects in non-experimental designs. Researchers have developed various methods, including matching (Rosenbaum, 1989), regression (Hahn, 1998; Heckman et al., 1998), and propensity score methods (Rosenbaum & Rubin, 1983; Hirano et al., 2003). Among these methods, propensity score methods are popular because 1) compared with matching, they can easily construct matched sets with similar distributions of multiple covariates to facilitate estimation of unbiased treatment effects, 2) they can avoid the assumption violation problems in regression (e.g., the functional form may not correctly specify the relation between the covariates and the outcome for data not observed). However, propensity score methods are not without their limitations. A potential concern is that they require sufficient overlap of the propensity score distributions between the treatment and control groups (Crump et al., 2009), which sometimes may not be the case in practice. However, previous studies have seldom explored what it is sufficient overlap and how it would influence the estimates of the treatment effects. In this study, a simulation study is used to explore the effects of propensity score overlap on the point estimates of treatment effects as well as their sampling variance. Theoretical Framework The propensity score is defined as the probability of receiving the treatment given the observed covariates (Rosenbaum & Rubin, 1983). In general, propensity score methods (matching, weighting and sub-classification) work through five steps: 1) identify baseline confounding covariates that could potentially bias estimates of the treatment effect, 2) calculate propensity scores into treatment using logistic regression (or a nonparametric approach) on the baseline covariates, 3) condition the propensity scores between the treatment and control groups through matching or reweighting of the data, 4) check the conditioning quality (e.g., balance check) of the matched samples, 5) estimate the treatment effects (Stuart, 2010). To obtain reliable estimates of treatment effects, it requires a sufficient overlap between the treatment and control groups. The lack of overlap can lead to imprecise estimates of the treatment effects (Crump et al., 2009) as insufficient overlap implies that the treatment and control groups are not balanced in the covariates. Step 3 aims to address this issue of unbalanced covariate distributions. However, the current methods have limitations. A common way is to discard individuals with propensity scores outside the range of the other group (Grzybowski et al., 2003; Vincent et al., 2002), which may change the population for which the results apply (Crump et al., 2009). Another way is to change the weight, or contribution, of data from participants in the control group, increasing the weights of those with propensity scores similar to the participants in the treatment group and decreasing the weights of individuals with propensity scores different from those in the treatment group (Heckman et al., 1998; Dehejia & Wahba,

1999). A potential drawback of the weighting method is that the variance may be high if the weights are extreme (Stuart, 2010). Few studies have explored the effects of propensity score overlap on the estimates of treatment effects and how imprecise the estimates would be for different levels of overlap. In this study, we use an index to quantify the overlap and explore the reliability and validity of the estimates at different overlap levels. In addition, we also explore the effects of insufficient overlap on the estimates of treatment effects using different propensity score methods (weighting, matching and doubly robust methods) to provide a guide rule about which method performs better under what overlap level. Methods Research Design The data generation process follows a common approach used in prior propensity score simulation studies (Kaplan & Chen, 2011; Craycrot, 2016): 1. generate confounding covariates X 1, X 2 and X 3 from normal and binomial distributions 2. calculate the propensity score (ps) using Eq(1) exp (β! X! + β! X! + β! X! ) ps = Eq(1) 1 + exp (β! X! + β! X! + β! X! ) 3. use Bernoulli distribution with probability of the calculated propensity score to decide treatment assignment 4. calculate the outcome value using Eq(2) Y = α! X! + α! X! + α! X! + α! T Eq 2 where T is an indicator of treatment assignment, control = 0, and treat = 1; α 4 is the true treatment effect. The values of all parameters are listed in Table 1 (In the full presentation, interactions between predictors, and interactions between the treatment assignment and the predictors will be added). Ten thousand replications are run. For each replication, the sample size is 1,000. Table 1. Values of the parameters X 1 ~ Normal(mean1, 1), mean1 ~ Normal(0, 1) X 2 ~ Normal(mean2, 1), mean2 ~ Normal(0.5, 1) X 3 ~ Binomial(1,000, 0.5) β 1 = 0.3, β 2 = 0.4, β 3 = -1 α 1 = 0.4, α 2 = -0.3, α 3 = 0.2, α 4 = 0.15 Note. Mean1 and mean2 are both vectors of size 10,000. Propensity score methods are then used to estimate treatment effects. First, a logistic regression model is run using the generated covariates as predictors and the treatment assignment as the outcome and the fitted model is used to obtain estimated propensity scores. The next step is to calculate the overlap rate of the propensity score distributions, which equals the intersection area of the density plots of the two groups divided by the sum of the area of the two density plots (the intersection area is only counted once). For example, in Figure 1, the overlap rate is the ratio of area 2 over the sum of areas 1, 2 and 3. As the overlap rate is empirically defined, we cannot control the number of replications

for each overlap rate level. Finally, different propensity score methods are used to estimate the treatment effect. For propensity score weighting, the method of weighting by the odds (WBO) is used, calculated as: w! = T! + (1 T!) e! 1 e! Eq(3) where w! is the weight for subject i, T! is an indicator about whether subject i received the treatment, and e! is the estimated propensity score for subject i. 1 3 2 Figure 1. Propensity score distributions of treatment and control groups. Analysis The estimated treatment effect is calculated as the average group mean difference after matching/weighting/sub-classification. Relative bias (the proportional difference between the true and estimated treatment effect) and variance (variance of the estimated treatment effects for a specific overlap level over replications) are used to measure the performance of the estimates. Preliminary Results The results from propensity score weighting shows that, in general, as the overlap rate increases the variance of the estimates decreases (see Figure 2) which is consistent with the findings of previous studies (Stuart, 2010); we quantify this decrease in variance for different overlap levels. Regarding bias, when the overlap rate is extremely small (<0.2), the relative average bias is comparatively large (see Table 2) but when the overlap rate goes beyond 0.2, the relative average bias becomes small. This implies that we need to be cautious about using propensity score weighting method to estimate treatment effects when the propensity score overlap rate is smaller than 0.2. A possible reason for the comparatively higher bias at low overlap levels is that the WBO method does not exclude control individuals who are very different from those receiving treatment. Although their weights are decreased, inclusion of a large amount of people with very different propensities, which is the case at low overlap rates, may bias the estimates. In

the full presentation, other propensity score methods (e.g., matching, doubly robust methods) will be explored as well as inclusion of interactions within the treatment effect generation model (Eq2). Results from different methods will be compared to provide guidelines about which method is recommended under what overlap rate. Figure 2. Relationship between bias and estimated overlap rate. Table 2. Average bias and variance of the estimated treatment effect overlap level N Relative mean bias Variance* [0, 0.1) 1700 9.9% 9713.1 [0.1, 0.2) 2591 1.9% 2146.8 [0.2, 0.3) 2440 0.3% 565.7 [0.3, 0.4) 1539 0.2% 172.6 [0.4, 0.5) 866 0.5% 54.2 [0.5, 0.6) 498 0.1% 17.0 [0.6, 0.7) 244 0.4% 7.2 [0.7, 0.8) 91 0.2% 2.9 [0.8, 0.9) 29 0.1% 0.5 [0.9, 1] 2 0.3% 0.0 Note. N is the number of replications with empirical overlap rates in the category listed. Extremely high overlap rates are difficult to obtain given the generation model in Eq1, so the frequency of replications with high overlap levels is very small. Relative mean bias is the ratio of mean bias over true treatment effect, where true treatment effect is 0.15 in this case. Variance has been rescaled by a factor of 100,000.

References Crump, R. K., Hotz, V. J., Imbens, G., W. & Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199. Dehejia, R. H. & Wahba, S. (1999). Causal effects in nonexperimental studies: Reevluating the evaluation of training programs. Journal of the American Statitstical Association, 94(448), 1053-1062. Grzybowski, M., Clements, E. A., Parsons, L., Welch, R., Tintinalli, A. T. & Ross, M. A. (2003). Mortality benefit of immediate revascularization of acute STT-segement elevation myocardinal infarction in patients with contraindications to thrombolytic therapy: A propensity analysis. Journal of the American Medical Association, 290, 1891-1898. Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66, 315-331. Heckman, J., Ichimura, H. & Todd, P. (1998). Matching as an econometric evaluation estimator. The Reviews of Economic Studies, 65, 261-294. Hirano, K., Imbens, G. W., & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4), 1161-1189. Kaplan, D. & Chen, C. J. (2011). Bayesian propensity score analysis: Simulation and case study. Presentation at the annual conference of Society of Research on Educational Effectiveness, Washington D. C.. Rosenbaum, P. R., & Rubin, D. B. (1983). The central of the propensity score in observational studies for casual effects. Biometrika, 70(1), 41-55. Rosenbaum, P. R. (1989). Optimal matching in observational studies. Journal of the American Statistical Association, 84, 1024-1032. Stuart, E. A. (2010). Matching methods for casual inference: A review and a look forward. Statistical Science, 25(1), 1-21. Vincent, J. L., Baron, J., Reinhart. K., Gattinoni, L., Thijs, L., Webb, A., Meier- Hellmann, A., Nollet, G. & Peres-Bota, D. (2002). Anemia and blood transfusion in critically ill patients. Journal of the American Medical Association, 288, 1499-1507.