PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

Similar documents
Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Randomized Controlled Trials Shortcomings & Alternatives icbt Leiden Twitter: CAAL 1

Propensity Score Analysis Shenyang Guo, Ph.D.

Recent advances in non-experimental comparison group designs

Propensity Score Matching with Limited Overlap. Abstract

The Logic of Causal Inference

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

MS&E 226: Small Data

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

TRIPLL Webinar: Propensity score methods in chronic pain research

Practical propensity score matching: a reply to Smith and Todd

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Introduction to Observational Studies. Jane Pinelis

Sensitivity Analysis in Observational Research: Introducing the E-value

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

Instrumental Variables Estimation: An Introduction

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

You must answer question 1.

For general queries, contact

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

Pearce, N (2016) Analysis of matched case-control studies. BMJ (Clinical research ed), 352. i969. ISSN DOI: /bmj.

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Exploring the Impact of Missing Data in Multiple Regression

Propensity Score Methods with Multilevel Data. March 19, 2014

1.4 - Linear Regression and MS Excel

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Analysis of a Medical Center's Cardiac Risk Screening Protocol Using Propensity Score Matching

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Researchers often use observational data to estimate treatment effects

Alan S. Gerber Donald P. Green Yale University. January 4, 2003

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

%CEM: A SAS MACRO TO PERFORM COARSENED EXACT MATCHING

Chapter 17 Sensitivity Analysis and Model Validation

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Political Science 15, Winter 2014 Final Review

Challenges of Observational and Retrospective Studies

Propensity scores: what, why and why not?

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

P E R S P E C T I V E S

Asian Journal of Research in Biological and Pharmaceutical Sciences Journal home page:

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN

Chapter 11. Experimental Design: One-Way Independent Samples Design

Bayesian and Frequentist Approaches

Biostatistics II

Matched Cohort designs.

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Carrying out an Empirical Project

Regression and causal analysis. Harry Ganzeboom Research Skills, December Lecture #5

The University of North Carolina at Chapel Hill School of Social Work

Propensity scores and causal inference using machine learning methods

Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models

T. Kushnir & A. Gopnik (2005 ). Young children infer causal strength from probabilities and interventions. Psychological Science 16 (9):

Causal Inference Course Syllabus

Instrumental Variables I (cont.)

OLS Regression with Clustered Data

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Statistics Mathematics 243

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Lecture II: Difference in Difference. Causality is difficult to Show from cross

PubH 7405: REGRESSION ANALYSIS. Propensity Score

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Treatment Adaptive Biased Coin Randomization: Generating Randomization Sequences in SAS

Regression Discontinuity Design

Improving Asthma Care: An Update for Managed Care

I. Identifying the question Define Research Hypothesis and Questions

Introducing Meta-Analysis. and Meta-Stat. This chapter introduces meta-analysis and the Meta-Stat program.

Promotional content for the

Key questions when starting an econometric project (Angrist & Pischke, 2009):

Introduction to Program Evaluation

Further Properties of the Priority Rule

Detecting Anomalous Patterns of Care Using Health Insurance Claims

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

A Practical Guide to Getting Started with Propensity Scores

Manitoba Centre for Health Policy. Inverse Propensity Score Weights or IPTWs

2.2 Psychologists Use Descriptive, Correlational, and Experimental Research Designs to Understand Behavior LEARNING OBJECTIVES

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Complier Average Causal Effect (CACE)

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Regression Discontinuity Analysis

Estimating average treatment effects from observational data using teffects

UNIT 5 - Association Causation, Effect Modification and Validity

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Underlying Theory & Basic Issues

Proposal to reduce continuing professional development (CPD) audit sample size from 5% to 2.5% from June 2009 and CPD update

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

RESEARCH METHODS. Winfred, research methods,

Introducing a SAS macro for doubly robust estimation

Choose an approach for your research problem

BIOSTATISTICAL METHODS

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients

Cambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level. Published

Pilot & Feasibility Randomized Controlled Trials

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Applied Econometrics for Development: Experiments II

Advanced Handling of Missing Data

Transcription:

PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact Matching (CEM) is a relatively new causal inference technique that allows the researcher to non-parametrically create a matched dataset to evaluate the effect of a treatment. Propensity Score Matching (PSM) is the older, more established technique in the literature. Both methods have been turned into SAS macros which are used by many SAS data scientists. In one particular instance, I was tasked to deal with a dataset generated from pharmacy data where it seemed that, because PSM would not work due to the ratio of cases to controls, the only viable alternative would be the use of regression. However, this alternative was unappealing as it imposes a linear model on the data. Remarkably, I found an N:N CEM was able to match 00% of the case population even though the ratio of cases to controls was close to :. In this paper, I show why PSM was not feasible in this particular instance and why CEM provided a nonparametric alternative to regression. The paper contributes to the growing literature on the subject of non-parametric observational studies, both inside and outside the SAS community, by providing a real-life example of a seemingly intractable problem from the perspective of PSM which could in fact be solved by the use of CEM. Readers of this paper will be introduced to the techniques, and instead of general advice about their applicability, provided a real world example where one method (once implemented in SAS) was clearly preferable to the other. INTRODUCTION Even though causal inference is a relatively new branch of statistics, with Donald Rubin and Paul Rosenbaum s landmark paper The Central Role of the Propensity Score in Observational Studies for Causal Effects 2 only published in 983, there is already a well-developed literature not only on propensity score analysis but other causal inference techniques as well. 3 There is even a wide variety of textbooks available for use in universities. 4 Much of this literature examines only the theoretical properties of different causal inference techniques, but there is also a wealth of papers comparing these approaches from a practical perspective, to which I have contributed previously. While this paper does not try to resolve the question of which causal inference technique should be used in all circumstances, I do point out a real world case in which a particular method, Coarsened Exact Matching (CEM), was applicable where Propensity Score Matching (PSM) was infeasible. This is somewhat of a departure from the overall trend in the literature, where the dimensionality problem 5 is often cited to show why Propensity Score Analysis is the only practical method of choice because exact matching is impossible. 6 The key factor which makes CEM the preferred method is that it allows for N:N matching, which is impossible using a propensity score approach. I believe that, once familiar with CEM, many researchers will find similar instances where they can draw a causal inference which is impossible to find using PSM. In this paper, I will briefly summarize the problem of causal inference, PSM and some reasons why such techniques have become recommended instead of simply performing regressions. I will then explain the mechanics of CEM to those who do not know of this causal inference technique. The rest of the paper will then be devoted to a real-world example that illustrates when CEM is a practical method while PSM is not feasible. I ll then conclude with a summary of the results and recommendations for further reading. THE BASIC PROBLEM OF CAUSAL INFERENCE When a new drug is developed or a particular promotional campaign is run, the researcher often wants to know what the effect of this treatment was. The basic problem of causal inference is that, while one can

evaluate the outcomes of those who received the treatment, one cannot simultaneously observe these same individuals not receiving the treatment and so cannot directly assess the treatment effect. The most statistically rigorous response is to run a randomized experiment, ideally with blocking 7, in which the case and control populations have similar characteristics on all key confounding variables (such as gender, age, or other confounders particular to a certain treatment) because those who received the treatment were selected at random. Thus, if the populations are similar, or the same, on all variables which could have an effect on the outcome the researcher can find the Average Treatment Effect (ATE) simply by evaluating: ATE=E(Y())-E(Y(0)) where stands for those individuals who received the treatment and 0 stands for the control. PROPENSITY SCORE MATCHING The key insight of Rubin and Rubinstein that sparked the modern causal inference revolution is that one can simulate a randomized control trial on retrospective data by knowledge of the mechanism by which participants in the experiment chose to be either case or control. Thus, if outcomes for users of drug A are being compared with users of drug B, one may want to know what comorbidities each patient has, their age, gender, amount of prior year cost, etc. in order to determine who was prescribed drug A versus who was prescribed drug B. Once all the variables are known and measured that contributed to the assignment mechanism, a logistic regression can be run to give the probability that an individual used either drug A or drug B. This predicted probability can then represent the propensity score and patients can be matched based on similar propensity scores to create a simulation of a randomized controlled experiment. One can then find the ATE as: ATE=E(Y() X)- E(Y(0) X) where X are the confounding variables one is matching on. WHY REGRESSION IS NOT A GOOD CAUSAL INFERENCE TECHNIQUE Before learning new and somewhat complicated statistical techniques, or, so to speak, joining the causal inference revolution, one may ask why wouldn t one simply evaluate the outcome by performing a regression with the treatment as a dummy variable and controlling for the confounders by making them independent variables? There are many reasons why this is not a good choice. The most important are: Performing a regression implies unrealistic assumptions about knowledge of the data generation process. 8 How does one know that the outcome is related to the treatment and independent variables linearly (as all linear regression assumes)? Even if the variables are related linearly how is one to know what particular functional form this relationship takes? It is unlikely that when one retrospectively looks at medical claims data, for example, one can be confident that one knows the true process by which the data were generated. 9 There may be individuals in either the case or control cohort who simply had no comparable individual in the contrasting population. Extrapolating results to these individuals is dubious at best. Matching can help alleviate these problems by not making parametric assumptions, excluding participants for whom there is no appropriate match and not making the data fit a particular functional form. PROPENSITY SCORE MATCHING--CONTINUED Although preferable to regression analysis, Propensity Score Matching has itself sparked a considerable literature critical of the plausibility of its assumptions. 0 In particular, what is known as the ignorability assumption has been questioned. This assumption states, loosely, that, conditional on all the variables which determined assignment to case or control the outcome is independent of any other variable. In many instances, such as a health care company trying to figure out why patient X used drug A instead of drug B, this seems like an unrealistic assumption. The distance between what can be known based on 2

medical claims data and the actual patient/doctor decision is just too great to imagine that one has accurately modeled this decision with the variables at hand. Another criticism, the one I would like to examine in detail in this study, is the fact that, since each observation gets its own particular propensity, given the continuous and categorical variables used in the logistic regression, there is no theoretically derived tolerance to use in matching individuals to one another. Even more, if one wants to use all the information possible in an N:N match, it is computationally impossible to calculate the difference in propensity score between all permutations once the number of observations is sufficiently large (>,000). COARSENED EXACT MATCHING Coarsened Exact Matching resolves this issue. In CEM, instead of matching based on a composite propensity to be in either treatment or control in order to create a quasi-randomized sample, one simply coarsens each variable to a reasonable degree of clustering for each variable and then performs an exact match on these coarsened variables. For example, if one has years of education it is probably going to be difficult to match all individuals precisely if one is also matching on gender, age, income, etc. Instead, one can coarsen this variable to less than high school, high school graduate, some college, college graduate, and post-graduate studies and have enough similarity between individuals for the purposes of a particular study. By not computing a composite score for each observation, and then having to select a certain tolerance, one can easily compute how many cases and controls fell into each permutation and then weight the controls based on how many cases they matched to. For example, suppose one had a dataset with only nine observations in which one case matched to two controls and four different cases matched to two other controls. One would assign of a weight of 4/2*4/5=.6 to the 2 controls which matched to four cases and /2*4/5=0.4 to the 2 controls which matched to one case. Each case would receive a weight of one. In this way, all the useful information in the case and control datasets can be included in the analysis, a significant advantage over PSM. When one encounters larger, and thus more realistic, numbers of cases and controls the same method of weighting every control by the number of cases it matched to and giving each case a weight of one is used. Moreover, when the ratio of cases to controls falls somewhere below :3, propensity score matching, even with replacement, can become problematic as one cannot retain enough individuals in the postmatched sample to evaluate the ATE properly. And, as previously stated, N:N matching is often not computationally possible. CASE STUDY OF N:N COARSENED EXACT MATCHING I was given the task at Cigna of evaluating a very large treatment where there were only seven confounders that needed to be controlled for. What made this analysis difficult was the ratio of cases to controls: 525,788 treated to 483,339 controls. Furthermore, the customers had statistically and substantively different averages across these confounders making the p-value of all significance tests far below an alpha level of 0.05 (see table ). A first reaction may be to respond by stating that the analysis cannot be done since there are more treated than controlled. Therefore, even if the populations were relatively balanced across the confounders (which they are not) it would not be feasible to perform a : or N: match simply because there are not enough controls to make a valid comparison. Fortunately, CEM allows for N:N comparisons. Prior to matching, I determined appropriate coarsening on all confounders. Zip codes were divided into five regions of the United States where one could expect to find relatively similar costs. Age was bucketed into six categories which correspond to different stages of development where similar health care costs are experienced. Customer Cost Share was divided into three buckets of low, medium and high. An overall measure of the overall health of customer, called ERG, was divided into 29 different categories after trial and error determined that this led to insignificant differences between case and controls. I then used a Coarsened Exact Matching Macro developed by Professor Gary King and several colleagues. 3

Pre-Matching Statistics on Confounders Treatment Control P-Value N 525,788 483,339 N/A Male 5.64% 52.08% <0.000 Midwest 3.42% 3.75% South 5.25% 54.65% Northeast 9.69% 9.79% <0.000 West 5.53%.8% Other 0.%.0% 7 or Younger 2.78% 2.34% 8 to 24 2.2% 2.9% 25 to 34 5.22% 4.80% <0.000 35 to 44 5.33% 3.89% 45 to 54 32.25% 3.6% 55 to 65 42.2% 45.6% 0%-9% Cost Share 44.77% 42.58% 9%-58% Cost Share 39.97% 4.07% <0.000 >=58% Cost Share 6.35% 5.26% HRA/HSA 29.99% 35.7% <0.000 PHS Plus 70.63% 79.67% <0.000 ERG 2.38 2.47 <0.000 Specialty Med Utilizer 3.58% 3.45% 0.0003 Post-Matching Statistics on Confounders Treatment Control P-Value N 525,788 483,339 N/A Male 5.64% 5.64% Midwest 3.42% 3.42% South 5.25% 5.25% Northeast 9.69% 9.69% West 5.53% 5.53% Other 0.%.% 4

7 or Younger 2.78% 2.78% 8 to 24 2.2% 2.2% 25 to 34 5.22% 5.22% 35 to 44 5.33% 5.33% 45 to 54 32.25% 32.25% 55 to 65 42.2% 42.2% 0%-9% Cost Share 44.77% 44.77% 9%-58% Cost Share 39.97% 39.97% >=58% Cost Share 6.35% 6.35% HRA/HSA 29.99% 29.99% PHS Plus 70.63% 70.63% ERG 2.38 2.38 0.6530 Specialty Med Utilizer 3.58% 3.58% Table. Pre and Post Matching Differences in Treatment and Controls Remarkably, given the differences in the pre-matching population, 00% of both the cases and controls could be matched via the N:N CEM macro. And, while statistical significance tests are not necessarily the best guide to whether confounders have been appropriately controlled for2, all post-matching differences between the confounding variables were statistically insignificant. In other words, without losing a single customer, I had transformed over,000,000 customers, simply by using appropriate weights on each control, into a matched dataset on which I could evaluate the average treatment effect. Without exaggeration, I would describe the transformation of this dataset as near magical. Not only is there no statistically significant difference, there is actually no difference at all on all the confounders except ERG which, because it is continuous, had to allow for some differences within each of the 29 bins. This is accomplished by using 00% of the information in the pre-matched data was used in the match, a result which is almost impossible to achieve in PSM. While I cannot guarantee similar results for all uses of CEM in retrospective analysis, the overwhelming success of the method in this instance should be strong support for its use by others encountering similar datasets where one needs to evaluate the treatment effect with a ratio of case to controls that approximates :. CONCLUSION There have been a lot of previous work comparing both the theoretical and experimental properties of different kinds of causal inference. What I hope I have achieved in this modest contribution, is to point out that N:N Coarsened Exact Matching often makes possible causal inferences which cannot be made with methods, such as PSM, which don t easily allow for N:N matching. REFERENCES 5

Cigna as used herein refers to operating subsidiaries of Cigna Corporation, including Cigna Health and Life Insurance Company and Cigna Health Management, Inc. All Cigna products and services are provided exclusively by such operating subsidiaries 2 Donald B. Rubin and Paul R. Rosenbaum, The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika Vol 70, 983 4-55. https://academic.oup.com/biomet/article/70//4/240879/the-central-role-of-the-propensity-score-in 3 See, for example, Judea Pearl, Madelyn Glymour and Nicholas P. Jewell, 206. Causal Inference in Statistics: A Primer. New York, NY. John Wiley and Sons 4 Examples are: Guido W. Imbens and Donald W. Rubin, Causal Inference for Statistics, Social and Biomedical Science. An Introduction, 205. New York, NY. Cambridge University Press. Shenyang Gao and Mark W. Fraser, 205. Propensity Score Analysis, Statistical Methods and Applications. Thousand Oaks, CA. Sage Publications. 5 The dimensionality problem is that, as the number of confounders increases, it becomes increasingly difficult to find exact matches between case and controls. 6 On this see Gary King and Richard Nielsen, Working Paper, Why Propensity Scores Should Not Be Used for Matching. http://j.mp/sexgvw 7 Idem 8 See Guido W. Imbens and Donald B. Rubin,opus cited. 9 Gary King and Richard Nielsen, opus cited. 0 See Guido W. Imbens and Donald B. Rubin, opus cited pp 257-280. The Macro can be found at: http://gking.harvard.edu/cem 2 See Guido W. Imbens and Donald B. Rubin, opus cited pp.349-358. 3 Idem. ACKNOWLEDGMENTS The author would like to thank Mr. Anthony Sumner and Mr. Jigar Shah for providing me the opportunity to conduct this analysis and for helpful suggestions throughout. RECOMMENDED READING Guido W. Imbens and Donald W. Rubin, Causal Inference for Statistics, Social and Biomedical Science. An Introduction Shenyang Gao and Mark W. Fraser, Propensity Score Analysis, Statistical Methods and Applications. Papers on Coarsened Exact Matching may be found at http://gking.harvard.edu/cem CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Aran Canes Enterprise: Cigna Health and Life Insurance Company Address: 70 Corporate Center Drive City, State ZIP: Raleigh, NC 27607 Work Phone: 99-854-726 E-mail:aran.canes@cigna.com 6

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. 7