Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach

Similar documents
Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Empirical Strategies

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

What is: regression discontinuity design?

Methods for Addressing Selection Bias in Observational Studies

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Propensity Score Analysis Shenyang Guo, Ph.D.

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Introduction to Applied Research in Economics

The Limits of Inference Without Theory

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Estimating average treatment effects from observational data using teffects

Class 1: Introduction, Causality, Self-selection Bias, Regression

A note on evaluating Supplemental Instruction

Key questions when starting an econometric project (Angrist & Pischke, 2009):

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Quasi-experimental analysis Notes for "Structural modelling".

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Instrumental Variables Estimation: An Introduction

Introduction to Applied Research in Economics

ECON Microeconomics III

Title: New Perspectives on the Synthetic Control Method. Authors: Eli Ben-Michael, UC Berkeley,

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Instrumental Variables I (cont.)

Logistic regression: Why we often can do what we think we can do 1.

Propensity Score Analysis and Strategies for Its Application to Services Training Evaluation

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Epidemiological study design. Paul Pharoah Department of Public Health and Primary Care

Social Change in the 21st Century

Impact Evaluation Toolbox

Causal Inference Course Syllabus

Propensity Score Matching with Limited Overlap. Abstract

Lecture II: Difference in Difference and Regression Discontinuity

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Establishing Causality Convincingly: Some Neat Tricks

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

Moving to an Interdisciplinary Perspective on the Causes and Consequences of Unemployment

Cross-Lagged Panel Analysis

Carrying out an Empirical Project

Practical propensity score matching: a reply to Smith and Todd

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Econometric analysis and counterfactual studies in the context of IA practices

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

Cancer survivorship and labor market attachments: Evidence from MEPS data

Asking and answering research questions. What s it about?

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Threats and Analysis. Shawn Cole. Harvard Business School

Unit 1 Exploring and Understanding Data

Empirical Strategies

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

The RoB 2.0 tool (individually randomized, cross-over trials)

Technical Specifications

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

The degree to which a measure is free from error. (See page 65) Accuracy

Threats and Analysis. Bruno Crépon J-PAL

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

Lecture 4: Research Approaches

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Following in Your Father s Footsteps: A Note on the Intergenerational Transmission of Income between Twin Fathers and their Sons

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

THE DETERRENT EFFECTS OF CALIFORNIA S PROPOSITION 8: WEIGHING THE EVIDENCE

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

A Study of the Spatial Distribution of Suicide Rates

Session 1: Dealing with Endogeneity

CHAPTER LEARNING OUTCOMES

Political Science 15, Winter 2014 Final Review

Causal Inference in Observational Settings

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Introduction to Observational Studies. Jane Pinelis

Regression Discontinuity Suppose assignment into treated state depends on a continuously distributed observable variable

26:010:557 / 26:620:557 Social Science Research Methods

Empirical Strategies 2012: IV and RD Go to School

SUPPLEMENTARY INFORMATION

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

TRIPLL Webinar: Propensity score methods in chronic pain research

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

Examining Relationships Least-squares regression. Sections 2.3

EC352 Econometric Methods: Week 07

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

Regression Discontinuity Analysis

Econ 270: Theoretical Modeling 1

Experimental Research Design

Transcription:

Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach Magnus Bygren and Ryszard Szulkin The self-archived version of this journal article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139601 N.B.: When citing this work, cite the original publication. Bygren, M., Szulkin, R., (2017), Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach, Scandinavian Journal of Public Health, 45, 50-55. https://doi.org/10.1177/1403494817702338 Original publication available at: https://doi.org/10.1177/1403494817702338 Copyright: SAGE Publications (UK and US) http://www.uk.sagepub.com/home.nav

Using Register Data to Estimate Causal Effects of Interventions: An Ex Post Synthetic Control Group Approach Magnus Bygren 1, 2 and Ryszard Szulkin 1 1 Department of Sociology, Stockholm University 2 Institute for Analytical Sociology, Linköping University Abstract Aims: It is common in the context of evaluations that participants have not been selected on the basis of transparent participation criteria, and researchers and evaluators many times have to make do with observational data to estimate effects of job training programs and similar interventions. The techniques developed by researchers in such endeavors are useful not only to researchers narrowly focused on evaluations, but also to social and population science more generally, as observational data overwhelmingly is the norm, and the endogeneity challenges encountered in the estimation of causal effects with such data are nontrivial. The aim of this article is to illustrate how register data can be used strategically to evaluate programs and interventions, and to estimate causal effects of participation in these. Methods: We use propensity score matching on pre-treatment period variables to derive a synthetic control group, and use this group as a comparison to estimate the employment treatment effect of participation in a large job training program. Results: We find the effect of treatment to be small, positive, but transient. Conclusions: Our method reveals a strong regression to the mean effect, extremely easy to interpret as a treatment effect, had a less advanced 1

design been used (e.g., a within-individual panel data analysis), and illustrates one of the unique advantages of using population register data for research purposes. Key words: programme evaluation, job training, unemployment, endogeneity, confounding, propensity score matching, synthentic control group, casual effects, regression to the mean Introduction It is commonplace in the context of evaluations of programs aiming at improvement of some aspect of individual lives that the groups of participants have not been created on the basis of transparent participation criteria, why researchers and evaluators many times have to make do with observational data to estimate effects of interventions. The techniques developed by researchers in such endeavors are, thus, useful not just to researchers narrowly focused on programs, but also to social and population sciences more generally, as observational data overwhelmingly is the norm, and the challenges encountered in the estimation of causal effects with such data are not trivial. 1, 2 The aim of this article is to illustrate how register data can be strategically used to evaluate such programs, and provide causal estimates of participation in these. Our empirical case is the European Social Fund in Sweden s so-called program area two, which was implemented during 2007-2013. In this program area, the Social Fund financed projects aiming at improving the chances of employment for individuals at a substantial distance from the labor market: individuals who had been unemployed for at least one year, among whom job-seekers of foreign background, and young job-seekers were prioritized. In total, the Swedish ESF- Council financed 515 projects in this program area, mainly with municipalities as project owners. The funding provided by the Social Fund for these projects amounted in total to 463 million Euros. 3 When an individual has participated in a project, the initial and outcome values regarding the individual s labor market status are usually known. It is however well known that attempting to assess the effectiveness of measures through a before-after design can lead to 2

incorrect conclusions, as the outcome after participation in a given program is virtually never exclusively caused by the content of the program. Selection on unobservables into participation and time-varying conditions in the economy may play a central role for the outcome. Job-seekers who participate in a program may be selected on motivation and ability to find employment, biasing the estimated effect of participation. A shift in the economic cycle independently influences the likelihood that participants will be able to find employment. Although the terminology differs across disciplines, researchers are well aware of the fact that with observational data, the effect of a program is easily mixed up with effects of unknown factors. In the sociological literature, it is usually discussed as spuriousness, in econometrics as endogeneity, and in epidemiology as confounding. The relevant counterfactual question to ask is: What would have happened to these individuals had they not participated in the program in question? Since it is not possible to observe the outcome of individuals participation and non-participation simultaneously, it is necessary to find a relevant group with whom the participants can be compared, the golden standard being randomized controlled trials, where participants are randomly assigned either to the training program that is the object of the evaluation, or to a control group. The random assignment of individuals to the experiment and control groups respectively means that the distribution of observed and unobserved individual characteristics and experiences that may be of relevance to the outcome are expected to be equivalent at the group level. If the experiment group, following participation in the program, transitions into work to a greater extent than the control group, this may be regarded as evidence of a positive causal effect of participation in the specific job program examined. Sometimes the researcher have access to data where some naturally occurring randomization has taken place, which can be used as an instrument, but good instruments are rare. Usually, groups of participants and nonparticipants have been created on the basis of less transparent participation criteria. To overcome these difficulties, we use a method similar to the so-called synthetic control method. 4 Our approach has several advantages. The control group is selected in a data-driven 3

manner, considerably reducing arbitrariness. It is a weighted average of individuals that in the aggregate is very close to the participant group on a large combination of relevant pre-participation characteristics and experiences. Neither the total adult population, nor a control group selected on a single variable (e.g., all unemployed) would come nearly as close to serve as an as meaningful control group. Unlike within-individual panel designs, the synthetic control group method is better adapted to take into account effects of unobservable confounders that vary over time, in particular when there is selection on the dependent variable into treatment. The reason is that unlike the treatment group during the pre-treatment period, a carefully selected control group is as likely as the treatment group to experience unobserved positive events increasing their job chances during post-treatment. Prior to participation, such unobserved positive events are by definition rare in the treatment group (otherwise they would not be treated), making the treatment group in the pre-treatment condition the implicit control group in within-individual panel designs far from appropriate. We will elaborate on this point in the results section. Data and Analytical Strategy The data employed in the empirical analyses, which has been a part of the official evaluation of ESFprogrammes, cover the period 2004 2011, and include all individuals who participated in an ESFfinanced training program during the years 2008 2010 (N = 41,590) and a matched sample of nonparticipant individuals aged between 16 and 64, selected from Statistics Sweden s population registers (N = 247,419). 5 The non-participant individuals were selected from a 20 percent random sample of the total adult population residing in Sweden during 2008-2010 (about 1.1 million individuals). None of these individuals participated in an ESF-financed project, but our matching procedure made sure they shared similar characteristics and experiences with the participant group and, on average, were equivalent to this group on several crucial variables. For each individual who participated in an ESF-financed project, we used propensity score matching to select the participant s ten nearest neighbours on certain observed characteristics and events that are correlated with 4

participation, using the Stata module PSMATCH2. 6 The same individual may be selected as a nearest neighbour to more than one participant, i.e. the matching process was conducted with replacement. We based the matching on an additive participation prediction (logit) model that included the following independent variables, where t represents the time of the matching year, t 1 the matching year minus 1, t 2 the matching year minus 2, and so on: (1) Dummy variable for employment status (in work yes/no) for t, t 1, t 2, t 3 and t 4. (2) Dummy variable for income rank in percentiles for t, t 1, t 2, t 3 and t 4 (income of work/ business activity) with indicators for the following percentile groups: 0, 1, 2, 3, 4, 5-6, 7-8, 9-10, 11-12, 13 14, 15 20, 21 30, 31 40, 41 50, 61 60, 61 70, 71 100. (3) Dummy variable for municipality of residence at t (4) Dummy variable for educational level (7 categories, ranging from compulsory to postgraduate) at t (5) Length of residence in Sweden at t (dummy variable with 6 categories: since birth; 0-2 years; 2-5 years; 5-10 years; 11-20 years; 20+ years). (6) Dummy variable indicating country of birth. (7) Indicator for gender. Since a not inconsiderable proportion of the participants are comprised of youths and recently arrived immigrants, register data for these are often unavailable for one or more years during the year(s) in question. In order to produce a matched group that was equivalent with regard to this, we coded missing data as a separate category, and balanced the groups on these values. A balancing test indicated that the participation and control group were well balanced on the variables employed in the matching. 5 It should be noted that we can only match on information available in registers. Although motivation and ability most probably affect employment chances, we are only able to take into account the influence of such factors insofar as they are captured by the employment history and other observables of individuals. Relatedly, our matching on missingness as a separate category hinges on the assumption that individuals with missing data in the treatment group do not differ on unknown parameters from individuals with missing data in the control group. The outcome we are interested in is the extent to which participants in ESF-financed projects improved their chances of obtaining work subsequent to project participation, relative to the control group. Employment is a dichotomous variable, with the value zero representing a lack of 5

income from work and the value one indicating that the individual has had some income from work during the year in question. We checked the robustness of our results using income rank as dependent variable, and the pattern of results were very similar to those reported below. We account for the matching with replacement in the calculation of standard errors, using the method detailed in [ 7 ]. Results In Table 1 we present mean values of employment for the control group and also the extent to which the corresponding values of the ESF-participants deviate from these control-group means prior to and subsequent to the ESF-participants entry into a project. The table shows that there is no major difference between the groups prior to project participation, indicating that the matching was successful. For both groups, the employment level initially lies at just over 20 percent, and then decreases to 14 15 percent in the year in which the ESF-participants are first assigned to an ESFfinanced project. During the follow-up years the proportion of ESF-participants in employment increased substantially, from 14.7 percent in the year of project entry to 37.3 percent three years later. When we shift the focus to the control group, however, we see that the trend in the proportion in employment was very similar also in this group. Here the proportion in employment increased from 14.4 percent in the year during which their twins entered ESF-financed projects, to 37.0 percent three years later, which is thus much the same as the proportion noted among the ESFparticipants at this point. A somewhat larger proportion of the ESF-participants were in employment during the intervening years, however. This difference is clearly visible and statistically significant, but is relatively small, at a maximum equal to a 2.2 percentage point advantage. Thus, the estimated positive effect of participation is relatively small, and transitory. The substantial improvement that follows participation appears to be something that also characterizes the matched group of twins 6

that were selected on the basis of the ESF-participants characteristics and experiences during the period prior to participation. To check whether the recovery in both groups was a result of some kind of idiosyncratic shock, we also employ a form of placebo analysis. We repeat our analysis, in combination with matching, the difference being that this time the groups are matched on the basis of data relating to the individuals situation up to one year prior to the participants entering the ESFfinanced projects. It should thus be noted that in this instance, the matching process produces a control group that is comprised of new individuals. The placebo line in Figure 1 shows that the recovery in this control group begins immediately in the year following the final year on which the groups were matched on the relevant characteristics and experiences. Thus, one clear result is that the employment rate increases dramatically during the years subsequent to participation as well as subsequent to placebo participation. We do not believe that these recoveries are due to all groups participating in some form of measure that has served to raise them up out of the marginalized position in which they found themselves. That explanation begs the question of why the recovery always starts immediately after the control group and the placebo group is left to their own devices. We find the evidence pointing in a different direction. If an analysis is focused on a group that is characterized by extreme values (in our case extreme low values) on the dependent variable at the first observation point (t), it is highly likely that the same group will have moved towards the mean value for the same variable (which would in our case involve a significant improvement on the outcome variable) at the next observation point (t+1). It is conceivable that there may be different reasons for a certain group of individuals having an employment rate of 15 20 percent. Obviously, it may be due to them quite simply lacking certain qualifications or characteristics that are valued by employers. However, it may also be due to in part transitory problems in the form of e.g. illness, substance abuse, negative events in the family, or other negative traumatic events. There is likely a disproportionately large accumulation of problems of this kind in a group of individuals that persist in having an employment rate of just 15 20 percent 7

over a period of several years. Given the role played in everybody s lives by chance occurrences, however, such an accumulation of problems is unlikely to persist over time. In the statistical literature, this well-known phenomenon is labelled regression to the mean, which is a statistical phenomenon that occurs when repeated measurements are made on the same subject or unit of observation, and there is measurement error, and/or true random fluctuations between measurements. The tendency is more powerful the weaker the correlation ρ between two measurement points, in our case the observation points t and t + 1. In our data, employment at point t and employment at point t + 1, ρ =.75. The population mean for employment among individuals with the same characteristics (education, number of years in Sweden, gender, municipality of residence) as the ESF-participants is 57 percent. The average expected regression to the mean between two measurement points can in our case be calculated to amount to 25 percent (1 ρ, i.e. 1.75 =.25). 8 Given that the ESF-group s employment rate is at 15 percent in the year prior to entering an ESF-financed project, the expected mean value for the group in the first year after project entry can be calculated to amount to 15 +.25(57 15) = 25.5, i.e. an employment rate of 25.5 percent. The corresponding expected regression to the mean in year two may be calculated to 25.5 +.25(57 25.5), producing an expected employment rate of 33.4 percent. The calculation for year 3 is 33.4 +.25(57 33.4), i.e. an employment rate of 39.3 percent. As can be seen in Figure 1 (and Table 1), these figures present a good match to the employment rates of both the control group and the ESF-group in the years following entry into an ESF-financed project. Conclusion The task of evaluating the causal effects of programs aiming at improvement of some aspects of individual lives is a difficult one. This is a result of the fact that these programs usually have not been designed in a way that makes it possible to evaluate them in a scientifically acceptable way. On a side note, it is quite problematic that policy makers accept a situation where intentions behind social 8

interventions are the only issue that can be evaluated while the results of these interventions remain obscure. Here we propose a register-based method of evaluating treatment effects of an intervention when the participants have not been recruited on the basis of transparent participation criteria. We create, ex post, a control group matched on central relevant observables in a population panel. Nordic population registers have obvious advantages in the context of evaluation studies like ours. They do not suffer from selective nonresponse, recall error or follow-up attrition bias. In the database individual histories can be followed retrospectively as well prospectively and central variables relevant for the description of labor market outcomes are covered. Our approach is, as far as we can judge, applicable for the evaluation of many other social interventions when individual register data on relevant variables is available. When we compare the employment levels in the control group and the treatment group we find the effect of treatment to be small, positive, and transient. Considering the amount of funding provided to these projects 463 million Euros the value added, in terms of increased employment chances for participants, seems to be meager. Our analysis shows a powerful recovery in the rate of employment and in incomes for treatment as well as control groups in the year following the final year in which the groups were matched with one another. We find convincing evidence that this recovery constitutes a manifestation of what the statistical literature refers to as regression to the mean, which means that units with extreme values tend over time to move towards the population mean. In other words, the major part of the improvement in labor market outcomes that we observe in our analyses should not be interpreted as a causal effect of the programs in which the individuals have participated. Instead, the observed recovery appears largely to be due to a natural tendency for marginalized groups to move closer to their true underlying employment chances. We initially estimated within-individual panel models, and they misled us to believe there were strong positive treatment effects (these results are not reported but are available from the authors on request). That is, the pattern we 9

report in this study would be extremely easy to interpret as a strongly positive causal treatment effect, had a less advanced design been used, and our application illustrates one of the unique advantages of combining a strong design for causal inference with population register data. 10

Tables and Figures Table 1: Employment over time among ESF-participants by comparison with a matched control group. Employment refers to the proportion with work incomes. Matched control group ESF-participants Years prior to/after entry into ESF-financed job training project Employment (percent) Employment percentage point difference relative to control group 4 years prior to entry 22.4-0.3 3 years prior to entry 21.1-0.2 2 years prior to entry 20.8 0.0 1 year prior to entry 15.5 0.3 Year of entry 14.4 0.3 1 year after entry 26.8 1.7** 2 years after entry 33.6 2.2** 3 years after entry 37.0 0.3 Number of observations 247,419 41,590 *p<.05 **p<.01 that coefficient is equal to zero (t-test, two-tailed) 11

Figure 1: Proportion with income from work/business activity, prior to and subsequent to participation in job training program. Control group matched on characteristics during the period -4 to 0 years, and -4 to -1 years, respectively. 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% -4-3 -2-1 0 1 2 3 Year before/after program participation Treated group Placebo control group Control group 12

References 1. Angrist JD and Pischke J-S. Mostly harmless econometrics: an empiricist's companion. Princeton: Princeton University Press, 2009. 2. Morgan SL and Winship C. Counterfactuals and Causal Inference : Methods and Principles for Social Research. Cambridge: Cambridge University Press, 2015. 3. ESF. Nationellt strukturfondsprogram för regional konkurrenskraft och sysselsättning (ESF) 2007-2013. Stockholm: Europeiska socialfonden i Sverige, 2007. 4. Abadie A, Diamond A and Hainmueller J. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California s Tobacco Control Program. Journal of the American Statistical Association. 2010; 105: 493-505. 5. Bygren M, Szulkin R and Lindblom C. The More Things Change, The More They Stay The Same. Institute for Future Studies Research Report. Institute for Future Studies 2014. 6. Leuwen E and Sianesi B. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. 2012. 7. Abadie A and Imbens GW. Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica. 2006; 74: 235-67. 8. Shepherd RJ. Regression to the Mean - A Threat to Exercise Science? Sports Medicine. 2003; 33: 575-84. 13