Rules Versus Discretion in Social Programs: Empirical Evidence on Profiling In Employment and Training Programs

Size: px

Start display at page:

Download "Rules Versus Discretion in Social Programs: Empirical Evidence on Profiling In Employment and Training Programs"

Jasper Shepherd
5 years ago
Views:

1 Rules Versus Discretion in Social Programs: Empirical Evidence on Profiling In Employment and Training Programs Miana Plesca University of Guelph Jeff Smith University of Maryland This version: January 2005 We thank the CIBC Chair in Human Capital and Productivity and the Social Sciences and Humanities Research Council of Canada for research support. We are grateful to Audra Bowlus, Bo Honor, Chris Robinson, Todd Stinebricker and Tiemen Woutersen for helpful comments. 1

2 Table of Contents Abstract 1. Introduction 2. Statistical Treatment Rules 2.1 Profiling as an allocation mechanism 2.2 The choice of a profiling variable 2.3 Empirical evidence on profiling on program outcomes 2.4 Empirical evidence on profiling on program impacts 3. The NJS Data 4. Estimation Methodology 4.1 Profiling on largest predicted impacts 4.2 Profiling specifications 4.3 Profiling on small Y Profiling at positive sites 4.5 Profiling into JTPA from Application 4.6 Profiling into JTPA from Eligibility 4.7 Profiling into Alternative Treatment Streams within JTPA 5. Results from profiling into JTPA 5.1 Profiling into JTPA on predicted program impacts Different covariate sets used in profiling Profiling based on positive predicted earnings impacts Profiling based on positive predicted employment impacts 5.2 Profiling into JTPA on predicted non-treatment outcomes 2

3 Profiling based on low predicted non-treatment earnings Profiling based on low predicted non-treatment employment Sensitivity to different cutoffs in outcome profiling 5.3 Sensitivity to different proportions of splitting the estimating and validating samples 5.4 Conclusions from profiling into JTPA Profiling performance Efficiency versus equity Specifications 6. Profiling into JTPA at positive sites 7. Profiling into JTPA from Application 8. Profiling into JTPA from Eligibility 9. Profiling into Alternative Treatment Streams within JTPA 9.1 Profiling pooled adult males and females into JTPA treatment streams Profiling the pooled sample on predicted earnings impacts Profiling the pooled sample on predicted employment impacts 9.2 Profiling separately adult males and adult females into JTPA treatment streams Profiling adult males into treatment streams on high treatment outcomes Y Conclusions from profiling into treatment streams 10. Summary and Conclusions References Main result tables Appendix 1: Results from profiling young males and young females Appendix 2: Some sensitivity results 3

4 Abstract A substantial body of empirical evidence indicates that public employment and training programs generally fail to achieve their stated goal of increasing the earnings and employment of those they serve. One possible solution for improving the performance of re-employment and training programs is to use statistical models to profile potential participants either into or out of a program or to alternative treatments within a program. Because of heterogeneity in program impacts, profiling based on predicted program impacts can lead to a more efficient allocation mechanism. Using experimental data from the National JTPA Study we evaluate the potential improvement in program performance from profiling into the JTPA program. We find that statistical profiling improves program impacts for adult males and females, but does not improve the impacts for young males and females. Compared to JTPA participants, we find the same profiling impacts for program applicants, as well as for adult female eligible non-applicants. For adult male eligible nonapplicants profiling impacts are worse. In general, profiling on large predicted impacts is more efficient than profiling on low predicted non-participation outcomes. Program performance can also be improved by eliminating negative mean impact sites. We also provide evidence on the potential benefits from profiling into alternative treatment streams conditional on program participation. For adult females, profiling into treatment streams improves program performance in all treatment streams, but not so for adult males. Caseworker assignment and random assignment allocations give similar predicted program impacts, which are less efficient than the statistical profiling ones for those instances where statistical profiling is an efficiency-improving mechanism. 4

5 1 Introduction Profiling methods seek to improve program performance by using statistical decision rules to assign individuals to participate in a program or not, or to assign participants to particular treatment streams within a program. Any program has a social agenda, be that equity - to serve those most in need - or efficiency - to serve those who would benefit most from the program. Given the goal of the program, policymakers face the important choice of an allocation mechanism to assign participation. Statistical profiling is one such possible treatment allocation mechanism. Other possibilities include deterministic rules which base assignment on one or more observable characteristics (e.g. race in affirmative action programs), or caseworker discretion. Within program eligibility rules caseworker assignments base the allocation on participant characteristics, observed and unobserved (to the statistician), at the discretion of program staff. In this paper we examine how the impact of a program can be improved by selecting program participants based on statistical profiling mechanisms which exploit the heterogeneity in outcomes and impacts across individuals. In practice governments have implemented profiling on outcome levels, by assigning to treatment individuals with low outcomes in the absence of treatment. 1 This kind of profiling serves an equity goal. If efficiency is the desired goal, then the allocation mechanism of choice ought to be statistical profiling based on predicted program impacts. The mechanism relies on predicting for each individual his or her gains in every state of a program (participation, non-participation, treatment streams within a program) and assigning the state that optimize expected gains. Besides comparing program impacts from assignments given by statistical profiling 1 This is the case, for instance, with the U.S. Unemployment insurance profiling system. 5

6 rules, we also investigate how the assignment into particular treatment streams within a program differs between statistical profiling and other assignment mechanisms such as random assignment or caseworker discretion. 2 Even though profiling systems for social programs are fairly new in practice, there are some important examples. For instance, in the U.S. the Unemployment Insurance (UI) system in most states profiles individuals into mandatory reemployment services based on their predicted duration of UI benefit receipt. 3 Although it is sometimes referred as a program that attempts to maximize the total gains of its participants, this allocation system in fact builds on equity concerns. While the UI profiling system is an example of a profiling system that allocates potential participants into a program or not, both the U.S. and Canada are also considering systems that will allocate persons to alternative treatments within a program. In the U.S. the Frontline Decision Support System (FDSS) will allocate persons to treatments funded under the Workforce Investment Act (WIA). In Canada an earlier system called the Service and Outcome Measurement System (SOMS) was intended to allocate unemployed persons to various publicly funded employment and training programs. 4 The goal of any program is to obtain the best results according to some social welfare criterion, be that efficiency or equity (or maximization of reelection probabilities). Conditional on having decided on the allocation mechanism, the choice of a target ( pro- 2 The evaluation of statistical profiling as an allocation mechanism and the evaluation of the program impacts are two separate issues. A good allocation mechanism can make the best it can when implemented on a badly designed program. Similarly, a bad allocation mechanism can misallocate participants and thus deteriorate the impacts of an otherwise attractive program. 3 See, e.g., Dickinson, Decker and Kreutzer (1999) for a detailed description of UI profiling in the U.S. 4 See Chapter 12 on FDSS by Eberts and O Leary and Chapter 10 on SOMMS by Colpitts in Targeting Employment Services, Eberts, O Leary and Wandner, eds., 2002 Upjohn Institute. 6

7 filing ) variable is essential in achieving the program goals. If the goal of the program is efficiency, then the program targets maximum impacts on the outcome of interest, e.g. earnings or employment. 5 If the goal of the program is equity, treatment is administered to those individuals identified as neediest, as in the case of the UI profiling system where those claimants with the highest predicted probabilities to exhaust UI benefits are profiled into treatment. 6 The choice of a profiling variable depends not only on the goal of the program, but also on the availability of data to be used in estimating the profiling model. Depending on the program under evaluation, profiling at different points in the participation decision may achieve different goals. For instance, if a program is running under capacity constraint limitations, profiling the pool of applicants may increase program efficiency by making sure the people who would benefit most from the program get enrolled. If the intention of a program is to implement services for the whole eligible population, not just for individuals self-selected into participation, then the profiling analysis should be carried out at the eligibility stage. Finally, profiling can further improve a program s performance by recommending program participants the treatment stream that would bring them the highest expected gains (possibly conditional on capacity constraints). In the empirical analysis we use data from an influential social experiment conducted in the U.S., the NJS (National JTPA Study) evaluation of the JTPA (Job Training and Partnership Act) program. We first consider profiling individuals into or out of the program at various stages in the application and enrollment process: profiling from random 5 This is a partial equilibrium world where general equilibrium effects such as displacement are ignored. 6 It is rarely the case that the goals of efficiency and equity do not conflict. See Berger, Black, and Smith (2001). 7

8 assignment, from application, and from eligibility. Using experimental data we estimate a heterogeneous program impact function based on a large set of individual characteristics. Under certain assumptions the impact function allows for the forecast of potential program gains for each individual. Based on this forecast we profile individuals either into participation or into non-participation and we re-compute program impacts based on the designed allocation. Likewise, we compute impact functions conditional on personal characteristics for each JTPA treatment stream and we profile participants into the treatment that results in the highest expected gain for each participant. A comparison between the program gains under the current implementation and the average program impacts under profiling indicates the potential for using statistical profiling to improve program performance. We document under what circumstances profiling can improve the efficiency of the JTPA program by allocating treatment those participants who would benefit most from it. Although economically relevant, the results do not always achieve statistical significance because of the small samples involved in estimation (but then, this is a common problem often plaguing the program evaluation literature). From a methodological standpoint, we bring innovations to the program evaluation literature. We introduce a profiling function that generates program impacts as a function of individuals observed covariates X. We note that in small samples the program impact estimators may suffer from over-fitting bias. To avoid the over-fitting bias we introduce a procedure in which we randomly split the sample into an estimating sample and a validating (or holdout) sample. We use the observations in the estimating sample to generate for individuals in the validating sample predicted impact functions on which we base the profiling allocations. We repeat this procedure in a bootstrap fashion enough 8

9 times (500) to account for sampling variations. We report average impact measures (as well as average standard errors) from the 500 repetitions. The paper proceeds as follows. Section 2 discusses the theoretical grounds for statistical treatment rules, the choice of profiling as an allocation mechanism, and existing evidence on profiling. Section 3 describes the experimental data used in this exercise. Section 4 provides the methodological approach. Results for profiling into the program at the point of random assignment are discussed in section 5. Different profiling exercises are undertaken in section 6 (profiling at positive sites), section 7 (profiling from application) and section 8 (profiling from eligibility). Section 9 provides results from the analysis of profiling into treatment streams, and section 10 concludes. 2 Statistical Treatment Rules 2.1 Profiling as an allocation mechanism As argued by Manski (1997), if a program were administered by a benevolent social planner, the allocation that would maximize a utilitarian social welfare function is the same as the allocation that would maximize the utility each individual derives from the program. Abstracting from unobserved individual heterogeneity pertaining to motivation or ability, the allocation of choice for individuals who have access to the same information set as the social planner is also the allocation based on profiling on program impacts. In other words, an individual will chose to participate or not in a program, or in a treatment stream within a program, depending on whether the net outcome from participation is larger or not than the net outcome from non-participation. The central motivation behind this argument is that program impacts are heterogeneous (Manski, 1997 and 2000). Even 9

10 conditional on some broad characteristics like belonging to a certain demographic group, program impacts vary across participants. The idea behind statistical profiling is to try to identify which personal characteristics are responsible for the heterogeneity in individual impacts and in what manner, and to use this knowledge to predict individual responses and best assignments under different treatment scenarios. Statistical profiling is an intermediate allocation strategy between caseworker discretion and a deterministic rule, less ad-hoc than caseworker selection and providing a much finer allocation than the deterministic rule. Caseworker assignment is the strategy at work in most existing multiple-treatment programs. The (hopefully) benevolent caseworker has access to background information, the results of formal tests of aptitudes and of interests, and possibly other variables related to the applicant s motivation and enthusiasm for the project. Based on her available information, the caseworker makes the treatment recommendation she considers will achieve the program goals for the eligible applicant, subject to budgetary and administrative constraints. Caseworker assignment has the virtue that it allows for idiosyncratic information about the client and about the institutional environment - information that may be difficult to include in a statistical decision system or to encode in a deterministic rule - to affect the allocation process. The downside of this allocation mechanism is that program outcomes will depend on subjective decisions by caseworkers who could make mistakes or engage in creamskimming to achieve performance standard goals. There is also considerable variation across caseworkers decisions: caseworkers are not all well informed, or they do not all use 10

11 the same criteria in decision-making. 7 A deterministic rule places individuals in treatment based on observable characteristics, such as means-tested transfer programs, affirmative action programs, or a rule that all welfare participants with no children below an age cutoff must participate in employmentrelated activities. Deterministic rules have the virtue of simplicity and of equity, in the sense of treating observationally equivalent cases in the same way. Statistical profiling is an intermediate case between caseworker discretion and a deterministic rule. Observable characteristics are included in the profiling to assign persons to a program or to treatments within a program, where the importance of each characteristic depends on its estimated relationship with the profiling variable. In practice, profiling results in a finer allocation that incorporates more information than most deterministic rules, at the cost of setting up and operating the profiling system. Relative to caseworker discretion, profiling gives up the use of idiosyncratic information. At the same time, profiling is likely to be less costly and generate fewer concerns about unequal treatment across caseworkers. At one extreme of statistical profiling is the random assignment mechanism, allocating participants at random to a program or into treatment streams. Since random assignment equates the distribution of observables and non-observables between treatment and control groups in large samples, random assignment is used in social experiments to allow unbiased computation of program impacts by simple mean difference in program outcomes between the treatment and control groups. We use this allocation method in computing experimental program impacts, as well as a benchmark in the exercise of profiling partic- 7 See the evidence in Bell and Orr (2002). 11

12 ipants into alternative treatment streams. 2.2 The choice of a profiling variable The selection of the profiling variable depends on the goals of the allocation rule and on the available data (Berger, Black, and Smith, 2001). If the goal of the allocation is to assign the neediest persons to a program, then a profiling variable that correlates positively with need will be used, and individuals with high predicted values of the variable will be assigned to treatment. If efficiency is the goal of the program, then the logical profiling variable is the predicted net impact of the program, whose determinants can be estimated using experimental or non-experimental methods. 8 Profiling requires information on the observable characteristics in the profiling model for everyone who is to be profiled, as well as data on the profiling variable for the sample that will be used to estimate the profiling model. When the profiling variable is equity related - such as expected duration of UI or welfare - the profiling variable itself can typically be obtained from administrative data on earlier cohorts of participants. When the profiling variable is an expected impact of the program, the ideal is to have experimental data on the population to be profiled so that the experimental impacts as a function of observable characteristics can be estimated without a bias. In some cases, the available experimental data will not correspond exactly to the program as currently implemented (perhaps coming from another state) or to the population being profiled (perhaps because of changes in participation over the business cycle). Given a choice of a variable, the choice of which predictor variables to include in the model de- 8 See Heckman, LaLonde and Smith (1999) and Angrist and Krueger (1999) for extended discussions of experimental and non-experimental methods for estimating the impact of social programs. 12

13 pends on the costs of obtaining the data and on how fine-grained an allocation mechanism is desired. Manski (1997, 2000) and Pepper (2000) discuss the issues in detail, and show that in some cases profiling based on poor data may do worse than simple deterministic rules. 2.3 Empirical evidence on profiling on program outcomes The outcome from program participation, Y 1, is the level of the variable of interest, be it income or employment, at the end of the program. Data identifies Y 1 from program participants. The outcome in the absence of the program, Y 0, is the level of the variable of interest in the absence of the program. It is identified by data on non-participants. Profiling on program outcomes, or levels, singles out and assigns to treatment individuals with highest (or lowest) predicted levels of Y 1 or Y 0, while profiling on program impacts assigns individuals with highest gain from the program Y 1 -Y 0. Because of equity concerns, most programs who implement profiling base it on low levels of Y 0. O Leary, Decker and Wandner (2001) look at potential gains from profiling persons into the UI bonus treatment using data from the Washington and Pennsylvania UI reemployment bonus experiments. The experimental impact estimates in both cases were statistically insignificant. The idea is that profiling based on the probability of benefit exhaustion (the same profiling variable currently used in the UI profiling system) would exclude persons most likely to have a short spell even without the bonus and could therefore improve the mean impact of the treatment. Their profiling results are presented as mean experimental impacts conditional on various cutoff levels of the probability of benefit exhaustion. 9 They find that although profiling can increase the cost-effectiveness of the program, the paid UI benefits do not 9 The program outcomes considered are UI benefit receipt, bonus payments, and earnings. 13

14 steadily decline as the cut-off levels increase. Using an ingenious quasi-experimental design, Black, Smith, Berger, and Noel (2003) examine the relationship between the impact of the mandatory reemployment services and the probability of finding employment. Under the UI profiling system implementation in Kentucky, UI claimants receive a profiling score between 1 and 20 that represents their predicted duration of UI benefit receipt from lowest to highest. Individuals are assigned to mandatory reemployment services each week in each local UI office in descending order of the profiling score until the slots for that office in that week are filled. Individuals with the marginal profiling score in each office each week - the score where there are not enough slots to treat everyone - are randomly assigned. If the profiling system were to enhance the efficiency of the allocation mechanism by directing the treatment to those who benefit most from it, the experimental impact estimates should increase with the score. The authors find no evidence that individuals targeted by the program, namely claimants likely to have a long spell of UI receipt, benefit more from the program than those predicted to have shorter spells, and conclude that the goal of the UI profiling system is equity rather than efficiency. 10 An experimental evaluation of a statistical profiling mechanism used to allocate welfare recipients to three different treatment streams was designed at the request of the U.S. Department of Labor (DOL) (Eberts, 2001). The program under consideration was Michigan s implementation of a Work-First program for welfare recipients. The service offered, job search assistance, was uniform across all participants, but the three treatment 10 In fact, the largest point estimate ($1507, not statistically significant) occurs for claimants with a profiling score of 6, while the point estimates for claimants with profiling scores of 17 and 18 are negative (and not significant). 14

15 providers differed in their approach to implementing the services. Participants were assigned a score, the probability of finding and keeping employment for more than 90 days after the treatment. The score estimated from a previous cohort of program participants controlled for education, demographics, and past experience with the program. Based on the score, one third of participants were assigned respectively to one of the three service providers. At this point, half of the participants in each score group, the treatments, were sent to treatment at the assigned provider. The other half, the controls, were randomly assigned to one of the three providers. Results from this evaluation seem to indicate that, conditional on program participation, individuals appear to have better outcomes from the treatment they were originally assigned to. Nothing more can be said about program impacts. The grouping by score does not necessarily equate participants unobservable characteristics (or observables, for that matter) within each group. 11 Moreover if, however unlikely, program impacts were to turn out homogenous across participants, then the allocation distribution would not even matter. Finally, the quality of predictions generated by a profiling model is essential in designing the profiling allocation. Policymakers who elect to use profiling as an allocation mechanism face two additional, interrelated choices: what variable to use as the profiling variable and what variables to include in the profiling model. The more covariates are available, the finer the allocation outcome that can be achieved, but at the cost of collecting the extra variables. In the UI profiling system, states have varied widely in the types and number of additional variables that they have included beyond those suggested in the basic state model. Including 11 The wide difference in outcomes between treatments and controls in the second score group who end up with the same (preferred) provider seems to confirm that there may be systematic differences between the treatment and control groups in score two. 15

16 additional covariates can change the ordering of individuals on the predicted profiling variable and can also make for better predictions. In a recent study requested by the U.S. DOL, Black et al. (2003) document that improving data quality, including enough conditioning variables and, to some extent, using a continuous profiling variable (fraction of UI benefits exhausted) are key ingredients for improving the predictive performance of UI profiling models. 2.4 Empirical evidence on profiling on program impacts Dehejia (2000) evaluates four different allocation mechanisms using results from an experimental evaluation of the GAIN (Greater Avenues for Independence) program in Alameda County, California. The treatment group receives the GAIN program consisting of regular AFDC (Aid for Families with Dependent Children) benefits plus employment and training services (mostly job search assistance and basic education), while the control group receives only the regular AFDC program. Deterministically assigning everyone to AFDC dominates assigning everyone to GAIN if impacts are assumed to be homogenous, while the reverse is true for heterogeneous program impacts. If impacts are heterogeneous and individuals are profiled into GAIN based on expected impacts, then this allocation mechanism second-order stochastically dominates either of the two deterministic rules. Lechner and Smith (2003) use Swiss administrative records to assess the performance of caseworkers in assigning unemployed individuals to one of eight possible treatment streams within the Swiss re-employment program. They adapt the popular matching estimator to suit the framework of multiple treatment streams. Their results indicate that, if caseworkers were to assign individuals to the treatment that would generate the largest 16

17 predicted impact, overall program impacts would increase by 14%. Conversely, if individuals were instead assigned to the treatment stream that suited them least, overall program performance would decrease by 15.8%. 12 Their conclusion is that, while caseworkers do not perform very well, they are not harmful either. Caseworkers are not necessarily uninformed or incompetent, but they may face different objectives - they may favor equity concerns over efficiency - or external restrictions such as imposed participation requirements. One more piece of evidence on profiling based on non-experimental predicted outcomes comes from Frlich (2001). Frlich applies profiling to the problem of choice of treatment for participants in Swedish worker rehabilitation programs. Recipients of temporary disability benefits are profiled into three different treatment streams: no rehabilitation, vocational rehabilitation and non-vocational rehabilitation. By assigning persons to treatment based on non-experimental estimates of the impact of each treatment (as function of observable characteristics) it is estimated that the re-employment rate increases from 46%, its level under current program operation, to 56%. A comparison of the profiling assignment based on estimated impacts and the actual assignments of caseworkers reveals that caseworkers assign only 42% of participants to what is considered optimal under profiling. To summarize, the literature indicates that profiling on predicted individual program impacts can exploit the heterogeneity in treatment responses across participants and result in a better allocation of participants to the program. In general, profiling on predicted program impacts serves efficiency goals, while profiling on predicted program outcomes serves equity goals, and usually there is a trade-off between the two. Using data from 12 As common in the literature, general equilibrium effects are ignored. 17

18 the National JTPA Study, we add to the literature by examining to what extent and under what circumstances profiling on impacts is more efficient than profiling on outcomes. Moreover, we investigate the relative gains from profiling at different points in the participation decision process: eligibility, application, or assignment. Answers to such queries are bound to have large policy relevance. Having good estimates for the predicted profiling variable is essential in obtaining the desired allocation mechanism. Even more important, good estimates of the heterogeneous impact profiling function are needed. Frlich (2001) aims to compute unbiased profiling function estimates by applying semiparametric techniques to the non-experimental data. Nevertheless, the literature indicates that, although the sample selection bias can be reduced by carefully applying non-experimental estimators, the bias is not completely removed when compared to unbiased experimental estimators. Our results regarding profiling into participation or non-participation will be more reliable since the impact profiling function relies on experimental estimators. 3 The NJS Data The data come from the National Job Training and Partnership Act (JTPA) Study (NJS). JTPA is a U.S. federal training program targeted to serve disadvantaged Americans. The program started in 1982 as a substitute for the Comprehensive Employment and Training Act (CETA) and continued to operate until the late 1990s when replaced by the Workforce Investment Act (WIA). JTPA provided disadvantaged youth and adults with training and re-employment services. Prior to random assignment participants are recommended different services that can be streamed in three service groups: classroom training in occupational skills (denoted CT-OS in what follows), job search assistance 18

19 and subsidized on-the-job training at private firms ( OJT ), and a mix of other services ( OTHER ), including basic education possibly in combination with some CT-OS or OJT. Classroom training programs in North America tend to have a small average duration - typically just a few months - and aim to prepare their participants for entry level positions in semi-skilled occupations. Subsidized on-the-job training provides an incentive for private firms to hire and train disadvantaged workers, though whether they actually get any more training than other newly hired workers is subject to debate. Job search assistance programs attempt to facilitate the matching process between firms and workers. This category includes services ranging from lectures on resume writing and how to give a good interview to the formal job matching services of the Employment Service. 13 The JTPA program was subject to an experimental evaluation - the National JTPA Study (NJS) commissioned by the U.S. department of Labor in 1986 to measure impacts and costs of JTPA (e.g. Bloom et al.1993). The evaluation took place at a non-random sample of 16 of the more than 600 JTPA training centers. Applicants over the period November September 1989 were first allocated a treatment stream based on caseworkers decision. Subsequently, the applicant group was split by random assignment to experimental treatment or control groups, in a proportion of two treatments to one control. Follow-up surveys examined the employment and earnings outcomes of the experimental control and treatment groups. We consider the moment of random assignment to treatment or control groups as the time zero mark. The outcomes of interest are the 13 A caveat applies to our exercise of profiling into treatment streams. The three treatment streams are a bit artificial relative to how the program actually operates. A real-world profiling system for JTPA should focus on assignment to particular treatments, not groups of treatments or treatment streams. 19

20 sum of earnings for all months up to month 18 after random assignment, and employment at month 18. The analysis is based on forecasting heterogeneous experimental program impacts as a function of personal characteristics. Since the parameters of the program impact function are computed using the randomly assigned experimental treatment and control participants, results are free of sample selection biases. The outcomes of reference are earnings and employment in the 18 months following random assignment. In profiling from random assignment and profiling into treatment streams we use information on experimental treatments and controls available at the 16 non-random sites which agreed to participate in the NJS experiment. In profiling from application we use extra data on non-experimental applicants (NEAs in what follows). These individuals initially applied for JTPA but got lost somewhere between the first and second interviews so they did not participate in the JTPA experiment. One reason (among many) that individuals may appear in the NEA sample but not the experimental sample is that they may turn out to be ineligible for JTPA; as such, the NEAs are drawn from a different population than the ENPs. their relationship to the impact functions estimated in the NEA and experimental samples. Other reasons for NEAs not following up with participation could be that they may not like their caseworker, or they may have learned enough to conclude that JTPA services are not for them, they may have gotten a job offer or gone to jail, etc. Information on NEAs is only available at 12 of the NJS sites and was collected on a voluntary basis at different points in time by caseworkers. Finally, for profiling from eligibility we use extra data on eligible non-experimental control participants (ENP in what follows). Information about ENPs is only available at 4 NJS sites. 20

21 The evaluation analysis is conducted separately for each of the four demographic groups: adult males and females (ages 22 and older), and males and female out-of-school youth (ages 16 to 21). Youth were randomly assigned at only 15 of the 16 experimental sites. Information on sample composition by demographic characteristics is available in Table 1. 4 Estimation Methodology A widely used parameter in the evaluation literature is the treatment on the treated (TT) parameter, estimating the impact of a program on its participants. Treatment on the treated is the parameter of interest in many obvious instances - for example to evaluate how well a program performs for the people it was designed to serve, or to decide whether to continue the program as is or shut it down. When experimental data is available, as is the case with NJS, under assumptions that rule out changes in the normal operation of a program due to experimentation, the simple mean difference between the outcomes of the treatment and control groups gives an unbiased estimate of the treatment on the treated parameter. These assumptions are not likely to hold in the JTPA experiment, due to treatment group dropout and control group substitution. Without accounting for dropout and substitution biases the experimental estimator is an unbiased estimator of the intent to treat rather than actual treatment received effect. Although sometimes invoked in the evaluation literature, the assumption of a common effect parameter stating that all treatment participants should experience the same program gains, does not hold in practice (e.g. Heckman, Smith and Clements 1997). 14 We 14 Note that evidence of heterogeneity in program impacts is implicit in any study that finds a change in the mean impact associated with profiling. 21

22 proceed by designing an impact function that predicts individual program gains based on individual observable characteristics. The idea on which we base the profiling experiment is straightforward: we exploit the random assignment design in the NJS to estimate for all participants, regardless of their assignment status, an unbiased individual impact function conditional on personal characteristics. 4.1 Profiling on largest predicted impacts Let Y 0 denote the program outcome in the absence of treatment, Y 1 the post-treatment outcome, and let D be an indicator for program participation. D takes the value 1 if the experimental participant was assigned to the treatment group and zero if s/he was assigned to control. Following Heckman, LaLonde and Smith (1999) we define the following switching regression: (1) Y = (1 D)Y 0 + DY 1 When we observe, when we observe. The impact of the program on its participants (the treatment-on-the treated parameter) is = E[Y 1 Y 0 X, D = 1]. The two equations describing the outcome in the two states, treatment and nontreatment, can be written in a simplified linear form as: Y 1 = Xβ 1 + u 1 for participation, Y 0 = Xβ 0 + u 0 for non-participation, Substituting the two outcome equations in (1) gives: Y = Xβ 0 + DX(β 1 β 0 ) + u 0 + D(u 1 u 0 ), 22

23 or, denoting by γ = β 1 β 0 and by ε = u 1 u 0 : (2) Y = Xβ 0 + DXγ + u 0 + Dε The treatment on the treated impact can be therefore obtained as: (X) = E[Y 1 Y 0 X, D = 1] = E[Y X, D = 1] E[Y X, D = 0] (X) = Xγ + (E[u 0 X, D = 1] E[u 0 X, D = 0]) + E[ε X, D = 1] Because of the random assignment design E[u 0 X, D = 0] = E[u 0 X, D = 1] and the second term on the right hand side disappears. This is also true for the last term: E[ε X, D = 1] = E[u 1 u 0 X, D = 1] = E[u 1 X, D = 1] E[u 0 X, D = 0] = 0, and thus (X) = Xγ. For each individual, estimating (2) generates a predicted individual treatment response function: (3) i (X i ) = X i γ Averaging all individual predicted impacts produces the desired parameter of interest: (4) (X) = i (X i ) = X i γ The allocation mechanism that profiles on predicted program impacts sorts all participants based on the values of the heterogeneous program impacts i (X i )and then assigns to treatment only those participants with positive predicted impacts (or with predicted impacts above a given threshold value). The profiled impacts are obtained by re-averaging i (X i )over profiled j individuals such that j (X j ) > 0 : (5) profiling bias = j (X j ) j (X j ) > 0 = X j γ X j γ > 0 for all j. 23

24 A caveat applies. As indicated by the bias superscript, if the sample size is too small, this methodology can be flawed. Although E[ε j X j, D j = 1] = 0, once we condition on X j γ > 0, unless the sample size is large enough, it may be the case that E[ε j X j, D j = 1 X j γ > 0] 0.Conditioning on X γ > 0truncates the conditional distribution of εto values of εwith the highest correlation between ε and Xγ ( over-fitting bias ). In this situation, averaging over profiled observations induces an upward bias in the estimated profiling impact. To get away from the over-fitting bias we implement the following procedure: we randomly split the sample into two, an estimating sample and a validating (or holdout) sample. 15 We use the observations in the estimating sample to generate predicted individual impact functions ie (X ie )for participants in the validating samples. 16 Sorting individuals based on ie (X ie )does not induce an inconsistency. We profile into treatment those participants from the uncontaminated validating sample who have positive predicted impacts as given by the predictions from the estimating sample. Profiling impacts are re-estimated on the profiled subsample, where the impact coefficient is now denoted by δ rather than γ: (6) Y = Xα + DXδ + u 0 + Dε While it is not necessary to re-estimate the impact function - a simple mean difference between the outcomes of experimental treatment and controls would also give an unbiased profiling impact - the re-estimated impact function can be used to profile eligible 15 In the forecasting literature the estimating sample can be referred as training sample and the validating sample testing or holdout sample. 16 The E subscript indicates that the impacts were predicted off the estimation sample. 24

25 individuals who have never taken part in the JTPA experiment. 17 The unbiased profiled treatment impact is given by: (7) profiling(x) = k (X k ) = X k δ where k are all individuals in the validating sample for whom a positive predicted impact was identified as a projection from the estimating sample ke (X ke ) > 0. We replicate 500 times the splitting randomizations into estimating and validating samples, and report results that are averages from 500 repetitions on different random samples. 4.2 Profiling specifications One possible concern from estimating the profiling impacts as described above is that, in order to achieve a fine enough allocation mechanism, the dimension of the X vector has to be very large. In turn, the precision of the impact function decreases with increases in the size of X. A compromise would be to incorporate the covariates X in the profiling function as a scalar linear combination of all X. We chose the propensity score, the probability that an individual participates in training conditional on X, as the scalar measure that incorporates the covariates X. Thus we implement a second allocation whereby the vector X is first converted into an index form P(X), the probability of selecting into participation conditional on observed characteristics (or, the propensity score). In this case, the equations that generate the profiling function (P (X))would look like: Y = α + βp (X) + δd + γp (X)D 17 Actually, we report both profiling impacts, one given by re-estimating the profiling function in the validating sample, the other as mean difference between profiled treatment and control outcomes in the validating sample. 25

26 i = δ + ˆγ ˆP (X i ) and (P (X)) = δ + ˆγ ˆP i where P is the propensity score P (X) = P r(r = 1 X)where R denotes selection into treatment, R=1 for individuals who decide to participate in JTPA. Because of the experimental design of the NJS sample, both the experimental treatment (D=1) and the experimental control (D=0) groups have R=1. In order to estimate the probability of participation in JTPA we need a larger sample that also included individuals who would have been eligible to participate but decided not to (Eligible non-participants, or ENPs), for whom R=0. Data on all experimentals, treatment and controls, is used for D=1. For D=0 all the ENPs for whom data is available (collected at 4 NJS sites out of 16). The propensity scores P come from a weighted logit, where population weights are used to account for the fact that in the eligible population the ENPs are a larger fraction (97%0 that in our sample. Propensity scores are computed only once, before starting the 500 iterations on random samples. (Sensitivity results show that computing propensity scores separately for the estimating and validating samples within each repetition does not change the results). The two specifications described, one where the vector of covariates X enters linearly, the other where it enters in its scalar form P(X), are described as specifications 1 and 2 in Table 2. To account for non-linearities in X and P(X) we implement a hybrid specification where the covariates X still enter in an index fashion, only instead of using the propensity score as such we use quantiles of its distribution. These are specifications 3 and 4 and some of their variants described in Table 2. The main difference between specifications 3 and 4 is 26

27 that for the former the quantiles are computed for the distribution of propensity scores in the experimental sample, while for the latter quantiles are computed for the distribution of propensity scores in the eligible population (including the ENPs). Specification 4 allows for less variation in the quantiles of experimental treatment and controls, since the propensity scores for the experimental group would tend to cluster towards the higher tail of the propensity score distribution in the population. To allow for more variation in the quantiles of experimental treatment and control groups, specification 4 uses more quantiles (20) than specification 3, which only uses 10 quintiles. We also implement a variant of specification 3, 3*, where 20 quintiles of the distribution of propensity scores in the experimental population are used in the profiling estimation. Because specifications 3, 3* and 4 combine virtues of a parsimonious specification (smaller overfitting bias and better out-of-sample forecast) with allowing for non-linearities in the propensity score, we expect those specifications to give the best profiling results. Table 2 mentions three more sets of profiling specifications: 2b-4b, 2c-4c, and 2d and 4d. These specifications are used at the stage of profiling from application or profiling from eligibility, exercises that are described further on. The main difference comes from different ways of including another segment of the eligible population, the Non-Experimental Applicants (NEA) in the propensity score analysis. Since the conditioning variables X determine the impact function (X) or (P (X)), the choice of X is of crucial relevance for the performance of the profiling method. Faced with a trade-off between precision in the allocation mechanism and specification concerns, we selected the subset of the covariates that best predicted the outcome variable in an automated backward-forward step-wise routine. For each of the four demographic groups, 27

28 these covariates are listed in Table 3. The set of X variables that the stepwise procedure chooses from are guided by what we have learned so far about participation and outcomes from the program evaluation literature. The backward selection routine removes from the pool of available variables one by one the variables that contribute least to R2 until the backward selection threshold (P.20) is attained. Next, the forward selection routine adds to the variables already included in the model those with the highest partial correlations with the dependent variable, as long as the forward selection threshold has not been attained (P.15). In general, the forward selection routine rarely adds any variable to those selected by the backward stepwise routine. Similar results are obtained whether the pool of available variables is interacted with the treatment participation dummy or not. The automated procedure is quite robust to small variations in the selection thresholds. The difference between specifications Stepwise I and Stepwise II is that for the former we consider grouped variables when they belong to the same category (for instance, all education variables are considered jointly by the stepwise routine), while for the latter we allow them to be split (the stepwise routine may pick only selected education categories). We also provide different sets of covariates X that were used in sensitivity analysis. Finally, when implementing the specifications that use the propensity score we did not have as much latitude in choosing covariates X because we were limited by the availability of data for the ENPs. The same covariates were used for all four demographic groups. They are listed in the end of Table 3. 28

29 4.3 Profiling on small Y 0 A different allocation mechanism, based on notions of equity rather than efficiency, would assign people to treatment not based on the predicted gains from the program (an efficiency goal), but based on low outcomes that would occur if the individual were not enrolled in the program (an equity goal). We implement here an exercise that evaluates what would be the impact gains for the program if individuals were selected for participation not based on predicted program impacts, but on low values of the outcome in the absence of treatment, Y 0. The methodology is similar to profiling on predicted impacts, only instead of assigning to treatment all individuals with positive predicted program impacts we assign them in ascending order of their predicted non-participation outcome Y 0. In this situation, since no endogenous threshold emerges, we have to make a decision where to impose the cut-off threshold for the profiled sample. We choose it at two thirds of the sample, but we also perform analysis for half the sample assigned into treatment, as well as one third of the sample. 4.4 Profiling at positive sites As an extension to individual profiling we combine profiling on largest predicted individual impacts with profiling of the sites participating in the program by eliminating from the analysis sites with negative mean impacts. The profiling impacts are generated using the same algorithm as in the base profiling case, with the only difference that the set of sites with negative mean impacts are first identified from separate site regressions and are eliminated from the analysis. Computing the experimental impact at positive mean 29

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to