Propensity Score. Overview:

Size: px

Start display at page:

Download "Propensity Score. Overview:"

Candace Quinn
5 years ago
Views:

1 Propensity Score Overview: What do we use a propensity score for? How do we construct the propensity score? How do we implement propensity score es<ma<on in STATA?

2 Joke (kind of ) Two heart surgeons (Jack and Jill) walk into a bar. Jack: I just finished my 100 th heart surgery! Jill: I finished my 100th heart surgery last week. Which probably means I m a beper heart surgeon. How many of your pa<ents died within 3 months of surgery? I ve only had 10 die. Jack: Five. So I m probably the beper surgeon. Jill: Or maybe mine are older and have a higher risk than your pa<ents. There may be differences in the pa<ents characteris<cs between Jack and Jill We want to show the difference due to treatment (Jill) We want to compare apples to apples not apples to oranges

3 Purpose of propensity scores It can produce apples- to- apples comparisons when treatment is non- random (non- ignorable treatment assignment) Provides a way to summarize covariate informa<on about treatment selec<on into a single number (scalar) Can be used to adjust for differences via study design, or matching, or during es<ma<on of the treatment effect (e.g., subclassifica.on or regression)

4 Propensity score es<ma<on Some caveats This is only relevant for selec<on on observables If you cannot write down a condi<oning strategy such that condi<oning on X will sa<sfy the backdoor criterion, then this is not the research design you choose You need to iden<fy the confounders, X, that will block all back doors based on economic theory and you will need data on them

5 BePer example: a case in which the propensity score is useful for causal inference Suppose that we are interested in whether a scholarship program caused children in to spend more years in high school (9-12). Suppose every 8 th grade graduate is eligible for this program You have data on every child, including test scores, family income, age, gender, etc. Scholarships are awarded based on some combina<on of test scores, family income, gender, etc., but you don t know the exact formula.

6 Mo<va<on (cont.) Ignorable treatment assignment: Scholarships are assigned to students randomly, independent of how a student is expected to perform in high school Calculate ATE by es<ma<ng simple difference in mean outcomes: 1 N (Y D =1) But what if ignorability is violated? For instance, assume you know that children with higher test scores are more likely to get the scholarship (posi<ve selec<on), but you don t know how important this and other factors are, you just know that the decision is based on informa<on you have (X) and some randomness. What can you do with this informa<on? 1 N (Y D = 0)

7 Mo<va<on (cont.) In principle, you could es<mate it using OLS controlling for X: Where X is a matrix of covariates that you think affect the probability of receiving a scholarship. OLS consistently es<mates the condi<onal mean, but if probability of gecng a scholarship is not a linear func<on of X, this condi<onal mean es<mate may not be informa<ve. Usually, we won t know how the selec<on depended on X, only that it did. For instance, they may use discrete cutoffs rather than a linear func<on

8 Mo<va<on (cont.) Suppose your variables are not con<nuous, but they are categories (somewhat arbitrarily). E.g. family income above or below $50 per week, scores above or below the mean, sex, age, etc. Now, you could put in dummy variables for each category and interac<on between all dummies. This would dis<nguish every group formed by the categories. Or you could run separate regressions for each group This is more flexible since it allows the effect of the scholarship to differ by group. These methods are in principle correct, but they are only feasible if you have a lot of data and few categories.

9 Construc<ng the Propensity Score Es<ma<on of average treatment effects based on propensity score es<ma<on can handle sparseness and ignorance about the func<onal form associated with treatment assignment. You will first need to have a selec<on into the treatment (in our case the scholarship) that is based on observables, or selec<on on observables. The following gives a brief overview of how the propensity score is constructed. In prac<ce, you can download a canned Stata command that will do all of this for you.

10 Defini<on and General Idea Defini:on: The propensity score is the condi<onal probability of being assigned to the treatment group (e.g., 9-12 grade scholarship), condi<onal on the par<cular covariates (X). Pr(D=1 X) is some marginal probability (e.g., 55%) The idea is to compare units who, based solely on their observables, had very similar probabili<es of being placed into treatment If condi<onal on X, two units have a similar probability of treatment, then we say they have similar propensity scores We then think that all the difference in the outcome variable is due to the treatment. If we compare a unit in the treatment group to a control group unit with two similar propensity scores, then condi<onal on the propensity score, all remaining varia<on between these two is randomness if selec<on on observables

11 First stage Es<ma<on using this method is a two- stage procedure First stage: es<mates the propensity score Second stage: calculate the average causal effect of interest by averaging differences in outcomes over units with similar propensity scores First stage: es:mate the propensity score: First, es<mate the following equa<on with binary treatment (D) on the LHS, and covariates (X) that determine selec<on into treatment on RHS using logit or probit model: Prob(D =1 X) = X +! Second, using es<mated coefficients, calculate the predicted LHS ˆ i = ˆX i The propensity score is just the predicted condi<onal probability of treatment (using es<mated coefficients on X) for each unit

12 Algorithm 1) Sort your data by the propensity score and divide it into blocks (groups) of observa<ons with similar propensity sores. 2) Within each block, test (using a t- test), whether the means of the covariates are equal in the treatment and control group. If so à stop, you re done with the first stage 3) If a par<cular block has one or more unbalanced covariates, divide that block into finer blocks and re- evaluate 4) If a par<cular covariate is unbalanced for mul<ple blocks, modify the ini<al logit or probit equa<on by including higher order terms and/or interac<ons with that covariate and start again.

13 Second Stage In the second stage, we look at the effect of treatment on the outcome (in our example of gecng the scholarship on years of schooling), using the propensity score. Once you have determined your propensity score with the procedure above, there are several ways to use it. I ll present two of them (canned version in Stata for both): Stra<fying on the propensity score Divide the data into blocks based on the propensity score (blocks are determined with the algorithm). Run the second stage regression within each block. Calculate the weighted mean of the within- block es<mates to get the average treatment effect. Matching on the propensity score Match each treatment observa<on with one or more control observa<ons, based on similar propensity scores. You then include a dummy for each matched group, which controls for everything that is common within that group.

14 Balancing within blocks 1. Sort the data by the propensity score 2. Divide the data into groups called blocks that have similar propensity scores (e.g., to 0.10, 0.10 to 0.20, etc.) 3. For each block, test whether the means of the covariates are equal for treatment and control using a t- test a. If they are, you are done with the first stage 4. If a par<cular block has one or more unbalanced covariates (X), divide that block into finer blocks and re- evaluate 5. If a par<cular covariate is unbalanced for mul<ple blocks, modify the ini<al logit or probit equa<on by including higher order terms and/or interac<ons with that covariate and start again

15 Implementa<on in STATA Mul<ple methods for es<ma<ng the propensity score Download psmatch2 from ssc ssc install psmatch2, replace First stage: pscore treat X 1 X 2 X 3, pscore(scorename) Second stage: apr (for matching) or aps (for stra<fying): a4r outcome treat, pscore(scorename)

16 General Remarks The propensity score approach becomes more appropriate the more we have randomness determining who gets treatment (closer to randomized experiment). The propensity score doesn t work very well if almost everyone with a high propensity score gets treatment and almost everyone with a low score doesn t: we need to be able to compare people with similar propensi<es who did and did not get treatment. The propensity score approach doesn t correct for unobservable variables that affect whether observa<ons receive treatment.

17 NSW example Comparison of propensity score matching with experimental results

18 NSW program During the mid- 1970s, Manpower Demonstra<on Research Corpora<on (MDRC) operated the Na<onal Supported Work Demonstra<on (NSW) NSW was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment Unlike other federally sponsored employment and training programs, though, the NSW program assigned qualified applicants to training posi<ons randomly Treatment group: received all the benefits of the NSW program Control group: let to fend for themselves NSW admiped into the program AFDC women, ex- drug addicts, ex- criminal offenders, and high school dropouts of both sexes

19 NSW Program Treatment group members were: guaranteed a job for 9-18 months depending on the target group and site divided into crews of 3-5 par<cipants who worked together and met frequently with an NSW counselor to discuss grievances and performance paid for their work Wage schedule offered the trainees lower wage rates than they would ve received on a regular job, but allowed their earnings to increase for sa<sfactory performance and apendance Ater their term expired, they were forced to find regular employment The type of work varied within sites gas sta<on apendant, working at a printer shop and males and females were frequently performing different kinds of work This was why the program costs varied across sites and target groups The program cost $9,100 per AFDC par<cipant and approximately $6,800 for other target groups trainees in 1982 dollars (US)

20 NSW Program MDRC collected earnings and demographic informa<on from both treatment and control at baseline and every 9 months thereater Conducted up to 4 post- baseline interviews

21 LaLonde (1986) study LaLonde, Robert J. (1986). Evalua<ng the Econometric Evalua<ons of Training Programs with Experimental Data. American Economic Review. 76(4): LaLonde s ideas: Outcome variable: Annual earnings in 1978 Get unbiased es<mate of the job training program s effects using randomized control group Compare that with what you get by selec<ng a control group from the en<re popula<on that looks like the treatment group using various causal inference methods

22 Need for a control group The fundamental problem of causal inference is causality is defined as the difference between two poten<al outcomes states, but for each individual, we only observe one of these. We are missing data on each trainees counterfactual what they would ve earned had they not been in the NSW experiment

23 Choice of a control group Best op<on: Randomize so that independence is sa<sfied Control group and treatment group are different only by random chance Eliminates bias due to baseline differences between the two groups and the heterogeneous treatment effects bias Oten<mes these kinds of randomized controls aren t available so labor economists would instead sample from various datasets to create (non- experimental) control groups So LaLonde sampled a non- experimental control group from two surveys: the Current Popula<on Survey (CPS) and the Panel Study of Income Dynamics (PSID) Sampled the en<re working popula<on Sampled those not working in 1976 Sampled those not working in 1975 or 1976

24 Similarity of treatment and control groups Treatment and control groups need to be similar. But in what way should they be similar? Most importantly, they need to be similar with regards to income pre- treatment since income is what we ll be examining post- treatment. So what did LaLonde find? First column is treatment group earnings in 1978 Second column is randomized control group Everything else are the non- random control groups

TABLE 2-ANNUAL EARNINGS OF NSW TREATMENTS, CONTROLS, AND EIGHT CANDIDATE COMPARISON GROUPS FROM THE PSID AND THE CPS-SSA Comparison Groupa,b Treat- CPS- CPS- CPS- CPS- Year ments Controls PSID-1

25 TABLE 2-ANNUAL EARNINGS OF NSW TREATMENTS, CONTROLS, AND EIGHT CANDIDATE COMPARISON GROUPS FROM THE PSID AND THE CPS-SSA Comparison Groupa,b Treat- CPS- CPS- CPS- CPS- Year ments Controls PSID-1 PSID-2 PSID-3 PSID-4 SSA-1 SSA-2 SSA-3 SSA $895 $877 7,303 2, ,654 7,788 3,748 4,575 2, (81) $1,794 (90) $646 (317) 7,442 (286) 2,697 (189) 665 (428) 6,770 (63) 8,547 (250) 4,774 (135) 3,800 (333) 2,036 (99) (63) (327) (317) (157) (463) (65) (302) (128) (337) 1977 $6,143 $1,518 7,983 3, ,213 8,562 4,851 5,277 2,844 (140) (112) (335) (376) (229) (484) (68) (317) (153) (450) 1978 $4,526 $2,885 8,146 3,636 1,631 7,564 8,518 5,343 5,665 3,700 (270) (244) (339) (421) (381) (480) (72) (365) (166) (593) 1979 $4,670 $3,819 8,016 3,569 1,602 7,482 8,023 5,343 5,782 3,733 (226) (208) (334) (381) (334) (462) (73) (371) (170) (543) Number of Observations , , a The Comparison Groups are defined as follows: PSID-1: All female household heads continuously from 1975 through 1979, who were between 20 and 55-years-old and did not classify themselves as retired in 1975; PSID-2: Selects from the PSID-1 group all women who received AFDC in 1975; PSID-3: Selects from the PSID-2 all women who were not working when surveyed in 1976; PSID-4: Selects from the PSID-1 group all women with children, none of whom are less than 5-years-old; CPS-SSA -1: All females from Westat CPS-SSA sample; CPS-SSA -2: Selects from CPS-SSA-1 all females who received AFDC in 1975; CPS-SSA-3: Selects from CPS-SSA-1 all females who were not working in the spring of 1976; CPS-SSA -4: Selects from CPS-SSA-2 all females who were not working in the spring of ball earnings are expressed in 1982 dollars. The numbers in parentheses are the standard errors. For the NSW treatments and controls, the number of observations refer only to 1975 and In the other years there are fewer observations, especially in At the time of the resurvey in 1979, treatments had been out of Supported Work for an average of 20 months.

TABLE 3-ANNUAL EARNINGS OF NSW MALE TREATMENTS, CONTROLS, AND SIX CANDIDATE COMPARISON GROUPS FROM THE PSID AND CPS-SSA Comparison Groupa,b Year Treatments Controls PSID-1 PSID-2 PSID-3 CPS-SSA-1

26 TABLE 3-ANNUAL EARNINGS OF NSW MALE TREATMENTS, CONTROLS, AND SIX CANDIDATE COMPARISON GROUPS FROM THE PSID AND CPS-SSA Comparison Groupa,b Year Treatments Controls PSID-1 PSID-2 PSID-3 CPS-SSA-1 CPS-SSA-2 CPS-SSA $3,066 $3,027 19,056a 7,569 2,611 13,650 7,387 2,729 (283) (252) (272) (568) (492) (73) (206) (197) 1976 $4,035 $2,121 20,267 6,152 3,191 14,579 6,390 3,863 (215) (163) (296) (601) (609) (75) (187) (267) 1977 $6,335 $3,403 20,898 7,985 3,981 15,046 9,305 6,399 (376) (228) (296) (621) (594) (76) (225) (398) 1978 $5,976 $5,090 21,542 9,996 5,279 14,846 10,071 7,277 (402) (227) (311) (703) (686) (76) (241) (431) Number of Observations , ,992 1, athe Comparison Groups are defined as follows: PSID-1: All male household heads continuously from 1975 through 1978, who were less than 55-years-old and did not classify themselves as retired in 1975; PSID-2: Selects from the PSID-1 group all men who were not working when surveyed in the spring of 1976; PSID-3: Selects from the PSID-1 group all men who were not working when surveyed in either spring of 1975 or 1976; CPS-SSA-1: All males based on Westat's criteria, except those over 55-years-old; CPS-SSA-2: Selects from CPS-SSA-1 all males who were not working when surveyed in March 1976; CPS-SSA-3: Selects from the CPS-SSA-1 unemployed males in 1976 whose income in 1975 was below the poverty level. ball earnings are expressed in 1982 dollars. The numbers in parentheses are the standard errors. The number of observations refer only to 1975 and In the other years there are fewer observations. The sample of treatments is smaller than the sample of controls because treatments still in Supported Work as of January 1978 are excluded from the sample, and in the young high school target group there were by design more controls than treatments.

TABLE 4-EARNINGS COMPARISONS AND ESTIMATED TRAINING EFFECTS FOR THE NSW AFDC PARTICIPANTS USING COMPARISON GROUPS FROM THE PSID AND THE CPS-SSAa b Difference in NSW Treatment Earnings Differences:

27 TABLE 4-EARNINGS COMPARISONS AND ESTIMATED TRAINING EFFECTS FOR THE NSW AFDC PARTICIPANTS USING COMPARISON GROUPS FROM THE PSID AND THE CPS-SSAa b Difference in NSW Treatment Earnings Differences: Unrestricted Less Comparison Group Difference in Difference in Controlling for Earnings Earnings Differences: All Observed Earnings Growth Quasi Difference Variables and Comparison Pre-Training Post-Training Treatments Less in Earnings Pre-Training Earnings Year, 1975 Year, 1979 Comparisons Growth Earnings Name of Growth Unad- Ad- Unad- Ad- Without With Unad- Ad- Without With Comparison justed justedc justed justedc Age Age justed justedc AFDC AFDC Groupd (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) 2, Controls (220) (122) (122) (307) (306) (323) (323) (308) (306) (312) PSID ,443-4,882-3,357-2,143 3,097 2, , ,097 (210) (326) (336) (403) (425) (317) (333) (357) (380) (409) (491) PSID-2 1,242-1,467-1,515 1, ,568 2,392 1,764 1,535 1,826 - (314) (216) (224) (468) (484) (473) (481) (472) (487) (537) PSID ,057 2,915 3,145 3,020 3,070 2,930 2,919 - PSID-4 (351) (202) (208) (532) (543) (557) (563) (531) (543) (592) 928-5,694-4,976-2,822-2,268 2,883 2,655 1, ,406 2,146 (311) (306) (323) (460) (491) (417) (434) (483) (503) (542) (652) CPS-SSA ,928-5,813-3,363-2,650 3,578 3,501 1,214 1, ,041 (64) (272) (309) (320) (365) (280) (282) (272) (309) (349) (503) CPS-SSA-2 1,595-2,888-2, ,215 2, (360) (204) (256) (428) (536) (438) (446) (468) (554) (651) CPS-SSA-3 1,207-3,715-3,150-1, ,603 2, ,246 CPS-SSA-4 (166) (226) (325) (311) (452) (307) (328) (305) (429) (481) (720) 1,684-1, ,126 1,833 1, (524) (249) (283) (630) (716) (654) (663) (637) (717) (814) a The columns above present the estimated training effect for each econometric model and comparison group. The dependent variable is earnings in Based on the experimental data, an unbiased estimate of the impact of training presented in col. 4 is $851. The first three columns present the difference between each comparison group's 1975 and 1979 earnings and the difference between the pre-training earnings of each comparison group and the NSW treatments. bestimates are in 1982 dollars. The numbers in parentheses are the standard errors. cthe exogenous variables used in the regression adjusted equations are age, age squared, years of schooling, high school dropout status, and race. dsee Table 2 for definitions of the comparison groups.

TABLE 5-EARNINGS COMPARISONS AND ESTIMATED TRAINING EFFECTS FOR THE NSW MALE PARTICIPANTS USING COMPARISON GROUPS FROM THE PSID AND THE CPS-SSAa,b Difference in NSW Treatment NsW Treatment Earni

28 TABLE 5-EARNINGS COMPARISONS AND ESTIMATED TRAINING EFFECTS FOR THE NSW MALE PARTICIPANTS USING COMPARISON GROUPS FROM THE PSID AND THE CPS-SSAa,b Difference in NSW Treatment NsW Treatment Earni Earnings Differences: Difference in Unrestricted Difference in LessEComparnigso GroupEarnings Earnings Differences: Growth Quasi Difference Controlling for Comparison Pre-Training Post-Training Treatments Less in Earnings All Observed Year, 1975 Earnings Year, 1978 Comparisons Growth All and Name of Growth Unad- Ad- Unad- Ad- Without With Unad- Ad- Pre-Training Comparison justed justedc justed justed' Age Age justed justed' Earnings Groupd (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Controls $2,063 $39 $-21 $886 $798 $847 $856 $897 $802 $662 (325) (383) (378) (476) (472) (560) (558) (467) (467) (506) PSID-1 $2,043 -$15,997 -$7,624 -$15,578 -$8,067 $425 -$749 -$2,380 -$2,119 -$1,228 (237) (795) (851) (913) (990) (650) (692) (680) (746) (896) PSID-2 $6,071 - $4,503 - $3,669 - $4,020 - $3,482 $484 - $650 - $1,364 - $1,694 - $792 (637) (608) (757) (781) (935) (738) (850) (729) (878) (1024) PSID-3 ($3,322 ($455 $455 $697 -$509 $242 -$1,325 $629 - $552 $397 (780) (539) (704) (760) (967) (884) (1078) (757) (967) (1103) CPS-SSA-1 $1,196 -$10,585 -$4,654 -$8,870 -$4,416 $1,714 $195 -$1,543 -$1,102 -$805 (61) (539) (509) (562) (557) (452) (441) (426) (450) (484) CPS-SSA-2 $2,684 -$4,321 -$1,824 -$4,095 -$1,675 $226 -$488 -$1,850 -$782 -$319 (229) (450) (535) (537) (672) (539) (530) (497) (621) (761) CPS-SSA-3 $4,548 $337 $878 -$1,300 $224 -$1,637 -$1,388 -$1,396 $17 $1,466 (409) (343) (447) (590) (766) (631) (655) (582) (761) (984) athe columns above present the estimated training effect for each econometric model and comparison group. The dependent variable is earnings in Based on the experimental data an unbiased estimatc of the impact of training presented in col. 4 is $886. The first three columns present the difference between each comparison group's 1975 and 1978 earnings and the difference between the pre-training earnings of each comparison group and the NSW treatments. bestimates are in 1982 dollars. The numbers in parentheses are the standard errors. 'The exogenous variables used in the regression adjusted equations are age, age squared, years of schooling, high school dropout status, and race. dsee Table 3 for definitions of the comnarison arouns.

29 Lessons What were the take- aways? Fairly pessimis<c findings observa<onal data and causal inference methods available at that <me performed poorly when trying to reproduce the known ATE from the randomiza<on What did he do? Linear regression, fixed effects, latent variable selec<on modeling His es<mated treatment effect for women tended to overes<mate the impact of the program posi<ve self- selec<on But it tended to underes<mate the impact of the program for men nega<ve self- selec<on Why should you care? Even though the control group might seem like a good guess for the treatment group, your answers may s<ll be significantly biased

30 Dehija and Wahba (1999; 2002) Dehejia, Rajeev H. and Sadek Wahba (1999). Causal Effects in Nonexperimental Studies: Reevalua<ng the Evalua<on of Training Programs. Journal of the American Sta.s.cal Associa.on, vol. 94(448): Dehejia, Rajeev H. and Sadek Wahba (2002). Propensity Score- Matching Methods for Nonexperimental Causal Studies. The Review of Economics and Sta.s.cs. February, 84(1): These two studies introduce propensity score matching methods to economists and perform a kind of replica<on of LaLonde s study

31 Dehejia and Wahba (1999) DW (1999) re- analyze the data using propensity score matching and stra<fica<on These were new at the <me to economists, although the method was first established in Rosenbaum and Rubin (1983) Iden<fying assump<ons: (Y 0,Y 1 ) D p(x) p(x) is propensity score 0<Pr(D X)<1 Common support Stable unit treatment value assump<on (SUTVA) The response of subject i to the treatment D doesn t depend on the treatment given to anyone else except i

32 Assump<ons e(x) = Pr(D X) which is the condi<onal probability of treatment. Also called the propensity score This is a scalar summary of all observed covariates, X Key Result is that the propensity score is a balancing score X e(x) Pr[D X, e(x)] = Pr[D e(x)] ATE at e(x) is the average difference between the observed responses in each treatment group at e(x) E[Y 1 Y 0 ) e(x) ] = E[Y e(x), D=1] E[Y e(x), D=0]

33 Interpreta<on The overall es<mated ATE from this method is the individual treatment effect averaged over the distribu.on of e(x)

34 Analy<cal use of propensity score Matching subsets consis<ng of both treatment and control subjects with the same propensity score are matched Stra<fica<on Data is divided into several strata (or blocks ) based on the propensity score, then regular analysis is carried out within each strata

35 Implementa<on Include as many observed pretreatment variables ( covariates ) as possible The sta<s<cal significance of individual terms isn t important Func<onal form of covariates Consider higher order polynomials as well as interac<on terms. Why? BALANCE BETWEEN TREATMENT AND CONTROL Selec<on of the model Probit or logit

36 Matching algorithm Nearest neighbor algorithm Itera<vely find the pair of subjects with the shortest distance Easy to understand and implement; offers good results in prac<ce; fast running <me; rarely offers the best matching results compared to some op<mal matching procedure

37 Implementa<on Choices of distance Exact match not possible because propensity score is a con<nuous variable and the probability of having the same value of a con<nuous score is zero Use one distance measure to summarize the informa<on Mahalanobis distance Propensity score Mahalanobis distance with propensity score caliper Any distance with the requirement of exact match on a specific variable

38 Sotware R func<ons by Ben Hansen hpp:// STATA func<ons STATA 13 has new treatment effects methods built into it which includes nearest neighbor matching as well as propensity score matching methods Pre- STATA 13: psmatch2(); pscore; nnmatch

39 Procedures for PSM Iden<fy the propensity score model (e.g., logit or probit; covariates) Es<mate the propensity score with all the data Compute the distance between any two subjects Created matched pair/group using a specific matching algorithm Check covariate balance between the treatment and control group among matched subjects; if not good enough, go back to improve the propensity score model Contrast between treated and control subjects within each pair/group Obtain the ATE by averaging over all pairs/groups

40 Why are we doing this? Remember the goal of DW: The goal is to inves<gate the credibility of the conven<onal analy<cal results from non- experimental data So the authors compared the results from the experimental data to the results from the non- experimental data by combining the treatment group with a comparable control dataset

44 Checking the balance ater matching

45 Comparison of the analy<cal results

46 Observa<ons The results ater the propensity score matching/ stra<fica<on was much closer to the truth (if we assume the randomized experiment is the correct benchmark) The variances seem to be larger due to the loss of the data The results aren t very sensi<ve to the func<onal form of the chosen covariates in the propensity score model; however they are sensi<ve to the selec<on of covariates included in the propensity score model

47 Comments Limita<on of propensity score method Relies on an unverified assump<on condi<onal independence, or selec<on on observables Unlike randomiza<on, propensity score matching cannot be used if there is unobserved counfounders, or selec<on on unobservables Overlap You need substan<al overlap between the treatment and the control groups, otherwise, it may result in significant loss of the data in your analysis

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor 1. INTRODUCTION This paper discusses the estimation of treatment effects in observational studies. This issue, which is of great practical importance because randomized experiments cannot always be implemented,