Methods of Randomization Lupe Bedoya Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013
Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods
Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods
Important Concepts Outcomes: What we observe, measure and want to affect Counterfactual Outcomes: The potential outcome that would had taken place if the individual had not been exposed to the program Impact: Change in outcomes caused by the intervention Use of the word cause More schooling causes higher earnings. We mean to say: A person with more schooling has higher earnings relative to the earnings of that same person if s/he had less schooling
The Evaluation Problem Basically a missing data problem: We don t observe the counterfactual outcomes for the same people If use an inappropriate counterfactual Biased estimates of the impact Impact Estimate ˆD i = Y i t -Y j c ˆD i = D i +x i, j Matching error for having individuals j and i = selection bias True impact Example: individuals with more ability tend to study more years, so the impact estimate of comparing those who study more (treatment) with those who study less (comparison) will have a significant matching error or selection bias
The Statistical Solution Randomization balances out the selection problem. It does not eliminate it Definition The purposeful manipulation of a social program or policy randomizing groups into treatment and control status E[ ˆD i ] = E[D i ] if and only if E[x i, j ] = 0 Independence Assumption = The 2 groups will (on average) have ALL the same characteristics (observable and unobservable) The only difference is the treatment If the randomization is not well implemented we are back with selection problems
In a nutshell We observe outcomes of policies and programs. We want to say that these policies or programs caused outcomes to change. We define these changes as an impact. We do not observe counterfactual outcomes. We create a counterfactual. We must convincingly solve the selection problem. We need to make sure the randomization/ intervention is implemented as planned
Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods
Example: A pencil sharpener
Example: A pencil sharpener Hidden from the researcher
Approach A (What) Two approaches: What vs. How/Why Come up with treatment that might affect pencils (windows). Run RCT. Conclusion: Windows affect pencils. Approach B (How/Why) Make economic theory about an economic mechanism (smoke gives opossums incentives to leave home). Find testable implication (opening windows affect pencils) and run RCT. Conclusion: We can t reject that smoke incentivizes opossums Implication for policy creating targeted incentives is a solution if we want to affect pencil sharpening (of course it applies to many real problems)
Need to focus on the most interesting questions: Why does financial literacy affect growth? Through which mechanism (how) does it affect growth, if at all? These questions affect the generalizability of the model! That is why DIME focuses on the How/Why Treatment variations to understand mechanisms Test hypothesis Contribute to scale up, learn for other settings
Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods
What are Potential Problems Random Designs face? Ethical issues Denying services Can only use when everyone does not have a right to the program Program/operational concerns Adequate participant flow Contamination Cannot randomize at the correct unit of analysis External validity
Where Do Ethical Concerns Arise? Voluntary programs that can enroll all applicants Mandatory programs that can enroll all eligibles Entitlement programs Control group will be made worse off
..And When They Can Be Addressed? Voluntary programs often far more interest than realized programs often simply enroll first comers Mandatory programs capacity may be limited relative to eligibility program is a demonstration Entitlement Programs difficult to justify one possibility: compensation to control group another possibility: encouragement designs
Need to Define Subgroups prior to RA Subgroups defined prior to random assignment (exogenous) e.g. demographics e.g. baseline behaviors, outcome values can generate unbiased subgroup estimates Subgroups defined by events/actions after random assignment (endogenous) problematic e.g. program dosage, stayers vs. leavers
Illustration 1: Mandatory Testing Program Variants Eligible Population e.g., all children in public HS RA Treatment 1 Treatment 2 Control Implications: Treatment variations (cannot evaluate What ) Impact can be extrapolated to all eligible population
Illustration 2: Voluntary Program Eligible Population e.g., unemployed Non-Applicants Outreach Applicants e.g., unemployed who apply to job training programs RA Treatment Control Implications: Could evaluate the What Impact can only be extrapolated to applicants within the eligible population
Implications for internal as well generalizability Internal Validity: does it get the causal effect right in the sample you are studying? (e.g., the children in public HS who volunteer to participate in the experiment) External Validity: Generalizable to the entire population and to other populations (e.g., all children in public HS)
Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods Simple Random Design Stratified Random Design Clustered Random Design
Simple Random Design Eligible Population All students RA At the student level Treatment Control Unit of randomization = unit of analysis Random sample from the universe (with or without replacement) Results can be extrapolated to the eligible population
Stratified Random Design Oversample a strata Eligible (1500 Students) Why? We think the impact of the program is different across diverse groups (i.e., strata) We may be able to measure more precisely the estimates by strata if 400 Women 1100 Men we randomize at that level. For instance, if we are interested in knowing the impact of a RA (150T;150C) 300 Treatment RA (150T;150C) program on women, who are under-represented in your eligible population (e.g., women in maledominated careers), we need to oversample women 300 Control Implication: Need to use weights to estimate the overall impact because women are oversampled
Cluster Random Design Eligible Population 900 villages Unit of Randomization RA Treatment 1 450 villages Control 450 villages 3,000 households 3,000 households Unit of analysis Outcomes are measured at this level
Cluster Random Design Unit of randomization (e.g., a village) is different from the unit of analysis (e.g., household) primary and secondary sampling units Treatment effects may be correlated within a village usually decreases precision In the worst case scenario, units within the cluster are identical and you are left with only the number of clusters
Individuals within clusters tend to have more similar elements than elements selected at random Intra-cluster correlation highly affects internal validity more sample size needed. We get more information about the impact on the whole population randomizing at the individual level than at the cluster level Easy to be unbalanced
What can go wrong? Common Problems: Can Be Serious!! Potentially underpowered situations small samples; clustered samples; high variance outcomes High non-participation or low dosage serious concern in voluntary programs Control group crossover/other contaminants Sample attrition esp. by treatment status control group can be harder to retain/locate Potential Selection Problems
Critical when conducting experiments Design experiments that help answer how/why things work (e.g., test predictions of theories), especially if your purpose is to pledge applicability beyond the specific context of your experiment. Recognize the limitations of your particular experiment (e.g., what you are estimating, what population your results are applicable to) Guarantee randomization is well implemented, and the intervention goes as planned Very frequent cause of low-quality IEs. FC are crucial for this. Make the appropriate/feasible corrections (e.g., se, selection problems)