The Late Pretest Problem in Randomized Control Trials of Education Interventions

Similar documents
Instrumental Variables Estimation: An Introduction

A re-randomisation design for clinical trials

Instrumental Variables I (cont.)

Inference with Difference-in-Differences Revisited

Statistical Science Issues in HIV Vaccine Trials: Part I

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Regression Discontinuity Design

Population-adjusted treatment comparisons Overview and recommendations from the NICE Decision Support Unit

An Introduction to Bayesian Statistics

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

1 Simple and Multiple Linear Regression Assumptions

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

REPEATED MEASURES DESIGNS

Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Small-area estimation of mental illness prevalence for schools

Received: 14 April 2016, Accepted: 28 October 2016 Published online 1 December 2016 in Wiley Online Library

The Pretest! Pretest! Pretest! Assignment (Example 2)

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

SAMPLING AND SAMPLE SIZE

EXPERIMENTAL RESEARCH DESIGNS

Alan S. Gerber Donald P. Green Yale University. January 4, 2003

Internal Model Calibration Using Overlapping Data

Econometric analysis and counterfactual studies in the context of IA practices

Placebo and Belief Effects: Optimal Design for Randomized Trials

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

Simple Linear Regression the model, estimation and testing

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Statistical Power Sampling Design and sample Size Determination

SCHOOL OF MATHEMATICS AND STATISTICS

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size

Problem set 2: understanding ordinary least squares regressions

STATISTICAL CONCLUSION VALIDITY

Methods for Addressing Selection Bias in Observational Studies

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

Key questions when starting an econometric project (Angrist & Pischke, 2009):

Thursday, April 25, 13. Intervention Studies

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Applied Quantitative Methods II

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores

APPENDIX D REFERENCE AND PREDICTIVE VALUES FOR PEAK EXPIRATORY FLOW RATE (PEFR)

MEA DISCUSSION PAPERS

Implementing double-robust estimators of causal effects

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

LIHS Mini Master Class Multilevel Modelling

CHAPTER 8 Estimating with Confidence

L4, Modeling using networks and other heterogeneities

Supplementary Information for Avoidable deaths and random variation in patients survival

MULTIPLE RANDOM START SYSTEMATIC SAMPLING: AN EMPIRICAL STUDY

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Biostatistics II

Final Exam - section 2. Thursday, December hours, 30 minutes

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Lecture 10: Learning Optimal Personalized Treatment Rules Under Risk Constraint

10-1 MMSE Estimation S. Lall, Stanford

Introduction to Observational Studies. Jane Pinelis

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Bayesian hierarchical modelling

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Thursday, April 25, 13. Intervention Studies

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Chapter 1: Exploring Data

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Abstract. September 2016

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

WWC Standards for Regression Discontinuity and Single Case Study Designs

investigate. educate. inform.

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

The Impact of Cellphone Sample Representation on Variance Estimates in a Dual-Frame Telephone Survey

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

QUANTIFYING THE EFFECT OF SETTING QUALITY CONTROL STANDARD DEVIATIONS GREATER THAN ACTUAL STANDARD DEVIATIONS ON WESTGARD RULES

Unit 1 Exploring and Understanding Data

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Exposure-response in the presence of confounding

Kelvin Chan Feb 10, 2015

Comparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman

1.4 - Linear Regression and MS Excel

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

University of Michigan School of Public Health

An Empirical Study to Evaluate the Performance of Synthetic Estimates of Substance Use in the National Survey of Drug Use and Health

MS&E 226: Small Data

Applied Econometrics for Development: Experiments II

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Diurnal Pattern of Reaction Time: Statistical analysis

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

HPS301 Exam Notes- Contents

CRITICAL APPRAISAL SKILLS PROGRAMME Making sense of evidence about clinical effectiveness. 11 questions to help you make sense of case control study

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

A Cue Imputation Bayesian Model of Information Aggregation

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

Transcription:

The Late Pretest Problem in Randomized Control Trials of Education Interventions Peter Z. Schochet ACF Methods Conference, September 2012 In Journal of Educational and Behavioral Statistics, August 2010, Vol. 35

Pretest-Posttest Designs Are Common for Education RCTs Posttests are typically the key outcome measures since the NCLB Act of 2001 Pretests improve the precision of the impact estimates Critical for clustered designs Need 130 schools without pretests Need 40-60 schools with pretests 2

But Pretests Are Often Collected Late School staff prefer to wait until routines have been established Want enough signed consent forms Scheduling issues 3

Using Late Pretests Could Bias The Impact Estimates Interventions are often implemented before school starts e.g. Teacher training Late pretests could capture early impacts Pretests are sometimes collected later for controls than treatments 4

The Late Pretest Problem From a Statistical Perspective Involves a variance-bias tradeoff Late pretests can improve precision But can yield biased posttest impact estimates Positive early effects Negative early effects Downward bias Upward bias 5

Paper Addresses Three Research Questions 1. Should late pretest data be collected? 2. What are statistical power losses with late pretests? 3. Is it preferable to obtain true baseline data from other sources? School-level test scores from previous years 6

Outline of Presentation Quantifying the variance-bias tradeoff Properties of commonly-used posttest estimators with late pretests Simulation results Conclusions 7

Quantifying the Variance-Bias Tradeoff Use the Mean Square Error for a posttest estimator ˆγ : MSE(γ) ˆ = Var(γ)+Bias(γ) ˆ ˆ 2 8

The MSE Graphically I I = True Impact ( U ) U = Unbiased Estimator and Confidence Interval ( B I ) B = Biased Estimator and Confidence Interval 9

Key Features of the MSE Smaller values are better Requires estimates of Bias(γ) ˆ ˆ Var(γ) Data collection costs are not considered and 10

Impact Estimation Framework

Two-Level Clustered Designs Schools are randomly assigned Treatments: T i = 1 Controls: T i = 0 Students selected within schools Pretests are administered in the fall and posttests in the spring ( same test) Pretest values: y 0ij Posttest values: y 1ij Data analyzed at the student level 12

Two Regression Models Posttest Impact Model: i = Schools; j = Students (1) y 1ij =α 0 +α1t i +(u 1i +e 1ij) Pretest Impact Model: ρ 01 λ 01 (2) y 0ij =β 0 +β1t i +(u 0i +e 0ij) Errors Are Correlated α 1 β 1 = Average Treatment Effect = Early Average Treatment Effect 13

Bias and Variance Properties of Four ATE Estimators for γˆ1

I. Posttest-Only Estimator 1. Regress y1ij on ˆ Posttest 1T 1C γ 2. =y -y is unbiased 3. Variance = MSE Ti 4. No variance gains or biases from pretests 15

II. Differences-in-Differences (DID) Estimator 1. Regress on 2. Yields γˆ (y1ij - y 0ij) Ti = ( y -y ) ( y -y ) (α -β ) p DID 1T 1C 0T 0C 1 1 3. Bias = -β 1 4. But get variance gains from pretests Var(γˆ 5. Find if DID Var(γˆ )< ) Posttest ρ 0.25 2 01 16

III. Analysis of Covariance (ANCOVA) Estimator y 1ij Regress on: Treatment status, T i School-level pretest mean, Within-school pretest score, y 0i w y 0ij =y0ij -y0i Statistical complications Pretests correlated with and error terms Bias and variance formulas are complex T i 17

IV. Unbiased ANCOVA (UANCOVA) Estimator 1. Regress y1ij on T and i y0ia are aggregate test scores from prior years ˆ UANCOVA γ y 0ia 2. is unbiased 3. But variance gains are likely to be less than using the pretests 18

Simulation Results

Simulation Methods Calculate MSE values for the four estimators Use key parameter values found in the RCT literature for achievement test scores ICC values Pretest-posttest correlations School sample sizes 20

Extent of Bias Will Depend Critically on the Growth Trajectory of Impacts Plausible Trajectories for Test Scores: Assume Ultimate Posttest Impact Is.25 Stds Linear Growth Quadratic Growth 0.25 0.20 Impact 0.15 0.10 0.05 0.00 0 2 4 6 8 10 Months After Start of School Year 21

Other Plausible Trajectories Quadratic Growth After Initial Negative Impacts Logarithmic Growth Impact 0.25 0.20 0.15 0.10 0.05 Impact 0.25 0.20 0.15 0.10 0.00-0.05-0.10 0 2 4 6 8 10 Months After Start of School Year 0.05 0.00 0 2 4 6 8 10 Months After Start of School Year 22

Estimators Using Late Pretests Are Better Than Those That Do Not The ANCOVA and DID estimators will typically yield smaller MSE values than the posttestonly estimator Holds if early impacts <.10 stds In graphs, holds if data are collected within: 4 months if linear impact growth 7 months if quadratic growth 2 months if logarithmic growth 23

The ANCOVA Estimator Tends to Dominate the DID Estimator The ANCOVA estimator yields: Less bias Smaller variance Smaller MSE values 24

Using Pretests Is Likely to Be Better Than Using Alternative Baselines Suppose: R 2 =.70 for the ANCOVA model R 2 =.50 for the UANCOVA model ANCOVA is better if early impacts <.065 stds In graphs, holds if data are collected within: 2 months if linear impact growth 5 months if quadratic growth 25

Key Reason We Want to Include Late Pretests Even small gains in R 2 values offset biases 26

Need Larger Samples With Late Pretests to Get the Same Power ANCOVA (R 2 =.7) Early Required School Sample Sizes Impact To Get an MDE = 0.20 0.00 40 0.02 45 0.04 51 0.06 58 0.08 67 27

Take Away Messages Late pretests are worth collecting if obtained within several months Not necessarily true if: Impacts grow very fast Alternative baselines have high R 2 values Data collection costs are considered Need larger samples to offset power losses 28

Other Applications of Methods Other RCT areas that collect pretests Early childhood Health Nutrition Propensity score matching that uses late pretests for matching 29