WWC Standards for Regression Discontinuity and Single Case Study Designs

Similar documents
Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

WWC STUDY REVIEW STANDARDS

Behavioral Intervention Rating Rubric. Group Design

Behavioral Intervention Rating Rubric. Group Design

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

What Works Clearinghouse

CHAPTER TEN SINGLE-SUBJECTS (QUASI-EXPERIMENTAL) RESEARCH

chapter thirteen QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS

Chapter 10 Quasi-Experimental and Single-Case Designs

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Educational Research. S.Shafiee. Expert PDF Trial

Erica Wendel, Stephanie W. Cawthon*, Jin Jin Ge, S. Natasha Beretvas

Follow your data. Visual Analysis. Parts of a Graph. Class 7 SPCD 619

Quasi-Experimental and Single Case Experimental Designs. Experimental Designs vs. Quasi-Experimental Designs

EXPERIMENTAL RESEARCH DESIGNS

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

The Effects of the Payne School Model on Student Achievement Submitted by Dr. Joseph A. Taylor

26:010:557 / 26:620:557 Social Science Research Methods

Lecture 4: Research Approaches

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Single case experimental designs. Bethany Raiff, PhD, BCBA-D Rowan University

Experimental Design Part II

Policy and Position Statement on Single Case Research and Experimental Designs from the Division for Research Council for Exceptional Children

2013/4/28. Experimental Research

Introductory: Coding

Throwing the Baby Out With the Bathwater? The What Works Clearinghouse Criteria for Group Equivalence* A NIFDI White Paper.

Conducting Strong Quasi-experiments

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

UNIT III: Research Design. In designing a needs assessment the first thing to consider is level for the assessment

Internal Validity and Experimental Design

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation.

Checklist for Randomized Controlled Trials. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

The following are questions that students had difficulty with on the first three exams.

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

Within Study Comparison Workshop. Evanston, Aug 2012

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

PSY 250. Designs 8/11/2015. Nonexperimental vs. Quasi- Experimental Strategies. Nonequivalent Between Group Designs

Lecture 3. Previous lecture. Learning outcomes of lecture 3. Today. Trustworthiness in Fixed Design Research. Class question - Causality

Chapter 8 Experimental Design

Sample Exam Questions Psychology 3201 Exam 1

TECH 646 Analysis of Research in Industry and Technology. Experiments

Supported by OSEP #H325G and IES #R324B (PI: Odom)

Regression Discontinuity Analysis

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Instrument for the assessment of systematic reviews and meta-analysis

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

RESEARCH METHODS. Winfred, research methods,

Define the population Determine appropriate sample size Choose a sampling design Choose an appropriate research design

Single-Case Designs (SCDs)

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Propensity Score Analysis Shenyang Guo, Ph.D.

PTHP 7101 Research 1 Chapter Assignments

News You Can Use: Resources and Supports for Students with Autism and Their Families Part 1 NASDSE Professional Development Series

Improved Randomization Tests for a Class of Single-Case Intervention Designs

A REVIEW OF VIDEO MODELING TO TEACH SOCIAL SKILLS TO PRESCHOOLERS WITH ASD

Research Design & Protocol Development

CHAPTER LEARNING OUTCOMES

Experimental Research. Types of Group Comparison Research. Types of Group Comparison Research. Stephen E. Brock, Ph.D.

Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information:

Taichi OKUMURA * ABSTRACT

Evidence-Based Communication Interventions for Persons with Severe Disabilities

Checklist for Randomized Controlled Trials. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

Research Methods. Donald H. McBurney. State University of New York Upstate Medical University Le Moyne College

What is: regression discontinuity design?

PYSC 224 Introduction to Experimental Psychology

Topic #6. Quasi-experimental designs are research studies in which participants are selected for different conditions from pre-existing groups.

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

HPS301 Exam Notes- Contents

Regression Discontinuity Design. Day 2 of Quasi Experimental Workshop

Risk of bias assessment for. statistical methods

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Machine Learning and Computational Social Science Intersections and Collisions

11/2/2017. Often, we cannot manipulate a variable of interest

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

Using natural experiments to evaluate population health interventions

An Analysis of Evidence-Based Practices in the Education and Treatment of Learners with Autism Spectrum Disorders

Evaluating Count Outcomes in Synthesized Single- Case Designs with Multilevel Modeling: A Simulation Study

QUASI EXPERIMENTAL DESIGN

RESEARCH METHODS. A Process of Inquiry. tm HarperCollinsPublishers ANTHONY M. GRAZIANO MICHAEL L RAULIN

Recent advances in non-experimental comparison group designs

The Late Pretest Problem in Randomized Control Trials of Education Interventions

A SYSTEMATIC REVIEW OF USING WEIGHTED VESTS WITH INDIVIDUALS WITH AUTISM SPECTRUM DISORDER

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Avoiding Loss in Translation: From Evidence-Based Practice to Implementation Science for Individuals with ASD

What We Will Cover in This Section

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

An Introduction to Regression Discontinuity Design

New Mexico TEAM Professional Development Module: Autism

High-Probability Request Sequence: An Evidence-Based Practice for Individuals with Autism Spectrum Disorder

Overview of Experimentation

The validity of inferences about the correlation (covariation) between treatment and outcome.

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2

Lecture 9 Internal Validity

Regression-Discontinuity Design. Day 2 of Quasi-Experimental Workshop

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

10/23/2017. Often, we cannot manipulate a variable of interest

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Transcription:

WWC Standards for Regression Discontinuity and Single Case Study Designs March 2011 Presentation to the Society for Research on Educational Effectiveness John Deke Shannon Monahan Jill Constantine 1

What is the What Works Clearinghouse (WWC)? Initiative of the U.S. Department of Education's Institute of Education Sciences Central and trusted source of scientific evidence for what works in education Develops and implements standards for reviewing and synthesizing education research Assesses the rigor of research evidence on the effectiveness of interventions 2

Producers of Information What Works Clearinghouse Users of Information Topics Universities Research Organizations Developers Evidence standards Evidence reports Practice guides Help desk Effectiveness Information School boards Superintendents and Advisors State Education Departments Funders Intermediaries Consultants Technical Assistance Providers Principals Teachers Federal government State government National groups Public Research That Meets Standards Evidence Evidence- Based Interventions and Practices Improved Student Outcomes 3

Current WWC Study Designs Standards Randomized Controlled Trials (RCTs) Well executed RCTs can obtain the highest level of causal validity and meet evidence standards Quasi-experimental Designs (QEDs) Some RCTs and matched comparison designs meet evidence standards with reservations Regression Discontinuity Designs (RDDs) Version 1.0 Single-Case Study Designs Version 1.0 4

Goals and Procedures for Developing a Research Design Standard Establish design standards for causal inference studies Standard must be operational Replicable and reliable Transparent Operationalized through a protocol and a study review guide Development process Convene an expert panel Pilot test and revise 5

What is RDD? Treatment and control groups formed by design (like an RCT), not by unobserved self-selection Groups are formed purposefully, not randomly Groups formed using a cutoff on a continuous, forcing variable 6

Rationale for Adding RDD Standards RDD studies are becoming more common as large, longitudinal administrative data sets become more common in education. IES is conducting evaluation studies with RDDs. The literature on RDD methodological best practices has evolved to a point that supports the development of standards. 7

Process for Developing Standards Panel convenes Standards drafted Study Review Guide drafted Standards pilot tested Standards approved by IES 8

Regression Discontinuity Design (RDD) Expert Panel Members Peter Schochet, Co-Chair, Mathematica John Deke, Co-Chair, Mathematica Tom Cook, Northwestern Guido Imbens, Harvard J. R. Lockwood, RAND Jack Porter, University of Wisconsin Jeff Smith, University of Michigan 9

Key Considerations Should the WWC allow RDD studies to join RCTs as studies that meet standards without reservations? How can we distinguish between RDD studies that meet standards with versus without reservations? 10

Overview of RDD Standards Screener to identify studies as being RDD Three possible designations: 1. Meets standards without reservations 2. Meets standards with reservations 3. Does not meet standards Four individual standards: 1. Integrity of the forcing (assignment) variable 2. Attrition 3. Continuity of the outcome-forcing variable relationship 4. Functional form and bandwidth Reporting requirement for standard errors 11

RDD Screener Treatment assignment is based on a forcing variable units on one side of a cutoff value are in the treatment group, units on the other side are in the comparison group. The forcing variable must be ordinal with at least 4 values above and 4 values below the cutoff value. There must be no factor confounded with the cutoff value of the forcing variable. 12

1. Integrity of the Forcing Variable Primary concern is manipulation. Two criteria: institutional integrity and statistical integrity Both criteria must be met to pass this standard without reservations; one criterion must be met to pass with reservations. 13

2. Attrition Primary concern is nonresponse bias. Standard is almost the same as the WWC RCT standard on attrition. One difference: overall attrition and differential attrition can either be reported at the cutoff value of the forcing variable or for the full treatment and comparison samples. 14

3. Continuity of Outcome-Forcing Variable Relationship Primary concern is phantom impacts from lack of smoothness in the relationship between the outcome and the forcing variable. Two criteria: baseline equivalence on key covariates and no evidence of unexplained discontinuities away from the cutoff value Both criteria must be met to pass this standard without reservations. 15

4. Functional Form and Bandwidth Primary concern is misspecification bias. Five criteria: A. An adjustment must be made for the forcing variable. B. A graphical analysis must be included and should be consistent with bandwidth/functional form choice. C. Statistical evidence must be provided of an appropriate bandwidth or functional form. D. Model should interact forcing variable with treatment status or provide evidence that an interaction is not needed. E. All of the above must be done for every combination of outcome, forcing variable, and cutoff value. All criteria must be met to pass standards without reservations; to meet with reservations, (A and D) plus (B or C). 16

Example of Graphical Analysis 17

Applying RDD Evidence Standards To meet standards without reservations, must meet each individual standard without reservations To meet standards with reservations, must meet individual standards 1, 3, and 4 with or without reservations 18

Reporting Requirement for Standard Errors Standard errors must reflect clustering at the unit of assignment. Lee and Card (2008) clustering issue applies when forcing variable is not truly continuous ( random misspecification error ) This does not affect whether a study meets standards, but it does affect how its findings can be used. 19

Overview Defining Features of SCDs Common Designs Internal Validity and Replication WWC SCD Design Standards WWC SCD Evidence Standards (Visual Analysis) 20

Single Case Designs (SCDs) Involve repeated, systematic measurement of a dependent variable Measurement occurs before, during, and/or after the active manipulation of an independent variable (e.g., applying an intervention). Most often used in applied and clinical fields A systematic review that includes SCDs will speak to the effectiveness of an intervention in a variety of specific settings, but cannot speak to effectiveness in all settings. 21

Contributions of the SCD Standards Expands the body of rigorous research that can be reviewed by the WWC Offers guidelines to the research community at large of best practice in SCD 22

Rationale for Adding SCD Standards WWC Topic Area SCD Studies Planned for Review by June 2011 Adolescent Literacy 20 Autism 190 Behavior Disorders 61 Intellectual Disabilities 159 Learning Disabilities 13 23

Single Case Design (SCD) Standards Panel Tom Kratochwill, Chair, University of Wisconsin John Hitchcock, Ohio University Rob Horner, University of Oregon Joel Levin, University of Arizona Sam Odom, University of North Carolina David Rindskopf, City University of New York Will Shadish, University of California, Merced 24

Example SCD Graph Source: Fraenkel and Wallen, 2006 25

Defining Features of an SCD An individual case is the unit of intervention and data analysis. A case may be a single participant or a cluster of participants (e.g., a classroom or community). Within the design, the case provides its own control for purposes of comparison. The outcome variable is measured repeatedly within and across different conditions or levels of the independent variable. The ratio of data points (measures) to the number of cases usually is large so as to distinguish SCDs from other longitudinal designs (e.g., traditional pretest-posttest and general repeated-measures designs). 26

Core Designs in the WWC SCD Standards Reversal/Withdrawal (ABAB designs) Multiple Baseline Designs Alternating Treatment Designs 27

Reversal/Withdrawal Design ABAB Baseline (A 1 Phase) Intervention (B 1 Phase) Baseline (A 2 ) Intervention (B 2 ) Dependent Variable Days/Weeks/Months/Sessions 28

Multiple Baseline Design Baseline Intervention Participant 4 Participant 3 Dependent Variable Participant 2 Participant 1 Days/Weeks/Months/Sessions 29

Types of Questions an Individual SCD Might Answer Which intervention is effective for this case? Does Intervention A reduce problem behavior for this case? Does adding B to Intervention A further reduce problem behavior for this case? Is Intervention B or Intervention C more effective in reducing problem behavior for this case? 30

The 5-3-20 Rule A minimum of five SCD studies examining the intervention with a design that Meets Evidence Standards or Meets Evidence Standards with Reservations The SCD studies must be conducted by at least three research teams at three different institutions with no overlapping authorship. The combined number of cases (e.g., individual participants, classrooms) totals at least 20. 31

SCD Standards Are Designed to Mitigate Threats to Internal Validity Ambiguous Temporal Precedence Selection History Testing Instrumentation Additive and Interactive Effects of Threats Maturation Regression to the Mean 32

Importance of Replication 90 Baseline1 A1 Intervention1 (Peer Tutoring) B1 Baseline2 A2 Intervention2 (Peer Tutoring) B2 80 60 Baseline Intervention Proportion of 10 Second Intervals Academically Engaged 80 70 60 50 40 30 20 Adam 40 20 0 80 60 1 Anna Dillion 10 40 2 0 First Demonstration of Effect Days 20 0 Second Demonstration of Effect Third Demonstration of Effect Methods of replication: Proportion of Intervals with Problem Behavior 80 60 1. Introduction and withdrawal (i.e., reversal) of the independent variable (e.g., ABAB design) 2. Staggered introduction of the independent variable across different points in time (e.g., multiple baseline design) 3. Iterative manipulation of the independent variable across different observational phases (e.g., alternating treatments design) 40 20 0 80 60 40 20 0 3 Days 4 Source: Horner and Spaulding, 2008 Amy Fred 33

WWC SCD Rating Process Evaluate the Design Meets Evidence Standards Meets Evidence Standards with Reservations Does Not Meet Evidence Standards Conduct Visual Analysis for Outcome Variable Strong Evidence Moderate Evidence No Evidence Effect-Size Estimation 34

Systematic Manipulation of the Independent Variable The independent variable (i.e., the intervention) must be systematically manipulated, with the researcher determining when and how the independent variable conditions change. If this standard is not met, the study Does Not Meet Evidence Standards. 35

Inter-Assessor Agreement Each outcome variable must be measured systematically over time by more than one assessor. The study needs to collect inter-assessor agreement: in each phase; on at least 20% of the data points in each condition (e.g., baseline, intervention); and and the inter-assessor agreement must meet minimal thresholds (e.g., 80% agreement and.60 for Cohen s kappa). Regardless of the statistic, inter-assessor agreement must be assessed for each case on each outcome variable. If no outcomes meet these criteria, the study Does Not Meet Evidence Standards. 36

At Least Three Attempts to Demonstrate an Effect The study must include at least three attempts to demonstrate an intervention effect at three different points in time or with three different phase repetitions. If this standard is not met, the study Does Not Meet Evidence Standards. Designs that could meet this standard include: ABAB Design Multiple Baseline Design with at least 3 baseline conditions Alternating Treatment Design Designs not meeting this standard include: AB Design ABA Design BAB Design 37

Minimum Data Points for a Reversal Withdrawal Design For a phase to qualify as an attempt to demonstrate an effect, it must have a minimum of 3 data points. To Meet Standards, a reversal/withdrawal design must have a minimum of 4 phases per case with at least 5 data points per phase. To Meet Standards with Reservations, a reversal/ withdrawal design must have a minimum of 4 phases per case with at least 3 data points per phase. Any phases based on fewer than 3 data points cannot be used to demonstrate existence or lack of an effect. 38

Minimum Data Points for Multiple Baseline For a phase to qualify as an attempt to demonstrate an effect, it must have a minimum of 3 data points. To Meet Standards, a multiple baseline design must have a minimum of 6 phases with at least 5 data points per phase. To Meet Standards with Reservations, a multiple baseline design must have a minimum of 6 phases with at least 3 data points per phase. Any phases based on fewer than 3 data points cannot be used to demonstrate existence or lack of an effect. 39

Meets Standards Source: Lee Kern et al., 1994 40

Meets Standards with Reservations as ABABs Source: Stahr et al., 2006 41

Applying Visual Analysis Goal: Determine whether the evidence demonstrates a basic effect at three different points in time A basic effect is a change in the dependent variable when the independent variable is manipulated. Assessment of basic effects involves simultaneous evaluation of several features Level, trend, variability, overlap, immediacy, consistency across similar phases, baseline stability in non-intervened series 42

Level 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 43

Trend 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 44

Variability 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 45

Overlap 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 46

Immediacy of Effect 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 47

SCD Evidence Ratings Applied to SCDs that Meet Evidence Standards (with or without reservations) Rating Algorithm Strong: Three demonstrations of a basic effect and no noneffects Moderate: Three demonstrations of a basic effect and at least one noneffect No Evidence: Fewer than three demonstrations of a basic effect Additional requirement for Multiple Baseline designs Stability in baseline patterns when intervention is introduced in an earlier series (described later) 48

4 Steps in WWC Visual Analysis Evaluate baseline Compare data within other phases Compare data in similar phases Compare data in adjacent phases 49

Evaluate Baseline Do baseline data document A pattern of behavior that demonstrates the problem the intervention is designed to address; and A pattern that would be expected to continue if no intervention were implemented? If yes, proceed. Otherwise, outcome receives a No Evidence rating. 50

Baseline and Within Phases Intervention X Intervention X 51

Similar Phases and Adjacent Phases First Demonstration of Effect Third Demonstration of Effect Second Demonstration of Effect 52

Example No Evidence and Strong Evidence Source: Stahr et al., 2006 53

Additional Consideration for Multiple Baselines Stability in nonintervened series Looking for carry-over effects Highest rating is moderate if evidence of instability Source: Lohrmann and Talerico, 2004 54