A note on evaluating Supplemental Instruction

Similar documents
Key questions when starting an econometric project (Angrist & Pischke, 2009):

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Regression Discontinuity Suppose assignment into treated state depends on a continuously distributed observable variable

Quasi-experimental analysis Notes for "Structural modelling".

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Propensity Score Analysis Shenyang Guo, Ph.D.

A Guide to Quasi-Experimental Designs

Research Design. Miles Corak. Department of Economics The Graduate Center, City University of New York

Causal Inference in Observational Settings

Empirical Strategies

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Introduction to Applied Research in Economics

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Empirical Strategies 2012: IV and RD Go to School

Empirical Strategies

Econometric analysis and counterfactual studies in the context of IA practices

The Limits of Inference Without Theory

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Threats and Analysis. Shawn Cole. Harvard Business School

Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Instrumental Variables Estimation: An Introduction

Lecture II: Difference in Difference and Regression Discontinuity

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

What is: regression discontinuity design?

Measuring Impact. Conceptual Issues in Program and Policy Evaluation. Daniel L. Millimet. Southern Methodist University.

Estimating average treatment effects from observational data using teffects

Threats and Analysis. Bruno Crépon J-PAL

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Introduction to Program Evaluation

Reassessing Quasi-Experiments: Policy Evaluation, Induction, and SUTVA. Tom Boesche

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Propensity Score Matching with Limited Overlap. Abstract

Class 1: Introduction, Causality, Self-selection Bias, Regression

Recent advances in non-experimental comparison group designs

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

PARTIAL IDENTIFICATION OF PROBABILITY DISTRIBUTIONS. Charles F. Manski. Springer-Verlag, 2003

Identification with Models and Exogenous Data Variation

Why randomize? Rohini Pande Harvard University and J-PAL.

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Individualised rating-scale procedure: a means of reducing response style contamination in survey data?

EFFECTIVE MEDICAL WRITING Michelle Biros, MS, MD Editor-in -Chief Academic Emergency Medicine

The Effect of Sports Participation on GPAs: A Conditional Quantile Regression Analysis

Module 14: Missing Data Concepts

Diabetes, older people and exercise: recommendations for health promotion programs

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Asking and answering research questions. What s it about?

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Statistical Literacy in the Introductory Psychology Course

A Brief Introduction to Bayesian Statistics

Understanding Regression Discontinuity Designs As Observational Studies

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates

TREATMENT EFFECTIVENESS IN TECHNOLOGY APPRAISAL: May 2015

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Complier Average Causal Effect (CACE)

Today s reading concerned what method? A. Randomized control trials. B. Regression discontinuity. C. Latent variable analysis. D.

Geographic Data Science - Lecture IX

Response to the ASA s statement on p-values: context, process, and purpose

Problem solving therapy

PM12 Validity P R O F. D R. P A S Q U A L E R U G G I E R O D E P A R T M E N T O F B U S I N E S S A N D L A W

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

Academic achievement and its relation to family background and locus of control

Chapter 11. Experimental Design: One-Way Independent Samples Design

Causal Inference Course Syllabus

Assignment 4: True or Quasi-Experiment

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template

Project Evaluation from Application to Econometric Theory: A Qualitative Explanation of Difference in Difference (DiD) Approach

STI 2018 Conference Proceedings

P E R S P E C T I V E S

Mastering Metrics: An Empirical Strategies Workshop. Master Joshway

Lecture Notes Module 2

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Version No. 7 Date: July Please send comments or suggestions on this glossary to

You must answer question 1.

Regression Discontinuity Analysis

Keywords: causal channels, causal mechanisms, mediation analysis, direct and indirect effects. Indirect causal channel

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Chapter 5: Research Language. Published Examples of Research Concepts

Systematic Reviews in Education Research: When Do Effect Studies Provide Evidence?

Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference

SUPPLEMENTARY INFORMATION

Difference-in-Differences in Action

UNIT 5 - Association Causation, Effect Modification and Validity

The Logic of Causal Inference

CSC2130: Empirical Research Methods for Software Engineering

Downloaded from:

Does AIDS Treatment Stimulate Negative Behavioral Response? A Field Experiment in South Africa

CAN EFFECTIVENESS BE MEASURED OUTSIDE A CLINICAL TRIAL?

Transcription:

Journal of Peer Learning Volume 8 Article 2 2015 A note on evaluating Supplemental Instruction Alfredo R. Paloyo University of Wollongong, apaloyo@uow.edu.au Follow this and additional works at: http://ro.uow.edu.au/ajpl The author would like to thank the editor, Jacques van der Meer, and two anonymous reviewers, who made suggestions to improve this manuscript that have been incorporated in the final version. The author is a fellow of the RWI, IZA, the ARC's LCC, and a member of UOW's CHSCR. Recommended Citation Paloyo, Alfredo R., A note on evaluating Supplemental Instruction, Journal of Peer Learning, 8, 2015, 1-4. Available at:http://ro.uow.edu.au/ajpl/vol8/iss1/2 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au

A note on evaluating Supplemental Instruction Cover Page Footnote The author would like to thank the editor, Jacques van der Meer, and two anonymous reviewers, who made suggestions to improve this manuscript that have been incorporated in the final version. The author is a fellow of the RWI, IZA, the ARC's LCC, and a member of UOW's CHSCR. This article is available in Journal of Peer Learning: http://ro.uow.edu.au/ajpl/vol8/iss1/2

Journal of Peer Learning (2015) 8: 1 4 A note on evaluating Supplemental Instruction Alfredo R. Paloyo ABSTRACT Selection bias pervades the evaluation of supplemental instruction (SI) in nonexperimental settings. This brief note provides a formal framework to understand this issue. The objective is to contribute to the accumulation of credible evidence on the impact of SI. INTRODUCTION In a recent systematic review on the effectiveness of Peer-Assisted Study Sessions (PASS) or Supplementary Instruction (SI), Dawson, van der Meer, Skalicky, and Cowley (2014, p. 609) concluded that this kind of academic support is correlated with higher mean grades, lower failure and withdrawal rates, and higher retention and graduation rates. Their conclusion is based on the collected body of evidence in recent years on SI. The purpose of this brief note is to provide a formal framework to demonstrate the problem of selection bias which pervades the majority of research on SI, including some of the published research that Dawson et al. (2014) cite. 1 The intended audience is educational researchers who wish to conduct their own evaluation or to understand the weaknesses of existing evidence. The objective is to ultimately contribute to the effort to accumulate credible evidence on the impact of SI on a number of interesting outcomes, including not just final marks, but also perhaps non-traditional outcomes such as lecture attendance and student satisfaction. THE EVALUATION PROBLEM Naive impact evaluation typically involves a comparison of observed mean outcomes between those who received the treatment and those who did not. In the context of the present manuscript, for example, the average or mean final marks of SI participants and nonparticipants may be obtained, and the difference between the two is used as an estimate of the impact of SI. Unfortunately, this approach does not take into account the fact that participation in SI is typically a voluntary decision, and, as such, is influenced by individual characteristics observed and (crucially) unobserved to the program evaluator that may also contribute to the final mark. The typical example is innate but unobserved motivation or ability, which may influence both the student s likelihood to participate in SI and her final 1 An annotated bibliography on peer learning outcomes is available from http://z.umn.edu/peerbib. Journal of Peer Learning Published by the University of Wollongong ISSN 2200-2359 (online)

A note on evaluating Supplemental Instruction: 2 mark. This confounds the estimate of the program impact (i.e., the exclusive impact of SI) obtained from a simple comparison of means. We refer to this confounding effect as the self-selection bias. To formalise ideas, suppose one is interested in the impact of SI on final marks. The following discussion is primarily based on Angrist and Pischke (2009), but one may also refer to Holland (1986) for an earlier treatment from a statistics perspective. Let y i denote the outcome for individual i, and let d i denote a binary indicator variable that equals 1 if individual i received SI. Before the receipt of SI, an individual has two potential outcomes: y i (0) and y i (1), representing the potential outcomes without SI and with SI, respectively. However, after the delivery of SI, we only observe y i = y i (d i ) = y i (0)(1 d i ) + y i (1)(d i ). That is, only one of the potential outcomes can ever be realised. This is the well-known fundamental problem of causal inference (Holland, 1986) which arises since we are unable to observe the counterfactual situation for any single individual. In terms of conditional expectations, one can show that E[y i d i = 1] E[y i d i = 0] = {E[y i (1) d i = 1] E[y i (0) d i = 1]} + {E[y i (0) d i = 1] E[y i (0) d i = 0]}. (1) As Angrist and Pischke (2009) explain, the difference in outcomes between those who received SI and those who did not (receive SI) consists of two components. First, there is the impact of SI on those who actually received SI. This is the first pair of terms inside braces (note the conditioning on d i = 1), and this is usually called the average treatment effect on the treated. Second, there is the selection-bias term, which is the pair contained in the second braces. In the context of evaluating the impact of SI on student outcomes, we expect the selection-bias term to be nonzero, implying that the observed difference in, say, final marks is not equal to the effect of SI because it is contaminated by self-selection. Good final marks can be expected from motivated students, but motivation is also positively correlated with participation in SI. This implies that the following inequality holds: E[y i (0) d i = 1] > E[y i (0) d i = 0]; that is, the bias term is positive. In other words, without taking selection into account, one would overestimate the impact of SI using a basic comparison of mean outcomes between participants and non-participants. 2 One way to ensure that the selection bias is actually zero is to randomise the provision of SI to the students. In that case, the potential final marks would be independent of treatment status. By design, the researcher can eliminate 2 That motivation should be controlled for is highlighted in a number of previous studies (e.g., Gattis, 2002).

3 Paloyo the selection bias in Equation (1) by random assignment. This means that participation in SI is no longer correlated with individual observed and unobserved characteristics. This explains why randomised controlled trials still constitute the gold standard for impact evaluation. 3 However, controlled trials are difficult to implement, especially outside the clinical or laboratory setting. Unlike bacteria in a petri dish, there are major ethical and practical considerations in social experiments. Consider, for instance, the fact that there is some evidence that SI can improve student outcomes. It would be difficult to ethically justify depriving a random group of students access to SI simply because we want to evaluate its impact. Nonetheless, there are a number of quasi-experimental approaches that still provide credible impact estimates under certain conditions. Examples of quasi-experimental approaches are instrumental-variables estimation, difference-in-differences, and regression-discontinuity designs. In the context of evaluating SI, these methods are particularly useful, especially since a randomised experiment may not be possible because of ethical or practical reasons. 4 CONCLUSION The evaluation of PASS or SI based on experimentally-generated data is rare. The majority of the literature on the topic relies on evidence obtained from non-experimental approaches that fail to account for the presence of selfselection bias. This note discusses how this bias causes problems in impact evaluation. The hope is that education researchers, especially those who are interested in estimating the impact of SI, can use this note to justify the use of experimental or quasi-experimental methods and to enable them to be critical of weak evidence. Ultimately, this will enable education researchers to contribute to a larger body of credible evidence on the impact of SI on a number of interesting outcomes. REFERENCES Angrist, J. D., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist s companion. Princeton, NJ: Princeton University Press. Dawson, P., van der Meer, J., Skalicky, J., & Cowley, K. (2014). On the effectiveness of supplemental instruction: A systematic review of Supplemental Instruction and Peer-Assisted Study Sessions literature between 2001 and 2010. Review of Educational Research, 84(4), 609 639. Gattis, K. W. (2002). Responding to self-selection bias in assessments of academic support programs: A motivational control study of Supplemental Instruction. Learning Assistance Review 7(2), 26 36. 3 In this setting, one does not formally need additional control variables. However, their inclusion could greatly reduce the sampling error, resulting in more precise estimates (i.e., lower standard errors). Naturally, increasing the sample size will also increase the precision of the estimates. 4 The most important non-experimental evaluation techniques are discussed in, for example, Gertler et al. (2011), Imbens and Wooldridge (2009), TenHave et al. (2003), and West et al. (2008). A detailed discussion of these methods is beyond the scope of this brief note. We refer the interested reader to the references provided.

A note on evaluating Supplemental Instruction: 4 Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C. M. J. (2011). Impact evaluation in practice. Washington, DC: The International Bank for Reconstruction and Development/The World Bank. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945 960. Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47(1), 5 86. TenHave, T. R., Coyne, J., Salzer, M., & Katz, I. (2003). Research to improve the quality of care for depression: Alternatives to the simple randomized clinical trial. General Hospital Psychiatry, 25(2), 115 123. West, S. G., Duan, N., Pequegnat, W., Gaist, P., Des Jarlais, D. C., Holtgrave, D., Dolan Mullen, P. (2008). Alternatives to the randomized controlled trial. American Journal of Public Health, 98(8), 1359 1366.