Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Similar documents
Version No. 7 Date: July Please send comments or suggestions on this glossary to

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Threats and Analysis. Bruno Crépon J-PAL

Threats and Analysis. Shawn Cole. Harvard Business School

Statistical Power Sampling Design and sample Size Determination

Vocabulary. Bias. Blinding. Block. Cluster sample

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

Unit 1 Exploring and Understanding Data

Experimental Methods. Policy Track

Critical Appraisal Series

Design of Experiments & Introduction to Research

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

CHAPTER LEARNING OUTCOMES

ECONOMIC EVALUATION IN DEVELOPMENT

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Independent Variables Variables (factors) that are manipulated to measure their effect Typically select specific levels of each variable to test

Chapter Three Research Methodology

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Experimental Design Part II

Sampling. (James Madison University) January 9, / 13

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Introduction to Program Evaluation

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials

2013/4/28. Experimental Research

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

4/10/2018. Choosing a study design to answer a specific research question. Importance of study design. Types of study design. Types of study design

Business Statistics Probability

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

CHAPTER 3 METHOD AND PROCEDURE

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Conducting Strong Quasi-experiments

Impact Evaluation Toolbox

Threats to Validity in Experiments. The John Henry Effect

Lecture II: Difference in Difference and Regression Discontinuity

Lecture 4: Research Approaches

Econometric analysis and counterfactual studies in the context of IA practices

Still important ideas

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Planning Sample Size for Randomized Evaluations.

Analysis of TB prevalence surveys

Still important ideas

The Effective Public Health Practice Project Tool

Estimating Direct Effects of New HIV Prevention Methods. Focus: the MIRA Trial

MATH-134. Experimental Design

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Research Process. the Research Loop. Now might be a good time to review the decisions made when conducting any research project.

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Issues in Randomization

11 questions to help you make sense of a case control study

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

GUIDE 4: COUNSELING THE UNEMPLOYED

Chapter 5: Producing Data

Structural Approach to Bias in Meta-analyses

I. Identifying the question Define Research Hypothesis and Questions

Human intuition is remarkably accurate and free from error.

Randomized Controlled Trial

MATH 2300: Statistical Methods. What is Statistics?

CRITICAL APPRAISAL SKILLS PROGRAMME Making sense of evidence about clinical effectiveness. 11 questions to help you make sense of case control study

Levels of Evaluation. Question. Example of Method Does the program. Focus groups with make sense? target population. implemented as intended?

Group Assignment #1: Concept Explication. For each concept, ask and answer the questions before your literature search.

WWC STUDY REVIEW STANDARDS

Where does "analysis" enter the experimental process?

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

EXPERIMENTAL RESEARCH DESIGNS

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Fodor on Functionalism EMILY HULL

Objectives. Data Collection 8/25/2017. Section 1-3. Identify the five basic sample techniques

How to Randomise? (Randomisation Design)

QUASI-EXPERIMENTAL APPROACHES

We re going to talk about a class of designs which generally are known as quasiexperiments. They re very important in evaluating educational programs

9 research designs likely for PSYC 2100

Samples, Sample Size And Sample Error. Research Methodology. How Big Is Big? Estimating Sample Size. Variables. Variables 2/25/2018

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)

Methodological requirements for realworld cost-effectiveness assessment

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Instrumental Variables Estimation: An Introduction

THREATS TO VALIDITY. Presentation by: Raul Sanchez de la Sierra

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Chapter 11. Experimental Design: One-Way Independent Samples Design

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

What can go wrong.and how to fix it!

Chapter 1. Understanding Social Behavior

Math 140 Introductory Statistics

Why randomize? Rohini Pande Harvard University and J-PAL.

University student sexual assault and sexual harassment survey. Notes on reading institutional-level data

Finding True Program Impacts Through Randomization

Module 14: Missing Data Concepts

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Student Lecture Guide YOLO Learning Solutions

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

SRI LANKA AUDITING STANDARD 530 AUDIT SAMPLING CONTENTS

Module 5. The Epidemiological Basis of Randomised Controlled Trials. Landon Myer School of Public Health & Family Medicine, University of Cape Town

Controlled Trials. Spyros Kitsiou, PhD

Program Evaluations and Randomization. Lecture 5 HSE, Dagmara Celik Katreniak

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

Checklist for Randomized Controlled Trials. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

Public Policy & Evidence:

UNIT 5 - Association Causation, Effect Modification and Validity

Transcription:

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank) Attribution The extent to which the observed change in outcome is the result of the intervention, having allowed for all other factors which may also affect the outcome(s) of interest. Attrition Either the drop out of subjects from the sample during the intervention, or failure to collect data from a subject in subsequent rounds of a data collection. Either form of attrition can result in biased impact estimates. Baseline Pre-intervention, ex-ante. The situation prior to an intervention, against which progress can be assessed or comparisons made. Baseline data are collected before a program or policy is implemented to assess the before state. Bias The extent to which the estimate of impact differs from the true value as a result of problems in the evaluation or sample design. Cluster A cluster is a group of subjects that are similar in one way or another. For example, in a sampling of school children, children who attend the same school would belong to a cluster, because they share the same school facilities and teachers and live in the same neighborhood. Cluster sample Sample obtained by drawing a random sample of clusters, after which either all subjects in selected clusters constitute the sample or a number of subjects within each selected cluster is randomly drawn. Comparison group A group of individuals whose characteristics are similar to those of the treatment groups (or participants) but who do not receive the intervention. Comparison groups are used to approximate the counterfactual. In a randomized evaluation, where the evaluator can ensure that no confounding factors affect the comparison group, it is called a control group. Confidence level The level of certainty that the true value of impact (or any other statistical estimate) will fall within a specified range.

Confounding factors Other variables or determinants that affect the outcome of interest. Contamination When members of the control group are affected by either the intervention (see spillover effects ) or another intervention that also affects the outcome of interest. Contamination is a common problem as there are multiple development interventions in most communities. Cost-effectiveness An analysis of the cost of achieving a one unit change in the outcome. The advantage compared to cost-benefit analysis, is that the (often controversial) valuation of the outcome is avoided. Can be used to compare the relative efficiency of programs to achieve the outcome of interest. Counterfactual The counterfactual is an estimate of what the outcome would have been for a program participant in the absence of the program. By definition, the counterfactual cannot be observed. Therefore it must be estimated using comparison groups. Dependent variable A variable believed to be predicted by or caused by one or more other variables (independent variables). The term is commonly used in regression analysis. Difference-in-differences (also known as double difference or D-in-D) The difference between the change in the outcome in the treatment group compared to the equivalent change in the control group. This method allows us to take into account any differences between the treatment and comparison groups that are constant over time. The two differences are thus before and after and between the treatment and comparison groups. Evaluation Evaluations are periodic, objective assessments of a planned, ongoing or completed project, program, or policy. Evaluations are used to answer specific questions often related to design, implementation and/or results. Ex ante evaluation design An impact evaluation design prepared before the intervention takes place. Ex ante designs are stronger than ex post evaluation designs because of the possibility of considering random assignment, and the collection of baseline data from both treatment and control groups. Also called prospective evaluation. Ex post evaluation design An impact evaluation design prepared once the intervention has started, and possibly been completed. Unless the program was randomly assigned, a quasi-experimental design has to be used.

External validity The extent to which the causal impact discovered in the impact evaluation can be generalized to another time, place, or group of people. External validity increases when the evaluation sample is representative of the universe of eligible subjects. Follow-up survey Also known as post-intervention or ex-post survey. A survey that is administered after the program has started, once the beneficiaries have benefited from the program for some time. An evaluation can include several follow-up surveys. Hawthorne effect The Hawthorne effect occurs when the mere fact that you are observing subjects makes them behave differently. Hypothesis A specific statement regarding the relationship between two variables. In an impact evaluation the hypothesis typically relates to the expected impact of the intervention on the outcome. Impact The effect of the intervention on the outcome for the beneficiary population. Impact evaluation An impact evaluation tries to make a causal link between a program or intervention and a set of outcomes. An impact evaluation tries to answer the question of whether a program is responsible for changes in the outcomes of interest. Contrast with process evaluation. Independent variable A variable believed to cause changes in the dependent variable, usually applied in regression analysis. Indicator An indicator is a variable that measures a phenomenon of interest to the evaluator. The phenomenon can be an input, an output, an outcome, or a characteristic. Inputs The financial, human, and material resources used for the development intervention. Intention to treat (ITT) estimate The average treatment effect calculated across the whole treatment group, regardless of whether they actually participated in the intervention or not. Compare to treatment on the treated estimate. Intra-cluster correlation Intra-cluster correlation is correlation (or similarity) in outcomes or characteristics between subjects that belong to the same cluster. For example, children that attend the

same school would typically be similar or correlated in terms of their area of residence or socio-economic background. Logical model Describes how a program should work, presenting the causal chain from inputs, through activities and outputs, to outcomes. While logical models present a theory about the expected program outcome, they do not demonstrate whether the program caused the observed outcome. A theory-based approach examines the assumptions underlying the links in the logical model. John Henry effect The John Henry effect happens when comparison subjects work harder to compensate for not being offered a treatment. When one compares treated units to those harderworking comparison units, the estimate of the impact of the program will be biased: we will estimate a smaller impact of the program than the true impact we would find if the comparison units did not make the additional effort. Minimum desired effect Minimum change in outcomes that would justify the investment that has been made in an intervention, accounting not only for the cost of the program and the type of benefits that it provides, but also on the opportunity cost of not having invested funds in an alternative intervention. The minimum desired effect is an input for power calculations: evaluation samples need to be large enough to detect at least the minimum desired effects with sufficient power. Null hypothesis A null hypothesis is a hypothesis that might be falsified on the basis of observed data. The null hypothesis typically proposes a general or default position. In evaluation, the default position is usually that there is no difference between the treatment and control group, or in other words, that the intervention has no impact on outcomes. Outcome A variable that measures the impact of the intervention. Can be intermediate or final, depending on what it measures and when. Output The products and services that are produced (supplied) directly by an intervention. Outputs may also include changes that result from the intervention which are relevant to the achievement of outcomes. Power calculation A calculation of the sample required for the impact evaluation, which depends on the minimum effect size that we want to be able to detect (see minimum desired effect ) and the required level of confidence. Pre-post comparison Also known as a before and after comparison. A pre-post comparison attempts to establish the impact of a program by tracking changes in outcomes for program

beneficiaries over time using measures both before and after the program or policy is implemented. Process evaluation A process evaluation is an evaluation that tries to establish the level of quality or success of the processes of a program. For example: adequacy of the administrative processes, acceptability of the program benefits, clarity of the information campaign, internal dynamics of implementing organizations, their policy instruments, their service delivery mechanisms, their management practices, and the linkages among these. Contrast with impact evaluation. Quasi-experimental design Impact evaluation designs that create a control group using statistical procedures. The intention is to ensure that the characteristics of the treatment and control groups are identical in all respects, other than the intervention, as would be the case in an experimental design. Random assignment An intervention design in which members of the eligible population are assigned at random to either the treatment group (receive the intervention) or the control group (do not receive the intervention). That is, whether someone is in the treatment or control group is solely a matter of chance, and not a function of any of their characteristics (either observed or unobserved). Random sample The best way to avoid a biased or unrepresentative sample is to select a random sample. A random sample is a probability sample where each individual in the population being sampled has an equal chance (probability) of being selected. Randomized evaluation (RE) (also known as randomized controlled trial, or RCT) An impact evaluation design in which random assignment is used to allocate the intervention among members of the eligible population. Since there should be no correlation between participant characteristics and the outcome, and differences in outcome between the treatment and control can be fully attributed to the intervention, i.e. there is no selection bias. However, REs may be subject to several types of bias and so need follow strict protocols. Also called experimental design. Regression analysis A statistical method which determines the association between the dependent variable and one or more independent variables. Selection bias A possible bias introduced into a study by the selection of different types of people into treatment and comparison groups. As a result, the outcome differences may potentially be explained as a result of pre-existing differences between the groups, rather than the treatment itself.

Significance level The significance level is usually denoted by the Greek symbol, α (alpha). Popular levels of significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a p-value lower than the α-level, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'. The lower the significance level, the stronger the evidence required. Choosing level of significance is an arbitrary task, but for many applications, a level of 5% is chosen, for no better reason than that it is conventional. Spillover effects When the intervention has an impact (either positive or negative) on units not in the treatment group. Ignoring spillover effects results in a biased impact estimate. If there are spillover effects then the group of beneficiaries is larger than the group of participants. Stratified sample Obtained by dividing the population of interest (sampling frame) into groups (for example, male and female), then by drawing a random sample within each group. A stratified sample is a probabilistic sample: every unit in each group (or strata) has the same probability of being drawn. Treatment group The group of people, firms, facilities or other subjects who receive the intervention. Also called participants. Treatment on the treated (TOT) estimate The treatment on the treated estimate is the impact (average treatment effect) only on those who actually received the intervention. Compare to intention to treat. Unobservables Characteristics which cannot be observed or measured. The presence of unobservables can cause selection bias in quasi-experimental designs.