Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank) Attribution The extent to which the observed change in outcome is the result of the intervention, having allowed for all other factors which may also affect the outcome(s) of interest. Attrition Either the drop out of subjects from the sample during the intervention, or failure to collect data from a subject in subsequent rounds of a data collection. Either form of attrition can result in biased impact estimates. Baseline Pre-intervention, ex-ante. The situation prior to an intervention, against which progress can be assessed or comparisons made. Baseline data are collected before a program or policy is implemented to assess the before state. Bias The extent to which the estimate of impact differs from the true value as a result of problems in the evaluation or sample design. Cluster A cluster is a group of subjects that are similar in one way or another. For example, in a sampling of school children, children who attend the same school would belong to a cluster, because they share the same school facilities and teachers and live in the same neighborhood. Cluster sample Sample obtained by drawing a random sample of clusters, after which either all subjects in selected clusters constitute the sample or a number of subjects within each selected cluster is randomly drawn. Comparison group A group of individuals whose characteristics are similar to those of the treatment groups (or participants) but who do not receive the intervention. Comparison groups are used to approximate the counterfactual. In a randomized evaluation, where the evaluator can ensure that no confounding factors affect the comparison group, it is called a control group. Confidence level The level of certainty that the true value of impact (or any other statistical estimate) will fall within a specified range.
Confounding factors Other variables or determinants that affect the outcome of interest. Contamination When members of the control group are affected by either the intervention (see spillover effects ) or another intervention that also affects the outcome of interest. Contamination is a common problem as there are multiple development interventions in most communities. Cost-effectiveness An analysis of the cost of achieving a one unit change in the outcome. The advantage compared to cost-benefit analysis, is that the (often controversial) valuation of the outcome is avoided. Can be used to compare the relative efficiency of programs to achieve the outcome of interest. Counterfactual The counterfactual is an estimate of what the outcome would have been for a program participant in the absence of the program. By definition, the counterfactual cannot be observed. Therefore it must be estimated using comparison groups. Dependent variable A variable believed to be predicted by or caused by one or more other variables (independent variables). The term is commonly used in regression analysis. Difference-in-differences (also known as double difference or D-in-D) The difference between the change in the outcome in the treatment group compared to the equivalent change in the control group. This method allows us to take into account any differences between the treatment and comparison groups that are constant over time. The two differences are thus before and after and between the treatment and comparison groups. Evaluation Evaluations are periodic, objective assessments of a planned, ongoing or completed project, program, or policy. Evaluations are used to answer specific questions often related to design, implementation and/or results. Ex ante evaluation design An impact evaluation design prepared before the intervention takes place. Ex ante designs are stronger than ex post evaluation designs because of the possibility of considering random assignment, and the collection of baseline data from both treatment and control groups. Also called prospective evaluation. Ex post evaluation design An impact evaluation design prepared once the intervention has started, and possibly been completed. Unless the program was randomly assigned, a quasi-experimental design has to be used.
External validity The extent to which the causal impact discovered in the impact evaluation can be generalized to another time, place, or group of people. External validity increases when the evaluation sample is representative of the universe of eligible subjects. Follow-up survey Also known as post-intervention or ex-post survey. A survey that is administered after the program has started, once the beneficiaries have benefited from the program for some time. An evaluation can include several follow-up surveys. Hawthorne effect The Hawthorne effect occurs when the mere fact that you are observing subjects makes them behave differently. Hypothesis A specific statement regarding the relationship between two variables. In an impact evaluation the hypothesis typically relates to the expected impact of the intervention on the outcome. Impact The effect of the intervention on the outcome for the beneficiary population. Impact evaluation An impact evaluation tries to make a causal link between a program or intervention and a set of outcomes. An impact evaluation tries to answer the question of whether a program is responsible for changes in the outcomes of interest. Contrast with process evaluation. Independent variable A variable believed to cause changes in the dependent variable, usually applied in regression analysis. Indicator An indicator is a variable that measures a phenomenon of interest to the evaluator. The phenomenon can be an input, an output, an outcome, or a characteristic. Inputs The financial, human, and material resources used for the development intervention. Intention to treat (ITT) estimate The average treatment effect calculated across the whole treatment group, regardless of whether they actually participated in the intervention or not. Compare to treatment on the treated estimate. Intra-cluster correlation Intra-cluster correlation is correlation (or similarity) in outcomes or characteristics between subjects that belong to the same cluster. For example, children that attend the
same school would typically be similar or correlated in terms of their area of residence or socio-economic background. Logical model Describes how a program should work, presenting the causal chain from inputs, through activities and outputs, to outcomes. While logical models present a theory about the expected program outcome, they do not demonstrate whether the program caused the observed outcome. A theory-based approach examines the assumptions underlying the links in the logical model. John Henry effect The John Henry effect happens when comparison subjects work harder to compensate for not being offered a treatment. When one compares treated units to those harderworking comparison units, the estimate of the impact of the program will be biased: we will estimate a smaller impact of the program than the true impact we would find if the comparison units did not make the additional effort. Minimum desired effect Minimum change in outcomes that would justify the investment that has been made in an intervention, accounting not only for the cost of the program and the type of benefits that it provides, but also on the opportunity cost of not having invested funds in an alternative intervention. The minimum desired effect is an input for power calculations: evaluation samples need to be large enough to detect at least the minimum desired effects with sufficient power. Null hypothesis A null hypothesis is a hypothesis that might be falsified on the basis of observed data. The null hypothesis typically proposes a general or default position. In evaluation, the default position is usually that there is no difference between the treatment and control group, or in other words, that the intervention has no impact on outcomes. Outcome A variable that measures the impact of the intervention. Can be intermediate or final, depending on what it measures and when. Output The products and services that are produced (supplied) directly by an intervention. Outputs may also include changes that result from the intervention which are relevant to the achievement of outcomes. Power calculation A calculation of the sample required for the impact evaluation, which depends on the minimum effect size that we want to be able to detect (see minimum desired effect ) and the required level of confidence. Pre-post comparison Also known as a before and after comparison. A pre-post comparison attempts to establish the impact of a program by tracking changes in outcomes for program
beneficiaries over time using measures both before and after the program or policy is implemented. Process evaluation A process evaluation is an evaluation that tries to establish the level of quality or success of the processes of a program. For example: adequacy of the administrative processes, acceptability of the program benefits, clarity of the information campaign, internal dynamics of implementing organizations, their policy instruments, their service delivery mechanisms, their management practices, and the linkages among these. Contrast with impact evaluation. Quasi-experimental design Impact evaluation designs that create a control group using statistical procedures. The intention is to ensure that the characteristics of the treatment and control groups are identical in all respects, other than the intervention, as would be the case in an experimental design. Random assignment An intervention design in which members of the eligible population are assigned at random to either the treatment group (receive the intervention) or the control group (do not receive the intervention). That is, whether someone is in the treatment or control group is solely a matter of chance, and not a function of any of their characteristics (either observed or unobserved). Random sample The best way to avoid a biased or unrepresentative sample is to select a random sample. A random sample is a probability sample where each individual in the population being sampled has an equal chance (probability) of being selected. Randomized evaluation (RE) (also known as randomized controlled trial, or RCT) An impact evaluation design in which random assignment is used to allocate the intervention among members of the eligible population. Since there should be no correlation between participant characteristics and the outcome, and differences in outcome between the treatment and control can be fully attributed to the intervention, i.e. there is no selection bias. However, REs may be subject to several types of bias and so need follow strict protocols. Also called experimental design. Regression analysis A statistical method which determines the association between the dependent variable and one or more independent variables. Selection bias A possible bias introduced into a study by the selection of different types of people into treatment and comparison groups. As a result, the outcome differences may potentially be explained as a result of pre-existing differences between the groups, rather than the treatment itself.
Significance level The significance level is usually denoted by the Greek symbol, α (alpha). Popular levels of significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a p-value lower than the α-level, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'. The lower the significance level, the stronger the evidence required. Choosing level of significance is an arbitrary task, but for many applications, a level of 5% is chosen, for no better reason than that it is conventional. Spillover effects When the intervention has an impact (either positive or negative) on units not in the treatment group. Ignoring spillover effects results in a biased impact estimate. If there are spillover effects then the group of beneficiaries is larger than the group of participants. Stratified sample Obtained by dividing the population of interest (sampling frame) into groups (for example, male and female), then by drawing a random sample within each group. A stratified sample is a probabilistic sample: every unit in each group (or strata) has the same probability of being drawn. Treatment group The group of people, firms, facilities or other subjects who receive the intervention. Also called participants. Treatment on the treated (TOT) estimate The treatment on the treated estimate is the impact (average treatment effect) only on those who actually received the intervention. Compare to intention to treat. Unobservables Characteristics which cannot be observed or measured. The presence of unobservables can cause selection bias in quasi-experimental designs.