PM12 Validity P R O F. D R. P A S Q U A L E R U G G I E R O D E P A R T M E N T O F B U S I N E S S A N D L A W

Internal and External Validity The concept of validity is very important in PE. To make PE useful it must be believable. Validity is usually focused on data collection, procedures, design and analysis. Here we refer more specifically to the validity of the design. In judging the quality of a design the main criteria are: internal validity; external validity.

Internal validity Internal validity refers to the certainty about cause and effect relationships. Did the program cause the observed outcome? To use the words of Cronbach internal validity refers to the trustworthiness of an inference. Internal validity refers to the conclusions regarding the subjects, time and context of the implemented PE.

Internal validity A threat to internal validity refers to precisely those conclusions, that is conclusions regarding the subjects, the time and context of the implemented research. A threat to internal validity is an objection that the design employed allows the casual link between treatment and outcome to remain uncertain; The design is weak in some way and does not enable one to have confidence in one s conclusions about what the program actually did accomplish regarding the subjects, time and context observed (Mohr, 1995).

Internal validity To sum up, evaluators and stakeholders wish to have confidence that when a PE shows significant results, the findings have been caused by the program itself and not by other activities and factors (alternative explanations).

Counterfactual situation To understand the importance of the internal validity of the evaluation design, is fundamental to introduce the concept of counterfactual situation. In theory, to exclude alternative explanations when evaluating the impact of a program it would be necessary to compare the observed outcome measure after the program (gross effect) with what would have appeared if the program had not been implemented (counterfactual situation).

Counterfactual situation Since counterfactual situation did not happen ( the darkness of unfulfilled ) the only alternative possible is to come to an estimation of it. There can be no design in quantitative program evaluation if there is no estimate of the counterfactual. When made empirically, this estimate always comes from (a) one or more previous time periods or (b) a group of comparable subjects

Counterfactual situation If the measure after the program is P (normally a mean) and the counterfactual situation is C (normally a mean), P-C is, essentially, what we need to know. P-C, as a difference of means (a difference of proportion or a regression coefficient) has to be compared to some standard to become an understandable and useful number to make judgment and decisions.

P-C P-C could be - at first - compared to program s objective (planned outcome Pp). Of course even planned objective should consider the counterfactual situation (the one estimated during the planning phase Cp). (P - C ) : (Pp Cp) To estimate the counterfactual situation in advance is very difficult, but we could use a before measure.

Example (Mohr, 1995) Objective: to reduce average travel time along a stretch of road from 15 minutes to 10 minutes by widening the road to three lanes. Suppose to measure the average travel time 10.5 minutes after the program. 15 minutes is our before measure, that is our Cp (P - C) : (Pp - Cp) = (10.5-15) : (10-15) = -4.5 : - 5 = 0.9 or 90% The hoped result was not achieved. Why?

Example (Mohr, 1995) We could use - alternatively - the adequacy ratio (the proportion of the problem eliminated by the program) as a standard for evaluating the accomplishment. Adequacy = 1 - (P : C). 1 - (10.5 : 15) = 1-0.7 = 0.3 or 30% Looking at the data used in our case, it is clear that to say that the program was only 30% adequate is unfair. As a matter of fact the elimination of the problem is impossible (unrealistic)!

Example (Mohr, 1995) In such cases it is necessary to establish the travel time that we consider as the elimination of the problem (say 9 minutes) and express P and C as gaps from this standard. C = 15-9 = 6 (the real problem without the program) P = 10.5-9 = 1.5 (the remaining true problem) Adequacy = 1 - (1.5 : 6.0) = 1-0.25 = 0.75 or 75% Adequacy is not 30% but 75%

Threats to internal validity Threats to internal validity are threats to the validity of the conclusions (inference) about a program effectiveness (impact) drawn on the basis of a specific design. Using quantitative methods, in order to come to a conclusion about program s impact we need to estimate a counterfactual situation. Counterfactual situation comes from: the same subjects at one or more previous time periods; a group of comparable subjects.

Threats to internal validity History In general, history is the possibility that something besides program accounts for all or part of the observed change over time. The importance of history can be better understood if we take into consideration important events (an economic recession; a terroristic attack,..) that can influence persons behaviour. Please note that even though the change is zero, this could be accounted for by some change-producing force of history that was counteracted by a true impact of the program. If the event affected a program group, the effect is called local history.

In August 1987, 27 children were killed outside of Cincinnati where a drunk driver crashed into a bus returning from a weekend outing. Assume that prior to this you had been asked to evaluate a public education program designed to reduce the number of DUIs and planned on using a relatively weak evaluation design to monitor the number of arrests for driving under the influence. Some months after the accident, you conclude that the decrease in DUI arrest was due to the intervention, when it probably was the tragedy that resulted in fewer drivers driving while intoxicated.

Threats to internal validity Maturation It can happen the problems improve autonomously as a consequence of the passage of time. An evaluator can attribute to the program the merit of having improved or solved a problem while, as a matter of fact, an important contribution could be given by the simple circumstance that persons naturally change over time. This specific threat is particularly important when the program is devoted to young persons whose attitude toward specific social problems can change fast (process of aging).

The evaluator may well have found that real changes have occurred during the course of the program; however, the reasons for the changes could be that the program lasted six moths and thus the participants are six months older and more experienced - not that the participants gained anything from the program.

Threats to internal validity Selection It is a powerful alternative explanation when the participation is on a voluntary basis (Pre test/post test helps). It can have several meanings. Let s consider the more common: potential difference on outcome between two observed groups is due to a difference that existed between them when they have been selected; you have selected persons who would have changed even without the treatment; are particularly likely to be changed by the treatment.

College teachers most likely to join faculty development programs are often already good teachers. After a faculty development program, most of these teachers will continue to be good teachers. Their competence tells us nothing about the quality of the program - these teachers were better than the typical teachers from the beginning

Threats to internal validity Attrition (Mortality) It concerns the possibility that subjects may leave the group after the beginning and before the end of the program. This means that the outcome measures on these subjects become unavailable. Since we cannot know if the subjects who left are average or mutually cancelling in their outcome performance, attrition introduces bias (Pre test/post test helps). The probability of attrition is correlated to the length of the program.

In a juvenile crime prevention program in which Y is measured by numbers of offenses over an extended period, those who are incarcerated because of a serious offense will have Y scores that are artificially low, and therefore irrelevant. Suppose you were running a program for parents of adolescents. Twelve parents sign up to learn how to better communicate with their adolescents. A few parents drop out during the 9-week program, but this does not concern you because you can objectively show that the program is working However, as you begin to examine your data, you realize that the parents who remained in the program were all college graduates. the parents who dropped out were high school graduates. Although the intervention may have worked, it did so only for parents who were college graduates

Threats to internal validity Regression (toward the mean) While maturation is referred to a sort of development, regression involves cyclical or episodic change. For many phenomena, subjects scoring toward an extreme are likely to drift naturally toward a less extreme norm over time. In a test taking, for example, many extreme high and low scores are due to transient conditions rather than innate ability, and a retest after some time passes is likely to produce scores that are less extreme

Threats to internal validity Testing This term is referred to pretest. We refer to the possibility that the scores of the posttest may be different from those otherwise would have been, not because of the treatment, but because the persons have been subjected to previous measurement (pretest) [familiarity; reactivity;] IQ tests: subsequent tests are usually better than the first ones

Threats to internal validity Instrumentation Just as those enrolled in a program can become bored by taking the same test on numerous occasions, the evaluator or other persons making observations might subtly or unconsciously modify the procedures Instead of counting every time a hyperactive child got out of his seat in the classroom, the weary observer by the end of the study may be counting only the incidents when the child got out of his seat and was corrected by the teacher; Observations ought to be made in the same way throughout the course of the evaluation. Tests should be administered the same way (e.g., in the same setting, at the same time of day, using the same rules or set of instructions) each time. a teacher gave more than the allowed time to a class to finish the posttest and less to the control group

Placebo factors Threats to internal validity refer to the generally mild and positive effects experienced by people as a result of their exposure to an innocuous intervention. Why placebo effects exist: Any form of health or psychosocial care delivered by a caring and sensitive service provider is capable of producing some generalized sense of wellbeing or even symptomatic improvements. Spending time in a treatment program, making a personal investment of energy, hope, and thought, tends to produce expectations so that the natural positive fluctuations of labile conditions (i.e., pain, depression, stress, anxiety, mobility, etc.) are attributed to the innovative treatment. If the service provider possesses great credibility and a favorable reputation in the community, etc., the stage is set for even greater placebo effects. How to recognise placebo effects: a comparison group of clients needs to receive some sort of benign but credible intervention, and their outcomes in effect subtracted from the outcomes of those who received the legitimate, experimental treatment.

External validity External validity concerns the extent to which one may safely generalize the conclusions derived from an evaluation. It is like to say that the program can be replicated with the same success in different situations (other subjects, other times, other settings). External validity concerns generalization and makes PE closer to research.

External validity There is always sound doubt that a program can be replicated with the same results in different time/space/culture situations even though experimental designs have been used (also with large samples). Social, political, demographic, economic conditions normally interact with the program. External validity is achievable in the extent to which the evaluation design allows the subjects, setting and time observed to be equivalent to those of which we would like to generalize.

External validity Experimental designs are not the best in this case since random assignment makes difficult to include typical subjects and natural settings. Both internal validity (to know if the program - as implemented - was effective) and external validity (how effective the program would be if continued or repeated) are important. Internal validity and external validity look a little contradictory since the first one concerns the past and the second one concerns the future.