PM12 Validity P R O F. D R. P A S Q U A L E R U G G I E R O D E P A R T M E N T O F B U S I N E S S A N D L A W

Similar documents
In this chapter we discuss validity issues for quantitative research and for qualitative research.

OBSERVATION METHODS: EXPERIMENTS

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Experimental Design and the struggle to control threats to validity

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Lecture 4: Research Approaches

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

The following are questions that students had difficulty with on the first three exams.

Chapter Three Research Methodology

More on Experiments: Confounding and Obscuring Variables (Part 1) Dr. Stefanie Drew

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Quasi-Experimental and Single Case Experimental Designs. Experimental Designs vs. Quasi-Experimental Designs

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

CHAPTER LEARNING OUTCOMES

Chapter 4: Understanding Others

COUNSELING INTERVIEW GUIDELINES

CHAPTER 8 EXPERIMENTAL DESIGN

Experimental Research. Types of Group Comparison Research. Types of Group Comparison Research. Stephen E. Brock, Ph.D.

Political Science 15, Winter 2014 Final Review

Levels of Evaluation. Question. Example of Method Does the program. Focus groups with make sense? target population. implemented as intended?

Causal inference: Nuts and bolts

PYSC 224 Introduction to Experimental Psychology

VARIABLES AND MEASUREMENT

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

The degree to which a measure is free from error. (See page 65) Accuracy

9 research designs likely for PSYC 2100

Class 1: Introduction, Causality, Self-selection Bias, Regression

Threats to validity in intervention studies. Potential problems Issues to consider in planning

9/30/2017. personality traits: Sheltered, Confident, Team- Oriented, Conventional, Pressured, Achieving

Mental Health Strategy. Easy Read

Asking & Answering Sociological Questions

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Strategies to Promote Emotional Resilience

Importance of Good Measurement

Topic #2. A key criterion in evaluating any test, measure, or piece of research is validity.

Selecting a Study Design

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

UNIT 1. THE DIGNITY OF THE PERSON

Experimental and Quasi-Experimental designs

RESEARCH METHODS. Winfred, research methods,

VALIDITY OF QUANTITATIVE RESEARCH

Self-Handicapping Variables and Students' Performance

Causal inference nuts and bolts

Barriers to concussion reporting. Qualitative Study of Barriers to Concussive Symptom Reporting in High School Athletics

GUIDE 4: COUNSELING THE UNEMPLOYED

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

We re going to talk about a class of designs which generally are known as quasiexperiments. They re very important in evaluating educational programs

The Regression-Discontinuity Design

Psych 1Chapter 2 Overview

ADMS Sampling Technique and Survey Studies

What Constitutes a Good Contribution to the Literature (Body of Knowledge)?

26:010:557 / 26:620:557 Social Science Research Methods

Educational Research. S.Shafiee. Expert PDF Trial

THE QUALITATIVE TRADITION: A COMPLIMENTARY PARADIGM FOR RESEARCH IN ECONOMIC EDUCATION

Fear messages in marketing

11-3. Learning Objectives

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

Patrick Breheny. January 28

Overview of the Logic and Language of Psychology Research

Does anxiety cause some difficulty for a young person you know well? What challenges does this cause for the young person in the family or school?

Vocabulary. Bias. Blinding. Block. Cluster sample

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

EXPERIMENTAL RESEARCH DESIGNS

Matching: Observational research

Semester: Semester 3, 2014 Program: Credit Points: 10 Course Coordinator: Document modified: 07 Oct :28:38

The Practice of Statistics 1 Week 2: Relationships and Data Collection

Conducting Strong Quasi-experiments

The Scientific Approach: A Search for Laws Basic assumption of science: Events are governed by some lawful order. Goals of psychology: Measure and

Q: What can you tell us about the work you do and your involvement with children with autism?

The Scientific Method

Over the Limit: New report aims to reduce DUI deaths

What We Will Cover in This Section

Meeting a Kid with Autism

PREVENTING DISTRACTED DRIVING. Maintaining Focus Behind the Wheel of a School Bus

UNIT II: RESEARCH METHODS

UNDERSTANDING and LEADING CHANGE A workshop presented to the Archdiocesan Development Council by the SGC Consulting Group March 12, 2008

STATISTICAL CONCLUSION VALIDITY

Communication Research Practice Questions

Why do Psychologists Perform Research?

The Invisible Driver of Chronic Pain


Reliability, validity, and all that jazz

QUASI EXPERIMENTAL DESIGN

This exam consists of three parts. Provide answers to ALL THREE sections.

Experimental Design (7)

MATH& 146 Lesson 6. Section 1.5 Experiments

1. Which of the following functions is affected by alcohol consumption? A. Vision B. Steering C. Attention D. All of the above

Overview of Experimentation

UNIT 7 EXPERIMENTAL RESEARCH-I1

As a law enforcement official you

INSTRUCTION NO. which renders him/her incapable of safely operating a motor vehicle. Under the law, a person

CHAPTER. Experimental Research

Goal: To become familiar with the methods that researchers use to investigate aspects of causation and methods of treatment

2 Critical thinking guidelines

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

An Understanding of Role of Heuristic on Investment Decisions

Transcription:

PM12 Validity P R O F. D R. P A S Q U A L E R U G G I E R O D E P A R T M E N T O F B U S I N E S S A N D L A W

Internal and External Validity The concept of validity is very important in PE. To make PE useful it must be believable. Validity is usually focused on data collection, procedures, design and analysis. Here we refer more specifically to the validity of the design. In judging the quality of a design the main criteria are: internal validity; external validity.

Internal validity Internal validity refers to the certainty about cause and effect relationships. Did the program cause the observed outcome? To use the words of Cronbach internal validity refers to the trustworthiness of an inference. Internal validity refers to the conclusions regarding the subjects, time and context of the implemented PE.

Internal validity A threat to internal validity refers to precisely those conclusions, that is conclusions regarding the subjects, the time and context of the implemented research. A threat to internal validity is an objection that the design employed allows the casual link between treatment and outcome to remain uncertain; The design is weak in some way and does not enable one to have confidence in one s conclusions about what the program actually did accomplish regarding the subjects, time and context observed (Mohr, 1995).

Internal validity To sum up, evaluators and stakeholders wish to have confidence that when a PE shows significant results, the findings have been caused by the program itself and not by other activities and factors (alternative explanations).

Counterfactual situation To understand the importance of the internal validity of the evaluation design, is fundamental to introduce the concept of counterfactual situation. In theory, to exclude alternative explanations when evaluating the impact of a program it would be necessary to compare the observed outcome measure after the program (gross effect) with what would have appeared if the program had not been implemented (counterfactual situation).

Counterfactual situation Since counterfactual situation did not happen ( the darkness of unfulfilled ) the only alternative possible is to come to an estimation of it. There can be no design in quantitative program evaluation if there is no estimate of the counterfactual. When made empirically, this estimate always comes from (a) one or more previous time periods or (b) a group of comparable subjects

Counterfactual situation If the measure after the program is P (normally a mean) and the counterfactual situation is C (normally a mean), P-C is, essentially, what we need to know. P-C, as a difference of means (a difference of proportion or a regression coefficient) has to be compared to some standard to become an understandable and useful number to make judgment and decisions.

P-C P-C could be - at first - compared to program s objective (planned outcome Pp). Of course even planned objective should consider the counterfactual situation (the one estimated during the planning phase Cp). (P - C ) : (Pp Cp) To estimate the counterfactual situation in advance is very difficult, but we could use a before measure.

Example (Mohr, 1995) Objective: to reduce average travel time along a stretch of road from 15 minutes to 10 minutes by widening the road to three lanes. Suppose to measure the average travel time 10.5 minutes after the program. 15 minutes is our before measure, that is our Cp (P - C) : (Pp - Cp) = (10.5-15) : (10-15) = -4.5 : - 5 = 0.9 or 90% The hoped result was not achieved. Why?

Example (Mohr, 1995) We could use - alternatively - the adequacy ratio (the proportion of the problem eliminated by the program) as a standard for evaluating the accomplishment. Adequacy = 1 - (P : C). 1 - (10.5 : 15) = 1-0.7 = 0.3 or 30% Looking at the data used in our case, it is clear that to say that the program was only 30% adequate is unfair. As a matter of fact the elimination of the problem is impossible (unrealistic)!

Example (Mohr, 1995) In such cases it is necessary to establish the travel time that we consider as the elimination of the problem (say 9 minutes) and express P and C as gaps from this standard. C = 15-9 = 6 (the real problem without the program) P = 10.5-9 = 1.5 (the remaining true problem) Adequacy = 1 - (1.5 : 6.0) = 1-0.25 = 0.75 or 75% Adequacy is not 30% but 75%

Threats to internal validity Threats to internal validity are threats to the validity of the conclusions (inference) about a program effectiveness (impact) drawn on the basis of a specific design. Using quantitative methods, in order to come to a conclusion about program s impact we need to estimate a counterfactual situation. Counterfactual situation comes from: the same subjects at one or more previous time periods; a group of comparable subjects.

Threats to internal validity History In general, history is the possibility that something besides program accounts for all or part of the observed change over time. The importance of history can be better understood if we take into consideration important events (an economic recession; a terroristic attack,..) that can influence persons behaviour. Please note that even though the change is zero, this could be accounted for by some change-producing force of history that was counteracted by a true impact of the program. If the event affected a program group, the effect is called local history.

In August 1987, 27 children were killed outside of Cincinnati where a drunk driver crashed into a bus returning from a weekend outing. Assume that prior to this you had been asked to evaluate a public education program designed to reduce the number of DUIs and planned on using a relatively weak evaluation design to monitor the number of arrests for driving under the influence. Some months after the accident, you conclude that the decrease in DUI arrest was due to the intervention, when it probably was the tragedy that resulted in fewer drivers driving while intoxicated.

Threats to internal validity Maturation It can happen the problems improve autonomously as a consequence of the passage of time. An evaluator can attribute to the program the merit of having improved or solved a problem while, as a matter of fact, an important contribution could be given by the simple circumstance that persons naturally change over time. This specific threat is particularly important when the program is devoted to young persons whose attitude toward specific social problems can change fast (process of aging).

The evaluator may well have found that real changes have occurred during the course of the program; however, the reasons for the changes could be that the program lasted six moths and thus the participants are six months older and more experienced - not that the participants gained anything from the program.

Threats to internal validity Selection It is a powerful alternative explanation when the participation is on a voluntary basis (Pre test/post test helps). It can have several meanings. Let s consider the more common: potential difference on outcome between two observed groups is due to a difference that existed between them when they have been selected; you have selected persons who would have changed even without the treatment; are particularly likely to be changed by the treatment.

College teachers most likely to join faculty development programs are often already good teachers. After a faculty development program, most of these teachers will continue to be good teachers. Their competence tells us nothing about the quality of the program - these teachers were better than the typical teachers from the beginning

Threats to internal validity Attrition (Mortality) It concerns the possibility that subjects may leave the group after the beginning and before the end of the program. This means that the outcome measures on these subjects become unavailable. Since we cannot know if the subjects who left are average or mutually cancelling in their outcome performance, attrition introduces bias (Pre test/post test helps). The probability of attrition is correlated to the length of the program.

In a juvenile crime prevention program in which Y is measured by numbers of offenses over an extended period, those who are incarcerated because of a serious offense will have Y scores that are artificially low, and therefore irrelevant. Suppose you were running a program for parents of adolescents. Twelve parents sign up to learn how to better communicate with their adolescents. A few parents drop out during the 9-week program, but this does not concern you because you can objectively show that the program is working However, as you begin to examine your data, you realize that the parents who remained in the program were all college graduates. the parents who dropped out were high school graduates. Although the intervention may have worked, it did so only for parents who were college graduates

Threats to internal validity Regression (toward the mean) While maturation is referred to a sort of development, regression involves cyclical or episodic change. For many phenomena, subjects scoring toward an extreme are likely to drift naturally toward a less extreme norm over time. In a test taking, for example, many extreme high and low scores are due to transient conditions rather than innate ability, and a retest after some time passes is likely to produce scores that are less extreme

Threats to internal validity Testing This term is referred to pretest. We refer to the possibility that the scores of the posttest may be different from those otherwise would have been, not because of the treatment, but because the persons have been subjected to previous measurement (pretest) [familiarity; reactivity;] IQ tests: subsequent tests are usually better than the first ones

Threats to internal validity Instrumentation Just as those enrolled in a program can become bored by taking the same test on numerous occasions, the evaluator or other persons making observations might subtly or unconsciously modify the procedures Instead of counting every time a hyperactive child got out of his seat in the classroom, the weary observer by the end of the study may be counting only the incidents when the child got out of his seat and was corrected by the teacher; Observations ought to be made in the same way throughout the course of the evaluation. Tests should be administered the same way (e.g., in the same setting, at the same time of day, using the same rules or set of instructions) each time. a teacher gave more than the allowed time to a class to finish the posttest and less to the control group

Placebo factors Threats to internal validity refer to the generally mild and positive effects experienced by people as a result of their exposure to an innocuous intervention. Why placebo effects exist: Any form of health or psychosocial care delivered by a caring and sensitive service provider is capable of producing some generalized sense of wellbeing or even symptomatic improvements. Spending time in a treatment program, making a personal investment of energy, hope, and thought, tends to produce expectations so that the natural positive fluctuations of labile conditions (i.e., pain, depression, stress, anxiety, mobility, etc.) are attributed to the innovative treatment. If the service provider possesses great credibility and a favorable reputation in the community, etc., the stage is set for even greater placebo effects. How to recognise placebo effects: a comparison group of clients needs to receive some sort of benign but credible intervention, and their outcomes in effect subtracted from the outcomes of those who received the legitimate, experimental treatment.

External validity External validity concerns the extent to which one may safely generalize the conclusions derived from an evaluation. It is like to say that the program can be replicated with the same success in different situations (other subjects, other times, other settings). External validity concerns generalization and makes PE closer to research.

External validity There is always sound doubt that a program can be replicated with the same results in different time/space/culture situations even though experimental designs have been used (also with large samples). Social, political, demographic, economic conditions normally interact with the program. External validity is achievable in the extent to which the evaluation design allows the subjects, setting and time observed to be equivalent to those of which we would like to generalize.

External validity Experimental designs are not the best in this case since random assignment makes difficult to include typical subjects and natural settings. Both internal validity (to know if the program - as implemented - was effective) and external validity (how effective the program would be if continued or repeated) are important. Internal validity and external validity look a little contradictory since the first one concerns the past and the second one concerns the future.