Underlying Theory & Basic Issues

Similar documents
Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

EXPERIMENTAL RESEARCH DESIGNS

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

9 research designs likely for PSYC 2100

26:010:557 / 26:620:557 Social Science Research Methods

UNIT 5 - Association Causation, Effect Modification and Validity

Causal Research Design- Experimentation

A. Indicate the best answer to each the following multiple-choice questions (20 points)

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

2013/4/28. Experimental Research

Seminar 3: Basics of Empirical Research

Previous Example. New. Tradition

Conducting Research in the Social Sciences. Rick Balkin, Ph.D., LPC-S, NCC

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Overview of Experimentation

Lecture 4: Research Approaches

observational studies Descriptive studies

Rival Plausible Explanations

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Quantitative research Methods. Tiny Jaarsma

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Topic #6. Quasi-experimental designs are research studies in which participants are selected for different conditions from pre-existing groups.

Disposition. Quantitative Research Methods. Science what it is. Basic assumptions of science. Inductive and deductive logic

RESEARCH METHODS. Winfred, research methods,

Chapter 5: Research Language. Published Examples of Research Concepts

One slide on research question Literature review: structured; holes you will fill in Your research design

Quantitative research Methods. Tiny Jaarsma

Group Assignment #1: Concept Explication. For each concept, ask and answer the questions before your literature search.

VALIDITY OF QUANTITATIVE RESEARCH

PLANNING THE RESEARCH PROJECT

Slinger Jansen Utrecht University Utrecht University - Slinger Jansen. 3 Utrecht University - Slinger Jansen

Audio: In this lecture we are going to address psychology as a science. Slide #2

Statistical Literacy in the Introductory Psychology Course

382C Empirical Studies in Software Engineering. Lecture 1. Introduction. Dewayne E Perry ENS present, Dewayne E Perry

Choosing designs and subjects (Bordens & Abbott Chap. 4)

Validity of measurement instruments used in PT research

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Asking and answering research questions. What s it about?

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

Who? What? What do you want to know? What scope of the product will you evaluate?

PSYCHOLOGY Vol. II - Experimentation in Psychology-Rationale, Concepts and Issues - Siu L. Chow

From the editor: Reflections on research and publishing Journal of Marketing; New York; Oct 1996; Varadarajan, P Rajan;

Phil 12: Logic and Decision Making (Winter 2010) Directions and Sample Questions for Final Exam. Part I: Correlation

Educational Psychology

OBSERVATION METHODS: EXPERIMENTS

Handout 5: Establishing the Validity of a Survey Instrument

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

investigate. educate. inform.

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Chapter 1 Introduction to Educational Research

Engineering Science & VALIDITY

VARIABLES AND MEASUREMENT

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Goal: To become familiar with the methods that researchers use to investigate aspects of causation and methods of treatment

The Role and Importance of Research

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Design of Experiments & Introduction to Research

Applying the Experimental Paradigm to Software Engineering

Chapter 9: Experiments

Leaning objectives. Outline. 13-Jun-16 BASIC EXPERIMENTAL DESIGNS

Overview of the Logic and Language of Psychology Research

1. Temporal precedence 2. Existence of a relationship 3. Elimination of Rival Plausible Explanations

Types of Biomedical Research

Conducting a Good Experiment I: Variables and Control

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Lesson 11 Correlations

Assignment 4: True or Quasi-Experiment

Evidence-Based Practice: Where do we Stand?

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Experimental and Quasi-Experimental designs

PÄIVI KARHU THE THEORY OF MEASUREMENT

Research Design. ESP178 Research Methods Dr. Susan Handy 1/19/16

Bradford Hill Criteria for Causal Inference Based on a presentation at the 2015 ANZEA Conference

CSC2130: Empirical Research Methods for Software Engineering

A conversation with Professor David Chalmers, May 20, 2016 Participants

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Chapter 10 Quasi-Experimental and Single-Case Designs

Hypothesis-Driven Research

The Current State of Our Education

12/26/2013. Types of Biomedical Research. Clinical Research. 7Steps to do research INTRODUCTION & MEASUREMENT IN CLINICAL RESEARCH S T A T I S T I C

An Alternative Explanation for Premack & Premack

Who, among the following, is the writer of Business Research Methods? Which of the folowing is the basis of the Scientific Method?

Thinking Like a Researcher

Measurement. Reliability vs. Validity. Reliability vs. Validity

HPS301 Exam Notes- Contents

Independent Variables Variables (factors) that are manipulated to measure their effect Typically select specific levels of each variable to test

9/5/ Research Hazards Awareness Training

Survey of Knowledge Base Content

Epidemiologic Methods and Counting Infections: The Basics of Surveillance

STATISTICAL CONCLUSION VALIDITY

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Transcription:

Underlying Theory & Basic Issues Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1

All Too True 2

Validity In software engineering, we worry about various issues: E-Type systems: Usefulness is it doing what is needed Is it doing it in an acceptable or appropriate way S-Type programs: correctness of functionality is it doing what it is supposed to do Are the structures consistent with the way it should perform In empirical work, worried about similar kinds of things Are we testing what we mean to test Are the results due solely to our manipulations Are our conclusions justified What are the results applicable to The questions correspond to different validity concerns Concerned about the logic of evidence 3

Validity 4 primary types of validity Construct Validity Internal Validity Statistical Conclusion External Validity Comments This organization differs somewhat from R&R Each sequentially dependent on preceding 4

Construct Validity Are we measuring what we intend to measure Akin to the requirements problem: are we building the right system If we don t get this right, the rest doesn t matter Constructs: abstract concepts Theoretical constructions Must be operationalized in the experiment Necessary condition for successful experiment Divide construct validity into three parts: Intentional Validity Representation Validity Observation Validity 5

Construct Validity Intentional Validity Do the constructs we chose adequately represent what we intend to study Akin to the requirements problem where our intent is fair scheduling but out requirement is FIFO Are our constructs specific enough Do they focus in the right direction Eg, is it intelligence or cunningness 6

Construct Validity Representation Validity How well do the constructs or abstractions translate into observable measures Two primary questions: Do the sub-constructs properly define the constructs Do the observations properly interpret, measure or test the constructs 2 ways to argue for representation validity Face validity Claim: on the face of it, seems like a good translation Very weak argument Strengthened by consensus of experts Content validity Check the operationalization against the domain for the construct The extent to which the tests measure the content of the domain being tested - ie, cover the domain The more it covers the relevant areas, the more content valid Both are nonquantitative judgments 7

Construct Validity Observation Validity How good are the measures themselves Different aspects illuminated by Predictive validity Criterion validity Concurrent validity Convergent validity Discriminant validity 8

Predictive Validity Construct Validity Observed measure predicts what it should predict and nothing else Eg, college aptitude tests are assessed for their ability to predict success in college Criterion Validity Degree to which the results of a measure agree with those of an independent standard Eg, for college aptitude, GPA or successful first year Concurrent Validity The observed measure correlates highly with an established set of measures Eg, shorter forms of tests against longer forms 9

Convergent Validity Construct Validity Observed measure correlates highly with other observable measures for the same construct Utility is not that it duplicates a measure but is a new way of distinguishing a particular trait while correlating with similar measures Discriminant Validity The observable measure distinguishes between two groups that differ on the trait in question Lack of divergence argues for poor discriminant validity R&R discuss various interesting correlations on convergent and discriminant validity among various psychological tests In terms of validity, reliability and stability 10

Internal Validity Are the values of the dependent variables solely the result of the manipulations of the independent variables Have we ruled out rival hypotheses Have we eliminated confounding variables Participant variables Experimenter variables Stimulus, procedural and situational variables Instrumentation Nuisance variables 11

Internal Validity Never completely satisfied (Systems never error free either) Campbell, Stanley and Cook Standardize choice of control groups Try to isolate potential invalidity sources Confounding effects Treatment effect and some other effect cannot be separated Confounding sources of internal invalidity H: History takes place between pre and post test May contaminate post test results M: Maturation older/wiser/better between pre/post I: Instrumentation change due to test instrument S: Selection nature of participants Control over assignment may have effects 12

Statistical Conclusion Validity Are the presumed causal variable X and its effect Y statistically related Ie, do they covary If unrelated then the one cannot be the cause of the other 3 questions (sequentially dependent) Is the study sufficiently sensitive What is the evidence that they covary How strongly do they covary 13

External Validity Two positions The generalizability of the causal relationship beyond that studied/observed Eg, do studies of very large reliable real-time systems generalize to small.com companies The extent to which the results support the claims of generalizability Eg, do the studies of 5ESS support the claim that they are representative of real-time ultra reliable systems 14

External Validity (EV) SWE: lab studies tend to be with students and that restricts EV and generalization Ecological validity: representativeness of the real world Efficacy vs effectiveness studies Former very rigorous, latter more open Former needs to be more concerned with EV Latter with internal validity EV: the demonstrated validity of the generalizations that the researcher intended the research to make at the outset and the validity of the generalized inferences that the researcher offers at the end 15

Generalizability Generalizability considerations People Researchers Places, environments, settings Time Treatments, levels of treatments Procedures, conditions and measurements Technology Reproducibility Key to generalizability is whether the study can be reproduced Replication is an exact as possible repeat Same procedure, different sample Look for congruent results Replications on a broader set of subjects under additional conditions further strengthens generalizability 16

Causation Philosophical issues Aristotle: formal, material, efficient, final causes Alternatives: concomitant variation, invariable sequence Issue of power, causal efficacy -> invariably joined Necessity versus invariable sequence Sufficiency Temporal precedence 17

Causation Plurality of causes In behavioral sciences more in terms of one of the causes than a single cause Experiment while holding everything else constant Causality reserved for experimental results Working definition of cause Forget infinite regressive trail of reason Non-philosophical working definition: A proximal antecedent agent or agency that initiates a sequence of events that re necessary and sufficient in bringing about the observed effects Proximal since it is occurs at a time near the result Antecedent as it clearly precedes the effect Agent is set up intentionally Experimenter exercises the control lever Sufficient: effect not seen in absence of treatment 18

Lack of Causation Correlation and causation Correlations show a relationship With path analysis, multiple regression analysis, one can begin to make causal inferences, or build causal models But demonstration of causality is a logical and experimental, rather than a statistical problem Enabling versus causing Permits but does not cause Eg: marriage is primary cause of divorce - NOT Other Not sufficient May not be necessary (eg, high IQ and good living) May be probabilistic Some unsubstantiated causal claims: Drinking wine prevents arteriosclerosis Eating broccoli prevents colon cancer Post hoc ergo propter hoc necessity not demonstrated 19

Hume s Classical Rules C&E must be contiguous in space and time C must be prior to E A constant union between C and E The same C always produces the same E and the same E never arises but from the same C Like Es imply like Cs Like Cs produce like Es Cs may have multiple components, ie subcs Some Cs are not complete in themselves 20

Distillation Co-variation Rule : cause is positively correlated with effect Temporal Precedence Rule : causes must precede effects Internal Validity Rule :all plausible alternative explanations must be ruled out Reality : must settle for best available evidence 21

Control Constancy of conditions Maintaining extraneous conditions that might affect variables Calibrating various elements in an experiment Control series Expectation control Behavior control Control condition Of primary concern to us Control + treatment 22

Mill s Method Method of agreement If X then Y If several instances of this and only X present Then X is a sufficient condition for Y Method of Difference If ~X then ~Y IfY does not occur when X is absent Then X is a necessary condition of Y Joint Method Should lead to better, more highly justified conclusions than either method separately Method of Concomitant Variation Relates changes in the amount or degree of change Y = f(x) Y is functionally related to variations in X Eg, stronger treatments show larger effects 23

Practical Application Practical problems: Rare to find perfect covariance, r = 1.0 Rare to rule out every plausible alternative hypothesis Experience needed to determine adequate control conditions Two group design Treatment: if X then Y Control: if ~X then ~Y 24

Randomness Randomness is fundamental in experiments Quasi-experimental otherwise Must compensate for it otherwise Must have a clear view of its role It is the reasoned basis for inference in experiments The basis for statistical methods RA Fischer, The Design of Experiments, Edinburgh: Oliver & Boyd, 1935, 1949 Credited with the invention of randomization Introduced the formal properties of randomization 25

The Lady Tasting Tea A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. Experiment 8 cups of tea 4 made each way Presented in random order often determined by a random number table Subject knows the experimental design Her task is to determine two sets of 4 Agreeing if possible with the treatments received 26

The Lady Tasting Tea What would be expected if the Lady was without any faculty of discrimination Ie, if she made no changes in her judgments in response to changes in the order of presentation There are 70 possible divisions of 8 into 4 Randomization has insured that all orderings are equally probably The chance of accidentally choosing the right ordering is 1/70 Ie, probability of random ordering agreeing with the Lady s fixed judgments is 1/70 or 8 = 70 4 0.014 is the significance level for testing the null hypothesis of having no judgment 27

The Lady Tasting Tea Example serves well The Lady is not a sample from a population of ladies concerns her alone Her eight judgments are not independent observations (rule of 4 each) Later cups differ from earlier ones Inferences are justified because the only probability distribution used in the inference is the one created by the experimenter 28

Fischer s argument Experiments do not require That experimental units be homogeneous That experimental units be a random sample from a population of unit It is sufficient to require that treatments be allocated at random to experimental units for valid inferences Probability enters the experiment only thru the random assignment of treatments 29