Overview of Experimentation

Similar documents
Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

11-3. Learning Objectives

Reliability and Validity checks S-005

HPS301 Exam Notes- Contents

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Lecture 4: Research Approaches

Chapter 4: Defining and Measuring Variables


Importance of Good Measurement

VARIABLES AND MEASUREMENT

Sample Exam Questions Psychology 3201 Exam 1

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015

Validity and reliability of measurements

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Reliability AND Validity. Fact checking your instrument

9 research designs likely for PSYC 2100

Underlying Theory & Basic Issues

Samples, Sample Size And Sample Error. Research Methodology. How Big Is Big? Estimating Sample Size. Variables. Variables 2/25/2018

Validity and reliability of measurements

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

HUMAN-COMPUTER INTERACTION EXPERIMENTAL DESIGN

OBSERVATION METHODS: EXPERIMENTS

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Design of Experiments & Introduction to Research

PSY 250. Experimental Design: The Basic Building Blocks. Simple between subjects design. The Two-Group Design 7/25/2015. Experimental design

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Psychometrics, Measurement Validity & Data Collection

2013/4/28. Experimental Research

Experimental Design Part II

investigate. educate. inform.

Audio: In this lecture we are going to address psychology as a science. Slide #2

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

Chapter 3 Psychometrics: Reliability and Validity

PTHP 7101 Research 1 Chapter Assignments

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

What We Will Cover in This Section

Psychology: The Science

The Science of Psychology

how good is the Instrument? Dr Dean McKenzie

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

CHAPTER VI RESEARCH METHODOLOGY

Scientific Research. The Scientific Method. Scientific Explanation

CHAPTER 8 EXPERIMENTAL DESIGN

The following are questions that students had difficulty with on the first three exams.

By Hui Bian Office for Faculty Excellence

DATA is derived either through. Self-Report Observation Measurement

Final Exam: PSYC 300. Multiple Choice Items (1 point each)

PÄIVI KARHU THE THEORY OF MEASUREMENT

1. Temporal precedence 2. Existence of a relationship 3. Elimination of Rival Plausible Explanations

Topic #2. A key criterion in evaluating any test, measure, or piece of research is validity.

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Psychology Research Process

OVERVIEW OF RESEARCH METHODS II. Lecturer: Dr. Paul Narh Doku Contact: Department of Psychology, University of Ghana

Do not write your name on this examination all 40 best

CHAPTER 2 APPLYING SCIENTIFIC THINKING TO MANAGEMENT PROBLEMS

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Validation of Scales

ADMS Sampling Technique and Survey Studies

Experiments. Outline. Experiment. Independent samples designs. Experiment Independent variable

What We Will Cover in This Section

9.63 Laboratory in Cognitive Science

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL)

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Measurement. Reliability vs. Validity. Reliability vs. Validity

Lesson 2 The Experimental Method

Chapter 4. The Validity of Assessment- Based Interpretations

How to Think Straight About Psychology

Overview of the Logic and Language of Psychology Research


The Science of Psychology

Lesson 11 Correlations

CAUTIONS ABOUT THE PRACTICE EXAM

Research Questions and Survey Development

2-Group Multivariate Research & Analyses

Measurement is the process of observing and recording the observations. Two important issues:

26:010:557 / 26:620:557 Social Science Research Methods

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Introduction to the Scientific Method. Knowledge and Methods. Methods for gathering knowledge. method of obstinacy

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

Reliability & Validity Dr. Sudip Chaudhuri

Limitations of 2-cond Designs

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Introduction to Reliability

Threats to Validity in Experiments. The John Henry Effect

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

Meeting-5 MEASUREMENT 8-1

How do we construct Intelligence tests? Tests must be: Standardized Reliable Valid

The reality.1. Project IT89, Ravens Advanced Progressive Matrices Correlation: r = -.52, N = 76, 99% normal bivariate confidence ellipse

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Variables Research involves trying to determine the relationship between two or more variables.

Research Methodology. Introduction 10/18/2016. Introduction. User name: cp6691 Password: stats.

4 Diagnostic Tests and Measures of Agreement

Transcription:

The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview; Help on OBS Project &/or Assign 8. Overview of Experimentation Experiments are the most powerful research methods in science because, if done correctly (if high in internal validity) they allow us to establish cause & effect. Establishing Cause & Effect is the first step toward Explanation & Control. The Main Components of Any Experiment: 1. Statement of a Hypothesis. 2. Random Assignment of Participants to Conditions. 3. Manipulation of Antecedent Conditions (IVs). 4. Measurement of Behavior (DVs). 5. (Statistical) Analysis of Results. Overview of Experimentation A hypothesis is a (concrete, testable, falsifiable) statement about the relation between two or more variables. [If IV, then DV]. Independent Variable (IV): What the E manipulates. Examples include degree of anxiety, noise level, amount of training, time between study & test, color of survey, position of product, type of therapy, amount of counseling, etc. etc. etc. Levels of an IV: The different values of the variable. All IVs have at least two levels and often more. 1 IV: Degree of Anxiety; 2 Levels: Calm, Anxious. 1 IV: Degree of Anxiety; 3 Levels: Low, Moderate, High. 1 IV: Color of Survey; 4 Levels: White, Red, Blue, Yellow. 1 IV: Noise Level; 5 Levels: 20dB, 40dB, 60dB, 80dB, 100dB. 1 IV: Type of Therapy; 2 Levels: Drug, Cognitive-Behavioral. 1

Overview of Experimentation Dependent Variable (DV): What the E measures. Examples include % errors, % correct responses, exam score, degree of affiliation, level of extroversion, # of aggressive acts, change in academic performance, sales, etc. DVs do not have levels (although one might measure more than one thing; or one thing in more than one way). Operational Definitions: Defining variables in terms of specific/ concrete operations or procedures, leaving no room for guesswork or interpretation, so that they can be reproduced by another (naive) E. Experimental Operation Definitions : The precise procedures involved in implementing and/or manipulating an IV. Measured Operation Definitions: The precise procedures involved in measuring a DV. Reliability & Validity Reliability & Validity are at the heart of measurement. Reliability refers to whether a measure is consistent. Is my measuring instrument (test, survey, scale, etc.) dependable? Can I count on it to give the same scores over people, items, & time? In general, reliability is assessed via correlational measures that are interpreted similar to Pearson correlations. Validity refers to whether an instrument (test) truly measures the variable we think it measures (or want it to measure). Is my measurement (or IQ, Depression, Memory, etc.) really getti ng at the construct I'm interested in? In general, validity is also assessed via correlational measures, but usually in more sophisticated (theory-based) ways than those associated with reliability. Without reliable & valid measures there's no reason to do experiments. Reliability Inter-rater Reliability: Do two or more people produce the same ratings or measurements? Often assessed via (a) average % agreement, (b) Cohen's Kappa; or (c) average r over raters. Inter-item Reliability (aka. Internal Consistency): Do different parts of a test produce similar measurements? Split-half technique: Correlation between two halves of a test or survey (e.g., 1st & 2nd half; odd & even items). Cronbach's Alpha ( ): Average r among all possible split-halves. Test-Retest Reliability: Does a test produce similar measurements over time (e.g., at Time 1 & Time 2)? Treated as a special case of split-half; assessed as a correlation between Test 1 & Test 2 scores. 2

Face & Content Validity Face & Content Validity are non- empirical (non-data-based) forms of validity, based on argument and theory. Face Validity: Does my instrument (test, survey, question, etc.) look or feel like it's measuring what it's supposed to measure? Measures with high face validity may not always provide truly valid indices of the construct we are trying to measure (e.g., survey questions about racism/prejudice; cross-word puzzles as IQ tests). Measures with low face validity may can provide excellent measures of the construct we are after (e.g., Perceptual-Motor Speed or Raven's Progressive Matrices as indices of IQ). Content Validity: Is my instrument capturing the entirety of the construct I'm trying to measure? The extent to which the measure (a) includes relevant aspects of the construct and (b) exclude irrelevant aspects of the construct. E.g., Does the SAT measure more than academic achievement? Construct Validity Construct Validity looks at how well a measure (or factor) captures the relevant aspects of a construct and excludes irrelevant aspects. It can be thought of as the hard-core, data-based version of content validity. Convergent Validity: Does my instrument correlate with other instruments designed to measure the same construct? Does my measure of Intelligence (Raven's, cross-word puzzle ability) correlate with other measures of intelligence (WAIS)? Divergent Validity: Does my instrument not correlate with other instruments designed to measure different constructs? Does my measure of Intelligence (Raven's, cross-word puzzle ability) correlate with measures of memory, personality, athletic ability etc.? Criterion Validity Criterion Validity looks at the degree to which a measure (or factor) is related to other (often real-world) outcomes. It can be thought of as the hard-core, data- based version of face validity. Concurrent Validity: Does my instrument predict (correlate with) a currently available outcome related to the construct? Does my measure of Intelligence (Raven's, SAT, cross-word puzzle ability) correlate with current GPA? Predictive Validity: Does my instrument predict (correlate with) a future outcome related to the construct? Does my measure of Intelligence (Raven's, SAT, cross-word puzzle ability) correlate with GPA at graduation (or future Job/Pay level)? 3

Different Kinds of Validity Validity Non-Data-Based Data-Based Face Content Construct Criterion Convergent Divergent Concurrent Predictive Internal & External Validity Up until now we have been discussing the validity of measurements. Internal & External Validity are concerned with the validity of an experiment as a whole. Internal Validity. The degree to which a research design allows you to make causal statements (or draw firm conclusions). Well-designed experiments have high internal validity. Experiments with confounds have low internal validity. External Validity. The degree to which research findings generalize to people or situations outside the research setting. As a general rule, high external validity (generalization) will be dependent on high internal validity (a welldesigned experiment). Classic Threats to Internal Validity The internal validity of an experiment can be lowered by many things (e.g., bad design, noisy room, etc.). However, there are eight "classic" threats that should be especially guarded against. 1. History: Any outside event that affects the DV. Especially a problem when multiple measures are taken over time. or when different groups of subjects are tested at different points in time. 2. Maturation: Any physical or psych change that affects the DV. Mainly a problem when measures are taken over time (pre-/post-testing). Examples include fatigue, boredom, increased knowledge, etc. 3. Testing: Any change in performance due to prior test experience. Problematic when measures are taken over time (e.g., Within-Ss Designs). Examples include practice effects & test pre-sensitization. 4. Instrumentation: Changes in a measurement device; or in the criteria used by observers for recording behavioral events. E.g.'s: Changes in the sensitivity of a button; Different criteria for "violence". 4

Classic Threats to Internal Validity 5. Regression to the mean: Movement away from an extreme value toward the mean value. Mainly a problem measuresover-time (pre/post) problem. 6. Selection: Anything that results in non-equivalent groups being exposed to different treatment conditions. Ex Post Facto Designs; Non-random assignment; Self-selection; etc. 7. Attrition (aka. Mortality): Differential loss of subjects from particular treatment groups or conditions. Often a problem in drug or aging studies; or when more difficult or boring conditions are compared to easier or more interesting conditions. 8. Selection Interactions: When any of the above threats affects one treatment group more than another. E.g., History or Test effects on Males vs. Females or Young vs. Old. Final Project Overview I The Final Project for this course is a Paper [50 points] & Presentation [20 points] based on a 2x2 factorial experiment [2 IVs with 2 levels each]. Experiments will be designed & conducted by teams of 4 students. Topic ideas will be handed out in class. I strongly encourage each team to (a) get started early in designing and setting up their experiment; and (b) conduct test runs of their experiment before actual data is collected. Two class periods have been set aside for data collection (see course Schedule). You may also collect data outside of these periods, but All experiments must be reviewed & cleared by me before any data is collected. Final Project Overview II Paper: A complete report of your experiment. Papers should (a) strictly follow APA format, (b) use the hourglass method, & (c) include at least three references of papers you have read. Grading will be based on the format of your report as well as its content (including stats). More detailed guidelines to grading will be posted on the web. Presentation: Each team will give a brief (10-15 minute) presentation of their experiment to the class on Dec. 4. Presentations should be in PowerPoint and include an Intro, Method, Results & Discussion, with each member presenting one section. To receive full credit, PowerPoint Presentations must be mailed to MethodsTA@yahoo.com by 5pm on Dec. 4. To receive full credit, Hard copies of your final report must be in my mailbox by 5pm on Dec. 6. 5