Reliability & Validity Dr. Sudip Chaudhuri

Similar documents
Reliability and Validity

Introduction to Reliability

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Chapter 4. The Validity of Assessment- Based Interpretations

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

11-3. Learning Objectives

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL)

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Handout 5: Establishing the Validity of a Survey Instrument

ATTITUDE SCALES. Dr. Sudip Chaudhuri. M. Sc., M. Tech., Ph.D. (Sc.) (SINP / Cal), M. Ed. Assistant Professor (Stage-3) / Reader

Importance of Good Measurement

PÄIVI KARHU THE THEORY OF MEASUREMENT

Measurement is the process of observing and recording the observations. Two important issues:

ADMS Sampling Technique and Survey Studies

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

VARIABLES AND MEASUREMENT

Chapter 3 Psychometrics: Reliability and Validity

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

By Hui Bian Office for Faculty Excellence

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

CHAPTER III RESEARCH METHODOLOGY


Ch. 11 Measurement. Paul I-Hai Lin, Professor A Core Course for M.S. Technology Purdue University Fort Wayne Campus

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012

Measurement. 500 Research Methods Mike Kroelinger

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

Ch. 11 Measurement. Measurement

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

32.5. percent of U.S. manufacturers experiencing unfair currency manipulation in the trade practices of other countries.

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Validity of measurement instruments used in PT research

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Evaluation and Assessment: 2PSY Summer, 2017

how good is the Instrument? Dr Dean McKenzie

Saville Consulting Wave Professional Styles Handbook

Class 7 Everything is Related

26:010:557 / 26:620:557 Social Science Research Methods

Evaluation and Assessment: 2PSY Summer, 2015

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS

Achievement in Science as Predictors of Students Scientific Attitude

Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP

CHAPTER-III METHODOLOGY

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Chapter 1 Applications and Consequences of Psychological Testing

Chapter 9: Intelligence and Psychological Testing

Reliability AND Validity. Fact checking your instrument

Lecture 4: Research Approaches

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Overview of Experimentation

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

NATIONAL QUALITY FORUM

Underlying Theory & Basic Issues

Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015

Introduction On Assessing Agreement With Continuous Measurement

Evaluation and Assessment: 2PSY Summer, 2015

The Psychometric Principles Maximizing the quality of assessment

Biserial Weights: A New Approach

Validation of Scales

CHAPTER 3 METHOD AND PROCEDURE

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Basic concepts and principles of classical test theory

Steps in establishing reliability and validity of need assessment questionnaire on life skill training for adolescents

Competency Rubric Bank for the Sciences (CRBS)

Research Questions and Survey Development

Bijay Lal Pradhan, M Sc Statistics, FDPM (IIMA) 2

Overview of the Logic and Language of Psychology Research

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Measurement. Reliability vs. Validity. Reliability vs. Validity

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

SEMINAR ON SERVICE MARKETING

On the purpose of testing:

FSA Training Papers Grade 4 Exemplars. Rationales

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide

Appendix B Statistical Methods

Emotional Intelligence and Leadership

SUMMARY AND CONCLUSION

Critical Thinking Assessment at MCC. How are we doing?

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

DATA is derived either through. Self-Report Observation Measurement

Word Association Type and the Temporal Stacking of Responses

THE IMPACT OF EMOTIONAL INTELLIGENCE IN THE CONTEXT OF LANGUAGE LEARNING AND TEACHING

Transcription:

Reliability & Validity Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian Society for Radiation and Photochemical Sciences (ISRAPS) chaudhurisudip@yahoo.co.in

Measurement Error A participant s score on a particular measure consists of 2 components: Observed score = True score + Measurement Error True Score = score that the participant would have obtained if measurement was perfect i.e., we were able to measure without error Measurement Error = the component of the observed score that is the result of factors that distort the score from its true value x = x + total true x total and 2 2 σ total = σ true + σ 2 total

Factors that Influence Measurement Error Transient states of the participants: (transient mood, health, fatigue-level, etc.) Stable attributes of the participants: (individual differences in intelligence, personality, motivation, etc.) Situational factors of the research setting: (room temperature, lighting, crowding, etc.)

Characteristics of Measures and Manipulations Precision and clarity of operational definitions Training of observers Number of independent observations on which a score is based (more is better?) Measures that induce fatigue or fear

Actual Mistakes Equipment malfunction Errors in recording behaviors by observers Confusing response formats for self-reports Data entry errors Measurement error undermines the reliability (repeatability) of the measures we use

Reliability The reliability of a measure is an inverse function of measurement error: The more error, the less reliable the measure Reliable measures provide consistent measurement from occasion to occasion

Estimating Reliability Total Variance = Variance due + Variance due in a set of scores to true scores to error Reliability = True-score / Total Variance Variance σ σ 2 true rtt = 2 total Reliability can range from 0 to 1.0 When a reliability coefficient equals 0, the scores reflect nothing but measurement error Rule of Thumb: measures with reliability coefficients of 70% or greater have acceptable reliability

Different Methods for Assessing Reliability Test-Retest Reliability Internal Consistency Reliability

Test-Retest Reliability Test-retest reliability refers to the consistency of participant s responses over time (usually a few weeks, why?) Assumes the characteristic being measured is stable over time not expected to change between test and retest ( x x)( y y) rt = 2rt and rtt = ( n 1) S S 1 + r x y t

Internal Consistency Reliability Relevant for measures that consist of more than 1 item (e.g., total scores on scales, or when several behavioral observations are used to obtain a single score) Internal consistency refers to inter-item reliability, and assesses the degree of consistency among the items in a scale, or the different observations used to derive a score Want to be sure that all the items (or observations) are measuring the same construct

Estimates of Internal Consistency Item-total score consistency Split-half reliability: randomly divide items into 2 subsets and examine the consistency in total scores across the 2 subsets (any drawbacks?) n M ( n M ) = ( n 1) nσ { 1 } rtt 2 t

Visual example

Visual example

Factors affecting Reliability Extrinsic Factors: 1. Group variability ( ) 2. Guessing by the examinees 3. Envoronmental conditions 4. Momentary fluctuations in the examinee

Factors affecting Reliability Intrinsic Factors: 1. Length of the test ( ) 2. Homogeneity of items ( ) 3. Facility value (FV) of test items (~0.5) 4. Discrimination index (DI) of test items ( ) 5. Range of total scores ( ) 6. Scorer reliability

How to improve Reliability? Quality of items; concise statements, homogenous words (some sort of uniformity) Adequate sampling of content domain; comprehensiveness of items Longer assessment less distorted by chance factors Developing a scoring plan (esp. for subjective items rubrics) Ensure VALIDITY

Estimating the Validity of a Measure A good measure must not only be reliable, but also valid A valid measure measures what it is intended to measure Validity is not a property of a measure, but an indication of the extent to which an assessment measures a particular construct in a particular context thus a measure may be valid for one purpose but not another A measure cannot be valid unless it is reliable, but a reliable measure may not be valid

Estimating Validity Like reliability, validity is not absolute Validity is the degree to which variability (individual differences) in participant s scores on a particular measure, reflect individual differences in the characteristic or construct we want to measure Three types of measurement validity: Face Validity Construct Validity Criterion Validity

Face Validity Face validity refers to the extent to which a measure appears to measure what it is supposed to measure Not statistical involves the judgment of the researcher (and the participants) A measure has face validity if people think it does Just because a measure has face validity does not ensure that it is a valid measure (and measures lacking face validity can be valid)

Construct Validity Most scientific investigations involve hypothetical constructs entities that cannot be directly observed but are inferred from empirical evidence (e.g., intelligence) Construct validity is assessed by studying the relationships between the measure of a construct and scores on measures of other constructs We assess construct validity by seeing whether a particular measure relates as it should to other measures

Convergent and Discriminant Validity To have construct validity, a measure should both: Correlate with other measures that it should be related to (convergent validity) And, not correlate with measures that it should not correlate with (discriminant validity)

Criterion-Related Validity Refers to the extent to which a measure distinguishes participants on the basis of a particular behavioral criterion The Scholastic Aptitude Test (SAT) is valid to the extent that it distinguishes between students that do well in college versus those that do not A valid measure of marital conflict should correlate with behavioral observations. A valid measure of depressive symptoms should distinguish between subjects in treatment for depression and those who are not in treatment

Two Types of Criterion-Related Validity Concurrent validity measure and criterion are assessed at the same time Predictive validity elapsed time between the administration of the measure to be validated and the criterion is a relatively long period (e.g., months or years) Predictive validity refers to a measure s ability to distinguish participants on a relevant behavioral criterion at some point in the future

Examples of Concurrent and Predictive Validity High school seniors who score high on the the SAT are better prepared for college than low scorers (concurrent validity) Probably of greater interest to college admissions administrators, SAT scores predict academic performance four years later (predictive validity)

Factors that can lower Validity Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty Poorly constructed test items Test items inappropriate for the outcomes being measured Tests that are too short Improper arrangement of items (complex to easy?) Identifiable patterns of answers Teaching Administration and scoring Students Nature of criterion