Importance of Good Measurement

Similar documents
Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Validity and reliability of measurements

CHAPTER III RESEARCH METHODOLOGY

Research Questions and Survey Development

Validity and reliability of measurements

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Overview of Experimentation

VARIABLES AND MEASUREMENT

11-3. Learning Objectives

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012

Chapter 15 PSYCHOLOGICAL TESTS

CHAPTER 3 METHOD AND PROCEDURE

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

ADMS Sampling Technique and Survey Studies

Reliability and Validity checks S-005

Introduction to Reliability

Handout 5: Establishing the Validity of a Survey Instrument

By Hui Bian Office for Faculty Excellence

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604


Reliability & Validity Dr. Sudip Chaudhuri

Validation of Scales

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Perception-Based Evidence of Validity


Reliability. Internal Reliability

Running head: CPPS REVIEW 1

4 Diagnostic Tests and Measures of Agreement

Reliability AND Validity. Fact checking your instrument

Process of a neuropsychological assessment

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

Psychometric Instrument Development

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Psychometric Instrument Development

Psychometric Instrument Development

Chapter 4. The Validity of Assessment- Based Interpretations

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

MEASUREMENT THEORY 8/15/17. Latent Variables. Measurement Theory. How do we measure things that aren t material?

Psychometric Instrument Development

Chapter 3 Psychometrics: Reliability and Validity

DATA GATHERING METHOD

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Steps in establishing reliability and validity of need assessment questionnaire on life skill training for adolescents

Technical Specifications

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

Chapter 3. Psychometric Properties

Measurement is the process of observing and recording the observations. Two important issues:

Final Exam: PSYC 300. Multiple Choice Items (1 point each)

Psychometrics, Measurement Validity & Data Collection

Review of Various Instruments Used with an Adolescent Population. Michael J. Lambert

Sample Exam Questions Psychology 3201 Exam 1

English 10 Writing Assessment Results and Analysis

Agreement Coefficients and Statistical Inference

Variable Measurement, Norms & Differences

Insight Assessment Measuring Thinking Worldwide

PTHP 7101 Research 1 Chapter Assignments

Spatial Visualization Measurement: A Modification of the Purdue Spatial Visualization Test - Visualization of Rotations

CHAPTER-III METHODOLOGY

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Psychological testing

Psychometric evaluation of the self-test (PST) in the responsible gambling tool Playscan (GamTest)

Saville Consulting Wave Professional Styles Handbook

COMPUTING READER AGREEMENT FOR THE GRE

CHAPTER VI RESEARCH METHODOLOGY

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide

Career Counseling and Services: A Cognitive Information Processing Approach

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Self Report Measures

RESEARCH METHODOLOGY-NET/JRF EXAMINATION DECEMBER 2013 prepared by Lakshmanan.MP, Asst Professor, Govt College Chittur

Ch. 11 Measurement. Paul I-Hai Lin, Professor A Core Course for M.S. Technology Purdue University Fort Wayne Campus

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions

Reliability and Validity

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

PÄIVI KARHU THE THEORY OF MEASUREMENT

Program Evaluation: Methods and Case Studies Emil J. Posavac Eighth Edition

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Intelligence. Intelligence Assessment Individual Differences

Basic concepts and principles of classical test theory

Ch. 11 Measurement. Measurement

Thomas G. Plante & Jesus Mejia

The Assessment in Advanced Dementia (PAINAD) Tool developer: Warden V., Hurley, A.C., Volicer, L. Country of origin: USA

Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD

Research Approaches Quantitative Approach. Research Methods vs Research Design

PM12 Validity P R O F. D R. P A S Q U A L E R U G G I E R O D E P A R T M E N T O F B U S I N E S S A N D L A W

Transcription:

Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The data that is collected is only as good as the instrument that collects the data. A poorly designed instrument will lead to bad data, which will lead to bad conclusions. Therefore, developing a good instrument is a key part of conducting a high quality research study. Considerations in Assessment Development Error: Anything that causes a student s score on the assessment to be an inaccurate representation of their true score The assessment must be carefully developed so as to avoid error For formal assessments, allow plenty of time to revise, pilot test, re-revise, and re-re-revise the assessment Considerations in Assessment Development The characteristics of the students taking the assessment should guide instrument development and language used Keep the assessment as short as possible while still including sufficient items to measure the key variables DIRECTLY measure key variables 1

Developing a Good Assessment Step 1: Identify other studies assessing the same variable or other assessments Two reasons for identifying other studies: Develop a construct definition of the variable. Provide ideas on how each variable should be measured. Step 2: Develop a construct definition for each variable Developing a Good Assessment Step 3: Operationalize the construct definition It is vital that the construct and operational definitions are clearly related Consider practical limitations in the variable definition when operationalizing the variable Types of Assessments Self-Report: Participants report their own demographic characteristics, attitudes, beliefs, knowledge, feelings, and behavior Can take the form of Questionnaire or Interview Performance Assessment: Directly assess performance on a contrived task Observation: Researchers observe participants behavior Checklist: Identify the frequency or presence of behaviors or characteristics Examination/Test: Test participants knowledge of a topic Archival Data: Collect information from existing records Developing a Good Assessment Step 4: Choose an instrument to measure each variable (except an IV that is manipulated in an experimental design) "Development of new tests is a complex and difficult process that requires considerable training in psychological measurement. Therefore, we recommend that you make certain no suitable test is available before developing your own" (Gall, Gall, & Borg, 2003, p. 216). Advantages of using an already-developed instrument Saves time and energy Likely has already been well-validated Connects your study to the entire body of research that uses that instrument 2

Adopting or Adapting an Assessment Adopting:Use the assessment nearly verbatim Adapting:Significantly alter the assessment Generally, adopting is preferable to adapting because reliability and validity studies still apply However, the instrument may not be applicable to your population, requiring adaptation Developing a Good Assessment Step 5: Write a draft of the assessment Step 6: Revise the draft Give the draft to other colleagues/experts to vet Step 7: Pilot the draft on a sample with similar characteristics to your population Ask them to note places where they are unclear on the instrument Step 8: Revise, revise, revise, Re-Pilot, revise, revise, revise, Re-Pilot, revise, revise, Repeat Reliability: Consistency of results Reliable Reliable Unreliable Reliability Theory Actual score on test = True score + Error True Score: Hypothetical score on test The reliability coefficient indicates the ratio between the true scorevariance on the test and the total variance As the error in testing decreases, reliability increases 3

Reliability: Sources of Error Error in test construction Error due to test construction with two or more forms of the instrument Error in test administration Error in test scoring Error in Test Construction Results from items measuring more than one variable Internal Consistency: Measured by statistics looking at item consistency, e.g., Cronbach salpha, split-half reliability Correlate the items with each other Low internal consistency indicates poor test construction Items are measuring more variables than they were designed to measure. Solution: Revise the items to focus more directly on the key variable based on its definition Error in Multiple Forms of the Instrument Parallel Forms Reliability: Determines the similarity of two different versions of the same instrument To calculate: Administer the two tests to the same participants within a short period of time. Correlate the test scores Error in Test Administration Test environment: Room temperature, amount of light, noise, etc. Test-taker variables: Illness, amount of sleep, test anxiety, etc. Examiner-related variables: Absence of examiner, examiner s demeanor, etc. 4

Error in Test Administration Test-Retest Reliability: Determines how much error in a test score is due to error in test administration. To calculate: Administer the same test to the same participants on two different occasions Correlate the two test scores Error in Test Scoring When educators give subjective assessments different educators may give different scores to the same responses Inter-Rater Reliability: Determines how closely two different raters mark the assessment To calculate Give the exact same test results from one test administration to two different raters. Calculate the inter-rater reliability, generally with Cohen s Kappa Validity: Measuring what is supposed to be measured Valid Invalid Invalid Validity Three types of validity: Construct validity: Measure the appropriate psychological construct Criterion validity: Predict appropriate outcomes Content validity: Adequate sampling of content 5

Construct Validity Construct Validity: Appropriateness of inferences drawn from test scores regarding an individual s status of the psychological construct of interest Two considerations: Construct underrepresentation Construct irrelevant variance Construct Validity Construct underrepresentation: A test does not measure all of the important aspects of the construct. Academic self efficacy may measure self efficacy only in math and science, ignoring other important academic subjects Construct-irrelevant variance: Test scores are affected by other unrelated processes A mathematics test requires students to understand a language they are not familiar with Criterion Validity Criterion Validity: Correlation between the measure and a criterion A criterion can be any standard with which the test should be related Behavior Other test scores Ratings Psychiatric diagnosis Criterion Validity Example Criterion Validity Evidence for New Science Reasoning Test: Correlations between Science Reasoning and Other Measures WAEC Science Scores.83 School Science Marks.75 WAEC Writing scores.34 WAEC Reading Scores.24 Future marks in university science courses New Science Reasoning Test.65 High correlations indicate good convergent validity. Low correlations indicate good divergent validity. High correlation indicates good predictive validity. 6

Content Validity Content Validity: Sampling the entire domain of the variable it was designed to measure To assess: Gather a panel of judges Give the judges a table of specifications of the content in the domain Give the judges the instrument Judges draw a conclusion as to whether the proportion of content covered on the instrument matches the proportion of content in the domain Class Coverage Test Coverage Addition Subtraction Multiplication Division Addition Subtraction Multiplication Division Class Coverage Test Coverage Addition Subtraction Multiplication Division Addition Subtraction Multiplication Division Face Validity Face validity: Whether the instrument appears to measure what it purports to measure To assess: Ask test users and test takers to evaluate whether the test appears to measure the construct of interest 7

Face Validity Face validity is rarely of interest to test developers and test users The only instance where face validity is of interest is to instill confidence in test takers that the test is worthwhile Face validity is generally NOT a consideration for psychological researchers Face validity CANNOT be used to determine the actual interpretive validity of a test 8