Introduction to Reliability

Similar documents
LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Reliability and Validity checks S-005

Chapter 3 Psychometrics: Reliability and Validity

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Reliability & Validity Dr. Sudip Chaudhuri

11-3. Learning Objectives

ADMS Sampling Technique and Survey Studies

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

PÄIVI KARHU THE THEORY OF MEASUREMENT

Importance of Good Measurement

DATA is derived either through. Self-Report Observation Measurement

Saville Consulting Wave Professional Styles Handbook

By Hui Bian Office for Faculty Excellence


PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL)

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

Verbal Reasoning: Technical information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Validity and reliability of measurements

Reliability, validity, and all that jazz

Collecting & Making Sense of

VARIABLES AND MEASUREMENT

Reliability and Validity

Any phenomenon we decide to measure in psychology, whether it is

Measurement is the process of observing and recording the observations. Two important issues:

Validity and reliability of measurements

9. Interpret a Confidence level: "To say that we are 95% confident is shorthand for..

Interpreting the Item Analysis Score Report Statistical Information

Reliability, validity, and all that jazz

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Reliability. Internal Reliability

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Ursuline College Accelerated Program

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Reliability AND Validity. Fact checking your instrument

Certification of Airport Security Officers Using Multiple-Choice Tests: A Pilot Study

Probability Models for Sampling

02a: Test-Retest and Parallel Forms Reliability

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Collecting & Making Sense of

Registered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics

Lecture 12: Normal Probability Distribution or Normal Curve

Statistics for Psychosocial Research Session 1: September 1 Bill

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Chapter 8: Estimating with Confidence

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Do not write your name on this examination all 40 best

Chronicles of Dental Research

Research Questions and Survey Development

10 Intraclass Correlations under the Mixed Factorial Design

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Reliability Analysis: Its Application in Clinical Practice

A REVIEW OF THE LITERATURE ON MARKING RELIABILITY. Michelle Meadows and Lucy Billington. May 2005

Basic concepts and principles of classical test theory

Sheila Barron Statistics Outreach Center 2/8/2011

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

Making a psychometric. Dr Benjamin Cowan- Lecture 9

PTHP 7101 Research 1 Chapter Assignments

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

HPS301 Exam Notes- Contents

Development, Standardization and Application of

CHAPTER VI RESEARCH METHODOLOGY

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right.

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up

Overview of Experimentation

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Answer Key to Problem Set #1

Chapter 8: Estimating with Confidence

PsychTests.com advancing psychology and technology

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

Variable Measurement, Norms & Differences

The knowledge and skills involved in clinical audit

Reliability of oral examinations: Radiation oncology certifying examination

Applied Statistical Analysis EDUC 6050 Week 4


CHAPTER III RESEARCH METHODOLOGY

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP

CHAPTER 8 Estimating with Confidence

AN ABSTRACT OF THE THESIS OF

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

An Introduction to Classical Test Theory as Applied to Conceptual Multiple-choice Tests

No part of this page may be reproduced without written permission from the publisher. (

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

t-test for r Copyright 2000 Tom Malloy. All rights reserved

Selecting and Designing Instruments: Item Development, Reliability, and Validity John D. Hathcoat, PhD & Courtney B. Sanders, MS, Nikole Gregg, BA

EPIDEMIOLOGY. Training module

PSY 250. Experimental Design: The Basic Building Blocks. Simple between subjects design. The Two-Group Design 7/25/2015. Experimental design

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017

Self Report Measures

Transcription:

Reliability

Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing?

Introduction to Reliability What is reliability? Reliability is an index that estimates dependability (consistency) of scores Why is it important? Prerequisite to validity because if you are not measuring something accurately and consistently, you do not know if your inferences are valid Should not base decisions on test scores that are not reliable

2 Sources of Measurement Error 1) Random individual fluctuation not too serious [use of large samples corrects for this] 2) Systematic due to test itself big problem makes test unreliable

Factors that Impact Reliability Measurement errors can be caused by examinee-specific factors. motivation concentration fatigue boredom momentary lapses of memory carelessness in marking answers luck in guessing Measurement errors can be caused by test-specific factors. ambiguous or tricky items poor directions Measurement errors can be caused by scoring-specific factors. nonuniform scoring guidelines carelessness counting or computational errors.

Reliability The extent to which the assessment instrument yields consistent results for each student How much are students scores affected by temporary conditions unrelated to the characteristic being measured (test-retest reliability) Do different parts of a single assessment instrument lead to similar conclusions about a student s achievement (internal consistency reliability/split half) Do different people score students performance similarly (inter-rater reliability)? Are instruments equivalent (alternate/equivalent/ parallel forms reliability)?

Test-Retest Reliability Used to assess the consistency of a measure from one time to another To determine if the score generalizes across time The same test form is given twice and the scores are correlated to yield a coefficient of stability High test-retest reliability tells us if examinees would probably get similar scores if tested at different times Interval between test administrations is important--practice effects/learning effects

Internal Consistency Reliability Involve only one test administration Used to assess the consistency of results across items within a test (consistency of an individual s performance from item to item & item homogeneity) To determine the degree to which all items measure a common characteristic of the person Ways of assessing internal consistency: Kuder-Richardson (KR20)/Coefficient alpha Split-half reliability

Internal Consistency Cronbach s Alpha Estimates how consistently learners respond to the items within a scale Alpha measures the extent to which item responses obtained at the same time correlate highly with each other The widely-accepted social science cut-off is that alpha should be.70 or higher for a set of items to be considered a scale Rule: more items, the more reliable a scale will be (alpha increases)

Types of Internal Reliability KR20 - Dichotomously scored items with a range of difficulty: Multiple choice Short answer Fill in the blank Coefficient Alpha - Items that have more than dichotomous, right-wrong scores: Likert scale (e.g rate 1 to 5) Short answer Partial credit KR21 - Used for dichotomously scored items that are all about the same difficulty

Alternate-Forms Reliability Used to assess the consistency of the results of two tests constructed in the same way from the same content domain The two forms of the test are correlated to yield a coefficient of equivalence

Reliability for Subjectively Scored Tests Training and scoring Intra-rater reliability Inter-rater reliability

Intra-rater Reliability Used to assess each raters consistency over time Agreement between scores on the same examinee at different times

Inter-Rater Reliability Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon Agreement between the scores assigned by two raters (calculated as a percentage of agreement between the two or a correlation between the two) Exact agreement for 5 points or less Adjacent agreement for more than 5 points

Strategies to Enhance Reliability Objectively Scored Tests Write better items Lengthen test Manage item difficulty Subjectively Scored Tests Training of scorers Reasonable rating scale

Considerations: Lengthen test Time available for testing Fatigue of examinees Ability to construct good test items Point of diminishing returns - increasing test length by a lot will increase reliability but not enough to make it worth the testing time needed

Classical Test Theory Assessment of Observation (Measurement) Observed Score = True Score + Error

Standard Error of Measurement Amount of variation to be expected in test scores SEM numbers given in tests are typically based upon 1 standard error Provides confidence interval Example -- Score is 52 / SEM is 2.5 68% of scores between 49.5 and 54.5 based upon repeated testing

Normal Curve 34% 34% 13.5% 13.5% 2.5% 2.5% -3-2 -1 Mean +1 +2 +3 Standard Deviations

SEM = SD x Sqroot(1-r) Where: S = the standard deviation for the test r = the reliability coefficient for the test A Wechsler test with a split-half reliability coefficient of.96 and a standard deviation of 15 yields of SEM of 3 SEM = 15 x Sqroot(.04) = 3 Now that we have a SEM of 3 we can apply it to a real life situation. Example: Joe took the Wechsler test and received a score of 100. Let s build a band of error around Joe s test score of 100, using a 68% interval. A 68% interval is approximately equal to 1 standard deviation on either side of the mean. For a 68% interval, use the following formula: Test score ± 1(SEM) Where: The test score is Joe s score = 100 ± (1 x 3) = 100 ± 3

Why the ± 1? Because we are adopting the normal distribution for our theoretical distribution of error, and 68% of the values lie within the area between 1 standard deviation below the mean and 1 standard deviation above the mean. Chances are 68 out of 100 that Joe s true score falls within the range of 97 and 103. What about a 95% confidence interval? Test score ± 2(SEM) = 100 ± (2 x 3) = 100 ± 6 Chances are 95 out of 100 that Joe s true score falls within the range of 94 and 106. The higher a test s reliability coefficient, the smaller the test s SEM. The larger the SEM the less reliable the test is.

Could you develop a reliable scale to measure the quality of 3rd 5th graders drawings of Madagascar Giant Hissing Cockroaches?