The Psychometric Principles Maximizing the quality of assessment

Similar documents
Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Chapter 1 Applications and Consequences of Psychological Testing

ADMS Sampling Technique and Survey Studies

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

2 Types of psychological tests and their validity, precision and standards

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

11-3. Learning Objectives

26:010:557 / 26:620:557 Social Science Research Methods

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

TESTING AND INDIVIDUAL DIFFERENCES. AP Psychology

Process of a neuropsychological assessment

SSRMC Module 12 : Psychometrics Introduction

Intelligence, Thinking & Language


Reliability, validity, and all that jazz

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

The Short NART: Cross-validation, relationship to IQ and some practical considerations

Construct Validation of Direct Behavior Ratings: A Multitrait Multimethod Analysis

Technical Specifications

HPS301 Exam Notes- Contents

An Examination of Culture Bias in the Wonderlic Personnel Test*

VARIABLES AND MEASUREMENT

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81.

HOW TO DESIGN AND VALIDATE MY PAIN QUESTIONNAIRE?

CONTEXTUAL CONSIDERATIONS IN PAIN ASSESSMENT

Pre-Employment Psychological Evaluation Guidelines

Strategies for Reducing Adverse Impact. Robert E. Ployhart George Mason University

Intelligence. Exam 3. Conceptual Difficulties. What is Intelligence? Chapter 11. Intelligence: Ability or Abilities? Controversies About Intelligence

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

Samantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd.

CLINICAL VS. BEHAVIOR ASSESSMENT

3-86 Psychological Tests and Evaluation Procedures ^ General Ability Measures

6. Assessment. 3. Skew This is the degree to which a distribution of scores is not normally distributed. Positive skew

Questionnaires in Medical Research

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM

Validity and reliability of measurements

DAT Next Generation. FAQs

Intelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10

Reliability, validity, and all that jazz

Measurement is the process of observing and recording the observations. Two important issues:

The Concept of Validity

Work Personality Index Factorial Similarity Across 4 Countries

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

THE EFFECTS OF IMPLICIT BIAS ON THE PROSECUTION, DEFENSE, AND COURTS IN CRIMINAL CASES

PÄIVI KARHU THE THEORY OF MEASUREMENT

Reliability & Validity Dr. Sudip Chaudhuri

Psychometrics, Measurement Validity & Data Collection

Reflect on the Types of Organizational Structures. Hierarch of Needs Abraham Maslow (1970) Hierarchy of Needs

AP PSYCH Unit 11.2 Assessing Intelligence

CHILDREN'S ADVOCACY CENTER of Laredo Webb County Volunteer Application

Pre-Employment Psychological Evaluation Guidelines

Psychological testing

Item Analysis Explanation

Elimination of Implicit Bias By Adapting to Various Personalities

PSYCHOLOGY PAPER - I. Foundations of Psychology. 1. Introduction:

Not all DISC Assessments are Created Equal

Evaluation and Assessment: 2PSY Summer, 2017

Examining the Psychometric Properties of The McQuaig Occupational Test

CHAPTER 3 METHOD AND PROCEDURE

Intelligence. PSYCHOLOGY (8th Edition) David Myers. Intelligence. Chapter 11. What is Intelligence?

Psychological testing

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Intelligence. Intelligence Assessment Individual Differences

DEFINITION OF KEY CONCEPTS DIRECT/INDIRECT/HARASSMENT/ SEXUAL HARASSMENT

Background on the issue Previous study with adolescents and adults: Current NIH R03 study examining ADI-R for Spanish speaking Latinos

0457 GLOBAL PERSPECTIVES

Are people with Intellectual disabilities getting more or less intelligent II: US data. Simon Whitaker

MINT Incorporated Code of Ethics Adopted April 7, 2009, Ratified by the membership September 12, 2009

Chapter 2--Norms and Basic Statistics for Testing

Intelligence What is intelligence? Intelligence Tests and Testing

Knowledge Building Part I Common Language LIVING GLOSSARY

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

This exam consists of three parts. Provide answers to ALL THREE sections.

Validity and reliability of measurements

PÀ ÁðlPÀ gádå G À Áå ÀPÀgÀ CºÀðvÁ ÀjÃPÉë (PÉ- Émï) KARNATAKA STATE ELIGIBILITY TEST (K-SET) FOR LECTUERSHIP

Validation of Scales

MindmetriQ. Technical Fact Sheet. v1.0 MindmetriQ

TLQ Reliability, Validity and Norms

Definition of Intelligence

Implicit Bias and Philanthropic Effectiveness

Change in Plans. Monday. Wednesday. Finish intelligence Grade notebooks FRQ Work on Personality Project. Multiple Choice Work on Personality Project

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Unconscious Bias Training. Programme Overview. 1 Day or ½ Day? +44 (0) /

Chapter 4: Defining and Measuring Variables

Chapter 2: Purpose in Life

what the change should be and how it can be effected. of New York at Buffalo

Introduction to Psychology: Gateways to Mind and Behavior 13 th ed. Introduction: The Psychology of Studying Reflective Learning.

Employment Discrimination Law Professor Nancy Modesitt Room 507 Administrative Assistant: Gloria Joy

INDICATOR LISTS. The correct answer to each question should be Yes unless otherwise indicated.

Introduction to Test Theory & Historical Perspectives

Ethics of Assessment. Ethics vs. Law. Ethics vs. Law 7/19/2012. Ethics: what one should or should not do, according to principles or norms of conduct

Testing and Intelligence. What We Will Cover in This Section. Psychological Testing. Intelligence. Reliability Validity Types of tests.

Transcription:

Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence 2 1

What can be measured? Length, blood pressure, knowledge, desire, intelligence Temperature is what thermometers measure Measurements, decisions, the umpire, judgements, competitions, awards. 3 Psychometrics as measurement Reliability is the extent to which a measurement is free from error. If anything exists it must exist in some quantity and can therefore be measured. (Lord Kelvin 1824, 1907) In 1900, Lord Kelvin claimed "There is nothing new to be discovered in physics now. All that remains is more and more precise measurement." [ 4 2

The theory of true scores Whatever precautions have been taken to secure unity of standard, there will occur a certain divergence between the verdicts of competent examiners. If we tabulate the marks given by the different examiners they will tend to be disposed after the fashion of a gendarme s hat. I think it is intelligible to speak of the mean judgment of competent critics as the true judgment; and deviations from that mean as errors. This central figure which is, or may be supposed to be, assigned by the greatest number of equally competent judges, is to be regarded as the true value..., just as the true weight of a body is determined by taking the mean of several discrepant measurements. Edgeworth, F.Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society, LI, 599-635. 5 The Theory of True Scores Charles Spearman (1904). "General intelligence" objectively determined and measured. American Journal of Psychology, 15,201-293. If we have two measures of the same characteristic we can estimate true values. The accuracy of this estimation is called its reliability. Melvin Novik, Frederick Lord And Allan Birnbaum, used Classical Test theory to derive Latent Trait Theory, the fundamental building block of Item Response Theory and Rasch. Ref: Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. 6 3

Measuring reliability The reliability of a score is a value between 0 and 1. If zero, all is error, 1 is perfect accuracy. Once we have an estimate of reliability we can use it to: 1.Compare different forms of assessment 2. Assign confidence to a test result. 7 Expected reliabilities Individual ability tests 0.92 Group ability tests 0.85 Personality scales 0.75 Essays 0.66 Creativity tests 0.50 Projective tests 0.32 Graphology/astrology? 8 4

Using reliability Reliability gives us the standard error of measurement: Standard Error of Measurement = S x (1-r) where S = standard deviation of test scores and r = reliability 9 Example Emma obtains a mark of 67 on her final year essay. Assuming the reliability of essays is 0.66 and a standard deviation of 10, the standard error of measurement is 10* (1-0.66), which is approximately 6. The 95% confidence interval is this value ±1.96, = approx 12 The 95% confidence interval of her mark is 67 ±12. That is, her true score could be anything between 55 and 79 10 5

More uses for reliability Spearman Brown Prophesy Formula New reliability = n*r/1+(n-1)r Where n = ratio by which test length has changed r = old reliability 11 Example If Emma completed 3 essays as part of her examination paper in a single subject, then the new reliability = 3*0.66/(1+(3-1)*.66) = 0.85. This gives a confidence interval of 67 ±8 i.e. from 59 to 75 12 6

Forms of validity Face validity Content validity Predictive validity Concurrent validity Criterion related validity Construct validity 13 Face Validity Appropriateness Relevance Fairness Face validity for the candidate AND client Face reliability 14 7

Content validity The extent to which the content of the test matches the content of the: Job description Person specification Curriculum 15 Test specification A B C D Content 25% 25% 25% 25% Bloom Taxonomy 25% 4 4 4 4 Knowledge 25% 4 4 4 4 Understanding 25% 4 4 4 4 Application 25% 4 4 4 4 Generalisation 16 8

Concurrent Validity Does the test measure the same thing as other tests that also purport to measure it? Concurrent validity as differential validity Multitrait-multimethodapproach (Campbell & Fiske): 3 or more traits assessed by 3 or more methods Convergent validity (concurrent) Discriminant validity 17 Differential Validity Does the test measure the trait it purports to measure? Anxiety but not Depression Potential but not Ability Critical Thinking but not Intelligence Conscientiousness but not Impression Management 18 9

The Multitrait-Multimethod Technique Self-report 360 Projective E N C E N C E N C SR-E 1-0.15 0.32 0.65 -.04 0.23 0.46-0.56 0.13 SR-N 1 SR-C 1 360-E 1 360-N 1 360-C 1 P-E 1 P-N 1 P-C 1 19 Criterion-related validity Does the test predict success on a criterion E.G Are students with three straight A s at A level more likely to become successful doctors? I.e. Do they do better : (a) In their medical school exams? (b) as doctors? 20 10

Predictive validity Validates the test against its ability to predict Behaviour Motivation Success Potential 21 Accuracy of Predictors 0.7 0.6 0.5 0.4 0.3 0.2 Prediction 0.1 0 AC (prom) WS Tests Ability Ts AC (perf) Biodata Pers Tests Interviews References Astrology Graphology 22 11

Construct validity Constructs (e.g. Intelligence, Justice) Definitions Networks of associated ideas e.g. Biological Basis of personality Arousal Brain structure Mental Illness Conditioning Sensory deprivation Three types of standard 1. Criterion referenced what can a person with this score be expected to do or know how to do? 2. Norm referenced compare with others 3. Ipsative strengths and weaknesses (or training needs) 24 12

The normal distribution 25 The standard score (z score) z = (raw score mean)/standard deviation Scores range between -3 and +3 with a mean of zero Eg for a set of scores with a mean of 60 and a standard deviation of 6, what is the z score of persons with raw scores of: 60?, 66?, 54?, 69? Percentiles are obtainable from z tables 26 13

Standardised scores T scores = z*10 + 50 Stanine scores = z*2 +5 Sten scores = z*2 + 5.5 IQ format scores = z*15+100 A Level grades? 27 Bias and offensiveness 28 14

How are tests perceived? The predictive model The competition model The examinations model Popular conceptions of bias 29 The correction for guessing Corrected Score = R W/(N-1) Where R = number correct W = number incorrect N = number of response options (in True/False, N=2) Raw R W Corrected 50 50 0 50 50 50 50 0 75 75 25 50 47 47 53 0 30 15

Equivalence (bias) Differential Item Functioning (DIF) Item bias Test Equivalence Intrinsic test bias Adverse Impact Extrinsic test bias Cultural insensitivity 31 Item bias Second languages Dialects within a language Language subsets Pictorial forms Puzzles 32 16

Testing for item bias (using difficulty values) Item number Group 1 Group 2 36.86.87 29*.75.35 3.72.74 48.68.59 15.61.55 9*.45.60 33 Example of US case law 1970. Diana vscalifornia State Board of Education (settled out of court) Use of WISC with Spanish speaking children for Special Education placement. All bilingual children must be tested in their primary language. Unfair verbal items should not be used. Currently enrolled bilingual children to be retested. State psychologists to develop tests for Mexican American children, with appropriate items and their own norms. Any school district with a disparity must submit an explanation 34 17

Translinguistic and transcultural equivalence Obtained by: Translation and back translation Focus groups Cognitive interviews 35 Intrinsic Test Bias 1 The predictive validity model allows us to predict a candidates success from their test score using regression. But suppose this regression equation is different between two groups. 36 18

Intrinsic Test Bias 2 37 Intrinsic Test Bias 3 This is a statistical model for positive discrimination But do psychometricians agree on the procedures? No Cleary Einhornand Bass And others.. y = α+ βx y = α+ βx + ε 38 19

Example of US case law Bakkevsthe Regents of the University of California Medical School at Davis. 1977 California Supreme Court ruled that positive discrimination on grounds of race violated the equal protection provision of the US constitution 1978 US Supreme Court also ruled by 5 to 4 Court upheld affirmative action provided racewasnot involved. 39 UK Equal Opportunities Legislation Sex Discrimination Act (1975) Race Relations Act (1976) Data protection Act (1984) Disabilities Act (1998) Employment Equality Regulations (2003) Sexual orientation Religion and belief 40 20

Conclusions The four psychometric principles can be used To evaluate an assessment To improve an assessment To establish degrees of confidence To address issues of inequality To improve efficiency 21