Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Similar documents
Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

DATA GATHERING METHOD

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

RELIABILITY AND VALIDITY (EXTERNAL AND INTERNAL)

Reliability and Validity

KCP learning. factsheet 44: mapp validity. Face and content validity. Concurrent and predictive validity. Cabin staff

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

Introduction to Reliability

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right.

Validity and reliability of measurements

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM

VARIABLES AND MEASUREMENT

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Validation of Scales

The Psychometric Principles Maximizing the quality of assessment

THE ANGOFF METHOD OF STANDARD SETTING


Overview of the Logic and Language of Psychology Research

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Catching the Hawks and Doves: A Method for Identifying Extreme Examiners on Objective Structured Clinical Examinations

ADMS Sampling Technique and Survey Studies

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012

Item Analysis Explanation

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

11-3. Learning Objectives

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Reliability & Validity Dr. Sudip Chaudhuri

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Chapter 3 Psychometrics: Reliability and Validity

Validity and reliability of measurements

Attitude Towards Tobacco Use

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Measurement is the process of observing and recording the observations. Two important issues:

Reliability of oral examinations: Radiation oncology certifying examination

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

Champlain Assessment/Outcome Measures Forum February 22, 2010

Statistics for Psychosocial Research Session 1: September 1 Bill

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

RESEARCH SUMMARY. for the TYPEFINDER. Personality Assessment !!!!!! Molly Owens, MA, and Andrew D. Carson, PhD. Truity Psychometrics LLC

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Importance of Good Measurement

PÄIVI KARHU THE THEORY OF MEASUREMENT

02a: Test-Retest and Parallel Forms Reliability

Proficiency Testing Policy

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Any object or event that can take on more than one form or value.

2016 Technical Report National Board Dental Hygiene Examination

Body Weight Behavior

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Issues in Clinical Measurement

Chapter 5 Analyzing Quantitative Research Literature

Reliability and Validity checks S-005

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Items pertaining to drunk driving

Chronicles of Dental Research

Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES

Chapter 3. Psychometric Properties

CHAPTER III RESEARCH METHODOLOGY

QUESTIONS & ANSWERS: PRACTISING DENTAL HYGIENISTS and STUDENTS

Registered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics

By Hui Bian Office for Faculty Excellence

IDEA Technical Report No. 20. Updated Technical Manual for the IDEA Feedback System for Administrators. Stephen L. Benton Dan Li

Ursuline College Accelerated Program

Steps in establishing reliability and validity of need assessment questionnaire on life skill training for adolescents

individual differences strong situation interactional psychology locus of control personality general self-efficacy trait theory self-esteem

Influences of IRT Item Attributes on Angoff Rater Judgments

2 Types of psychological tests and their validity, precision and standards

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean

Examining the Psychometric Properties of The McQuaig Occupational Test

APA STANDARDS OF PRACTICE (Effective October 10, 2017)

CRITICAL APPRAISAL OF CLINICAL PRACTICE GUIDELINE (CPG)

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide

BIOLOGY. The range and suitability of the work submitted

Reliability, validity, and all that jazz

APA STANDARDS OF PRACTICE (Effective September 1, 2015)

The Assessment in Advanced Dementia (PAINAD) Tool developer: Warden V., Hurley, A.C., Volicer, L. Country of origin: USA

Process of a neuropsychological assessment

Any phenomenon we decide to measure in psychology, whether it is

Report of Recovery Star Research Seminar

HPS301 Exam Notes- Contents

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

UNIVERSITY OF CALGARY. Reliability & Validity of the. Objective Structured Clinical Examination (OSCE): A Meta-Analysis. Ibrahim Al Ghaithi A THESIS

In this module we will cover Correla4on and Validity.

Certification of Airport Security Officers Using Multiple-Choice Tests: A Pilot Study

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Devils, Details, and Data: Measurement Models and Analysis Strategies for Novel Technology-Based Clinical Outcome Assessments

Self Report Measures

Transcription:

Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness, validity reliability and general performance of each examination, and that all identified deficiencies are corrected. 1

Survey Question What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? 2

Terms Fairness equal opportunity for success provided to each candidate in the certification process Note 1 to entry: Fairness includes freedom from bias in examinations Validity evidence that the assessment measures what it is intended to measure, as defined by the certification scheme Reliability indicator of the extent to which examinations scores are consistent across different examination times and locations, different examination forms, and different examiners. General Performance 3

Fairness What kind of data exists to show that the examination process is fair? Does the CAB look at data associated with various language and cultural groups to ensure that some particular group is not performing worse on the examination form Bias review of test questions and test process Differential Item Functioning (DIF) 4

Validity Many types of validity Content validity Concurrent validity Predictive validity Face validity 5

Validity Content validity the most important type of validity for most certification and licensure programs is probably that of content validity. is a logical process where connections between the test items and the job-related tasks are established. is typically estimated by gathering a group of subject matter experts (SMEs) together to review the test items. The SMEs are asked to indicate whether or not they agree that each item is appropriately matched to the content area indicated. Any items that the SMEs identify as being inadequately matched to the test blueprint, or flawed in any other way, are either revised or dropped from the test. 6

Validity Concurrent validity a statistical method using correlation. Examinees who are known to be either masters or non-masters on the content measured by the test are identified, and the test is administered to them under realistic exam conditions. Once the tests have been scored, the relationship is estimated between the examinees' known status as either masters or non-masters and their classification as masters or non-masters (i.e., pass or fail) based on the test. This type of validity provides evidence that the test is classifying examinees correctly. The stronger the correlation is, the greater the concurrent validity of the test is. the degree to which the examination results correlates with other measures of the same construct measured at the same time. If an examination results is compared to another similar examination result, with the results correlate? 7

Validity Predictive validity the relationship of test scores to an examinee's future performance as a master or non-master that is estimated. the degree to which the examination can predict (correlate with) the performance or knowledge (the construct) measured at some time in the future. the degree to which the test predict examinees' future status as masters or non-masters?" correlation computed is between the examinees' classifications as master or non-master based on the test and their later performance, perhaps on the job. 8

Validity Face validity determined by a review of the items and not through the use of statistical analyses. anyone who looks over the test, including examinees and other stakeholders, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure. Does it appear to candidates to look like it measures what it is supposed to measure? If we asked parents if a spelling test for school children is a good test, this is an example of face validity. 9

Validity Validity is on a continuum. More or less evidence for validity of an assessment process. 10

11

Reliability has to do with the consistency, or reproducibility, of an examinee's performance on the test. For example, if you were to administer a test with high reliability to an examinee on two occasions, you would be very likely to reach the same conclusions about the examinee's performance both times. A test with poor reliability, on the other hand, might result in very different scores for the examinee across the two test administrations. 12

Reliability Many types of Reliability Test-retest reliability Parallel forms reliability Decision consistency Internal consistency Interrater reliability 13

Reliability Test-retest reliability administer a test form on two separate occasions only a few days or a few weeks apart the time should be short enough so that the examinees' skills in the area being assessed have not changed through additional learning. relationship between the examinees' scores from the two different administrations is estimated, through statistical correlation, to determine how similar the scores are. This type of reliability demonstrates the extent to which a test is able to produce stable, consistent scores across time. 14

Reliability Parallel forms reliability parallel forms are all constructed to match the test blueprint and to be similar in average item difficulty. Parallel forms reliability is estimated by administering both forms of the exam to the same group of examinees. While the time between the two test administrations should be short, it does need to be long enough so that examinees' scores are not affected by fatigue. The examinees' scores on the two test forms are correlated in order to determine how similarly the two test forms function. This reliability estimate is a measure of how consistent examinees' scores can be expected to be across test forms. 15

Reliability Decision consistency For many criterion referenced tests (CRTs) a more useful way to think about reliability may be in terms of examinees' classifications. For example, a typical CRT will result in an examinee being classified as either a master or non-master; the examinee will either pass or fail the test. It is the reliability of this classification decision that is estimated in decision consistency reliability. If an examinee is classified as a master on both test administrations, or as a non-master on both occasions, the test is producing consistent decisions. This approach can be used either with parallel forms or with a 16 single form administered twice in test-retest fashion.

Reliability Internal consistency The internal consistency method estimates how well the set of items on a test correlate with one another; that is, how similar the items on a test form are to one another. Many test analysis software programs produce this reliability estimate automatically. Quantifiable. Kuder-Richardson or coefficient alpha measure of test reliability 17

Reliability Interrater reliability When a test includes performance tasks, or other items that need to be scored by human raters, then the reliability of those raters must be estimated. This reliability method asks the question, "If multiple raters scored a single examinee's performance, would the examinee receive the same score. Interrater reliability provides a measure of the dependability or consistency of scores that might be expected across raters. 18

WHAT???? 19

Testing of People is a Science Psychometrics technique of mental measurement (Merriam-Webster) The measurement of mental traits, abilities, and processes. (Dictionary.com) Field of study concerned with the theory and technique of measurement... of skills, knowledge, abilities, attitudes... (Wikipedia) 20

Testing of People is a Science You wouldn t try to fly a plane on your own or do brain surgery on your own 21

Testing of People is a Science If you are struggling doing this on your own, find a psychometric or measurement expert in your country to help you (universities, etc. have experts). 22

Questions? 23