Psychological testing

Size: px
Start display at page:

Download "Psychological testing"

Transcription

1 Psychological testing Lecture 12 Mikołaj Winiewski, PhD

2 Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above)

3 Content Validation Defining all aspects of the construct and create test items Derived from theory or based on purpose of the test Content of the item is of the primary importance Consulting experts about the constructs using qualitative methods Employ expert s as judges to assess each potential item using quantitative measures Perform psychometric analyses of items

4 Empirical Keying Create test items to measure one or more traits Derived from theory or based on purpose of the test Content of the item is not of the primary importance Administer test items to a criterion and control group Select items that best distinguish between these two groups

5 Factor Analisys Create test items to measure one or more traits derived from theory Content of the item is of the primary importance Administer test items to appropriate sample derived from population of interest large (depending on technique and base no of items) possibly representative Employ factor analysis (family of correlational techniques used to determine underlying structure of data)

6 Mixed approach Employing mixed strategy For example Defining all aspects of the construct cretin test items Employ expert s as judges to assess each potential item or use experts to create item pool Parallel Employ factor analysis Administer test items to a criterion and control group

7 Adaptation Producing (adjusting existing instruments) instruments that measure target constructs adequately in target cultures Using set of procedures and techniques to create equivalent tool Purpose / application - central issue in adaptations.

8 Main Applications of Translations/Adaptations Comparative Studies (diagnosis & research) Focus: Comparison of construct or mean scores across cultures Strategy: Maximizing comparability Studies in target culture (diagnosis & research) Focus: validity in new context Strategy: Maximizing local suitability

9 Considerations cultural equivalence Psychological theory / dimensions Psychological concepts / terms Behavioral indicators procedures

10 Considerations test equivalence Face equivalence (superficial) Psychometrical statistics, validity and reliability Psychological - functional Translation Construction

11 Adoption / translation Not only language! Literal/close translation: What is the name of the queen of the England? Problem: Item more difficult for American children than for English children Adaptation: What is the name of the president of the USA? Problem: Queen and president are not equally known in their respective countries

12 equivalence Words linguistic Meanings - psychological

13 Linguistic Equivalence (Broader than similarity of words) Linguistic equivalence refers to similarity of linguistic features of a text. Examples of relevant linguistic features are: Lexical similarity Grammatical accuracy In general: emphasis on formal-textual characteristics (cf. automatic translations)

14 Psychological Equivalence Psychological equivalence refers to similarity of (psychological) meaning and scores Similarity in a broad sense: Textual, e.g., Connotation of words, implied context of text Comprehensibility Metrical: Score comparability

15 Relationship between Two Perspectives Three possible relations between linguistic and psychological features, depending on the overlap: a. complete b. partial c. none psych. linguistic Translatable Poorly translatable Essentially non-translatable

16 Cultural adaptation Options / strategies Adoption / transcription (Close literal translation) Advantage: maintains metric equivalence Disadvantage: adequacy (too) readily assumed, should be demonstrated Adaptation translation travesty paraphrase Advantage: more flexible, more tailored to the context Disadvantage: fewer statistical techniques available to compare scores across cultures Assembly (re-assembly) (composing a new instrument) Advantage: very flexible Disadvantage: almost no comparability maintained

17 Adoption / transcription Literal translation of all items Focus: extreme translation fidelity Assumption: universality of constructs and behaviors Pros: metric equivalence possibility of straightforward comparisons Cons: language and psychometric problems

18 Adaptation: translation Faithful translation of original pool of items with possible changes Focus: translation fidelity Assumption: universality of constructs and behaviors, but not language Pros: better psychometric properties better construct and ecological validity Cons: Fewer comparison options Still some language and psychometric problems

19 Adaptation: travesty Free translation of original pool of items keeping meaning and changing language adjusting to language and psychological needs Focus: psychological meaning Assumption: universality of constructs but not language and possible cultural differences in behaviors Pros: better cultural adjustment less metric equivalence but still pretty good better psychometric properties Cons: Few comparison options Major differences between versions of the tests

20 Adaptation: paraphrase Creating new tool using original items as inspiration rather than base Focus: psychological meaning Assumption: universality of constructs but not behaviors and language Pros: good cultural adjustment good psychometric properties cultural equivalence Cons: No metric equivalence Major differences between versions of the tests

21 Assembly (re-assembly) Composing new instrument using original theoretical model and development strategy Focus: adaptation of tool and theory Assumption: no cultural universality of behaviors and language and possible differences in constructs Pros: Best cultural adjustment Cons: No metric equivalence Two different tools

22 Item Analysis

23 Purpose of Item Analysis Evaluates the quality of each item Rationale: the quality of items determines the quality of test (i.e., reliability & validity) May suggest ways of improving the measurement of a test Can help with understanding why certain tests predict some criteria but not others

24 Item Analysis When analyzing the test items, we have several questions about the performance of each item. Some of these questions include: Are the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be discarded?

25 Types of Item Analyses for CTT Three major types: 1. Assess quality of the distractors 2. Assess difficulty of the items 3. Assess how well an item differentiates between high and low performers

26 DISTRACTOR ANALYSIS 1) Question DISTRACTORS A. Multiple-Choice B. Multiple-Choice Correct answer C. Multiple-Choice D. Multiple-Choice

27 Distractor Analysis First question of item analysis: How many people choose each response? If there is only one best response, then all other response options are distractors. Example (N = 35): Which method has the best internal consistency? # a) projective test 1 b) peer ratings 1 c) forced choice 21 d) differences n.s. 12

28 Distractor Analysis A perfect test item would have 2 characteristics: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers. It is not desirable to have one of the distracters chosen more often than the correct answer. This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.

29 Distractor Analysis (cont d) Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1). # of Persons Exp. To Choose Distractor N answering incorrectly 14 Number of distractors 3 = = 4.7

30 Distractor Analysis (cont d) When the number of persons choosing a distractor significantly exceeds the number expected, there are 2 possibilities: 1. It is possible that the choice reflects partial knowledge 2. The item is a poorly worded trick question unpopular distractors may lower item and test difficulty because it is easily eliminated extremely popular is likely to lower the reliability and validity of the test

31 Item Difficulty Percentage of test takers who respond correctly What if p =.00 What if p = 1.00?

32 Item Difficulty An item with a p value of.0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless When comparing 2 test scores, we are interested in who had the higher score or the differences in scores p value of.5 have most variation so seek items in this range and remove those with extreme values can also be examined to determine proportion answering in a particular way for items that don t have a correct answer

33 Item Difficulty (cont.) What is the best p-value? most optimal p-value =.50 maximum discrimination between good and poor performers Should we only choose items of.50? When shouldn t we?

34 Item Difficulty (cont.) Should we only choose items of.50? Not necessarily... When wanting to screen the very top group of applicants (i.e., admission to university or medical school). Cutoffs may be much higher Other institutions want a minimum level (i.e., minimum reading level) Cutoffs may be much lower

35 Item Difficulty (cont d) General Rules of Item Difficulty p low (<.20) p moderate ( ) p high (>.80) difficult test item moderately diff. easy item

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Culture & Survey Measurement. Timothy Johnson Survey Research Laboratory University of Illinois at Chicago

Culture & Survey Measurement. Timothy Johnson Survey Research Laboratory University of Illinois at Chicago Culture & Survey Measurement Timothy Johnson Survey Research Laboratory University of Illinois at Chicago What is culture? It is the collective programming of the mind which distinguishes the members of

More information

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM FORM A THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM This document consists of two sections. Please complete section 1 if

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS 1 SARITA DESHPANDE, 2 RAVINDRA KUMAR PRAJAPATI 1 Professor of Education, College of Humanities and Education, Fiji National University, Natabua,

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

A Broad-Range Tailored Test of Verbal Ability

A Broad-Range Tailored Test of Verbal Ability A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from

More information

Finnish Sign Language as a Foreign Language (K2)

Finnish Sign Language as a Foreign Language (K2) Finnish Sign Language as a Foreign Language (K2) The composition of studies Basic studies (30 pts) are common to all major and minor students. The scope of subject studies is 50 pts for major students,

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses Presented by Applied Measurement Professionals, Inc. Copyright 2011 by Applied Measurement Professionals, Inc.

More information

A New Approach to Examining Validity

A New Approach to Examining Validity Nov. 2006, Volume 4, No.11 (Serial No.38) US -China Foreign Language, ISSN1539-8080, US A A New Approach to Examining Validity Test-taking Strategy Investigation HE Xue-chun * (Foreign Languages Department,

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Psychological testing

Psychological testing Psychological testing Lecture 11 Mikołaj Winiewski, PhD Marcin Zajenkowski, PhD Strategies for test development and test item considerations The procedures involved in item generation, item selection,

More information

PÄIVI KARHU THE THEORY OF MEASUREMENT

PÄIVI KARHU THE THEORY OF MEASUREMENT PÄIVI KARHU THE THEORY OF MEASUREMENT AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition

More information

DAT Next Generation. FAQs

DAT Next Generation. FAQs DAT Next Generation FAQs DAT TM Next Generation Frequently Asked Questions What does DAT Next Generation measure? The Differential Aptitude Tests, or DAT for short, are a battery of tests designed to assess

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

California Subject Examinations for Teachers

California Subject Examinations for Teachers California Subject Examinations for Teachers TEST GUIDE AMERICAN SIGN LANGUAGE SUBTEST III Subtest Description This document contains the World Languages: American Sign Language (ASL) subject matter requirements

More information

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description Archives of Clinical Neuropsychology 19 (2004) 703 708 Test review Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., 2002 1. Test description The Trail Making Test

More information

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK Hanny Pradana, Gatot Sutapa, Luwandi Suhartono Sarjana Degree of English Language Education, Teacher

More information

Psychological testing

Psychological testing What is a psychological test Psychological testing An evaluative device or procedure in which a sample of an examinee s behavior in a specified domain is obtained and subsequently evaluated and scored

More information

Keep Wild Animals Wild: Wonderfully Wild!

Keep Wild Animals Wild: Wonderfully Wild! Animal Action Education English Language Arts Keep Wild Animals Wild: Wonderfully Wild! U.S. Standards Correlation Ages 5 7 Reading Key Ideas and Details CCRA.R.1: Read closely to determine what the text

More information

A framework for predicting item difficulty in reading tests

A framework for predicting item difficulty in reading tests Australian Council for Educational Research ACEReSearch OECD Programme for International Student Assessment (PISA) National and International Surveys 4-2012 A framework for predicting item difficulty in

More information

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that it purports to perform. Does an indicator accurately

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1 Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola 1.1-1 Chapter 1 Introduction to Statistics 1-1 Review and Preview 1-2 Statistical Thinking 1-3

More information

Cross-validation of easycbm Reading Cut Scores in Washington:

Cross-validation of easycbm Reading Cut Scores in Washington: Technical Report # 1109 Cross-validation of easycbm Reading Cut Scores in Washington: 2009-2010 P. Shawn Irvin Bitnara Jasmine Park Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published

More information

ROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)

ROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) ROC Curve Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) 580342 ROC Curve The ROC Curve procedure provides a useful way to evaluate the performance of classification

More information

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

Process of a neuropsychological assessment

Process of a neuropsychological assessment Test selection Process of a neuropsychological assessment Gather information Review of information provided by referrer and if possible review of medical records Interview with client and his/her relative

More information

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models SISCR Module 7 Part III: Comparing Two Risk Models Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington Outline of Part III 1. How to compare two risk models 2.

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK April 2016 Authorized for Distribution by the New York State Education Department This test design and framework document is

More information

DATA GATHERING METHOD

DATA GATHERING METHOD DATA GATHERING METHOD Dr. Sevil Hakimi Msm. PhD. THE NECESSITY OF INSTRUMENTS DEVELOPMENT Good researches in health sciences depends on good measurement. The foundation of all rigorous research designs

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

ʻThe concept of Deaf identity in Sloveniaʼ

ʻThe concept of Deaf identity in Sloveniaʼ 28.09.2013 ʻThe concept of Deaf identity in Sloveniaʼ DAMJANA KOGOVŠEK Faculty of Education, University of Ljubljana, Slovenia damjana.kogovsek@pef.uni-lj.si COLLABORATION with and MANY THANKS to The Ljubljana

More information

Handout 5: Establishing the Validity of a Survey Instrument

Handout 5: Establishing the Validity of a Survey Instrument In this handout, we will discuss different types of and methods for establishing validity. Recall that this concept was defined in Handout 3 as follows. Definition Validity This is the extent to which

More information

Gezinskenmerken: De constructie van de Vragenlijst Gezinskenmerken (VGK) Klijn, W.J.L.

Gezinskenmerken: De constructie van de Vragenlijst Gezinskenmerken (VGK) Klijn, W.J.L. UvA-DARE (Digital Academic Repository) Gezinskenmerken: De constructie van de Vragenlijst Gezinskenmerken (VGK) Klijn, W.J.L. Link to publication Citation for published version (APA): Klijn, W. J. L. (2013).

More information

Publishing Your Study: Tips for Young Investigators. Learning Objectives 7/9/2013. Eric B. Bass, MD, MPH

Publishing Your Study: Tips for Young Investigators. Learning Objectives 7/9/2013. Eric B. Bass, MD, MPH Publishing Your Study: Tips for Young Investigators Eric B. Bass, MD, MPH Learning Objectives To apply a logical approach to organizing & presenting your work in a manuscript To recognize the importance

More information

Chapter 6. Methods of Measuring Behavior Pearson Prentice Hall, Salkind. 1

Chapter 6. Methods of Measuring Behavior Pearson Prentice Hall, Salkind. 1 Chapter 6 Methods of Measuring Behavior 2009 Pearson Prentice Hall, Salkind. 1 CHAPTER OVERVIEW Tests and Their Development Types of Tests Observational Techniques Questionnaires 2009 Pearson Prentice

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

On the diversity principle and local falsifiability

On the diversity principle and local falsifiability On the diversity principle and local falsifiability Uriel Feige October 22, 2012 1 Introduction This manuscript concerns the methodology of evaluating one particular aspect of TCS (theoretical computer

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

32.5. percent of U.S. manufacturers experiencing unfair currency manipulation in the trade practices of other countries.

32.5. percent of U.S. manufacturers experiencing unfair currency manipulation in the trade practices of other countries. TECH 646 Analysis of Research in Industry and Technology PART III The Sources and Collection of data: Measurement, Measurement Scales, Questionnaires & Instruments, Sampling Ch. 11 Measurement Lecture

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information

Testing & Assessment Techniques

Testing & Assessment Techniques Testing & Assessment Techniques Dayo Odukoya dayoodukoya@gmail.com 09096505735 Content/Objectives Concepts of Test and Assessment Types of Tests Test Development & Standardization Test Blueprint Item Generation

More information

How Many Options do Multiple-Choice Questions Really Have?

How Many Options do Multiple-Choice Questions Really Have? How Many Options do Multiple-Choice Questions Really Have? ABSTRACT One of the major difficulties perhaps the major difficulty in composing multiple-choice questions is the writing of distractors, i.e.,

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH

ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH ORIGINS AND DISCUSSION OF EMERGENETICS RESEARCH The following document provides background information on the research and development of the Emergenetics Profile instrument. Emergenetics Defined 1. Emergenetics

More information

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington SISCR Module 4 Part III: Comparing Two Risk Models Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington Outline of Part III 1. How to compare two risk models 2.

More information

Ch. 11 Measurement. Paul I-Hai Lin, Professor A Core Course for M.S. Technology Purdue University Fort Wayne Campus

Ch. 11 Measurement. Paul I-Hai Lin, Professor  A Core Course for M.S. Technology Purdue University Fort Wayne Campus TECH 646 Analysis of Research in Industry and Technology PART III The Sources and Collection of data: Measurement, Measurement Scales, Questionnaires & Instruments, Sampling Ch. 11 Measurement Lecture

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

U.S. Standards Correlation Young Reader Grades 3 5

U.S. Standards Correlation Young Reader Grades 3 5 Primary Level - Grades 3-5 Animal Action Education Cats, Dogs, and Us English Language Arts Reading Cats, Dogs, and Us U.S. Standards Correlation Young Reader Grades 3 5 Key Ideas and Details 1. Read closely

More information

Assessing risk of bias

Assessing risk of bias Assessing risk of bias Norwegian Research School for Global Health Atle Fretheim Research Director, Norwegian Institute of Public Health Professor II, Uiniversity of Oslo Goal for the day We all have an

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

Houghton Mifflin Harcourt. Participant s Guide Distractor Rationales Fall 2012 User s Conference By Christina Fritz

Houghton Mifflin Harcourt. Participant s Guide Distractor Rationales Fall 2012 User s Conference By Christina Fritz Houghton Mifflin Harcourt Participant s Guide Distractor Rationales Fall 2012 User s Conference By Christina Fritz Topics for Discussion High Quality Items Anatomy of a Multiple Choice Item Types of Distractors

More information

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps

Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps Chapter 3 Tools for Practical Theorizing: Theoretical Maps and Ecosystem Maps Chapter Outline I. Introduction A. Understanding theoretical languages requires universal translators 1. Theoretical maps identify

More information

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012 Introduction: Speaker PhD in Educational Measurement University of Nebraska-Lincoln October 28, 2012 CRITICAL TESTING AND MEASUREMENT CONCEPTS: ASSESSMENT PROFESSIONALS 13 years experience HE Assessment

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Variables in Research 9/20/2005 P767 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 9/20/2005 P767

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Chapter Three: Sampling Methods

Chapter Three: Sampling Methods Chapter Three: Sampling Methods The idea of this chapter is to make sure that you address sampling issues - even though you may be conducting an action research project and your sample is "defined" by

More information

Principles of Sociology

Principles of Sociology Principles of Sociology DEPARTMENT OF ECONOMICS ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS [Academic year 2017/18, FALL SEMESTER] Lecturer: Dimitris Lallas Principles of Sociology 4th Session Sociological

More information

2016 Technical Report National Board Dental Hygiene Examination

2016 Technical Report National Board Dental Hygiene Examination 2016 Technical Report National Board Dental Hygiene Examination 2017 Joint Commission on National Dental Examinations All rights reserved. 211 East Chicago Avenue Chicago, Illinois 60611-2637 800.232.1694

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Ch. 11 Measurement. Measurement

Ch. 11 Measurement. Measurement TECH 646 Analysis of Research in Industry and Technology PART III The Sources and Collection of data: Measurement, Measurement Scales, Questionnaires & Instruments, Sampling Ch. 11 Measurement Lecture

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

AP STATISTICS 2008 SCORING GUIDELINES (Form B) AP STATISTICS 2008 SCING GUIDELINES (Form B) Question 2 Intent of Question The primary goals of this question were to assess a student s ability to (1) recognize an unbiased estimator and explain why the

More information

Work, Employment, and Industrial Relations Theory Spring 2008

Work, Employment, and Industrial Relations Theory Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 15.676 Work, Employment, and Industrial Relations Theory Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction

More information

CHAPTER III METHODOLOGY

CHAPTER III METHODOLOGY CHAPTER III METHODOLOGY This chapter discusses things related to the way this study is conducted. Research design, data collection methods which consist of population, sample and setting, research instruments,

More information

Measurement Issues in Concussion Testing

Measurement Issues in Concussion Testing EVIDENCE-BASED MEDICINE Michael G. Dolan, MA, ATC, CSCS, Column Editor Measurement Issues in Concussion Testing Brian G. Ragan, PhD, ATC University of Northern Iowa Minsoo Kang, PhD Middle Tennessee State

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

INSPECT Overview and FAQs

INSPECT Overview and FAQs WWW.KEYDATASYS.COM ContactUs@KeyDataSys.com 951.245.0828 Table of Contents INSPECT Overview...3 What Comes with INSPECT?....4 Reliability and Validity of the INSPECT Item Bank. 5 The INSPECT Item Process......6

More information

Item Analysis for Beginners

Item Analysis for Beginners Item Analysis for Beginners John Kleeman Questionmark Executive Director and Founder All rights reserved. Questionmark is a registered trademark of Questionmark Computing Limited. All other trademarks

More information

Interpretation in neuropsychological assessment

Interpretation in neuropsychological assessment Interpretation in neuropsychological assessment What does interpretation of a neuropsychological test involve? What you need to consider in interpretation? hat do we mean by nterpreta1on? Distinction between

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Non-profit education, research and support network offers money in exchange for missing science

Non-profit education, research and support network offers money in exchange for missing science Alive & Well $50,000 Fact Finder Award Find One Study, Save Countless Lives Non-profit education, research and support network offers money in exchange for missing science http://www.aliveandwell.org Tel

More information

Multiple Act criterion:

Multiple Act criterion: Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus

More information

Special guidelines for preparation and quality approval of reviews in the form of reference documents in the field of occupational diseases

Special guidelines for preparation and quality approval of reviews in the form of reference documents in the field of occupational diseases Special guidelines for preparation and quality approval of reviews in the form of reference documents in the field of occupational diseases November 2010 (1 st July 2016: The National Board of Industrial

More information

Competency Rubric Bank for the Sciences (CRBS)

Competency Rubric Bank for the Sciences (CRBS) Competency Rubric Bank for the Sciences (CRBS) Content Knowledge 1 Content Knowledge: Accuracy of scientific understanding Higher Order Cognitive Skills (HOCS) 3 Analysis: Clarity of Research Question

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Reliability AND Validity. Fact checking your instrument

Reliability AND Validity. Fact checking your instrument Reliability AND Validity Fact checking your instrument General Principles Clearly Identify the Construct of Interest Use Multiple Items Use One or More Reverse Scored Items Use a Consistent Response Format

More information

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1 SLEEP DISTURBANCE A brief guide to the PROMIS Sleep Disturbance instruments: ADULT PROMIS Item Bank v1.0 Sleep Disturbance PROMIS Short Form v1.0 Sleep Disturbance 4a PROMIS Short Form v1.0 Sleep Disturbance

More information

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Making a psychometric. Dr Benjamin Cowan- Lecture 9 Making a psychometric Dr Benjamin Cowan- Lecture 9 What this lecture will cover What is a questionnaire? Development of questionnaires Item development Scale options Scale reliability & validity Factor

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

Session 401. Creating Assessments that Effectively Measure Knowledge, Skill, and Ability

Session 401. Creating Assessments that Effectively Measure Knowledge, Skill, and Ability Practical and Effective Assessments in e-learning Session 401 Creating Assessments that Effectively Measure Knowledge, Skill, and Ability Howard Eisenberg, Questionmark Effectively Measuring Knowledge,

More information

Psych 1Chapter 2 Overview

Psych 1Chapter 2 Overview Psych 1Chapter 2 Overview After studying this chapter, you should be able to answer the following questions: 1) What are five characteristics of an ideal scientist? 2) What are the defining elements of

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information