Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison
|
|
- Madeline Copeland
- 5 years ago
- Views:
Transcription
1 Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison
2 Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological traits, abilities, and attitudes.
3 Purpose of Session Introduce several key psychometric concepts and gain an appreciation for the theoretical underpinnings of standardized tests. Scales, Norms, and Equating Validity Reliability Test Theory (as time permits) Classical Test Theory Item Response Theory
4 Poll Question Because of my comfort-level with psychometrics, I am to discuss measurement-related issues with faculty and students. A. eager B. willing C. hesitant D. unwilling
5 SECTION I Scales, Norms, and Equating
6 Measurement and Scaling Measurement is the process of assigning scores in a systematic and coherent way For purposes of reporting, these scores are often transformed in some way to facilitate interpretations Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees.
7 Why Scale? Imagine three students (A, B, and C) take a standardized test Student A answered 20 of 30 items correct. Student B answered 20 of 29 items correct. Student C answered 21 of 30 items correct. What can we say about the achievement level of these three students?
8 What can we say about A, B, and C? A: 20/30 B: 20/29 C: 21/30 Suppose you learned that the students completed different test forms? the 3 hardest items were all on Test A? the 12 easiest items were all on Test C? The overall difficulty levels of two different tests are rarely identical Even if tests are of equal average difficulty, they may be differently difficult for students at different levels.
9 The Rationale Behind Scaling Raw scores (number correct scores) depend on the items on the test and do not have consistent meaning across forms. Same is true for percentage correct scores Makes score interpretations very difficult
10 Score Scales The score scale is the metric which is actually used for purposes of reporting scores to users. Moving from raw scores to the score scale involves either a linear or non-linear transformation The transformed scores themselves are called scaled scores or derived scores
11 Common Scales
12 Common Scales 16%
13 Advantage of Measurement Scales Standardization Scale must not measure differently depending on what it is that s being measured Pieces, bites, handfuls, and number/percent correct Pounds, inches, ºF, level on construct Without a standardized reporting metric, direct comparisons are impossible
14 Transforming between two scales Mean = 67 St. Dev = 8.5 Mean = 72 St. Dev =
15 Transforming between two scales Mean = 67 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 1. Make means equal Add (72 67) = 5 to all blue scores
16 Transforming between two scales Mean = 72 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 1. Make means equal Add (72 67) = 5 to all blue scores
17 Transforming between two scales Mean = 72 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 2. Make St. devs equal Multiply all blue scores by (5.7/8.5)
18 Transforming between two scales Mean = 72 St. Dev = 5.7 Mean = 72 St. Dev = Linear transformation 2. Make St. devs equal Multiply all blue scores by (5.7/8.5)
19 Any set of test scores can easily be transformed to some other metric. This allows for direct norm-referenced comparisons Candidate A scored 82 on red test (96 th percentile) Candidate B scored 82 on blue test (96 th percentile)
20 Transforming between two scales Mean = 72 St. Dev = 5.7 Mean = 72 St. Dev = 5.7 Score = th percentile
21 Transforming between two scales Are these two students comparable? Score = th percentile
22 Poll Question Are these two students comparable? A. Yes B. No C. It cannot be determined
23 Are two students comparable? We don t know. Students scaled scores and percentile ranks are relative to other students who completed that same form. If the populations of test takers were different, it is quite likely that the examinees are not of equal ability.
24 SAT GRE GED 600 SAT = 600 GRE = 600?
25 Norming Perform initial scaling on a single (base) test form (calibration) The sample completing the base form should be large and representative of the target population The sample taking the base form is used as a reference point for purposes of comparison with all subsequent samples Norming
26 Norm Group The norm group is the group of individuals for whom the test scale was established SAT is scaled to have an average of 500 the average of the norm group was 500
27 Test Equating Need to transform the data so that the scores candidates receive are the same scores they would have received if they Were part of the normative sample, and Had been administered the base form This process, known as equating, ensures that test scores have identical meaning across administrations, even as items and populations change.
28 Equating The process of determining the transformation to convert between the raw score metric and the reporting metric (based on the norm group) Equating is a topic worthy of a full-length graduate-level course Requires comparison across common elements between base form and new form : Common items Assumed randomly equivalent populations Very difficult assumption to make across years
29 Simple Equating Design Base Test Common New Test Items Average Std. Dev After Equating In much the same way we did before, we can now align these two assessments using only data from common items Add ( ) = 15 to all New Test scores Multiple all New Test scores by 98 / 105 = 0.93
30 SECTION I Scales, Norms, and Equating Questions?
31 SECTION II Validity
32 Definitions of Validity Formal Definition of Validity: Degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores. Less Formal Definition of Validity: Degree to which the inferences made from a test (e.g., that the student knows the material of interest, is competent to practice in discipline, is likely to be successful in training/school, etc.) are justified and accurate. Informal Definition of Validity: A test is valid if it measures what it s supposed to measure
33 Problem with the Informal Definition Informal Definition of Validity: A test is valid if it measures what it s supposed to measure Tests cannot be valid Tests are valid only for specific purposes Placement/Admissions exams for exemption VSA Testing (differences in target population) Validity is a matter of degree
34 Measuring Educ/Psychological Variables Unlike physical attributes (height, weight, etc.), educational and psychological attributes cannot be directly observed, hence cannot be directly measured. To the extent that a variable is abstract and latent rather than concrete and observable, it is called a construct. In order to move from construct formation to measurement, the construct which is to be measured must first be operationally defined.
35 Embodiment of a Construct Operationalize What behaviors comprise the construct? How does this construct relate to or distinguish itself from other constructs? Plan How will samples of these behaviors be obtained? Instrument development Develop a standard procedure for obtaining the sample of behaviors from the specified domain. Measurement
36 Test Validation The degree to which the evidence supports the claim that the assessment measures the intended construct. Three types of validity? Construct Validity Criterion-Related Validity Content Validity No, just one. It s really all about Construct Validity
37 Assessing Construct Validity Target group differences Is there a logical differentiation between groups? e.g., placement test math scores for students who completed HS Calculus vs. those who only completed HS Algebra Correlational studies between test and related (or unrelated) measures Convergent validity: Does test correlate with other measures that are theoretically related? ACT and SAT, Compass & Accuplacer, Different IQ tests Divergent validity: Does test fail to correlate with other measures that are theoretically unrelated? ACT and Stanford Binet Math and English placement scores
38 Assessing Construct Validity Factor Analysis Statistical procedure to empirically test whether performance on observed variables (items) can be explained by a smaller number of unobserved constructs. Dimensionality assessment Does empirical structure match theoretical structure? Can also assess whether clusters of items are related in ways that are expected. Are items that are intended to measure the same subscores (e.g., trigonometry, algebra, etc.) more similar to each other than to other items?
39 Assessing Construct Validity Content Validity The extent to which the set of items on the test are representative of and relevant to the construct Items should cover the breadth and depth of the construct Weight assigned to each content area should reflect importance of that content area within construct For employment and certification exams, often necessary to conduct a practice analysis Panels of content experts are often utilized to assess relevance of items
40 Assessing Construct Validity Criterion-Related Validity Examines the relationship of the test results to other variables/criteria external to the test Predictive The extent to which an individual s future level on the criterion can be predicted from prior test performance Correlation between ACT/SAT scores and first year GPA Concurrent The extent to which test scores estimate an individual s present standing on the criterion. Correlation between Prior Learning Assessment and final course grade
41 SECTION II Validity Questions?
42 SECTION III Reliability
43 A Game of Darts Validity: Confidence that the test will hit the bullseye Reliability: Confidence that any one dart is a good predictor of where next dart would go. Clustering the Darts together
44 Unreliable
45 Reliable, but not valid
46 Reasonable reliable and valid
47 Highly reliable and valid
48 Reliability and Validity A test cannot be valid (for any purpose) unless it is reliable. Validity: Confidence that the test will hit the bullseye Not that it will average out to the bullseye
49 Working Definitions of Reliability The degree to which a test is consistent and stable in measuring what it is intended to measure Measurement repeatability Will an examinee score similarly when administered an independent alternate form of the test administered under the same conditions and with no opportunity for learning (or forgetting)?
50 Understanding Reliability No two tests will consistently produce identical results. All test scores contain some random error Observed Score = True Score + Random Error = Signal + Noise This equation is often written as X = T + E
51 What is Random Error? Any non-systematic source of variance that is unrelated to the construct of interest. Examinee-specific factors Motivation Concentration Fatigue Boredom Test-specific factors Specific questions Ambiguous items Memory lapses Carelessness Luck in guessing Clarity of directions Reading load of items Scoring-specific factors Non-uniform scoring Carelessness Computational errors
52 Formal Definition of Reliability X = T + E A measure of the extent to which an examinee s score reflects their true score, (as opposed to random measurement error) Reliability = Variance True / Variance Observed = 1 Variance Error / Variance Observed Test with reliability of.80 contains 20% random error
53 Reliability and the SEM If reliability is a measure of the stability of measurement, the standard error of measurement (SEM) provides a measure of the instability of measurement. SEM = (st. dev.) 1 Provides a measure of the expected variability in an individual s score (X i ) upon retesting. Score Interval Probability of score falling in interval X i ± SEM 68% X i ± 2 SEM 95%
54 Why Care About Reliability? Measurement error is random its effect on a student s test score is unpredictable. In an unreliable test, students scores consist largely of measurement error. An unreliable test offers no advantage over randomly assigning test scores to students. Reliability is a necessary precursor to validity
55 Estimating Reliability Test-Retest Reliability Administer the same exam to the same group of candidates and correlate the scores Interval should be short enough for no learning, and long enough for no remembering Parallel/Alternate Forms Reliability 1. Develop equivalent forms of the test. 2. Have examinees take both tests. 3. Correlate the scores.
56 Estimating Reliability: 1 Administration Split-Half Reliability 1. Split exam in two random halves 2. Correlate scores across the two halves. 3. Apply a formula to estimate reliability Internal Consistency Cronbach s Coefficient α, KR-20, KR-21 Measures of the extent to which the test items throughout a test are homogeneous α is average split-half reliability across all possible split-halves. α and KR-20 are lower-bound estimates of reliability
57 Reliability In Practice High Stakes Standardized Testing Subscores or Low-stakes tests
58 Improving Reliability Improve item quality Increase the number of points or item alternatives Increase the number of items
59 SECTION III Reliability Questions? End?
60 SECTION IV Test Theory Classical Test Theory Item Response Theory
61 Classical Test Theory X = T + E Person characteristics Total test score serves as a proxy for examinee s level on the construct Item characteristics Item difficulty is estimated as the proportion of examinees who answer an item correctly Item discrimination measures how effectively the item differentiates between high- and lowperforming examinees. Correlation between item score (1/0) and total score
62 Item Response Theory Mathematical modeling approach to test scoring and analysis Less intuitive, but more sophisticated approach Solves many problems with CTT Sample-dependency of item/exam statistics Test-dependency of total scores Tough to compare people and items Equal item weighting No good way to account for guessing
63 Trait Level vs. Prob. Correct Response 1.0 Item Probability θ (Examinee Trait Level) 63
64 An Item Characteristic Curve Probability θ (Examinee Trait Level) 64
65 Sample Independent Same Curve Probability θ (Examinee Trait Level) 65
66 Item Response Theory Directly models the probability of a candidate getting an item correct based on their overall level on the construct and item characteristics is the person s level on the construct a i, b i, and c i are item parameters corresponding to the item s discrimination, difficulty, and guessing likelihood
67 Item Difficulty Probability θ (Examinee Trait Level)
68 Item Difficulty 1.0 Probability θ (Examinee Trait Level)
69 Item Discrimination Probability = = θ (Examinee Trait Level)
70 Accounting for Guessing Probability θ (Examinee Trait Level)
71 Putting it all Together Probability θ (Examinee Trait Level)
72 Test Characteristic Curve (TCC) Describes relationship between total test score and examinee trait level (θ) TCC is obtained by adding item characteristic curves across all values of θ Each test has its own TCC
73 PT Test Characteristic Curve A form with slightly easier items will shift the TCC to the left, requiring the examinee to answer a greater number of items correctly in order to pass
74 PT Test Characteristic Curve A form with slightly harder easier items will shift the TCC to the right, left, requiring the the examinee to answer a smaller greater number of items correctly in order to pass
75 3 Hypothetical TCCs Projected Test Score θ IRT is also independent of characteristics of the specific test form Easier (Top) Anchor (Middle) Harder (Bottom)
76 IRT Summary Although dealing with raw scores is conceptually appealing, it is problematic in practice IRT overcomes many of these problems IRT difficulty and person trait estimates are scaled together Item and person parameters are properties of the items and people, and do not change across samples or test forms. Majority of programs use IRT scoring and linearly transform θ to scale of interest
77 SECTION IV Test Theory Classical Test Theory Item Response Theory Questions?
78 Thank you For more information, please contact Jim Wollack University of Wisconsin Madison
André Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationInvestigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationUsing Analytical and Psychometric Tools in Medium- and High-Stakes Environments
Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session
More informationIntroduction to Reliability
Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What
More informationPsychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals
Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe
More informationContents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD
Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationMeasurement and Descriptive Statistics. Katie Rommel-Esham Education 604
Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency
More informationReliability & Validity Dr. Sudip Chaudhuri
Reliability & Validity Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian
More informationImportance of Good Measurement
Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The
More informationVARIABLES AND MEASUREMENT
ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.
More informationExamining the Psychometric Properties of The McQuaig Occupational Test
Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,
More informationRegression. Lelys Bravo de Guenni. April 24th, 2015
Regression Lelys Bravo de Guenni April 24th, 2015 Outline Regression Simple Linear Regression Prediction of an individual value Estimate Percentile Ranks Regression Simple Linear Regression The idea behind
More informationOverview of Experimentation
The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview;
More informationInfluences of IRT Item Attributes on Angoff Rater Judgments
Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts
More informationValidity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)
Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity
More informationValidation of Scales
Validation of Scales ἀγεωμέτρητος μηδεὶς εἰσίτω (Let none enter without a knowledge of mathematics) D R D I N E S H R A M O O Introduction Validity and validation are crucial to understanding psychological
More informationTHE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION
THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationDiagnostic Classification Models
Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom
More informationChapter 3 Psychometrics: Reliability and Validity
34 Chapter 3 Psychometrics: Reliability and Validity Every classroom assessment measure must be appropriately reliable and valid, be it the classic classroom achievement test, attitudinal measure, or performance
More informationADMS Sampling Technique and Survey Studies
Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As
More informationBy Hui Bian Office for Faculty Excellence
By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys
More informationGeorgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera
Homework assignment topics 32-36 Georgina Salas Topics 32-36 EDCI Intro to Research 6300.62 Dr. A.J. Herrera Topic 32 1. Researchers need to use at least how many observers to determine interobserver reliability?
More informationHandout 5: Establishing the Validity of a Survey Instrument
In this handout, we will discuss different types of and methods for establishing validity. Recall that this concept was defined in Handout 3 as follows. Definition Validity This is the extent to which
More informationEVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS
DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main
More informationResults & Statistics: Description and Correlation. I. Scales of Measurement A Review
Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize
More informationOn the purpose of testing:
Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase
More informationValidity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)
Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity
More informationDifferential Item Functioning
Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item
More informationBasic concepts and principles of classical test theory
Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication
More informationChapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE
Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive
More informationESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS
International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT
More informationComprehensive Statistical Analysis of a Mathematics Placement Test
Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational
More informationGENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS
GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at
More informationItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides
More informationBasic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP
Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP 2 0 17 ABPP Annual Conference & Workshops S a n Diego, CA M a y 1 8, 2 017 Basic Psychometrics for The Practicing
More informationDevelopment, Standardization and Application of
American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,
More informationDetecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker
Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National
More informationEmpirical Knowledge: based on observations. Answer questions why, whom, how, and when.
INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group
More informationValidity, Reliability, and Defensibility of Assessments in Veterinary Education
Instructional Methods Validity, Reliability, and Defensibility of Assessments in Veterinary Education Kent Hecker g Claudio Violato ABSTRACT In this article, we provide an introduction to and overview
More informationResearch Questions and Survey Development
Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research
More informationIntroduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012
Introduction: Speaker PhD in Educational Measurement University of Nebraska-Lincoln October 28, 2012 CRITICAL TESTING AND MEASUREMENT CONCEPTS: ASSESSMENT PROFESSIONALS 13 years experience HE Assessment
More informationSheila Barron Statistics Outreach Center 2/8/2011
Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when
More informationPage 1 of 11 Glossary of Terms Terms Clinical Cut-off Score: A test score that is used to classify test-takers who are likely to possess the attribute being measured to a clinically significant degree
More informationSurvey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?
Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,
More informationTest Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification
What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it
More informationLecture Week 3 Quality of Measurement Instruments; Introduction SPSS
Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit Overview Quality of Measurement Instruments Introduction SPSS Read:
More information3 CONCEPTUAL FOUNDATIONS OF STATISTICS
3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical
More informationLikelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.
Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions
More informationChapter 2--Norms and Basic Statistics for Testing
Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.
More informationUSE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION
USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,
More informationThe Effect of Guessing on Item Reliability
The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct
More informationItem Analysis Explanation
Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between
More informationAnalysis of the Reliability and Validity of an Edgenuity Algebra I Quiz
Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative
More informationalternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over
More informationAMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS
AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS Attached are the performance reports and analyses for participants from your surgery program on the
More informationComparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria
Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill
More informationCritical Thinking Assessment at MCC. How are we doing?
Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment
More informationDoing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto
Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea
More informationChapter 9: Intelligence and Psychological Testing
Chapter 9: Intelligence and Psychological Testing Intelligence At least two major "consensus" definitions of intelligence have been proposed. First, from Intelligence: Knowns and Unknowns, a report of
More informationBruno D. Zumbo, Ph.D. University of Northern British Columbia
Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.
More informationMCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2
MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts
More informationSamantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd.
01 Feb 2013 EXPERT STANDARD REPORT ADAPTIVE GENERAL REASONING TEST ABILITY ADAPT-G REPORT STRUCTURE The Standard Report presents s results in the following sections: 1. Guide to Using This Report Introduction
More informationSLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1
SLEEP DISTURBANCE A brief guide to the PROMIS Sleep Disturbance instruments: ADULT PROMIS Item Bank v1.0 Sleep Disturbance PROMIS Short Form v1.0 Sleep Disturbance 4a PROMIS Short Form v1.0 Sleep Disturbance
More informationIntelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10
Exam 3 iclicker Mean: 32.8 Median: 33 Mode: 33 SD = 6.4 How many of you have one? Do you think it would be a good addition for this course in the future? Top Score: 49 Top Cumulative Score to date: 144
More informationOn Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015
On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:
More informationIntroduction to the HBDI and the Whole Brain Model. Technical Overview & Validity Evidence
Introduction to the HBDI and the Whole Brain Model Technical Overview & Validity Evidence UPDATED 2016 OVERVIEW The theory on which the Whole Brain Model and the Herrmann Brain Dominance Instrument (HBDI
More information2016 Technical Report National Board Dental Hygiene Examination
2016 Technical Report National Board Dental Hygiene Examination 2017 Joint Commission on National Dental Examinations All rights reserved. 211 East Chicago Avenue Chicago, Illinois 60611-2637 800.232.1694
More informationThe Psychometric Principles Maximizing the quality of assessment
Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to
More informationChapter 12. The One- Sample
Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's
More informationChapter 1 Applications and Consequences of Psychological Testing
Chapter 1 Applications and Consequences of Psychological Testing Topic 1A The Nature and Uses of Psychological Testing The Consequences of Testing From birth to old age, people encounter tests at all most
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat
More informationPÄIVI KARHU THE THEORY OF MEASUREMENT
PÄIVI KARHU THE THEORY OF MEASUREMENT AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition
More informationVariables in Research. What We Will Cover in This Section. What Does Variable Mean?
Variables in Research 9/20/2005 P767 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 9/20/2005 P767
More informationTHE NATURE OF OBJECTIVITY WITH THE RASCH MODEL
JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that
More informationItem Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses
Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,
More informationIDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS
IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements
More informationReliability and Validity
Reliability and Today s Objectives Understand the difference between reliability and validity Understand how to develop valid indicators of a concept Reliability and Reliability How accurate or consistent
More information6. Assessment. 3. Skew This is the degree to which a distribution of scores is not normally distributed. Positive skew
6. Assessment 1. Measurement: general process of determining the dimensions of an attribute or trait. Assessment: Processes and procedures for collecting information about human behavior. Assessment tools:
More informationValidity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that
Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that it purports to perform. Does an indicator accurately
More informationAssociate Prof. Dr Anne Yee. Dr Mahmoud Danaee
Associate Prof. Dr Anne Yee Dr Mahmoud Danaee 1 2 What does this resemble? Rorschach test At the end of the test, the tester says you need therapy or you can't work for this company 3 Psychological Testing
More informationDATA GATHERING. Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings.
DATA GATHERING Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings. 2012 John Wiley & Sons Ltd. Measurement Measurement: the assignment
More informationChapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method
Chapter -6 Reliability and Validity of the test 6.1 Introduction 6.2 Reliability of the test 6.2.1 Test - Retest Method 6.2.2 Rational Equivalence Method 6.2.3 Split-Half Method 6.3 Validity of the test
More informationAP PSYCH Unit 11.2 Assessing Intelligence
AP PSYCH Unit 11.2 Assessing Intelligence Review - What is Intelligence? Mental quality involving skill at information processing, learning from experience, problem solving, and adapting to new or changing
More informationReliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007
Reliability Theory for Total Test Scores Measurement Methods Lecture 7 2/27/2007 Today s Class Reliability theory True score model Applications of the model Lecture 7 Psych 892 2 Great Moments in Measurement
More informationThe Current State of Our Education
1 The Current State of Our Education 2 Quantitative Research School of Management www.ramayah.com Mental Challenge A man and his son are involved in an automobile accident. The man is killed and the boy,
More informationITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE
California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION
More information2 Types of psychological tests and their validity, precision and standards
2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is
More information11-3. Learning Objectives
11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types
More informationPsychological testing
Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining
More informationA Comparison of Several Goodness-of-Fit Statistics
A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures
More informationTHE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE
THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE Tal Shahor The Academic College of Emek Yezreel Emek Yezreel 19300,
More informationUnderlying Theory & Basic Issues
Underlying Theory & Basic Issues Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1 All Too True 2 Validity In software engineering, we worry about various issues: E-Type systems: Usefulness is it doing what
More informationRegression Discontinuity Analysis
Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income
More informationChapter 4: Defining and Measuring Variables
Chapter 4: Defining and Measuring Variables A. LEARNING OUTCOMES. After studying this chapter students should be able to: Distinguish between qualitative and quantitative, discrete and continuous, and
More information