Models in Educational Measurement
|
|
- Charity Goodman
- 5 years ago
- Views:
Transcription
1 Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg
2 Background Measurement in education and psychology has increasingly come to rely on explicitly formulated statistical models. Statistical models offer power, precision and flexibility by focusing on a few essential aspects of the phenomenon under study. But if the model assumptions do not agree with the phenomenon, inferences may be incorrect.therefore, issues of model fit are centrally important. However, it is not always an easy task to determine whether a model fits data or not, or which the consequences are of model misfit.
3 The Rasch model According to the Rasch model the probability of a correct response to a test item is a function of the ability of the person, and of the difficulty of the item: As the ability of the person increases, the probability of a correct answer increases. As the difficulty of the item increases, the probability of a correct answer decreases. Under certain assumptions, it is possible to estimate the difficulty of items independently of the ability of persons, and to estimate the ability of persons independenty of the difficulty of the items.
4 The Rasch model, cont Having estimated the difficulty parameters of a set of items they can be used to construct different tests, which all measure the same ability. Different test-takers can be given different items to measure one and the same ability. This offers great advantages for solving practical measurement problems, such as adaptive testing, horizontal and vertical linking of tests, and constructing matrix-sampling test designs.
5 Does the model fit data? The Rasch model is attractive, because if it fits data, the properties of the model guarantee simple and powerful solutions of practical measurement problems. Assumptions: Unidimensionality Homogeneous discrimination of all items These assumptions may be tested with statistical tests constructed within the framework of the Rasch model.
6 Gustafsson, J. E. (1980). Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33(2), Two categories of statistical tests: ICC-tests investigate if item parameters are invariant across subsets of persons (e.g., high scorers vs. low scorers; boys vs girls) PCC-tests investigate if ability parameters are invariant across subsets of items Some results: ICC-tests do not detect multidimensionality. Muthén (1978) analyzed 15 locus of control items with a new method for factor analysis of dichotomous variables and found three lowly correlated factors. However, according to an ICC test the Rasch model had good fit to these data. The PCC-test supported the conclusion that there were three separate dimensions. The statistical power of the ICC-tests is strongly dependent on sample size and on the heterogeneity of the sample. The Rasch model does not fit speeded tests or tests which allow guessing
7 Gustafsson, J. E. & Lindblad, T. (1978) The Rasch model for dichotomous items: A solution of the conditional estimation problem for long tests and some thoughts about item screening procedures. Paper presented at the European Conference on Psychometrics and Mathematical Psychology, Uppsala, June 15 17, The Rasch model was used to analyze a test of English grammar for Swedish students. The model had poor fit, which was primarily due to a set of items measuring knowledge of irregular verbs having too high discrimination. In separate analyses, good fit was found for the irregular verb items, as well as for the other items, after some poorly constructed items had been excluded.
8 What to do when model fit is poor? Exclude the offending items This would have caused unacceptable construct underrepresentation. It also would have been illogical because too high or too low discrimination typically is not an intrinsic characteristic of an item, but rather whether the other items have similar discrimination or not Put the problematic items in a separate scale This would have been impractical, unless we aimed to differentiate between different domains of English grammar. But to do that reliably we would need more items testing irregular verbs Turn to another, less restrictive, model such as Verhelsts OPLM model which is a Rasch model which allows different but fixed discrimination parameters. This model was not developed at the time Keep the items in the test and accept the poor fit This could imply loss of credibility
9 Robustness George Box: Essentially, all models are wrong, but some are useful Many applications of the Rasch model and other IRT models, such as TIMSS and PISA define both an overall score, and subscores for different domains or processes. This must be a violation of the unidimensionality assumption. Still, this practice seems meaningful and useful. Coefficient α is often described as not being based on any strict assumptions but the formula is in fact based on the same assumptions as the Rasch model: unidimensionality and homogeneous item discrimination. If the assumptions are violated, α is underestimated. However, even in the presence of large variation in the item discrimination, the underestimation is marginal (e,g,, Reuterberg & Gustafsson, 1992)
10 Some conclusions It is difficult to assess the fit of the Rasch model It is even more difficult to develop well-fitting models There is a risk of conflict between the model requirements and the validity of the test Use of the model needs to rely on trust in robustness The capability of the Rasch model to deal with issues of multidimensionality is limited
11 Dimensionality of cognitive abilities Factor analysis was invented to investigate the dimensionality of variables: Spearman invented factor analysis to test the hypothesis that individual differences in cognition can be captured by a g-factor Thurstone invented exploratory factor analysis and demonstrated that there are seven primary mental abilities. Followers of Thurstone extended this number to at least 100 primary abilities Cattell applied factor analysis to correlations among factors to identify second- and third-order factors and introduced the distinction between Fluid Intelligence (Gf) and Crystallized Intelligence (Gc) Jöreskog developed confirmatory factor analysis and structural equation modeling allowing flexible and powerful building and testing of latent variable models
12 Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8(3),
13 Does the model hold? The g = Gf relation was replicated in several studies, but far from all Carroll s (1993) meta-analysis did not replicate the perfect relation but it showed that Gf was the broad ability most highly related to g.
14 Valentin Kvist, A., & Gustafsson, J. E. (2008). The relation between fluid intelligence and the general factor as a function of cultural background: A test of Cattell's investment theory. Intelligence, 36(5), The correlation between g and Gf was.83 in a heterogeneous group of adults, but correlations of were found within each of the three sub-groups Non-immigrants, European immigrants; Non-European immigrants. Explanation: Gf is a determinant of learning in all domains, along with motivation, effort and opportunity to learn When everyone has equal opportunity to learn, Gf influences learning and development in all domains, and so it becomes a general factor. If subgroups of persons have had different opportunities to learn certain domains, the generality of Gf breaks down. These results provide general support for Cattell s Investment theory
15 The Investment theory The Investment theory basically says that Gf is a causal factor in the development of individual differences in learning. If we knew the mechanisms through which Gf influences development of fundamental skills such as decoding and vocabulary we would have a better basis for educational interventions.
16 Methodological aspects of the hierarchical model The standard view of measurement implies that the phenomenon can be described in terms of a set of correlated dimensions, which all are unidimensional. However, some constructs are broad and encompass a very wide range of phenomena (e.g., g), others are broad and encompass wide domains of phenomena (e.g., Gc) while other constructs are narrow and encompass a more limited range of phenomena (e.g., knowledge of irregular verbs). The constructs differ in referent generality. When the unidimensionality requirement is imposed this has implied a focus on constructs with narrow referent generality. Typically it has had the consequence that broad constructs have been splintered into more and more narrow constructs, as happened in the research on cognitive abilities.
17 Gustafsson, J.E. (2002). Measurement from a hierarchical point of view. In In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.) The role of constructs in psychological and educational measurement (pp ). London: Lawrence Erlbaum Associates, Publishers. Three propositions: To measure constructs with high referent generality it is necessary to use heterogeneous measurement devices. A homogenous test always measures several dimensions. To measure constructs with low referent generality it is also necessary to measure constructs with high generality.
18 Measurement from a hierarchical point of view, cont The principle of aggregation: Aggregation causes the general factor to account for a larger proportion of variance in the sum of scores than it does in each observed measure. Each observed variable is complex, but aggregate scores may be essentially unidimensional Aggregation over broad domains of performance is a way to approximate unidimensionality so that robust use of the Rasch model and other IRT models may still be possible.
19 Grading in Sweden In Sweden grades have always been high-stakes, because they have been the primary instrument for eligibility and selection to the next level of the educational system Teachers have always been trusted to grade their students Exams were abolished in the 1960s and standardized testing has traditionally had a comparably limited role Up until the mid 1990s the grading system was normreferenced, but in 1998 a criterion-referenced grading system was introduced.
20 The norm-referenced grading system The norm-referenced grading system was developed in the 1940s by Frits Wigforss (SOU 1942:11), after it had been observed that the grades used for admission to grammar school ( realskola ) lacked severely in comparability across schools and teachers The proposed system specified that grades should be normally distributed in the population, with a specified percentage of the students at each step of the grading scale So called standard tests were developed to guide the teachers grading at the class-level With the introduction of the comprehensive school ( grundskola ) in 1962 a five-step grading scale (1-5) was introduced, without any pass level
21 Critique of the norm-referenced grading system The norm-referenced grading system was criticized on many grounds: It inspired competition rather than cooperation It was unfair to students in different classes ( There are no 5s left ) Because the grade distribution was specified to be Normal (3,1) in the population the grades could not be used to describe change in levels of knowledge and skills Along with a curriculum reform in 1994, the norm-referenced grades were abolished, and a criterion-referenced system was introduced
22 The criterion-referenced grading system The system which was first put to use in 1998 had a scale with 4 steps: Pass with Special Distinction (MVG), Pass with Distinction (VG), Pass (G) and Fail (IG) In 2011 a new scale with six steps was introduced (A-F). F = fail According to the original plans, the number of failed students was expected be a few percentage points, but the first results showed the percentage of failed students to be much higher (9 %). It has since increased to 14 % The grading is guided by verbally formulated knowledge requirements for the different steps of the grading scale
23 Knowledge requirements (partial) for grades /E/C/A/ at the end of year 9 Grade E: Pupils can choose and use basically functional mathematical methods with some adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with satisfactory results. Grade C: Pupils can choose and use appropriate mathematical methods with relatively good adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with good results. Grade A: Pupils can choose and use appropriate and effective mathematical methods with good adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with very good results.
24 Gustafsson, J.-E., Cliffordson, C., & Erickson. G. (2014). Likvärdig kunskapsbedömning i och av den svenska skolan problem och möjligheter [Equitable knowledge assessment in and of the Swedish school problems and possibilitties]. Stockholm: SNS Förlag Substantial grade inflation, particularly in upper secondary school Considerable variation in grading practices among teachers and schools Instability in the national tests across years and subjects These problems, along with several others, seem to be due to the lack of precision in the verbally formulated knowledge requirements for the different steps on the grading scale. Wigforss (1942) concluded that it is not possible to achieve sufficient comparability in grading based on verbally formulated criteria, which was why he developed the norm-referenced grading system.
25 Olsen, R.V., & Nilsen, T. (in press). Standard setting in PISA and TIMSS. In S. Blömeke & J.E. Gustafsson (Eds.), Standard Setting in Education - The Nordic Countries in an International Perspective, New York: Springer Publishing The authors compare and discuss similarities and differences in the way PISA and TIMSS set and formulate descriptions of standards or do scale anchoring (International Benchmarks in TIMSS based on a curriculum model; Proficiency Levels in PISA based on a competence model). Focus on the empirical basis for development of performancelevel descriptors. Their interest in standard setting and performance-level descriptions were partially driven by the fact that the Norwegian grading system has problems of comparability in grading.
26 An generic example of an item map (from Olsen & Nilsen, in press) Decide on the number and location of cut-scores to be used Develop Performance-Level Descriptors (PLDs) based on descriptions of the clusters of items identified and on the general description of the construct stated in the framework. This typically requires lots of items, given that the PLDs should not be formulated in item specific terms.
27 TIMSS International Benchmarks (partial) Low (400): Students have some knowledge of whole numbers and decimals, operations, and basic graphs. Intermediate: (475) Students can solve problems involving decimals, fractions, proportions, and percentages in a variety of settings. For example, they can determine proportions of a whole in order to construct pie charts and calculate unit prices to solve a problem. High (550): Students can use information from several sources to solve problems involving different types of numbers and operations. Students can relate fractions, decimals, and percents to each other. They can solve problems with fractions, proportions, and percentages. Students show understanding of whole number exponents. They can identify the prime factorization of a given number. Advanced (625): Students can solve a variety of fraction, proportion, and percent problems and justify their conclusions. They can reason with different types of numbers, including whole numbers, negative numbers, fractions, and percentages in abstract and non-routine situations. For example, given two points on a number line representing unspecified fractions, students can identify the point that represents their product.
28 PLDs and grading Empirically based PLDs could potentially provide a more stable foundation to support class-room based assessment and criterion-referenced grading than the currently used knowledge requirements, if formulated at an appropriate level of abstraction Linking national tests to PISA, TIMSS and the other international studies could provide a broader basis for constructing PLDs Dimensionality?
29 Individual differences versus development of competence The PLDs could, perhaps, be developed into empirically and theoretically based descriptions of learning trajectories, which could inform curricula, instruction and assessment The classical measurement models focus on individual differences: The notions of dimensionality, discrimination, reliability and validity are defined in terms of variance and covariance, and their application requires that the population of persons is defined. These models have limited applicability for the study of individual growth. The main aim of education is to support development of competence, so in educational measurement issues of development should be a central concern The tensions between differential and developmental psychology illustrate the difficulties to integrate research on individual differences and development. However, progress has lately been made through growth curve modeling and applications of IRT to solve measurement problems. Hopefully we will see more of integration of differential and developmental approaches in the future.
Basic concepts and principles of classical test theory
Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must
More informationInvestigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationEmpowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison
Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological
More informationComprehensive Statistical Analysis of a Mathematics Placement Test
Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational
More informationHaving your cake and eating it too: multiple dimensions and a composite
Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite
More informationINVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form
INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement
More informationDevelopment, Standardization and Application of
American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,
More informationMeasurement and Descriptive Statistics. Katie Rommel-Esham Education 604
Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency
More informationCEMO RESEARCH PROGRAM
1 CEMO RESEARCH PROGRAM Methodological Challenges in Educational Measurement CEMO s primary goal is to conduct basic and applied research seeking to generate new knowledge in the field of educational measurement.
More informationOn indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state
On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties
More informationContents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD
Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationPaul Irwing, Manchester Business School
Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human
More informationItem Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses
Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,
More informationBy Hui Bian Office for Faculty Excellence
By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys
More informationlinking in educational measurement: Taking differential motivation into account 1
Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to
More informationRe-Examining the Role of Individual Differences in Educational Assessment
Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National
More informationFundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2
Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain
More informationAndré Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationalternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over
More informationTHE NATURE OF OBJECTIVITY WITH THE RASCH MODEL
JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that
More informationMeasurement Invariance (MI): a general overview
Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc
More informationResults & Statistics: Description and Correlation. I. Scales of Measurement A Review
Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize
More informationDoes factor indeterminacy matter in multi-dimensional item response theory?
ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional
More informationMeasurement Issues in Concussion Testing
EVIDENCE-BASED MEDICINE Michael G. Dolan, MA, ATC, CSCS, Column Editor Measurement Issues in Concussion Testing Brian G. Ragan, PhD, ATC University of Northern Iowa Minsoo Kang, PhD Middle Tennessee State
More informationExamining the Psychometric Properties of The McQuaig Occupational Test
Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,
More informationThe Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland
Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University
More informationAPPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH
APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER
More informationChapter 1: Explaining Behavior
Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring
More information2 Critical thinking guidelines
What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory
More information10 Intraclass Correlations under the Mixed Factorial Design
CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a
More informationData and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data
TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2
More informationThe relation between fluid intelligence and the general factor as a function of cultural background: a test of Cattell s investment theory
The relation between fluid intelligence and the general factor as a function of cultural background: a test of Cattell s investment theory Ann Valentin Kvist Jan-Eric Gustafsson WORKING PAPER 2007:23 The
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationCHAPTER VI RESEARCH METHODOLOGY
CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the
More informationPsych 1Chapter 2 Overview
Psych 1Chapter 2 Overview After studying this chapter, you should be able to answer the following questions: 1) What are five characteristics of an ideal scientist? 2) What are the defining elements of
More informationPsychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model
Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Curt Hagquist Karlstad University, Karlstad, Sweden Address: Karlstad University SE-651 88 Karlstad Sweden
More informationThe Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis
Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous
More informationItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides
More informationGENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS
GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at
More informationMartin Senkbeil and Jan Marten Ihme
neps Survey papers Martin Senkbeil and Jan Marten Ihme NEPS Technical Report for Computer Literacy: Scaling Results of Starting Cohort 4 for Grade 12 NEPS Survey Paper No. 25 Bamberg, June 2017 Survey
More informationReliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital
Paper I Olsen, E. (2008). Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital. In J. Øvretveit and P. J. Sousa (Eds.), Quality and Safety Improvement Research:
More informationWork, Employment, and Industrial Relations Theory Spring 2008
MIT OpenCourseWare http://ocw.mit.edu 15.676 Work, Employment, and Industrial Relations Theory Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationEvaluating the quality of analytic ratings with Mokken scaling
Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch
More informationNumerical Integration of Bivariate Gaussian Distribution
Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationMeasuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University
Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety
More informationValidating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky
Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University
More informationMethodological Issues in Measuring the Development of Character
Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton
More informationRunning head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note
Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,
More informationIntroduction to Test Theory & Historical Perspectives
Introduction to Test Theory & Historical Perspectives Measurement Methods in Psychological Research Lecture 2 02/06/2007 01/31/2006 Today s Lecture General introduction to test theory/what we will cover
More informationDiagnostic Classification Models
Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom
More informationASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA
1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT
More informationIssues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy
Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus
More informationInvestigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.
The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,
More informationParallel Forms for Diagnostic Purpose
Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing
More informationAnalyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia
Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA
More informationAssessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.
Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong
More information2 Types of psychological tests and their validity, precision and standards
2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence Key Vocabulary: point estimator point estimate confidence interval margin of error interval confidence level random normal independent four step process level C confidence
More informationResearch Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.
Department of Political Science UNIVERSITY OF CALIFORNIA, SAN DIEGO Philip G. Roeder Research Prospectus Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. A
More informationA critical look at the use of SEM in international business research
sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University
More informationInfluences of IRT Item Attributes on Angoff Rater Judgments
Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts
More informationGood Assessment by Design
February 2013 Good Assessment by Design An International Comparative Analysis of Science and Mathematics Assessments Dr Rose Clesham International Comparative analysis of Mathematics and Science assessments
More informationwas also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179 185. (3362 citations in Google Scholar as of 4/1/2016) Who would have thought that a paper
More informationOn the purpose of testing:
Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase
More informationIDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS
IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements
More informationScale Building with Confirmatory Factor Analysis
Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory
More informationANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES
ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES Comparing science, reading and mathematics performance across PISA cycles The PISA 2006, 2009, 2012
More informationDoing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto
Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea
More informationThe MHSIP: A Tale of Three Centers
The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.
More informationMeasuring noncompliance in insurance benefit regulations with randomized response methods for multiple items
Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University,
More informationResearch Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section
Research Approach & Design Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section Content: Introduction Definition of research design Process of designing & conducting
More informationCSC2130: Empirical Research Methods for Software Engineering
CSC2130: Empirical Research Methods for Software Engineering Steve Easterbrook sme@cs.toronto.edu www.cs.toronto.edu/~sme/csc2130/ 2004-5 Steve Easterbrook. This presentation is available free for non-commercial
More informationPsychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals
Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe
More informationThe Psychometric Principles Maximizing the quality of assessment
Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence
More informationPsychological testing
Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining
More informationProof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén
Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected
More informationIntelligence What is intelligence? Intelligence Tests and Testing
1 2 3 4 1 2 Intelligence What is intelligence? What you know or how well you learn? Psychologist disagree. INTELLIGENCE Is the cognitive abilities (thinking, reasoning, and problem solving) of a person
More information2. Which pioneer in intelligence testing first introduced performance scales in addition to verbal scales? David Wechsler
Open Your Class with this Tomorrow Intelligence: All That Really Matters KEY Exploring IQ with Graphs and Charts Directions: Review each of the following statements about intelligence and the associated
More informationLong Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops.
Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Short Term: Develop a valid and reliable measure of students level of understanding
More informationPLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity
PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with
More informationMultidimensional Modeling of Learning Progression-based Vertical Scales 1
Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This
More informationinvestigate. educate. inform.
investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More informationBruno D. Zumbo, Ph.D. University of Northern British Columbia
Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.
More informationDouble perspective taking processes of primary children adoption and application of a psychological instrument
Double perspective taking processes of primary children adoption and application of a psychological instrument Cathleen Heil Leuphana University, IMD, Lüneburg, Germany; cathleen.heil@leuphana.de Perspective
More informationProceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)
EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational
More informationModeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)
International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement
More informationReliability & Validity Dr. Sudip Chaudhuri
Reliability & Validity Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian
More informationConvergence Principles: Information in the Answer
Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice
More informationMaike Krannich, Odin Jost, Theresa Rohm, Ingrid Koller, Steffi Pohl, Kerstin Haberkorn, Claus H. Carstensen, Luise Fischer, and Timo Gnambs
neps Survey papers Maike Krannich, Odin Jost, Theresa Rohm, Ingrid Koller, Steffi Pohl, Kerstin Haberkorn, Claus H. Carstensen, Luise Fischer, and Timo Gnambs NEPS Technical Report for reading: Scaling
More informationMultiple Act criterion:
Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus
More informationHierarchical Bayesian Modeling of Individual Differences in Texture Discrimination
Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive
More informationUvA-DARE (Digital Academic Repository)
UvA-DARE (Digital Academic Repository) Standaarden voor kerndoelen basisonderwijs : de ontwikkeling van standaarden voor kerndoelen basisonderwijs op basis van resultaten uit peilingsonderzoek van der
More informationHow Do We Assess Students in the Interpreting Examinations?
How Do We Assess Students in the Interpreting Examinations? Fred S. Wu 1 Newcastle University, United Kingdom The field of assessment in interpreter training is under-researched, though trainers and researchers
More informationOn the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA
On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction
More informationLinking Errors in Trend Estimation in Large-Scale Surveys: A Case Study
Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation
More informationLinking Assessments: Concept and History
Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.
More informationImpact and adjustment of selection bias. in the assessment of measurement equivalence
Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,
More information