Having your cake and eating it too: multiple dimensions and a composite

Size: px
Start display at page:

Download "Having your cake and eating it too: multiple dimensions and a composite"

Transcription

1 Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018

2 outline Motivating example Different modeling approaches Composite Model Reliability Plausible values Empirical Example

3 Micro- and macro- level individual dimensions summative combination of those multiple dimensions composite three main modeling options: the uni- and multidimensional the bi-factor model the higher-order model

4 Micro- and macro- level Mathematics Achievement Algebra Geometry Statistics Administrators : What is the mathematics achievement of students? Teachers: Which topic needs closer attention?

5 Classical Test Theory

6 Item Response Theory

7 Bifactor model a serious limitation for interpretation for this context not useful for practitioners

8 Bifactor model Perhaps the methodologists who are promoting this model know some secret unknown to the authors, but we have no conceptualization what such things ( Algebra uncorrelated with Mathematics Achievement, Geometry uncorrelated with Mathematics Achievement and Statistics uncorrelated with Mathematics Achievement ) might be, and/or how they could be interpreted. (Wilson & Gochyyev, forthcoming, p.7)

9 Second-order (higher-order) model the lower order estimates are a linear function of the higher order estimate if the relationship is linear: each person has only one estimate (the higher-order one) the lower-order ones are all determined by that.

10 Composite model Assumptions The sub-test level (the parts ) are the main focus for measurement The sum-total level (the whole ) is needed for other pragmatic uses Two parts: 1. a multidimensional model for the sub-tests 2. a predictive model for a composite of the latent variables based on each sub-test

11

12 Composite model: hybrid of two measurement traditions reflective measurement dominant trend latent variable is seen as being the source of the responses to the items formative measurement items are seen as being the source of the general variable

13 Composite model Howell, Breivik & Wilcox (2007, p. 205): formative measurement is not an equally attractive alternative to reflective measurement and that whenever possible, in developing new measures or choosing among alternative existing measures, researchers should opt for reflective measurement. we agree the key: which level of the measurement should be optimized? in the educational context: level of the sub-tests should be optimized reflective measurement at the sub-test level

14 Estimation

15 Weighting Schemes Weighting by the number of items ( item-frequency weighting ) not ideal confounded by design-related decisions implicitly encoded in the unidimensional modeling approach Reliability weighting: the more reliable the score for a dimension, the higher the weight it gets affected by the number of items for that dimension

16 Weighting Schemes Weighting by mean item difficulty ( item-difficulty weighting ) if a dimension s items are more difficult, that dimension should have a higher weight in the composite one should either use a proportion correct or IRT difficulties obtained from the unidimensional model if one finds that dimensions-specific difficulty means differ substantially, this may hint towards possible design flaws as a good practice in instrument design, one should aim to have items from each dimension to span the ability continuum.

17 Weighting Schemes Weighting by intended use ( consequential weighting ) not all strands are created equally depending on the grade level, some topics/content-areas dominate the school year compared to others adjusting the weights accordingly by giving more weight to topics that are covered more might be useful for one important reason: reflecting in the test the apparent amount of a topic in the curriculum particularly relevant in educational achievement testing

18 Common scale across dimensions often overlooked regardless of how insensible it sounds justifies the combination of these dimensions into a single summary score (the composite score) option 1: construction of composite scores after aligning the different dimensions option 2: implement this alignment within an estimation routine itself dimensions will be forced into a common metric

19 Reliability of the composite

20 EAP reliability EAP: mean of the posterior distribution The variance of the posterior is used to represent uncertainty Mislevy, Beaton, Kaplan & Sheehan (1992): reliability can be viewed as the amount by which the measurement process has reduced uncertainty in the prediction of each individual s ability R E s = 1- s 2 p 2 var = s EAP ( q ) 2

21 Variance and reliability for the composite To construct this model-based variance estimate for the composite, we use plausible values (PVs: Mislevy et al, 1992) (1) randomly generate 5 PVs for each person and for each dimension (2) obtain the composite score resulting from each draw (using weights) (3) estimate the variance for each of the 5 composite distributions (4) average the variance across five draws To obtain EAP reliability divide the observed variance of the composite (obtained from dimensions-specific EAP scores) with the variance obtained from the above steps

22 Alternative reliability for the composite Reliability Coefficient (Spearman, 1910): The correlation between one half and the other half of several measures of the same thing classical formulation of reliability: correlation between two random measurements of the composite using PVs as above, obtain correlations between each pair of the 5 composite distributions, and calculate the mean of the 10 possible pairings (i.e., ((5!)/(3!2!) = 10).

23 Example: ADM Data Modeling curriculum designed to improve middle school students statistical reasoning schools were randomly assigned treatment/control pre- and post-test we used data from the posttest five sub-dimensions (domains): Data Display (DAD) Models of Variability (MOV) Chance (CHA) Concepts of Statistics (COS) Informal Inference (INI) due to the very high correlation between DAD and INI dimensions, we combined these two dimensions 25 items: DAD (11); COS (8); CHA (3); MOV (3)

24 Example: multidimensional Rasch model unidimensional Rasch: variance: (0.024) EAP reliability of 0.89; Cronbach s Alpha of 0.87.

25 Example: multidimensional Rasch model

26 Example: naïve correlations overestimated due to the correlated bivariate priors when computing EAP estimates EAP estimates are shrunken towards each other, and the amount of shrinkage depends (inversely) on their reliabilities

27 Example: Bifactor model the latent variable correlation between the common and the unidimensional latent variable is estimated at calculated using plausible values for the unidimensional latent variable, and using the reliability of the common factor to correct for the overestimation of the EAP correlations

28 Example: Bifactor model naïve correlations

29 Example: Second-order model the latent variable correlation between the common and the unidimensional latent variable is estimated at (calculated using plausible values for the unidimensional latent variable, and using the reliability of the overall factor to correct for the overestimation)

30 Example: Second-order model Correlations between latent variables Naïve correlations (between EAP estimates)

31 Example: Composite model with equal weights The latent variable correlation between the composite and the unidimensional latent variable: 0.84

32 Example: Composite model with reliability weights The latent variable correlation between the composite and the unidimensional latent variable: 0.85

33 Conclusion inherently multidimensional contexts ( the parts ) nevertheless also include a certain level of interest in the overarching combination of those multiple dimensions ( the whole ) using the uni- and multidimensional pair of modeling techniques can give both perspectives to bring them together under a single analytic umbrella, the composite model offers some very useful advantages we see it as being readily useful quite broadly to address a very long-standing measurement problem.

34 thank you questions?

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating

More information

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015 On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Introduction to Test Theory & Historical Perspectives

Introduction to Test Theory & Historical Perspectives Introduction to Test Theory & Historical Perspectives Measurement Methods in Psychological Research Lecture 2 02/06/2007 01/31/2006 Today s Lecture General introduction to test theory/what we will cover

More information

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT Amin Mousavi Centre for Research in Applied Measurement and Evaluation University of Alberta Paper Presented at the 2013

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Impact of Methods of Scoring Omitted Responses on Achievement Gaps

Impact of Methods of Scoring Omitted Responses on Achievement Gaps Impact of Methods of Scoring Omitted Responses on Achievement Gaps Dr. Nathaniel J. S. Brown (nathaniel.js.brown@bc.edu)! Educational Research, Evaluation, and Measurement, Boston College! Dr. Dubravka

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Turning Output of Item Response Theory Data Analysis into Graphs with R

Turning Output of Item Response Theory Data Analysis into Graphs with R Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops.

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Short Term: Develop a valid and reliable measure of students level of understanding

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Module 14: Missing Data Concepts

Module 14: Missing Data Concepts Module 14: Missing Data Concepts Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724 Pre-requisites Module 3

More information

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Li Cai University of California, Los Angeles Carol Woods University of Kansas 1 Outline

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES

ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES Comparing science, reading and mathematics performance across PISA cycles The PISA 2006, 2009, 2012

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Statistics for Psychosocial Research Session 1: September 1 Bill

Statistics for Psychosocial Research Session 1: September 1 Bill Statistics for Psychosocial Research Session 1: September 1 Bill Introduction to Staff Purpose of the Course Administration Introduction to Test Theory Statistics for Psychosocial Research Overview: a)

More information

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Overview Babak Shahbaba Department of Statistics, UCI The role of statistical analysis in science This course discusses some biostatistical methods, which involve

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

Measurement of Constructs in Psychosocial Models of Health Behavior. March 26, 2012 Neil Steers, Ph.D.

Measurement of Constructs in Psychosocial Models of Health Behavior. March 26, 2012 Neil Steers, Ph.D. Measurement of Constructs in Psychosocial Models of Health Behavior March 26, 2012 Neil Steers, Ph.D. Importance of measurement in research testing psychosocial models Issues in measurement of psychosocial

More information

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES) Assessing the Validity and Reliability of the Teacher Keys Effectiveness System (TKES) and the Leader Keys Effectiveness System (LKES) of the Georgia Department of Education Submitted by The Georgia Center

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Placebo and Belief Effects: Optimal Design for Randomized Trials

Placebo and Belief Effects: Optimal Design for Randomized Trials Placebo and Belief Effects: Optimal Design for Randomized Trials Scott Ogawa & Ken Onishi 2 Department of Economics Northwestern University Abstract The mere possibility of receiving a placebo during a

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4 UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4 Algebra II Unit 4 Overview: Inferences and Conclusions from Data In this unit, students see how the visual displays and summary statistics they

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) UNIT 4 OTHER DESIGNS (CORRELATIONAL DESIGN AND COMPARATIVE DESIGN) Quasi Experimental Design Structure 4.0 Introduction 4.1 Objectives 4.2 Definition of Correlational Research Design 4.3 Types of Correlational

More information

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007 Reliability Theory for Total Test Scores Measurement Methods Lecture 7 2/27/2007 Today s Class Reliability theory True score model Applications of the model Lecture 7 Psych 892 2 Great Moments in Measurement

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

Adaptive EAP Estimation of Ability

Adaptive EAP Estimation of Ability Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Ana V. Diez Roux MD PhD Center for Integrative Approaches to Health Disparities University of Michigan

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Bayesians methods in system identification: equivalences, differences, and misunderstandings Bayesians methods in system identification: equivalences, differences, and misunderstandings Johan Schoukens and Carl Edward Rasmussen ERNSI 217 Workshop on System Identification Lyon, September 24-27,

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

The application of Classical Test Theory (CTT) to the development of Patient- Reported Outcome Measures (PROMs) in Health Services Research

The application of Classical Test Theory (CTT) to the development of Patient- Reported Outcome Measures (PROMs) in Health Services Research The application of Classical Test Theory (CTT) to the development of Patient- Reported Outcome Measures (PROMs) in Health Services Research Matthew Hankins Submission for PhD by Publication University

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

GUIDELINE COMPARATORS & COMPARISONS:

GUIDELINE COMPARATORS & COMPARISONS: GUIDELINE COMPARATORS & COMPARISONS: Direct and indirect comparisons Adapted version (2015) based on COMPARATORS & COMPARISONS: Direct and indirect comparisons - February 2013 The primary objective of

More information

2013 Supervisor Survey Reliability Analysis

2013 Supervisor Survey Reliability Analysis 2013 Supervisor Survey Reliability Analysis In preparation for the submission of the Reliability Analysis for the 2013 Supervisor Survey, we wanted to revisit the purpose of this analysis. This analysis

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics CHAPTER 11 Reviewing the TIMSS Advanced 2015 Achievement Item Statistics Pierre Foy Michael O. Martin Ina V.S. Mullis Liqun Yin Kerry Cotter Jenny Liu The TIMSS & PIRLS conducted a review of a range of

More information

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Jonathan Templin Department of Educational Psychology Achievement and Assessment Institute

More information

Tech Talk: Using the Lafayette ESS Report Generator

Tech Talk: Using the Lafayette ESS Report Generator Raymond Nelson Included in LXSoftware is a fully featured manual score sheet that can be used with any validated comparison question test format. Included in the manual score sheet utility of LXSoftware

More information

On the Targets of Latent Variable Model Estimation

On the Targets of Latent Variable Model Estimation On the Targets of Latent Variable Model Estimation Karen Bandeen-Roche Department of Biostatistics Johns Hopkins University Department of Mathematics and Statistics Miami University December 8, 2005 With

More information

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering Meta-Analysis Zifei Liu What is a meta-analysis; why perform a metaanalysis? How a meta-analysis work some basic concepts and principles Steps of Meta-analysis Cautions on meta-analysis 2 What is Meta-analysis

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

commentary Time is a jailer: what do alpha and its alternatives tell us about reliability?

commentary Time is a jailer: what do alpha and its alternatives tell us about reliability? commentary Time is a jailer: what do alpha and its alternatives tell us about reliability? Rik Psychologists do not have it easy, but the article by Peters Maastricht University (2014) paves the way for

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Correlation SPSS procedure for Pearson r Interpretation of SPSS output Presenting results Partial Correlation Correlation

More information

SWITCH Trial. A Sequential Multiple Adaptive Randomization Trial

SWITCH Trial. A Sequential Multiple Adaptive Randomization Trial SWITCH Trial A Sequential Multiple Adaptive Randomization Trial Background Cigarette Smoking (CDC Estimates) Morbidity Smoking caused diseases >16 million Americans (1 in 30) Mortality 480,000 deaths per

More information

CFPB Financial Well-Being Scale

CFPB Financial Well-Being Scale May 2017 CFPB Financial Well-Being Scale Scale development technical report Table of contents Table of contents... 1 1. Introduction... 3 2. Defining financial well-being... 6 3. Overview of a typical

More information

Smiley Faces: Scales Measurement for Children Assessment

Smiley Faces: Scales Measurement for Children Assessment Smiley Faces: Scales Measurement for Children Assessment Wan Ahmad Jaafar Wan Yahaya and Sobihatun Nur Abdul Salam Universiti Sains Malaysia and Universiti Utara Malaysia wajwy@usm.my, sobihatun@uum.edu.my

More information