Initial Report on the Calibration of Paper and Pencil Forms UCLA/CRESST August 2015

Size: px
Start display at page:

Download "Initial Report on the Calibration of Paper and Pencil Forms UCLA/CRESST August 2015"

Transcription

1 This report describes the procedures used in obtaining parameter estimates for items appearing on the Smarter Balanced Assessment Consortium (SBAC) summative paper-pencil forms. Among the items appearing on these forms, a subset were identified as modified meaning it was assumed that these items would function differently in this mode of administration and that it would not be reasonable to use the item parameters from the previous calibration (in which items were administered via computer). The remaining items which we refer to in this report as unmodified were treated as candidate anchor items for the purpose of linking the paper-pencil forms to the scale of the computer-based test administrations. For each grade and subject, our calibration of items for paper and pencil administration included the following steps: 1. Exclusion of ineligible cases. Data for the calibration of items appearing on the paper and pencil forms were provided by two consortium member states. We excluded from the calibrations any cases with fewer than nine valid item scores on the form. Invalid score codes included B=blank, L=non-scorable language, T=off-topic, M=off-purpose. Items with no score or score code were also treated as invalid for the purpose of determining eligibility. Table 1 below shows the number of eligible cases used in the paper-pencil calibrations, by grade level and subject. Table 1. Calibration sample size by grade and subject. Grade ELA Math HS Recoding of invalid or missing item scores. For cases meeting the inclusion criterion in step #1 (i.e., having a minimum of nine valid item scores on the form), any items with an invalid score code (or missing a score or score code entirely) were recoded with a score of zero (i.e., treated as incorrect or receiving the minimum score). 3. Initial calibration. We performed an initial calibration in which (a) the parameters of all unmodified (candidate anchor) items were fixed to their prior estimates (the values obtained from the online administration), (b) the parameters of all modified items were freely estimated, and (c) the latent variable density was estimated as an empirical histogram (see, e.g., Woods, 2007; Houts & Cai, 2013) with estimated mean and variance. This population distribution was estimated due to our uncertainty about the proficiency of the students used in this calibration and the expectation that these students were unlikely to be representative of all students in the grade level. In order to match the scoring procedures used with the computer-based tests, scores for two items were first recoded (two score levels collapsed into one). Another two items required similar recoding in order to obtain stable item parameter estimates. All such recoding was noted in the scoring parameter files created after the final calibration. From the initial calibration performed in 1

2 this step, we obtained item and model goodness-of-fit indices. The calibrations in this step (as well as in later steps #4 and #6) were performed with the flexmirt item response modeling software (Cai, 2015). 4. Calibrations to estimating of parameters of unmodified items. We then performed a series of calibrations identical to step #3 but with the parameters of one unmodified item at a time now freely estimated. The parameters of all other unmodified items were fixed to their prior estimates (from the prior field test, in which items were administered online). As in step #3, the parameters of all modified items were freely estimated, along with the population distribution s mean, variance, and shape. The number of calibrations performed in this step was equal to the number of unmodified items on the form (for the given subject and grade level). From these calibrations, we obtained item and model goodness-of-fit indices and item parameter estimates for the unmodified items when these parameters were estimated freely, rather than fixed to their prior estimates (as in step #3). 5. Evaluation of candidate anchor items. We then examined the extent to which each unmodified item should be retained or rejected as an anchor in the final calibration for the paper and pencil forms (i.e., whether or not it would be reasonable to fix the parameters of these items to their prior estimates). We used the parameter estimates from the prior item calibration (in which the items were administered via computer) and the calibrations from step #4 to compute the expected score functions for the two modes of test administration. For dichotomous items, the expected score function is simply the item traceline. The two expected score functions (for the computer-based and paper-pencil administrations) were plotted, and differences in item functioning across the modes of administration were quantified by computing a weighted Area Between the Curves (wabc; see Hansen, Cai, Stucky, Tucker, Shadel, & Edelen, 2014). w ( ) ( ) ( ) ( ) g( ). ( Here, ) ( ) is the expected score function based on the parameter estimates from the prior ( calibration, in which items were administered via computer, and ) ( ) is the expected score function based on the parameters estimated from the paper-pencil calibration (in step #4). The proficiency distribution for the grade and subject is given by g( ). The weighted differences were integrated over the range of proficiency levels (from -4 to +4 on the logit scale, in increments of 0.1). We rejected as anchors any items with a wabc value greater than Note that all items with wabc > also showed significant misfit based on the marginal fit index obtained in the initial calibration and significant improvement in overall model fit when the parameters of the item were estimated freely rather than fixed to their prior estimates (evaluated though a likelihood ratio test; note that the model in step #3 is nested within each model estimated in step #4). There were, however, items that displayed statistically significant differences in item functioning (based on the likelihood ratio test) that showed little meaningful difference in the expected score functions (wabc < 0.150); these items were retained as anchors. Examples of the expected score functions and corresponding wabc values are shown in Figure 1, for six Grade 3 ELA items. 2

3 Figure 1. Expected score functions for six items, by mode of administration. Notes: Items shown are from the Grade 3 ELA paper-pencil form. Solid black lines show the expected score functions based on the parameter estimates obtained from the prior calibration (when items were administered via computer). Dotted blue lines show the expected score functions when the item s parameters were estimated, rather than fixed. The p-value is for a likelihood ratio test examining whether the overall fit of model in which the item s parameters are fixed to the previous estimates (i.e., the model estimated in step #3) is significantly worse than a model in which the item s parameters are freely estimated (as in step #4). wabc is the weighted difference in expected score functions. For the two items in the first row, model fit was not significantly worse when the item parameters were fixed (i.e., p > 0.05 for the likelihood ratio test comparing the models from steps #3 and #4). For the two items in the second row, model fit was significantly worse when the parameters were fixed, but the wabc value remained below the threshold value of For items in the third row, model fit was significantly worse when item parameters were fixed to the previous values, and the wabc values exceeded Among these six items, only these last two items were rejected as anchors. Table 2 presents a summary of results from our evaluation of the unmodified (candidate anchor) items. 3

4 Table 2. Number of items in calibration, by grade and subject. grade modified unmodified items items anchor rejected total ELA HS Math HS Notes: Performance task items for the HS ELA and Math forms were completed by only a small proportion of students the sample used in these analyses. As a result, the HS performance task items were excluded from the calibration. Two items in Grade 6 ELA were also excluded from the calibration because very few examinees answered these items correctly, leading to unstable parameter estimates. The excluded items are not included in the item counts shown in this table but are used in scoring the paper-pencil forms (using the parameters from the prior calibration). The number of unmodified items within the form for a particular grade and subject that displayed wabc values exceeding the threshold value of ranged from zero to three. For those subjects and grades in which no items were rejected, all unmodified items were used as anchors, the model from step #3 was used as the final calibration model, and the parameter estimates from this model were provided to SBAC for use in scoring the paper-pencil forms. When one or more items were rejected as anchors, we proceeded to step #6. 6. Final item calibration. For grades and subjects in which any unmodified item was rejected as an anchor, we estimated a final calibration model that was identical to the model from step #3, except that the parameters of all rejected anchor items were freely estimated, rather than fixed to their prior estimates. Parameters of the modified items were also freely estimated. As in steps #3 and #4, the latent variable density was estimated as an empirical histogram. The parameter estimates from this final calibration were provided to SBAC for use in scoring the paper-pencil forms. 4

5 References Cai, L. (2015). flexmirt: Flexible multilevel item factor analysis and test scoring [Computer software]. Seattle, WA: Vector Psychometric Group. Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. S., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS smoking item banks. Nicotine and Tobacco Research, 16, Supplement 3, S175-S189. Houts, C. R., & Cai, L. (2013). flexmirt user s manual version 2: Flexible multilevel multidimensional item analysis and test scoring. Chapel Hill, NC: Vector Psychometric Group. Woods, C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and psychological measurement, 67,

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Li Cai University of California, Los Angeles Carol Woods University of Kansas 1 Outline

More information

FATIGUE. A brief guide to the PROMIS Fatigue instruments:

FATIGUE. A brief guide to the PROMIS Fatigue instruments: FATIGUE A brief guide to the PROMIS Fatigue instruments: ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS Ca Bank v1.0 Fatigue PROMIS Pediatric Bank v2.0 Fatigue PROMIS Pediatric Bank v1.0 Fatigue* PROMIS

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

MEANING AND PURPOSE. ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Meaning and Purpose PROMIS Short Form v1.0 Meaning and Purpose 4a

MEANING AND PURPOSE. ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Meaning and Purpose PROMIS Short Form v1.0 Meaning and Purpose 4a MEANING AND PURPOSE A brief guide to the PROMIS Meaning and Purpose instruments: ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Meaning and Purpose PROMIS Short Form v1.0 Meaning and Purpose 4a PROMIS

More information

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1 SLEEP DISTURBANCE A brief guide to the PROMIS Sleep Disturbance instruments: ADULT PROMIS Item Bank v1.0 Sleep Disturbance PROMIS Short Form v1.0 Sleep Disturbance 4a PROMIS Short Form v1.0 Sleep Disturbance

More information

PAIN INTERFERENCE. ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS-Ca Bank v1.1 Pain Interference PROMIS-Ca Bank v1.0 Pain Interference*

PAIN INTERFERENCE. ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS-Ca Bank v1.1 Pain Interference PROMIS-Ca Bank v1.0 Pain Interference* PROMIS Item Bank v1.1 Pain Interference PROMIS Item Bank v1.0 Pain Interference* PROMIS Short Form v1.0 Pain Interference 4a PROMIS Short Form v1.0 Pain Interference 6a PROMIS Short Form v1.0 Pain Interference

More information

PSYCHOLOGICAL STRESS EXPERIENCES

PSYCHOLOGICAL STRESS EXPERIENCES PSYCHOLOGICAL STRESS EXPERIENCES A brief guide to the PROMIS Pediatric and Parent Proxy Report Psychological Stress Experiences instruments: PEDIATRIC PROMIS Pediatric Item Bank v1.0 Psychological Stress

More information

PHYSICAL FUNCTION A brief guide to the PROMIS Physical Function instruments:

PHYSICAL FUNCTION A brief guide to the PROMIS Physical Function instruments: PROMIS Bank v1.0 - Physical Function* PROMIS Short Form v1.0 Physical Function 4a* PROMIS Short Form v1.0-physical Function 6a* PROMIS Short Form v1.0-physical Function 8a* PROMIS Short Form v1.0 Physical

More information

Smoking Social Motivations

Smoking Social Motivations Smoking Social Motivations A brief guide to the PROMIS Smoking Social Motivations instruments: ADULT PROMIS Item Bank v1.0 Smoking Social Motivations for All Smokers PROMIS Item Bank v1.0 Smoking Social

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

INTRODUCTION TO ASSESSMENT OPTIONS

INTRODUCTION TO ASSESSMENT OPTIONS DEPRESSION A brief guide to the PROMIS Depression instruments: ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS-Ca Bank v1.0 Depression PROMIS Pediatric Item Bank v2.0 Depressive Symptoms PROMIS Pediatric

More information

ANXIETY. A brief guide to the PROMIS Anxiety instruments:

ANXIETY. A brief guide to the PROMIS Anxiety instruments: ANXIETY A brief guide to the PROMIS Anxiety instruments: ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS Bank v1.0 Anxiety PROMIS Short Form v1.0 Anxiety 4a PROMIS Short Form v1.0 Anxiety 6a PROMIS Short

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

ABOUT SMOKING NEGATIVE PSYCHOSOCIAL EXPECTANCIES

ABOUT SMOKING NEGATIVE PSYCHOSOCIAL EXPECTANCIES Smoking Negative Psychosocial Expectancies A brief guide to the PROMIS Smoking Negative Psychosocial Expectancies instruments: ADULT PROMIS Item Bank v1.0 Smoking Negative Psychosocial Expectancies for

More information

ABOUT SUBSTANCE USE INTRODUCTION TO ASSESSMENT OPTIONS SUBSTANCE USE. 2/26/2018 PROMIS Substance Use Page 1

ABOUT SUBSTANCE USE INTRODUCTION TO ASSESSMENT OPTIONS SUBSTANCE USE. 2/26/2018 PROMIS Substance Use Page 1 SUBSTANCE USE A brief guide to the PROMIS Substance Use instruments: ADULT PROMIS Item Bank v1.0 Appeal of Substance Use (Past 3 Months) PROMIS Item Bank v1.0 Appeal of Substance Use (Past 30 days) PROMIS

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY No. of Printed Pages : 12 MHS-014 POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY Time : 2 hours Maximum Marks : 70 PART A Attempt all questions.

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

GLOBAL HEALTH. PROMIS Pediatric Scale v1.0 Global Health 7 PROMIS Pediatric Scale v1.0 Global Health 7+2

GLOBAL HEALTH. PROMIS Pediatric Scale v1.0 Global Health 7 PROMIS Pediatric Scale v1.0 Global Health 7+2 GLOBAL HEALTH A brief guide to the PROMIS Global Health instruments: ADULT PEDIATRIC PARENT PROXY PROMIS Scale v1.0/1.1 Global Health* PROMIS Scale v1.2 Global Health PROMIS Scale v1.2 Global Mental 2a

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

INTRODUCTION TO ASSESSMENT OPTIONS

INTRODUCTION TO ASSESSMENT OPTIONS ASTHMA IMPACT A brief guide to the PROMIS Asthma Impact instruments: PEDIATRIC PROMIS Pediatric Item Bank v2.0 Asthma Impact PROMIS Pediatric Item Bank v1.0 Asthma Impact* PROMIS Pediatric Short Form v2.0

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

ANXIETY A brief guide to the PROMIS Anxiety instruments:

ANXIETY A brief guide to the PROMIS Anxiety instruments: ANXIETY A brief guide to the PROMIS Anxiety instruments: ADULT PEDIATRIC PARENT PROXY PROMIS Pediatric Bank v1.0 Anxiety PROMIS Pediatric Short Form v1.0 - Anxiety 8a PROMIS Item Bank v1.0 Anxiety PROMIS

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a COGNITIVE FUNCTION A brief guide to the PROMIS Cognitive Function instruments: ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Applied Cognition - Abilities* PROMIS Item Bank v1.0 Applied Cognition

More information

PHYSICAL STRESS EXPERIENCES

PHYSICAL STRESS EXPERIENCES PHYSICAL STRESS EXPERIENCES A brief guide to the PROMIS Physical Stress Experiences instruments: PEDIATRIC PROMIS Pediatric Bank v1.0 - Physical Stress Experiences PROMIS Pediatric Short Form v1.0 - Physical

More information

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Jonathan Templin Department of Educational Psychology Achievement and Assessment Institute

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Psych - Stat 2 Name Period Date MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) In a set of incomes in which most people are in the $15,000

More information

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Psych - Stat 1 Name Period Date MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) In a set of incomes in which most people are in the $15,000

More information

GENERAL SELF-EFFICACY AND SELF-EFFICACY FOR MANAGING CHRONIC CONDITIONS

GENERAL SELF-EFFICACY AND SELF-EFFICACY FOR MANAGING CHRONIC CONDITIONS GENERAL SELF-EFFICACY AND SELF-EFFICACY FOR MANAGING CHRONIC CONDITIONS A brief guide to the PROMIS Self-Efficacy Instruments ADULT PROMIS Item Bank v1.0 General Self-Efficacy PROMIS Short Form v1.0 General

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

A Broad-Range Tailored Test of Verbal Ability

A Broad-Range Tailored Test of Verbal Ability A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Utilizing the NIH Patient-Reported Outcomes Measurement Information System

Utilizing the NIH Patient-Reported Outcomes Measurement Information System www.nihpromis.org/ Utilizing the NIH Patient-Reported Outcomes Measurement Information System Thelma Mielenz, PhD Assistant Professor, Department of Epidemiology Columbia University, Mailman School of

More information

Yang Liu, Ph.D. Education. Professional History. Awards

Yang Liu, Ph.D. Education. Professional History. Awards , Ph.D. Department of Human Development and Quantitative Methodology 3942 Campus Drive University of Maryland College Park, MD 20742-1115 Office: 1230B Benjamin Building Phone: (301) 314-1126 Fax: (301)

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

ABOUT PHYSICAL ACTIVITY

ABOUT PHYSICAL ACTIVITY PHYSICAL ACTIVITY A brief guide to the PROMIS Physical Activity instruments: PEDIATRIC PROMIS Pediatric Item Bank v1.0 Physical Activity PROMIS Pediatric Short Form v1.0 Physical Activity 4a PROMIS Pediatric

More information

CFPB Financial Well-Being Scale

CFPB Financial Well-Being Scale May 2017 CFPB Financial Well-Being Scale Scale development technical report Table of contents Table of contents... 1 1. Introduction... 3 2. Defining financial well-being... 6 3. Overview of a typical

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Quick start guide for using subscale reports produced by Integrity

Quick start guide for using subscale reports produced by Integrity Quick start guide for using subscale reports produced by Integrity http://integrity.castlerockresearch.com Castle Rock Research Corp. November 17, 2005 Integrity refers to groups of items that measure

More information

INSPECT Overview and FAQs

INSPECT Overview and FAQs WWW.KEYDATASYS.COM ContactUs@KeyDataSys.com 951.245.0828 Table of Contents INSPECT Overview...3 What Comes with INSPECT?....4 Reliability and Validity of the INSPECT Item Bank. 5 The INSPECT Item Process......6

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time. Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time. While a team of scientists, veterinarians, zoologists and

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Global estimation of child mortality using a Bayesian B-spline bias-reduction method

Global estimation of child mortality using a Bayesian B-spline bias-reduction method Global estimation of child mortality using a Bayesian B-spline bias-reduction method Leontine Alkema and Jin Rou New Department of Statistics and Applied Probability, National University of Singapore,

More information

Computerized Adaptive Testing With the Bifactor Model

Computerized Adaptive Testing With the Bifactor Model Computerized Adaptive Testing With the Bifactor Model David J. Weiss University of Minnesota and Robert D. Gibbons Center for Health Statistics University of Illinois at Chicago Presented at the New CAT

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

Item Selection in Polytomous CAT

Item Selection in Polytomous CAT Item Selection in Polytomous CAT Bernard P. Veldkamp* Department of Educational Measurement and Data-Analysis, University of Twente, P.O.Box 217, 7500 AE Enschede, The etherlands 6XPPDU\,QSRO\WRPRXV&$7LWHPVFDQEHVHOHFWHGXVLQJ)LVKHU,QIRUPDWLRQ

More information

The Patient-Reported Outcomes Measurement Information

The Patient-Reported Outcomes Measurement Information ORIGINAL ARTICLE Practical Issues in the Application of Item Response Theory A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales Cheryl D. Hill, PhD,*

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Journal of Educational Measurement Summer 2010, Vol. 47, No. 2, pp. 227 249 Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Jimmy de la Torre and Yuan Hong

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Yang Liu, Ph.D. School of Social Sciences, Humanities and Arts University of California, Merced

Yang Liu, Ph.D. School of Social Sciences, Humanities and Arts University of California, Merced , Ph.D. Department of Human Development and Quantitative Methodology 3942 Campus Drive University of Maryland College Park, MD 20742-1115 Office: 1230B Benjamin Building Phone: (301) 314-1126 Fax: (301)

More information

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

Nondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX

Nondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX PROFICIENCY EVALUATION OF NDE PERSONNEL UTILIZING THE ULTRASONIC METHODOLOGY Mark K. Davis Nondestructive Inspection and Testing Section SA-ALC/MAQCN Kelly AFB, TX 7824-5000 INTRODUCTION The measure by

More information

The Influence of Conditioning Scores In Performing DIF Analyses

The Influence of Conditioning Scores In Performing DIF Analyses The Influence of Conditioning Scores In Performing DIF Analyses Terry A. Ackerman and John A. Evans University of Illinois The effect of the conditioning score on the results of differential item functioning

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

version User s Guide nnnnnnnnnnnnnnnnnnnnnn AUTOMATIC POULTRY SCALES BAT2 Lite

version User s Guide nnnnnnnnnnnnnnnnnnnnnn AUTOMATIC POULTRY SCALES BAT2 Lite version 1.02.0 User s Guide nnnnnnnnnnnnnnnnnnnnnn AUTOMATIC POULTRY SCALES BAT2 Lite 1. INTRODUCTION... 2 1.1. Scales Description... 2 1.2. Basic Technical Parameters... 2 1.3. Factory Setup of the Scales...

More information

A simple guide to IRT and Rasch 2

A simple guide to IRT and Rasch 2 A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling Chong Ho Yu, Ph.Ds Email: chonghoyu@gmail.com Website: http://www.creative-wisdom.com Updated: October 27, 2017 This document, which

More information

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan.

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan. HLM Cheating 1 A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance William Skorupski University of Kansas Karla Egan CTB/McGraw-Hill Paper presented at the May, 2012 Conference

More information

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Dr. KAMALPREET RAKHRA MD MPH PhD(Candidate) No conflict of interest Child Behavioural Check

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

The following page contains the final YODA Project review approving this proposal.

The following page contains the final YODA Project review approving this proposal. The YODA Project Research Proposal Review The following page contains the final YODA Project review approving this proposal. The Yale University Open Data Access (YODA) Project Yale University Center for

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Math Released Item Grade 3. Find the Area and Identify Equal Areas 1749-M23082

Math Released Item Grade 3. Find the Area and Identify Equal Areas 1749-M23082 Math Released Item 2018 Grade 3 Find the Area and Identify Equal Areas 1749-M23082 Anchor Set A1 A6 With Annotations Prompt 1749-M23082 Rubric Part A Score Description 1 This part of the item is machine

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

V. Measuring, Diagnosing, and Perhaps Understanding Objects

V. Measuring, Diagnosing, and Perhaps Understanding Objects V. Measuring, Diagnosing, and Perhaps Understanding Objects Our purpose when undertaking this venture was not to explain data or even to build better instruments. It may not seem like it based on the discussion

More information

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D. Psicológica (2009), 30, 343-370. SECCIÓN METODOLÓGICA Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data Zhen Li & Bruno D. Zumbo 1 University

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

RAG Rating Indicator Values

RAG Rating Indicator Values Technical Guide RAG Rating Indicator Values Introduction This document sets out Public Health England s standard approach to the use of RAG ratings for indicator values in relation to comparator or benchmark

More information

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix Table of Contents Preface to the third edition xiii Preface to the second edition xv Preface to the fi rst edition xvii List of abbreviations xix PART 1 Developing and Validating Instruments for Assessing

More information