Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Size: px
Start display at page:

Download "Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses"

Transcription

1 Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction, analysis, and scoring of psychological measures. A clear difference between traditional CTT and modern IRT psychometric methods is that the former is based on the representation of constructs through an aggregate composite score whereas the latter is based on the representation of constructs through a latent variable (or latent trait), which is assumed to underlie item responses. Moreover, IRT measurement models are formal statistical models that attempt to capture the interaction between person and item properties as they jointly determine an individual s response to an item (Embretson, 996). As such, IRT modeling rests on a set of testable assumptions, and IRT models can be statistically evaluated as to their fit to item-response data. From a technical perspective, IRT measurement models are closely related to confirmatory factor analytic models for ordinal data and have their origin in the work of Derrick Lawley (Lawley, 943), Frederick Lord (Lord, 98), and Darrell Bock (Bock & Aitkin, 98), among many others. From an applied perspective, IRT measurement models were developed to solve real-world practical problems in large-scale aptitude and achievement testing that were challenging and, in some cases impossible, under CTT psychometrics. It is only since the beginning of the century, however, that IRT models have been more commonly employed in the measurement of personality, psychopathology (Reise & Waller, 29), and medical outcomes constructs (Cella et al., 27). The defining features of IRT measurement models are the specification and estimation of the parameters of a mathematical function, typically a logistic function. This function is called an item response function (IRF), and its purpose is to model the relation between individual differences on a continuous latent trait construct (theta, ) and the probability of responding to a scale item (e.g., endorsing a true/false item in the keyed direction, or responding in category 3 on a five-point ordered rating item). Thus, in the following two sections, commonly applied IRT models appropriate for dichotomous and polytomous item response data are described. The discussion is restricted to unidimensional IRT models (i.e., models that assume only a single common latent variable underlies the covariances among item responses) because all the basic principles described here generalize easily to multidimensional IRT models. Interpretation of indices derived from IRT model parameters and assumptions underlying IRT models are then detailed, and common applications of IRT models computerized adaptive testing are presented. Unidimensional IRT Models for Dichotomous Item Responses In large-scale national and state-wide achievement testing, the most commonly used item response format is multiple choice, where responses are dichotomously scored as either correct () or incorrect (). Moreover, for many popular personality and psychopathology measures, the endorsed () versus not endorsed (), dichotomous yes/no, true/false, or agree/disagree response formats are very common. In all these cases, and assuming that asinglecommonlatentvariable() underlies item responses, a researcher may be interested in estimating an IRF to describe the relation between standing on a latent variable (representing the construct of interest) and the probability of endorsing an item in the keyed direction. The Encyclopedia of Clinical Psychology, First Edition. Edited by Robin L. Cautin and Scott O. Lilienfeld. 25 John Wiley & Sons, Inc. Published 25 by John Wiley & Sons, Inc. DOI: / wbecp357

2 2 ITEM RESPONSE THEORY The primary goal of fitting a dichotomous IRT model is to find an IRF that best representsorfitstheobserveditemresponsedata. To achieve this goal, one must select among several commonly applied models that vary in complexity. Dichotomous IRT models differ primarily in the number of item properties that need to be accounted for. The least complex IRT measurement model is the one-parameter logistic (PL) model shown in Equation, where the subscript i refers to an item, and x refers to a particular item response scored for not keyed/endorsed, and for a keyed/endorsed response. The ä represents the common-item slope parameter, and b is an item location parameter that is allowed to vary between items within a scale. P(x i = ) = exp(ä i ( b i )) () + exp(ä i ( b i )) The PL model in Equation, depending on its specification, is sometimes referred to as a Rasch model. Specifically if is fixed to for all items (for identification), and the variance of thelatentvariableisestimated,thentechnically EquationisaRaschmodel.If,asconsidered here,isconstantacrossitems,andthevariance of the latent variable is fixed to (for identification) then the model is more appropriately referred to as a PL model. In describing this model, and throughout this entry, it is assumed that the metric for the latent variable has been identifiedbyfixingitsmeantoanditsvariance to. Thus, assuming normality, the metric for the latent trait can be interpreted like a z-score. In the PL model, all items have a constant slope a, but items may differ from each other in their location parameter (represented by b). Item location parameters typically range between 2. to 2. and indicate the point on the latent trait metric where the probability of endorsing the item in the keyed direction is.5. Thus, the b parameter serves to shift the IRF from left to right along the latent trait continuum. Item location parameters are analogous,butnotequivalent,toitemproportion endorsed in traditional item analysis. Items that are commonly endorsed tend to have negative location parameters, and the IRF will be shifted to the left. Items that are rarely endorsed (endorsed by only individuals high on the latent variable) will have positive location parameters, and the IRF will be shifted to the right. To illustrate, Figure A shows the IRFs for three example scale items with a fixed to.5 and b parameters set to,, and, respectively. The a slope parameter, which typically ranges between. and 2., is often referred to as a discrimination parameter (constant across items within the PL model). It determines the steepness of the IRF at its inflection point. They are analogous to item-test biserial correlations in traditional scale analysis. Figures B and C show three items with the same location parameters as in Figure A, butwitha values of 2.5 and.5, respectively. These figures make clear the interpretation of the parameter; the higher the slope the more differentiating or discriminating the item is in the sense that response probabilities change rapidly as scores on the latent variable increase. This is especially true in the latent variable range around the item s location. The PL model requires all scale items to have equal slope, but items may vary in location, and thus items vary in where along the latent trait continuum they provide the most discrimination (see subsequent section). The PL model is analogous to the concept of essential tau-equivalence in classical test theory. A slightly more complex, or less restricted, model can be specified by allowing the scale items to vary in discrimination. This two-parameter logistic (2PL) model is shown in Equation 2. The2PLmodelisanalogoustotheconcept of congeneric measurement in classical test theory. P(x i = ) = exp(a i ( b i )) (2) + exp(a i ( b i )) In this model, items are allowed to vary in two ways slope (a) and location (b) and interpretation of parameters remains the same as in the PL model. Equation 2 states that

3 ITEM RESPONSE THEORY 3 (A) (B) (C) Figure A, IRFs, slope =.5. B, IRFs, slope = 2.5. C, IRFs, slope =.5. the probability of endorsing an item in the keyed direction is a function of the difference between an individual s standing on the latent variable () andtheitem slocationparameter (b), and this difference is weighted by the slope (a). Thus, for items with relatively low slope (discrimination), the difference between an individual s trait level and the item location has little effect on the response probability. In contrast, when the slope parameter is relatively high, differences between an individual s trait level and the item s location have a large effect on the response probability. To illustrate, Figure 2 displays the IRFs for two items that havethesamelocationparameters(b = ) but different slopes (.5 and.5, respectively). Clearly, the item with the larger slope provides relatively more discrimination around the middle of the trait continuum in the sense that the response probabilities are changing very rapidly. As a consequence, it is easier to discriminate between individuals who are in the middle of the trait range using the item with the larger slope, relative to the item with the lower slope. In the case of multiple-choice aptitude or achievement tests, where examinees who are lowonthelatentvariablecanobtainacorrect answer by guessing, a commonly applied IRT model is the three-parameter logistic model

4 4 ITEM RESPONSE THEORY Figure 2 IRFs, location =, slope =.5 and.5. (3PL), shown in Equation 3. exp(a P(x i = ) =c i +( c i ) i ( b i )) + exp(a i ( b i )) (3) Equation 3 expands the 2PL model by adding an additional parameter (c) often referred to as the pseudo-guessing or lower asymptote parameter. The c parameter is on a proportion metric and typically ranges between and 5 (for multiple-choice items with four response options).thevalueofthisparametersetsa boundary on the lower asymptote of the IRF such that, at low trait levels, the response probability never goes toward, but rather stays constant. To illustrate, Figure 3 displays the IRFs for three items, each having a slope of.5 and location of. but differing lower asymptote (,., and 5). It is important to notice that, in Figure 3, the interpretation of the location parameter has now changed slightly relative to its interpretation in the PL or 2PL models. Specifically, the location parameter remains the point on the latent trait continuum where the IRF is most steep (i.e., the inflection point) but this point no longer corresponds to the location on the latent trait at which the probability of endorsement is.5. Instead, the probability of endorsing an item in the keyed direction at = b is ( + c) / 2. Figure 3 IRFs, slope =.5, location =., lower =,., 5. The 3PL model has rarely been applied in personality or psychopathology measurement; nevertheless, an extension of this model to allow for both a nonzero lower asymptote or a non-one upper asymptote called a four-parameter logistic model (Equation 4) has received some attention (Reise & Waller, 29). exp(a P(x i = ) =c i +(d i c i ) i ( b i )) + exp(a i ( b i )) (4) Equation 4 is simply Equation 3 with replaced by d. This highly complex model may be appropriate to measurement contexts where the probability of endorsing an item (e.g., having a symptom) is not zero even for low trait individuals (e.g., endorsing sad moods within the last 7 days has a nonzero probability even for individuals who are low on depression). Conversely, the probability of endorsing an item (e.g., having a symptom) does not approach. even for individuals who are in the high range on the latent variable (e.g., suicide ideation within the last 7 days is not universally endorsed even for individuals at the highest levels of depression). To illustrate this model, Figure 4 displays a 4PL IRF with slope of.5, location of., lower asymptote of and upper asymptote of.9. In this model, the interpretation of the location parameter is even more complicated because it now reflects

5 ITEM RESPONSE THEORY Figure 4 IRF: a =.5, b =., c =., d =.9. the point on the latent trait scale where the response proportion is ( + c d)/2. Unidimensional IRT Models for Polytomous Item Responses When item responses are scored as ordered categories, polytomous IRT models are required. In the models for dichotomous items described above, only one IRF reflecting the probability of a keyed response needed to be estimated. This is because once the IRF is known, the probability of responding in the nonkeyed direction as a function of the latent variable is known by subtraction,thatis, P. Conceptually, one can think of a dichotomous item as having a single threshold between the nonkeyed () and keyed () response, and one goal of IRT modeling is to estimate this threshold via a location parameter indicating where on the latent variable a keyed response becomes more likely than anonkeyedone. With a polytomous item response format, the complexity of the IRT model increases, and a slightly different terminology is needed to describe response propensities. Specifically, instead of estimating a single IRF for a dichotomous item, a researcher needs to estimate K category response curve (CRCs) for a K category polytomous item. Here, let response categories be coded to K. Each CRC will model the relation between standing onthelatentvariableandtheprobabilityof responding to an item in a specific category. Although there are numerous potential polytomous IRT models that one may consider, the illustration is the graded-response model (GRM; Samejima, 969). In the GRM, for each item, a set of K- (b) location parameters needs to be estimated, and one common item slope (a). Stated differently, in the GRM, for each item, a set of K- 2PL IRFs are estimated with the slopes constrained to be equal within an item (but not between items).thesefunctionsarecalledthreshold response functions (TRF; Equation 5) with location parameters that indicate the trait level necessary to have a 5% chance of responding above one of the K- thresholds between the response categories. Pxi () = exp[a( b ji )] + exp[a( b ji )], (5) where, j = number of response categories minus, and x is the response category. For example, for a four-point item, three 2PL IRFs are estimated for responses versus, 2, 3; for responses, versus 2 and 3; for,, 2 versus 3. To illustrate, Figure 5A displays, from left to right, threshold response functions for a four-category item with the slope parameter equal to 2 and the location parameters of.,, and, respectively. Given the parameters for the threshold response functions, and the stipulation that the conditional probability of responding at least in the first category is, and the conditional probability of responding above the highest category is, the CRCs can be estimated by subtraction as shown below. To illustrate, Figure 5B shows the CRCs for the example item in Figure 5A. Going from left to right, the probability of responding in the lowest category (x = ) monotonically decreases as a function of trait level. For the middle two response categories (x = or2),responsepropensityis a unimodal function that increases and then

6 6 ITEM RESPONSE THEORY x = x = x = 2 x = 3 (A) (B) Figure 5 A, TRFs: a = 2., b =, b2 =, b3 =.. B, Category response curves. decreases as a function of trait level. Finally, the probability of responding in the highest category (x = 3) monotonically increases with increasing trait level. Observe that at any point on the latent trait continuum, the probabilities of category response sum to. Graded-response model item parameters are easily interpretable and determine the shapes and locations of the TRFs (and thus the CRCs). The higher the slope parameters, the steeper the TRCs and the narrower and more peaked the CRCs, indicating that the response categories differentiate among individuals at different trait levels well. The threshold parameters (b) determine the location of the TRFs andwhereeachofthecrcsforthemiddle response options peak. Specifically, the CRCs peak in the middle of two adjacent threshold parameters. The distances between adjacent location parameters are also important. A large distance between locations shows that an item discriminates across the entire trait range. Ideally, an item will be highly discriminating (high slope) and will have location parameters spread out across an appropriate range of the trait. Finally, it is important to note that CRCs for a polytomous item can be aggregated into a single IRF that is analogous to the IRF in the dichotomous models. By weighting the CRCs (i.e., the conditional probabilities of responding in a specific category) by the integers used to score the responses (e.g.,,, 2, 3), an item-response curve (IRC) for a polytomous item is obtained. K E(X i ) =IRC i = xp xi () (6) x= The one important difference is that the y-axis for a polytomous model will range from to K- (assuming categories scored to K-), whereas the y-axis for an IRF for a dichotomous model will range between and. The IRC in Figure 6 displays how the expected raw Expected score Figure 6 Item response curve.

7 ITEM RESPONSE THEORY 7 scoreonanitemchangesasafunctionofthe latent trait for the example item. Model Features: Information and Conditional Standard Errors Interpretation of the parameter estimates of the models described above is critical to the psychometricanalysisofaninstrument.generally speaking, researchers are most concerned that the items provide good discrimination and that the location parameters are spread out (between items in dichotomous models, and within items for polytomous models) across the full range of the latent trait continuum. However,toaidinthepsychometricassessment of a set of scale items, IRT modeling provides several useful tools that are derived from the estimated item parameters. Most useful are the item and scale information functions and the corresponding conditional standard error function, described below. For any item, once the model parameters are estimated (i.e., the IRFs are known), their values can be transformed into an item-information function (IIF). An IIF describes how much psychometric information, or discrimination, an item provides at each level of the latent variable. For dichotomous items, items with higher discrimination (slope parameters) provide more information, and the position along the latent variable continuum where that information is concentrated is determined by the item s location. Some items may provide information in the high trait range, whereas others differentiate best among low-trait individuals or among individuals in the middle of the trait range. For polytomous items, similar principles apply in that items with higher slopes provide more information, and the concentration of the information is peaked around the item s location parameters. However, because polytomous items have multiple location parameters ideally spread across the latent trait continuum polytomous information functions tend to spread the information out across the trait range. Indeed, that is the entire purpose of a polytomous response format to allow one item to make multiple (and hopefully meaningful) distinctions between people across the trait range. To illustrate the concept of information, Figure 7A shows the IRFs for five items that vary widely in slope and location. Figure 7B displays the corresponding item information functions and the scale information function derived by summing the IIFs across the five items. Item information, considered alone, is difficult to interpret because its metric has no simple definition. However, as described below, item information is critically related to the conditional standard error. Specifically, assuming that items are locally independent after controlling for the latent variable (see next section), IIFs are additive across items within a scale. Thus, a researcher can easily create a scale-information function (SIF) that indicates the amount of psychometric information an item set provides at each trait level. Then, the square root of divided by the scale information yields the conditional standard errorofthemaximumlikelihoodtrait-level estimate. When this transformation is made, the resulting function is a standard error function, indicating how precisely trait levels canbeestimated.thisfunctionisshownin Figure 7C for the five example items. The SIF and resulting standard error function are extremely useful in scale or short-form construction and in diagnosing the strengths and weaknesses of various instruments. They are also valuable in designing instruments to meet specific measurement needs (e.g., selecting items to differentiate best among high trait individuals). Item Response Theory Model Assumptions and Consequences The utility of latent variable measurement models depends critically on the extent to which the data meets the assumptions. Moreover, even if data are consistent with the requirements of IRT modeling, after model parameter estimation one then needs to show that the selected model provides an acceptable

8 8 ITEM RESPONSE THEORY Information (A) (B) Standard error (C) Figure 7 A,IRFs for five items. B, IIFand SIF for five items. C, Standard error function for five items. fit to the data. This section considers the former topic IRT modeling assumptions only. The complex topic of fit assessment is difficult to summarize, and readers are referred to the recommended readings provided in the Further Reading section. Commonly applied IRT models make three fundamental assumptions about item-response data. First, they assume that there is a fully continuous dimensional latent variable (or variables for multidimensional IRT models) that underlies the reliable item response variance. If there is no continuous underlying latent trait then estimating a latent trait measurement model is a meaningless exercise because model parameters would have no sensible interpretation. Second, IRT models assume that response probabilities are monotonically increasing; as individuals increase on the latent variable, their probabilities of endorsing a dichotomous item, or responding in a higher response category in a polytomous item, increase. This is a necessary assumption because the parametric models described above fit (or force) monotonically increasing IRFs onto the data. Alternative nonparametric and parametric (e.g., unfolding) IRT models are available when this assumption is not met but these are beyond the scope of the present discussion.

9 ITEM RESPONSE THEORY 9 The most critical assumption, and the one that has drawn the most research attention, is that item responses be locally independent (uncorrelated) after controlling for the latent variable (or latent variables in multidimensional IRT models). In unidimensional IRT models,itmustbeassumedthatallthecommonvarianceinanitemsetcanbeexplained by a single common factor; this is analogous to no correlated residuals in structural equation modeling. When the local independence assumption is not met (or at least well approximated), item parameter estimates can be biased because the latent trait is not properly specified. In turn, all derived functions from the item parameter estimates, such as the item or scale information and standard error, may also be erroneous to some degree depending on the severity of the violation. The most serious consequence of a local independence violation is that IRT modelsmaylosetheirmostimportantproperty, namely,thatofinvarianceofitemandperson parameters. The concepts of item and person invariance are commonly misunderstood. Simply stated, item-parameter invariance means that an item s parameters do not depend on the otheritemsthatareincludedintheanalysisor thesubsampleofthepopulationthatisusedto calibrate the item parameters within a linear transformation. Person-parameter invariance means that an individual s standing on the latent variable does not depend on which items are administered, again within a linear transformation. These item- and person-parameter invariance properties depend entirely on meeting the IRT model s assumptions. When the assumptions described above are not met, especially local independence, all the applications of IRT modeling, including those described in thenextsection,arequestionable. Item Response Theory Applications Beyond providing a more informed basis for basic psychometric analysis, the increasing popularity of IRT models is driven by their utility. For example, in large-scale aptitude and achievement testing, IRT models are used to link the scales for different versions of a test administered to different examinee subgroups so that scores (latent trait estimates) areonthesamescale(i.e.,comparable).more generally, across a wide range of disciplines, IRTmodelshavebeenusedtoformthebasis for computerized adaptive testing (CAT), and the examination of measurement equivalence across socio-demographic groups. These two topics, which depend critically on the assumption of item and person parameter invariance, arebrieflyreviewedbelow. The creation of a precalibrated item pool (i.e., a set of items measuring the same trait with known IRT model parameters) and the efficient administration of a subset of items tailored to an individual s trait level, is an attractive alternative to the CTT counterpart of short form creation. A simple CAT algorithm may begin by administering one or more items with location parameters in the middle of the trait range. The individual s responses are then used to estimate a person s position on the latent trait continuum. If, for example, they are estimated to be relatively high on the latent variable, a new item that has a higher location parameter is administered, scored, and the response is used to update the estimate of an individual s trait standing. This process continues until either the individual has responded to a predetermined number of items or their standard error falls below some threshold. The key to CAT is that individuals are being administered the items most relevant to differentiating among people in their trait range. In theory, high-trait individuals would receive only hard items, whereas low-trait individuals would receive only easy items. In this way, individuals do not waste their time responding to items that are not discriminating because their endorsement probability is either nearly or close to. A second popular application of IRT models is that it forms the basis for modern explorations of measurement invariance hypotheses, traditionally called item-bias analysis but now known as differential item functioning analysis

10 ITEM RESPONSE THEORY (DIF analysis). Because of parameter invariance (within a linear transformation), items may be calibrated in two sociodemographic samples that differ in mean and variance on the latent trait but the IRT item parameter estimatesinonesamplecanbeplacedonto thesamescaleastheitemparameterestimates in the second sample. The IRFs estimated separately in the two groups may then be tested for equivalence. If equivalence is found, a researcher may conclude that the item functions the same as a trait indicator across the groups, and a common set of item parameters maybeused.if,ontheotherhand,theirfs differ in slope or location after being placed onto a common metric, a researcher may conclude that the item functions differently for the two groups. In other words, if the IRFs for the same item estimated in the two samples are not equal, then conditionally on any trait level, one group will have a higher (or lower) expected score on the item. Depending on the severity of DIF, it may be difficult to validly apply the measure in different groups of examinees. SEE ALSO: Coefficient Alpha and Coefficient Omega hierarchical ; Item Response Theory, Approach to Test Construction; Measurement Invariance; Reliability; Scale Development; Structural Equation Modeling References Bock, R. D., & Aitkin, M. (98). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, Cella,D.,Yount,S.,Rothrock,N.,Gershon,R., Cook, K., Reeve, B.... Rose, M. (27). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl.), S3 S. Embretson, S. E. (996). The new rules of measurement. Psychological Assessment, 8, Lawley, D. N. (943). The application of maximum likelihood method to factor analysis. British Journal of Psychology, General Section, 33, Lord, F. M. (98). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Reise, S. P., & Waller, N. G. (29). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, Samejima, F. (969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, part 2),. Further Reading de Ayala, R. J. (29). The theory and practice of item response theory. New York: Guilford Press. Embretson, S. E., & Reise, S. P. (2). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Millsap, R. E. (2). Statistical approaches to measurement invariance.newyork,ny: Routledge. Thissen, D., & Wainer, H. (2). Test scoring. Mahwah, NJ: Erlbaum. Wainer, H. (2). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Erlbaum.

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc.

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc. IRT - 1 Item Response Theory Robert J. Harvey Virginia Polytechnic Institute & State University Allen L. Hammer Consulting Psychologists Press, Inc. IRT - 2 Abstract Item response theory (IRT) methods

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Survey Sampling Weights and Item Response Parameter Estimation

Survey Sampling Weights and Item Response Parameter Estimation Survey Sampling Weights and Item Response Parameter Estimation Spring 2014 Survey Methodology Simmons School of Education and Human Development Center on Research & Evaluation Paul Yovanoff, Ph.D. Department

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

Adaptive EAP Estimation of Ability

Adaptive EAP Estimation of Ability Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 3-008 Scoring Multiple Choice Items: A Comparison of IRT and Classical

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2009 Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Bradley R. Schlessman

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Rasch Versus Birnbaum: New Arguments in an Old Debate

Rasch Versus Birnbaum: New Arguments in an Old Debate White Paper Rasch Versus Birnbaum: by John Richard Bergan, Ph.D. ATI TM 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax: 520.323.9139 Copyright 2013. All rights reserved. Galileo

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Termination Criteria in Computerized Adaptive Tests: Variable-Length CATs Are Not Biased. Ben Babcock and David J. Weiss University of Minnesota

Termination Criteria in Computerized Adaptive Tests: Variable-Length CATs Are Not Biased. Ben Babcock and David J. Weiss University of Minnesota Termination Criteria in Computerized Adaptive Tests: Variable-Length CATs Are Not Biased Ben Babcock and David J. Weiss University of Minnesota Presented at the Realities of CAT Paper Session, June 2,

More information

Utilizing the NIH Patient-Reported Outcomes Measurement Information System

Utilizing the NIH Patient-Reported Outcomes Measurement Information System www.nihpromis.org/ Utilizing the NIH Patient-Reported Outcomes Measurement Information System Thelma Mielenz, PhD Assistant Professor, Department of Epidemiology Columbia University, Mailman School of

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Item Response Theory and Health Outcomes Measurement in the 21st Century

Item Response Theory and Health Outcomes Measurement in the 21st Century MEDICAL CARE Volume 38, Number 9 Supplement II, pp II-28 II-42 2000 Lippincott Williams & Wilkins, Inc. Item Response Theory and Health Outcomes Measurement in the 21st Century RON D. HAYS, PHD,* LEO S.

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Item Response Theory. Author's personal copy. Glossary

Item Response Theory. Author's personal copy. Glossary Item Response Theory W J van der Linden, CTB/McGraw-Hill, Monterey, CA, USA ã 2010 Elsevier Ltd. All rights reserved. Glossary Ability parameter Parameter in a response model that represents the person

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

Multidimensional Item Response Theory in Clinical Measurement: A Bifactor Graded- Response Model Analysis of the Outcome- Questionnaire-45.

Multidimensional Item Response Theory in Clinical Measurement: A Bifactor Graded- Response Model Analysis of the Outcome- Questionnaire-45. Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2012-05-22 Multidimensional Item Response Theory in Clinical Measurement: A Bifactor Graded- Response Model Analysis of the Outcome-

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance Psychological Bulletin 1993, Vol. 114, No. 3, 552-566 Copyright 1993 by the American Psychological Association, Inc 0033-2909/93/S3.00 Confirmatory Factor Analysis and Item Response Theory: Two Approaches

More information

The Patient-Reported Outcomes Measurement Information

The Patient-Reported Outcomes Measurement Information ORIGINAL ARTICLE Practical Issues in the Application of Item Response Theory A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales Cheryl D. Hill, PhD,*

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Using the Score-based Testlet Method to Handle Local Item Dependence

Using the Score-based Testlet Method to Handle Local Item Dependence Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.

More information

An item response theory analysis of Wong and Law emotional intelligence scale

An item response theory analysis of Wong and Law emotional intelligence scale Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 2 (2010) 4038 4047 WCES-2010 An item response theory analysis of Wong and Law emotional intelligence scale Jahanvash Karim

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Information Structure for Geometric Analogies: A Test Theory Approach

Information Structure for Geometric Analogies: A Test Theory Approach Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

Scaling TOWES and Linking to IALS

Scaling TOWES and Linking to IALS Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Introduction to Item Response Theory

Introduction to Item Response Theory Introduction to Item Response Theory Prof John Rust, j.rust@jbs.cam.ac.uk David Stillwell, ds617@cam.ac.uk Aiden Loe, bsl28@cam.ac.uk Luning Sun, ls523@cam.ac.uk www.psychometrics.cam.ac.uk Goals Build

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Copyright. Kelly Diane Brune

Copyright. Kelly Diane Brune Copyright by Kelly Diane Brune 2011 The Dissertation Committee for Kelly Diane Brune Certifies that this is the approved version of the following dissertation: An Evaluation of Item Difficulty and Person

More information

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING BY John Michael Clark III Submitted to the graduate degree program in Psychology and

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN By FRANCISCO ANDRES JIMENEZ A THESIS PRESENTED TO THE GRADUATE SCHOOL OF

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms

Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms Dato N. M. de Gruijter University of Leiden John H. A. L. de Jong Dutch Institute for Educational Measurement (CITO)

More information

Modeling DIF with the Rasch Model: The Unfortunate Combination of Mean Ability Differences and Guessing

Modeling DIF with the Rasch Model: The Unfortunate Combination of Mean Ability Differences and Guessing James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 4-2014 Modeling DIF with the Rasch Model: The Unfortunate Combination

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY. Alison A. Broadfoot.

COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY. Alison A. Broadfoot. COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY Alison A. Broadfoot A Dissertation Submitted to the Graduate College of Bowling Green State

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis Structural Equation Modeling, 15:676 704, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print/1532-8007 online DOI: 10.1080/10705510802339080 TEACHER S CORNER Incorporating Measurement Nonequivalence

More information

Exploring rater errors and systematic biases using adjacent-categories Mokken models

Exploring rater errors and systematic biases using adjacent-categories Mokken models Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.

More information

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT

More information

PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET,

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)

Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref) Qual Life Res (2008) 17:275 290 DOI 10.1007/s11136-007-9281-6 Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)

More information

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

Computerized Adaptive Testing With the Bifactor Model

Computerized Adaptive Testing With the Bifactor Model Computerized Adaptive Testing With the Bifactor Model David J. Weiss University of Minnesota and Robert D. Gibbons Center for Health Statistics University of Illinois at Chicago Presented at the New CAT

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

How Many IRT Parameters Does It Take to Model Psychopathology Items?

How Many IRT Parameters Does It Take to Model Psychopathology Items? Psychological Methods Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 8, No. 2, 164 184 1082-989X/03/$12.00 DOI: 10.1037/1082-989X.8.2.164 How Many IRT Parameters Does It Take

More information

The Use of Item Statistics in the Calibration of an Item Bank

The Use of Item Statistics in the Calibration of an Item Bank ~ - -., The Use of Item Statistics in the Calibration of an Item Bank Dato N. M. de Gruijter University of Leyden An IRT analysis based on p (proportion correct) and r (item-test correlation) is proposed

More information

Item Selection in Polytomous CAT

Item Selection in Polytomous CAT Item Selection in Polytomous CAT Bernard P. Veldkamp* Department of Educational Measurement and Data-Analysis, University of Twente, P.O.Box 217, 7500 AE Enschede, The etherlands 6XPPDU\,QSRO\WRPRXV&$7LWHPVFDQEHVHOHFWHGXVLQJ)LVKHU,QIRUPDWLRQ

More information

CHAPTER 7 RESEARCH DESIGN AND METHODOLOGY. This chapter addresses the research design and describes the research methodology

CHAPTER 7 RESEARCH DESIGN AND METHODOLOGY. This chapter addresses the research design and describes the research methodology CHAPTER 7 RESEARCH DESIGN AND METHODOLOGY 7.1 Introduction This chapter addresses the research design and describes the research methodology employed in this study. The sample and sampling procedure is

More information

PROMIS DEPRESSION AND CES-D

PROMIS DEPRESSION AND CES-D PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS DEPRESSION AND CES-D SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON F. COOK & DAVID CELLA

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales University of Iowa Iowa Research Online Theses and Dissertations Summer 2013 Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales Anna Marie

More information

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION A Dissertation Presented to The Academic Faculty by Justin A. DeSimone In Partial Fulfillment of the Requirements

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information