Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses
|
|
- Cameron King
- 5 years ago
- Views:
Transcription
1 Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction, analysis, and scoring of psychological measures. A clear difference between traditional CTT and modern IRT psychometric methods is that the former is based on the representation of constructs through an aggregate composite score whereas the latter is based on the representation of constructs through a latent variable (or latent trait), which is assumed to underlie item responses. Moreover, IRT measurement models are formal statistical models that attempt to capture the interaction between person and item properties as they jointly determine an individual s response to an item (Embretson, 996). As such, IRT modeling rests on a set of testable assumptions, and IRT models can be statistically evaluated as to their fit to item-response data. From a technical perspective, IRT measurement models are closely related to confirmatory factor analytic models for ordinal data and have their origin in the work of Derrick Lawley (Lawley, 943), Frederick Lord (Lord, 98), and Darrell Bock (Bock & Aitkin, 98), among many others. From an applied perspective, IRT measurement models were developed to solve real-world practical problems in large-scale aptitude and achievement testing that were challenging and, in some cases impossible, under CTT psychometrics. It is only since the beginning of the century, however, that IRT models have been more commonly employed in the measurement of personality, psychopathology (Reise & Waller, 29), and medical outcomes constructs (Cella et al., 27). The defining features of IRT measurement models are the specification and estimation of the parameters of a mathematical function, typically a logistic function. This function is called an item response function (IRF), and its purpose is to model the relation between individual differences on a continuous latent trait construct (theta, ) and the probability of responding to a scale item (e.g., endorsing a true/false item in the keyed direction, or responding in category 3 on a five-point ordered rating item). Thus, in the following two sections, commonly applied IRT models appropriate for dichotomous and polytomous item response data are described. The discussion is restricted to unidimensional IRT models (i.e., models that assume only a single common latent variable underlies the covariances among item responses) because all the basic principles described here generalize easily to multidimensional IRT models. Interpretation of indices derived from IRT model parameters and assumptions underlying IRT models are then detailed, and common applications of IRT models computerized adaptive testing are presented. Unidimensional IRT Models for Dichotomous Item Responses In large-scale national and state-wide achievement testing, the most commonly used item response format is multiple choice, where responses are dichotomously scored as either correct () or incorrect (). Moreover, for many popular personality and psychopathology measures, the endorsed () versus not endorsed (), dichotomous yes/no, true/false, or agree/disagree response formats are very common. In all these cases, and assuming that asinglecommonlatentvariable() underlies item responses, a researcher may be interested in estimating an IRF to describe the relation between standing on a latent variable (representing the construct of interest) and the probability of endorsing an item in the keyed direction. The Encyclopedia of Clinical Psychology, First Edition. Edited by Robin L. Cautin and Scott O. Lilienfeld. 25 John Wiley & Sons, Inc. Published 25 by John Wiley & Sons, Inc. DOI: / wbecp357
2 2 ITEM RESPONSE THEORY The primary goal of fitting a dichotomous IRT model is to find an IRF that best representsorfitstheobserveditemresponsedata. To achieve this goal, one must select among several commonly applied models that vary in complexity. Dichotomous IRT models differ primarily in the number of item properties that need to be accounted for. The least complex IRT measurement model is the one-parameter logistic (PL) model shown in Equation, where the subscript i refers to an item, and x refers to a particular item response scored for not keyed/endorsed, and for a keyed/endorsed response. The ä represents the common-item slope parameter, and b is an item location parameter that is allowed to vary between items within a scale. P(x i = ) = exp(ä i ( b i )) () + exp(ä i ( b i )) The PL model in Equation, depending on its specification, is sometimes referred to as a Rasch model. Specifically if is fixed to for all items (for identification), and the variance of thelatentvariableisestimated,thentechnically EquationisaRaschmodel.If,asconsidered here,isconstantacrossitems,andthevariance of the latent variable is fixed to (for identification) then the model is more appropriately referred to as a PL model. In describing this model, and throughout this entry, it is assumed that the metric for the latent variable has been identifiedbyfixingitsmeantoanditsvariance to. Thus, assuming normality, the metric for the latent trait can be interpreted like a z-score. In the PL model, all items have a constant slope a, but items may differ from each other in their location parameter (represented by b). Item location parameters typically range between 2. to 2. and indicate the point on the latent trait metric where the probability of endorsing the item in the keyed direction is.5. Thus, the b parameter serves to shift the IRF from left to right along the latent trait continuum. Item location parameters are analogous,butnotequivalent,toitemproportion endorsed in traditional item analysis. Items that are commonly endorsed tend to have negative location parameters, and the IRF will be shifted to the left. Items that are rarely endorsed (endorsed by only individuals high on the latent variable) will have positive location parameters, and the IRF will be shifted to the right. To illustrate, Figure A shows the IRFs for three example scale items with a fixed to.5 and b parameters set to,, and, respectively. The a slope parameter, which typically ranges between. and 2., is often referred to as a discrimination parameter (constant across items within the PL model). It determines the steepness of the IRF at its inflection point. They are analogous to item-test biserial correlations in traditional scale analysis. Figures B and C show three items with the same location parameters as in Figure A, butwitha values of 2.5 and.5, respectively. These figures make clear the interpretation of the parameter; the higher the slope the more differentiating or discriminating the item is in the sense that response probabilities change rapidly as scores on the latent variable increase. This is especially true in the latent variable range around the item s location. The PL model requires all scale items to have equal slope, but items may vary in location, and thus items vary in where along the latent trait continuum they provide the most discrimination (see subsequent section). The PL model is analogous to the concept of essential tau-equivalence in classical test theory. A slightly more complex, or less restricted, model can be specified by allowing the scale items to vary in discrimination. This two-parameter logistic (2PL) model is shown in Equation 2. The2PLmodelisanalogoustotheconcept of congeneric measurement in classical test theory. P(x i = ) = exp(a i ( b i )) (2) + exp(a i ( b i )) In this model, items are allowed to vary in two ways slope (a) and location (b) and interpretation of parameters remains the same as in the PL model. Equation 2 states that
3 ITEM RESPONSE THEORY 3 (A) (B) (C) Figure A, IRFs, slope =.5. B, IRFs, slope = 2.5. C, IRFs, slope =.5. the probability of endorsing an item in the keyed direction is a function of the difference between an individual s standing on the latent variable () andtheitem slocationparameter (b), and this difference is weighted by the slope (a). Thus, for items with relatively low slope (discrimination), the difference between an individual s trait level and the item location has little effect on the response probability. In contrast, when the slope parameter is relatively high, differences between an individual s trait level and the item s location have a large effect on the response probability. To illustrate, Figure 2 displays the IRFs for two items that havethesamelocationparameters(b = ) but different slopes (.5 and.5, respectively). Clearly, the item with the larger slope provides relatively more discrimination around the middle of the trait continuum in the sense that the response probabilities are changing very rapidly. As a consequence, it is easier to discriminate between individuals who are in the middle of the trait range using the item with the larger slope, relative to the item with the lower slope. In the case of multiple-choice aptitude or achievement tests, where examinees who are lowonthelatentvariablecanobtainacorrect answer by guessing, a commonly applied IRT model is the three-parameter logistic model
4 4 ITEM RESPONSE THEORY Figure 2 IRFs, location =, slope =.5 and.5. (3PL), shown in Equation 3. exp(a P(x i = ) =c i +( c i ) i ( b i )) + exp(a i ( b i )) (3) Equation 3 expands the 2PL model by adding an additional parameter (c) often referred to as the pseudo-guessing or lower asymptote parameter. The c parameter is on a proportion metric and typically ranges between and 5 (for multiple-choice items with four response options).thevalueofthisparametersetsa boundary on the lower asymptote of the IRF such that, at low trait levels, the response probability never goes toward, but rather stays constant. To illustrate, Figure 3 displays the IRFs for three items, each having a slope of.5 and location of. but differing lower asymptote (,., and 5). It is important to notice that, in Figure 3, the interpretation of the location parameter has now changed slightly relative to its interpretation in the PL or 2PL models. Specifically, the location parameter remains the point on the latent trait continuum where the IRF is most steep (i.e., the inflection point) but this point no longer corresponds to the location on the latent trait at which the probability of endorsement is.5. Instead, the probability of endorsing an item in the keyed direction at = b is ( + c) / 2. Figure 3 IRFs, slope =.5, location =., lower =,., 5. The 3PL model has rarely been applied in personality or psychopathology measurement; nevertheless, an extension of this model to allow for both a nonzero lower asymptote or a non-one upper asymptote called a four-parameter logistic model (Equation 4) has received some attention (Reise & Waller, 29). exp(a P(x i = ) =c i +(d i c i ) i ( b i )) + exp(a i ( b i )) (4) Equation 4 is simply Equation 3 with replaced by d. This highly complex model may be appropriate to measurement contexts where the probability of endorsing an item (e.g., having a symptom) is not zero even for low trait individuals (e.g., endorsing sad moods within the last 7 days has a nonzero probability even for individuals who are low on depression). Conversely, the probability of endorsing an item (e.g., having a symptom) does not approach. even for individuals who are in the high range on the latent variable (e.g., suicide ideation within the last 7 days is not universally endorsed even for individuals at the highest levels of depression). To illustrate this model, Figure 4 displays a 4PL IRF with slope of.5, location of., lower asymptote of and upper asymptote of.9. In this model, the interpretation of the location parameter is even more complicated because it now reflects
5 ITEM RESPONSE THEORY Figure 4 IRF: a =.5, b =., c =., d =.9. the point on the latent trait scale where the response proportion is ( + c d)/2. Unidimensional IRT Models for Polytomous Item Responses When item responses are scored as ordered categories, polytomous IRT models are required. In the models for dichotomous items described above, only one IRF reflecting the probability of a keyed response needed to be estimated. This is because once the IRF is known, the probability of responding in the nonkeyed direction as a function of the latent variable is known by subtraction,thatis, P. Conceptually, one can think of a dichotomous item as having a single threshold between the nonkeyed () and keyed () response, and one goal of IRT modeling is to estimate this threshold via a location parameter indicating where on the latent variable a keyed response becomes more likely than anonkeyedone. With a polytomous item response format, the complexity of the IRT model increases, and a slightly different terminology is needed to describe response propensities. Specifically, instead of estimating a single IRF for a dichotomous item, a researcher needs to estimate K category response curve (CRCs) for a K category polytomous item. Here, let response categories be coded to K. Each CRC will model the relation between standing onthelatentvariableandtheprobabilityof responding to an item in a specific category. Although there are numerous potential polytomous IRT models that one may consider, the illustration is the graded-response model (GRM; Samejima, 969). In the GRM, for each item, a set of K- (b) location parameters needs to be estimated, and one common item slope (a). Stated differently, in the GRM, for each item, a set of K- 2PL IRFs are estimated with the slopes constrained to be equal within an item (but not between items).thesefunctionsarecalledthreshold response functions (TRF; Equation 5) with location parameters that indicate the trait level necessary to have a 5% chance of responding above one of the K- thresholds between the response categories. Pxi () = exp[a( b ji )] + exp[a( b ji )], (5) where, j = number of response categories minus, and x is the response category. For example, for a four-point item, three 2PL IRFs are estimated for responses versus, 2, 3; for responses, versus 2 and 3; for,, 2 versus 3. To illustrate, Figure 5A displays, from left to right, threshold response functions for a four-category item with the slope parameter equal to 2 and the location parameters of.,, and, respectively. Given the parameters for the threshold response functions, and the stipulation that the conditional probability of responding at least in the first category is, and the conditional probability of responding above the highest category is, the CRCs can be estimated by subtraction as shown below. To illustrate, Figure 5B shows the CRCs for the example item in Figure 5A. Going from left to right, the probability of responding in the lowest category (x = ) monotonically decreases as a function of trait level. For the middle two response categories (x = or2),responsepropensityis a unimodal function that increases and then
6 6 ITEM RESPONSE THEORY x = x = x = 2 x = 3 (A) (B) Figure 5 A, TRFs: a = 2., b =, b2 =, b3 =.. B, Category response curves. decreases as a function of trait level. Finally, the probability of responding in the highest category (x = 3) monotonically increases with increasing trait level. Observe that at any point on the latent trait continuum, the probabilities of category response sum to. Graded-response model item parameters are easily interpretable and determine the shapes and locations of the TRFs (and thus the CRCs). The higher the slope parameters, the steeper the TRCs and the narrower and more peaked the CRCs, indicating that the response categories differentiate among individuals at different trait levels well. The threshold parameters (b) determine the location of the TRFs andwhereeachofthecrcsforthemiddle response options peak. Specifically, the CRCs peak in the middle of two adjacent threshold parameters. The distances between adjacent location parameters are also important. A large distance between locations shows that an item discriminates across the entire trait range. Ideally, an item will be highly discriminating (high slope) and will have location parameters spread out across an appropriate range of the trait. Finally, it is important to note that CRCs for a polytomous item can be aggregated into a single IRF that is analogous to the IRF in the dichotomous models. By weighting the CRCs (i.e., the conditional probabilities of responding in a specific category) by the integers used to score the responses (e.g.,,, 2, 3), an item-response curve (IRC) for a polytomous item is obtained. K E(X i ) =IRC i = xp xi () (6) x= The one important difference is that the y-axis for a polytomous model will range from to K- (assuming categories scored to K-), whereas the y-axis for an IRF for a dichotomous model will range between and. The IRC in Figure 6 displays how the expected raw Expected score Figure 6 Item response curve.
7 ITEM RESPONSE THEORY 7 scoreonanitemchangesasafunctionofthe latent trait for the example item. Model Features: Information and Conditional Standard Errors Interpretation of the parameter estimates of the models described above is critical to the psychometricanalysisofaninstrument.generally speaking, researchers are most concerned that the items provide good discrimination and that the location parameters are spread out (between items in dichotomous models, and within items for polytomous models) across the full range of the latent trait continuum. However,toaidinthepsychometricassessment of a set of scale items, IRT modeling provides several useful tools that are derived from the estimated item parameters. Most useful are the item and scale information functions and the corresponding conditional standard error function, described below. For any item, once the model parameters are estimated (i.e., the IRFs are known), their values can be transformed into an item-information function (IIF). An IIF describes how much psychometric information, or discrimination, an item provides at each level of the latent variable. For dichotomous items, items with higher discrimination (slope parameters) provide more information, and the position along the latent variable continuum where that information is concentrated is determined by the item s location. Some items may provide information in the high trait range, whereas others differentiate best among low-trait individuals or among individuals in the middle of the trait range. For polytomous items, similar principles apply in that items with higher slopes provide more information, and the concentration of the information is peaked around the item s location parameters. However, because polytomous items have multiple location parameters ideally spread across the latent trait continuum polytomous information functions tend to spread the information out across the trait range. Indeed, that is the entire purpose of a polytomous response format to allow one item to make multiple (and hopefully meaningful) distinctions between people across the trait range. To illustrate the concept of information, Figure 7A shows the IRFs for five items that vary widely in slope and location. Figure 7B displays the corresponding item information functions and the scale information function derived by summing the IIFs across the five items. Item information, considered alone, is difficult to interpret because its metric has no simple definition. However, as described below, item information is critically related to the conditional standard error. Specifically, assuming that items are locally independent after controlling for the latent variable (see next section), IIFs are additive across items within a scale. Thus, a researcher can easily create a scale-information function (SIF) that indicates the amount of psychometric information an item set provides at each trait level. Then, the square root of divided by the scale information yields the conditional standard errorofthemaximumlikelihoodtrait-level estimate. When this transformation is made, the resulting function is a standard error function, indicating how precisely trait levels canbeestimated.thisfunctionisshownin Figure 7C for the five example items. The SIF and resulting standard error function are extremely useful in scale or short-form construction and in diagnosing the strengths and weaknesses of various instruments. They are also valuable in designing instruments to meet specific measurement needs (e.g., selecting items to differentiate best among high trait individuals). Item Response Theory Model Assumptions and Consequences The utility of latent variable measurement models depends critically on the extent to which the data meets the assumptions. Moreover, even if data are consistent with the requirements of IRT modeling, after model parameter estimation one then needs to show that the selected model provides an acceptable
8 8 ITEM RESPONSE THEORY Information (A) (B) Standard error (C) Figure 7 A,IRFs for five items. B, IIFand SIF for five items. C, Standard error function for five items. fit to the data. This section considers the former topic IRT modeling assumptions only. The complex topic of fit assessment is difficult to summarize, and readers are referred to the recommended readings provided in the Further Reading section. Commonly applied IRT models make three fundamental assumptions about item-response data. First, they assume that there is a fully continuous dimensional latent variable (or variables for multidimensional IRT models) that underlies the reliable item response variance. If there is no continuous underlying latent trait then estimating a latent trait measurement model is a meaningless exercise because model parameters would have no sensible interpretation. Second, IRT models assume that response probabilities are monotonically increasing; as individuals increase on the latent variable, their probabilities of endorsing a dichotomous item, or responding in a higher response category in a polytomous item, increase. This is a necessary assumption because the parametric models described above fit (or force) monotonically increasing IRFs onto the data. Alternative nonparametric and parametric (e.g., unfolding) IRT models are available when this assumption is not met but these are beyond the scope of the present discussion.
9 ITEM RESPONSE THEORY 9 The most critical assumption, and the one that has drawn the most research attention, is that item responses be locally independent (uncorrelated) after controlling for the latent variable (or latent variables in multidimensional IRT models). In unidimensional IRT models,itmustbeassumedthatallthecommonvarianceinanitemsetcanbeexplained by a single common factor; this is analogous to no correlated residuals in structural equation modeling. When the local independence assumption is not met (or at least well approximated), item parameter estimates can be biased because the latent trait is not properly specified. In turn, all derived functions from the item parameter estimates, such as the item or scale information and standard error, may also be erroneous to some degree depending on the severity of the violation. The most serious consequence of a local independence violation is that IRT modelsmaylosetheirmostimportantproperty, namely,thatofinvarianceofitemandperson parameters. The concepts of item and person invariance are commonly misunderstood. Simply stated, item-parameter invariance means that an item s parameters do not depend on the otheritemsthatareincludedintheanalysisor thesubsampleofthepopulationthatisusedto calibrate the item parameters within a linear transformation. Person-parameter invariance means that an individual s standing on the latent variable does not depend on which items are administered, again within a linear transformation. These item- and person-parameter invariance properties depend entirely on meeting the IRT model s assumptions. When the assumptions described above are not met, especially local independence, all the applications of IRT modeling, including those described in thenextsection,arequestionable. Item Response Theory Applications Beyond providing a more informed basis for basic psychometric analysis, the increasing popularity of IRT models is driven by their utility. For example, in large-scale aptitude and achievement testing, IRT models are used to link the scales for different versions of a test administered to different examinee subgroups so that scores (latent trait estimates) areonthesamescale(i.e.,comparable).more generally, across a wide range of disciplines, IRTmodelshavebeenusedtoformthebasis for computerized adaptive testing (CAT), and the examination of measurement equivalence across socio-demographic groups. These two topics, which depend critically on the assumption of item and person parameter invariance, arebrieflyreviewedbelow. The creation of a precalibrated item pool (i.e., a set of items measuring the same trait with known IRT model parameters) and the efficient administration of a subset of items tailored to an individual s trait level, is an attractive alternative to the CTT counterpart of short form creation. A simple CAT algorithm may begin by administering one or more items with location parameters in the middle of the trait range. The individual s responses are then used to estimate a person s position on the latent trait continuum. If, for example, they are estimated to be relatively high on the latent variable, a new item that has a higher location parameter is administered, scored, and the response is used to update the estimate of an individual s trait standing. This process continues until either the individual has responded to a predetermined number of items or their standard error falls below some threshold. The key to CAT is that individuals are being administered the items most relevant to differentiating among people in their trait range. In theory, high-trait individuals would receive only hard items, whereas low-trait individuals would receive only easy items. In this way, individuals do not waste their time responding to items that are not discriminating because their endorsement probability is either nearly or close to. A second popular application of IRT models is that it forms the basis for modern explorations of measurement invariance hypotheses, traditionally called item-bias analysis but now known as differential item functioning analysis
10 ITEM RESPONSE THEORY (DIF analysis). Because of parameter invariance (within a linear transformation), items may be calibrated in two sociodemographic samples that differ in mean and variance on the latent trait but the IRT item parameter estimatesinonesamplecanbeplacedonto thesamescaleastheitemparameterestimates in the second sample. The IRFs estimated separately in the two groups may then be tested for equivalence. If equivalence is found, a researcher may conclude that the item functions the same as a trait indicator across the groups, and a common set of item parameters maybeused.if,ontheotherhand,theirfs differ in slope or location after being placed onto a common metric, a researcher may conclude that the item functions differently for the two groups. In other words, if the IRFs for the same item estimated in the two samples are not equal, then conditionally on any trait level, one group will have a higher (or lower) expected score on the item. Depending on the severity of DIF, it may be difficult to validly apply the measure in different groups of examinees. SEE ALSO: Coefficient Alpha and Coefficient Omega hierarchical ; Item Response Theory, Approach to Test Construction; Measurement Invariance; Reliability; Scale Development; Structural Equation Modeling References Bock, R. D., & Aitkin, M. (98). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, Cella,D.,Yount,S.,Rothrock,N.,Gershon,R., Cook, K., Reeve, B.... Rose, M. (27). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl.), S3 S. Embretson, S. E. (996). The new rules of measurement. Psychological Assessment, 8, Lawley, D. N. (943). The application of maximum likelihood method to factor analysis. British Journal of Psychology, General Section, 33, Lord, F. M. (98). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Reise, S. P., & Waller, N. G. (29). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, Samejima, F. (969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, part 2),. Further Reading de Ayala, R. J. (29). The theory and practice of item response theory. New York: Guilford Press. Embretson, S. E., & Reise, S. P. (2). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Millsap, R. E. (2). Statistical approaches to measurement invariance.newyork,ny: Routledge. Thissen, D., & Wainer, H. (2). Test scoring. Mahwah, NJ: Erlbaum. Wainer, H. (2). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Erlbaum.
Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD
Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT
More informationITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE
California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION
More informationInvestigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationItem Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc.
IRT - 1 Item Response Theory Robert J. Harvey Virginia Polytechnic Institute & State University Allen L. Hammer Consulting Psychologists Press, Inc. IRT - 2 Abstract Item response theory (IRT) methods
More informationChapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.
Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human
More informationUsing the Distractor Categories of Multiple-Choice Items to Improve IRT Linking
Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence
More informationUSE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION
USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,
More informationComprehensive Statistical Analysis of a Mathematics Placement Test
Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational
More informationItem Response Theory: Methods for the Analysis of Discrete Survey Response Data
Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationTECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock
1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationA Comparison of Several Goodness-of-Fit Statistics
A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures
More informationDifferential Item Functioning
Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item
More informationSurvey Sampling Weights and Item Response Parameter Estimation
Survey Sampling Weights and Item Response Parameter Estimation Spring 2014 Survey Methodology Simmons School of Education and Human Development Center on Research & Evaluation Paul Yovanoff, Ph.D. Department
More informationReferences. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,
The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant
More informationAdaptive EAP Estimation of Ability
Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,
More informationMeasuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University
Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety
More informationLikelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.
Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions
More informationDevelopment, Standardization and Application of
American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,
More informationScoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods
James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 3-008 Scoring Multiple Choice Items: A Comparison of IRT and Classical
More informationAndré Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationThe Influence of Test Characteristics on the Detection of Aberrant Response Patterns
The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess
More informationItem Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century
International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century
More informationType I Error Rates and Power Estimates for Several Item Response Theory Fit Indices
Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2009 Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Bradley R. Schlessman
More informationThe Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory
The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health
More informationRasch Versus Birnbaum: New Arguments in an Old Debate
White Paper Rasch Versus Birnbaum: by John Richard Bergan, Ph.D. ATI TM 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax: 520.323.9139 Copyright 2013. All rights reserved. Galileo
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationTermination Criteria in Computerized Adaptive Tests: Variable-Length CATs Are Not Biased. Ben Babcock and David J. Weiss University of Minnesota
Termination Criteria in Computerized Adaptive Tests: Variable-Length CATs Are Not Biased Ben Babcock and David J. Weiss University of Minnesota Presented at the Realities of CAT Paper Session, June 2,
More informationUtilizing the NIH Patient-Reported Outcomes Measurement Information System
www.nihpromis.org/ Utilizing the NIH Patient-Reported Outcomes Measurement Information System Thelma Mielenz, PhD Assistant Professor, Department of Epidemiology Columbia University, Mailman School of
More informationAssessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.
Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong
More informationItem Response Theory and Health Outcomes Measurement in the 21st Century
MEDICAL CARE Volume 38, Number 9 Supplement II, pp II-28 II-42 2000 Lippincott Williams & Wilkins, Inc. Item Response Theory and Health Outcomes Measurement in the 21st Century RON D. HAYS, PHD,* LEO S.
More informationA Bayesian Nonparametric Model Fit statistic of Item Response Models
A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely
More informationItem Response Theory. Author's personal copy. Glossary
Item Response Theory W J van der Linden, CTB/McGraw-Hill, Monterey, CA, USA ã 2010 Elsevier Ltd. All rights reserved. Glossary Ability parameter Parameter in a response model that represents the person
More informationUsing Analytical and Psychometric Tools in Medium- and High-Stakes Environments
Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session
More informationaccuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian
Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation
More informationMultidimensional Item Response Theory in Clinical Measurement: A Bifactor Graded- Response Model Analysis of the Outcome- Questionnaire-45.
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2012-05-22 Multidimensional Item Response Theory in Clinical Measurement: A Bifactor Graded- Response Model Analysis of the Outcome-
More informationA Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests
A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational
More informationTHE NATURE OF OBJECTIVITY WITH THE RASCH MODEL
JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that
More informationThe Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing
The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in
More informationSensitivity of DFIT Tests of Measurement Invariance for Likert Data
Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and
More informationRunning head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note
Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,
More informationGENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS
GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at
More informationINVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form
INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement
More informationConfirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance
Psychological Bulletin 1993, Vol. 114, No. 3, 552-566 Copyright 1993 by the American Psychological Association, Inc 0033-2909/93/S3.00 Confirmatory Factor Analysis and Item Response Theory: Two Approaches
More informationThe Patient-Reported Outcomes Measurement Information
ORIGINAL ARTICLE Practical Issues in the Application of Item Response Theory A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales Cheryl D. Hill, PhD,*
More informationMantel-Haenszel Procedures for Detecting Differential Item Functioning
A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of
More informationUsing the Score-based Testlet Method to Handle Local Item Dependence
Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.
More informationAn item response theory analysis of Wong and Law emotional intelligence scale
Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 2 (2010) 4038 4047 WCES-2010 An item response theory analysis of Wong and Law emotional intelligence scale Jahanvash Karim
More informationInfluences of IRT Item Attributes on Angoff Rater Judgments
Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations
More informationInformation Structure for Geometric Analogies: A Test Theory Approach
Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,
More informationDescription of components in tailored testing
Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of
More informationThe Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland
Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University
More informationScaling TOWES and Linking to IALS
Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy
More informationDoes factor indeterminacy matter in multi-dimensional item response theory?
ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional
More informationResearch and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida
Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality
More informationEvaluating the quality of analytic ratings with Mokken scaling
Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch
More informationAdaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida
Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models
More informationIntroduction to Item Response Theory
Introduction to Item Response Theory Prof John Rust, j.rust@jbs.cam.ac.uk David Stillwell, ds617@cam.ac.uk Aiden Loe, bsl28@cam.ac.uk Luning Sun, ls523@cam.ac.uk www.psychometrics.cam.ac.uk Goals Build
More informationComparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria
Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill
More informationCYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)
DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;
More informationJason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the
Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting
More informationCopyright. Kelly Diane Brune
Copyright by Kelly Diane Brune 2011 The Dissertation Committee for Kelly Diane Brune Certifies that this is the approved version of the following dissertation: An Evaluation of Item Difficulty and Person
More informationABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III
ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING BY John Michael Clark III Submitted to the graduate degree program in Psychology and
More informationMCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2
MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts
More informationEFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN
EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN By FRANCISCO ANDRES JIMENEZ A THESIS PRESENTED TO THE GRADUATE SCHOOL OF
More informationBuilding Evaluation Scales for NLP using Item Response Theory
Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged
More informationOn indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state
On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties
More informationItem-Rest Regressions, Item Response Functions, and the Relation Between Test Forms
Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms Dato N. M. de Gruijter University of Leiden John H. A. L. de Jong Dutch Institute for Educational Measurement (CITO)
More informationModeling DIF with the Rasch Model: The Unfortunate Combination of Mean Ability Differences and Guessing
James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 4-2014 Modeling DIF with the Rasch Model: The Unfortunate Combination
More informationThe Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective
Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item
More informationCOMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY. Alison A. Broadfoot.
COMPARING THE DOMINANCE APPROACH TO THE IDEAL-POINT APPROACH IN THE MEASUREMENT AND PREDICTABILITY OF PERSONALITY Alison A. Broadfoot A Dissertation Submitted to the Graduate College of Bowling Green State
More informationNonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia
Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla
More informationDifferential Item Functioning from a Compensatory-Noncompensatory Perspective
Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation
More informationThe effects of ordinal data on coefficient alpha
James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and
More informationIncorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis
Structural Equation Modeling, 15:676 704, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print/1532-8007 online DOI: 10.1080/10705510802339080 TEACHER S CORNER Incorporating Measurement Nonequivalence
More informationExploring rater errors and systematic biases using adjacent-categories Mokken models
Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.
More informationA DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS
A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT
More informationPROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES
PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET,
More informationIssues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy
Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus
More informationNonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)
Qual Life Res (2008) 17:275 290 DOI 10.1007/s11136-007-9281-6 Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)
More informationThe Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests
The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University
More informationTHE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH
THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF
More informationComputerized Adaptive Testing With the Bifactor Model
Computerized Adaptive Testing With the Bifactor Model David J. Weiss University of Minnesota and Robert D. Gibbons Center for Health Statistics University of Illinois at Chicago Presented at the New CAT
More informationMultidimensional Modeling of Learning Progression-based Vertical Scales 1
Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This
More informationHow Many IRT Parameters Does It Take to Model Psychopathology Items?
Psychological Methods Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 8, No. 2, 164 184 1082-989X/03/$12.00 DOI: 10.1037/1082-989X.8.2.164 How Many IRT Parameters Does It Take
More informationThe Use of Item Statistics in the Calibration of an Item Bank
~ - -., The Use of Item Statistics in the Calibration of an Item Bank Dato N. M. de Gruijter University of Leyden An IRT analysis based on p (proportion correct) and r (item-test correlation) is proposed
More informationItem Selection in Polytomous CAT
Item Selection in Polytomous CAT Bernard P. Veldkamp* Department of Educational Measurement and Data-Analysis, University of Twente, P.O.Box 217, 7500 AE Enschede, The etherlands 6XPPDU\,QSRO\WRPRXV&$7LWHPVFDQEHVHOHFWHGXVLQJ)LVKHU,QIRUPDWLRQ
More informationCHAPTER 7 RESEARCH DESIGN AND METHODOLOGY. This chapter addresses the research design and describes the research methodology
CHAPTER 7 RESEARCH DESIGN AND METHODOLOGY 7.1 Introduction This chapter addresses the research design and describes the research methodology employed in this study. The sample and sampling procedure is
More informationPROMIS DEPRESSION AND CES-D
PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS DEPRESSION AND CES-D SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON F. COOK & DAVID CELLA
More informationlinking in educational measurement: Taking differential motivation into account 1
Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to
More informationComputerized Mastery Testing
Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating
More information11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES
Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to
More informationEffect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales
University of Iowa Iowa Research Online Theses and Dissertations Summer 2013 Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales Anna Marie
More informationAN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION
AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION A Dissertation Presented to The Academic Faculty by Justin A. DeSimone In Partial Fulfillment of the Requirements
More informationUsing Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items
University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth
More informationPROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES
PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON
More informationItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides
More information