A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory

Size: px
Start display at page:

Download "A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory"

Transcription

1 Research Report ETS RR A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory Peter W. van Rijn Frank Rijmen August 2012

2 ETS Research Report Series EIGNOR EXECUTIVE EDITOR James Carlson Principal Psychometrician ASSOCIATE EDITORS Brent Bridgeman Distinguished Presidential Appointee Marna Golub-Smith Principal Psychometrician Shelby Haberman Distinguished Presidential Appointee Donald Powers Managing Principal Research Scientist John Sabatini Managing Principal Research Scientist Joel Tetreault Managing Research Scientist Matthias von Davier Director, Research Xiaoming Xi Director, Research Rebecca Zwick Distinguished Presidential Appointee Kim Fryer Manager, Editing Services PRODUCTION EDITORS Ruth Greenwood Editor Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS.

3 A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory Peter W. van Rijn and Frank Rijmen ETS, Princeton, New Jersey August 2012

4 As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS s constituents and the field. To obtain a PDF or a print copy of a report, please visit: Associate Editor: Matthias von Davier Reviewers: Sandip Sinharay and Frederic Robin Copyright 2012 by Educational Testing Service. All rights reserved. ETS, the ETS logo, and LISTENING. LEARNING. LEADING., are registered trademarks of Educational Testing Service (ETS).

5 Abstract Hooker and colleagues addressed a paradoxical situation that can arise in the application of multidimensional item response theory (MIRT) models to educational test data. We demonstrate that this MIRT paradox is an instance of the explaining-away phenomenon in Bayesian networks, and we attempt to enhance the understanding of MIRT models by placing the paradox in a broader statistical modeling perspective. Key words: multidimensional IRT, paradoxical results, explaining away, Bayesian networks i

6 Acknowledgments The authors would like to thank Matthias von Davier, Shelby Haberman, and Bob Mislevy for helpful comments. ii

7 Hooker, Finkelman, and Schwartzman (2009) addressed a paradoxical situation that can arise in the application of multidimensional item response theory (MIRT) models to educational test data. The paradox boils down to the fact that a correct response on an additional item can lead to a lower estimate for one of the latent ability variables, whereas an incorrect response can lead to a higher estimate (Van der Linden, 2012). Hooker et al. (2009) argued that this is unfair to test takers. Various different appearances, generalizations, and implications of the paradox have been studied by numerous authors over the past few years (Finkelman, Hooker, & Wang, 2010; Hooker, 2010; Hooker & Finkelman, 2010; Jordan & Spiess, 2012; Van der Linden, 2012). The stated paradoxical situation is related to the explaining-away phenomenon in Bayesian networks (Pearl, 2009; Wellman & Henrion, 1993), which in statistics is known as Berkson s paradox (Berkson, 1946). In this report, we demonstrate that the MIRT paradox is an instance of this phenomenon, and we attempt to enhance the understanding of MIRT models by placing the paradox in a broader statistical modeling perspective, namely, that of graphical models and Bayesian networks (Mislevy, 1994; Pearl, 2009; Williamson, 2005). These frameworks provide a shorthand for the probabilistic relationships of interest and can help understand the properties of these relationships. We discuss a small number of MIRT modeling examples in these frameworks, illustrating the relation between the MIRT paradox and the explaining-away phenomenon, and we end with some concluding remarks. 1 Examples In the following examples, we will adhere to parametric IRT in the framework of generalized nonlinear mixed models (Mellenbergh, 1994; Rijmen, Tuerlinckx, De Boeck, & Kuppens, 2003), and we will make additional assumptions as needed; that is, we do not make assumptions about the types of items (continuous or discrete; dichotomous or polytomous), the types of latent variables (continuous or discrete), and the response functions (linear, normal, or logistic). We assume that both item response variables and latent variables are random and that item response variables can be observed, whereas latent variables cannot. (Because we make as few assumptions as possible, standard linear 1

8 factor models are included here as well.) An important assumption in both unidimensional and multidimensional IRT models is monotonicity. Monotonicity requires the probabilities for the item variables to be strictly increasing or decreasing in each latent variable, and MIRT models are monotone if and only if the latent variables are compensatory (Holland & Rosenbaum, 1986; Van der Linden, 2012). Strictly speaking, we do not need to make the monotonicity assumption, but then a unidimensional IRT model for which local independence holds can always be specified for a set of item variables (Suppes & Zanotti, 1981). Therefore we need to keep the assumption of monotonicity and will illustrate other assumptions, such as local independence, through the examples. In all our examples, we have chosen to use six items to keep things simple yet nontrivial. Furthermore, we assume that the first five items are already observed so that the sixth item is always the focal additional item that possibly creates the paradoxical situation. Figure 1 displays a partially directed acyclic graph (DAG) of a MIRT model with two latent variables θ 1 and θ 2 and six item response variables X 1, X 2,..., X 6. (It is called partially directed because not all the lines in the graph have arrowheads. A partial DAG is also referred to as a chain graph.) This model is said to be of simple structure, also referred to as a between-item two-dimensional IRT model, because every item response variable is linked to a single latent variable only. In the graph, the nodes correspond to random variables, and the directed edges represent conditional dependency relations. An advantage of using graphical models is that there is a correspondence between the property of separation of the nodes in the graph and conditional independence of the random variables in the statistical model. For example, the path X 1 θ 1 X 2 in Figure 1 illustrates an instance of so-called d-separation (Pearl, 2009, pp ); that is, the only path from X 1 to X 2 runs through θ 1, and the arrows do not meet head to head at θ 1. The fact that X 1 and X 2 are d-separated in the graph implies that they are conditionally independent given θ 1. We can generalize this to all six items in the example, and obtain the familiar IRT assumption of local independence: the joint probability of X 1, X 2,..., X 6 is conditional on θ 1 and θ 2 can 3 be written as a simple product: Pr(X 6 1, X 2,..., X 6 θ 1, θ 2 ) = j=1 Pr(X j θ 1 ) j=4 Pr(X j θ 2 ). Because of the correspondence between d-separation and conditional independence, it is 2

9 possible to determine all conditional independence relations that are entailed solely by working with the graph. Now, the MIRT paradox revolves around the beliefs about θ 1 and θ 2 in different situations. In describing the paradox, Hooker et al. (2009) always seemed to condition implicitly on X 1, X 2,..., X 5. Keeping this in mind, the MIRT paradox cannot arise for the model in Figure 1 because the only path between θ 1 and θ 2 is the undirected edge; that is, conditional on X 1, X 2,..., X 5, the additional observation of X 6 does not affect the belief about θ 1 in an unexpected manner. 3 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 X 6 Figure 1. Partially directed acyclic graph of two-dimensional IRT model with between-item multidimensionality Figure 1. Partially directed acyclic graph of two-dimensional item response latent variables (continuous or discrete), and the response functions (linear, normal, or logistic). theory model with between-item multidimensionality. We assume that both item response variables and latent variables are random, and that item Figure 2 shows the DAG of a two-dimensional IRT model for six items with so-called response variables can be observed whereas latent variables cannot. (Because we make as few within-item multidimensionality for items 3 and 4. In this figure, the paths θ 1 X 3 θ 2 assumptions as possible, standard linear factor models are included here as well.) An important and θ 1 X 4 θ 2 are so-called inverted forks and contain the first and foremost step of assumption in both unidimensional and multidimensional IRT models is that of monotonicity. explaining what happens in the MIRT paradox. These paths between θ 1 and θ 2 are not Monotonicity requires the probabilities for the item variables to be strictly increasing or decreasing in each latent variable, and MIRT models are monotone if and only if the latent variables blocked by X 3 and X 4 because the edges on these paths meet head to head. Therefore θ 1 and θ 2 are not d-separated by X 3 and X 4, and conditional independence between θ 1 and θ are compensatory (Van der Linden, 2012; Holland & Rosenbaum, 1986). Strictly speaking, we 2 given X 3 and X 4 is not implied. We note that this kind of conditional independence is different do not from need to that make typically the monotonicity used in IRT assumption, because we but condition then a unidimensional here observed IRT variables model for instead which of local on independence unobserved variables. holds can Now, alwayseven be specified θ 1 and for θ 2 aare set independent of item variables a priori, (Suppes they & become Zanotti, dependent 1981). Therefore, when we wecondition need to keep on Xthe 1,.. assumption., X 5. Furthermore, of monotonicity the observation and will illustrate of X 6 can other affect assumptions the belief about such asθ 1 local in an independence unanticipated through fashion. thethis examples. at first Insight all our counterintuitive examples, we phenomenon is called the explaining-away effect. We refrain from giving substantive have chosen to use six items in order to keep things simple, yet nontrivial. Furthermore, we assume that the first five items are already observed, so that the sixth item is always the focal 3 additional item that possibly creates the paradoxical situation. Figure 1 displays a partially directed acyclic graph (DAG) of a MIRT model with two latent variables θ 1 and θ 2, and six item response variables X 1, X 2,..., X 6. (It is called partially directed,

10 examples to be concise and because intuitive examples of this phenomenon are described by many authors (e.g., Berkson, 1946; Bishop, 2006, p. 378; Hooker & Finkelman, 2010, p. 251; Pearl, 2009, p. 17). 5 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 X 6 Figure 2. Directed acyclic graph of two-dimensional IRT model with within-item structure Figure 2. Directed acyclic graph of two-dimensional item response theory model with not implied. within-item We notestructure. that this kind of conditional independence is different from that typically used in IRT, because we condition here on observed variables instead of on unobserved variables. We emphasize that this explaining-away phenomenon can arise as long as there is Now, even if θ 1 and θ 2 are independent a priori, they become dependent when we condition on at least one inverted fork on the paths between θ 1 and θ 2 through X 1, X 2,..., X 5 that does not X 1, depend..., X 5. on Furthermore, particular the observation relation of of θ 1 Xand 6 canθ 2 affect with the X 6 belief. We about illustrate θ 1 inthis an unanticipated by two other instances fashion. This of the at first phenomenon. sight counterintuitive The first case phenomenon is illustrated is called in Figure the explaining 3, which away effect. the focal We sixth refrain variable from giving is not substantive item response examplesbut in order the variable to be concise, gender, andwhere because gender intuitive is related examples to θ 2 of. this Obviously, phenomenon observing are described gender changes by manythe authors, belief e.g., about Berkson θ 2, but (1946), the belief Bishop about (2006, θp. 1 can 378), be affected in an unexpected manner owing to the inverted forks. Again, this dependency can Pearl (2009, p. 17), and Hooker and Finkelman (2009, p. 251). arise when θ 1 and θ 2 are a priori independent and when θ 1 is unrelated to gender (as is We emphasize that this explaining way phenomenon can arise as long as there is at least the case in Figure 3). This example is particularly interesting because many applications one inverted fork on the paths between θ 1 and θ 2 through X 1, X 2,..., X 5, and does not depend of multidimensional IRT models with background variables are found in large-scale on the particular relation of θ 1 and θ 2 with X 6. We illustrate this by two other instances of assessments such as the Programme for International Student Assessment (PISA; Adams, the phenomenon. The first case is illustrated in Figure 3, in which the focal sixth variable Wilson, & Wang, 1997) and the National Assessment of Educational Progress (NAEP; Mislevy, not an 1985). item (However, response, but we the note variable that the gender, current where MIRT gender models is related in PISA to and θ 2. NAEP Obviously, have a observing between-item gender structure, changes the as in belief Figure about 1.) θa 2, second but theinstance belief about can be θ 1 can constructed be affected when in an we relate unexpected gender manner to item due to response the inverted variable forks. instead Again, of this to a dependency latent variable. can arise This when situation θ 1 andis given θ 2 arein afigure priori independent 4, where gender-related and when θ 1 differential is unrelateditem to gender functioning (as is the appears case in on Figure the fifth 3). item. This Observing example is gender particularly affects interesting, the belief because about θmany 2 through applications X 5 as well of multidimensional as the belief about IRTθ 1 models with background variables are found in4 large scale assessments such as the Programme for International Student Assessment (PISA; Adams, Wilson, & Wu, 1997) and the National

11 because of the inverted forks. To reiterate, paradoxical results in all these instances are not to be attributed to the focal sixth variable but to the inverted forks in other parts of the model. 6 6 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 Gender θ 1 θ 2 Figure 3. Directed acyclic graphxof 1 two-dimensional X 2 IRT X 3 model with X 4 within-item X 5 structure Genderand relation between gender and θ 2 Figure 3. Directed acyclic graph of two-dimensional item response theory model Figure 3. with Directed within-item acyclic graph structure of two-dimensional and relation IRT model between with within-item gender structure and θ 2. and relation between θ 1 θ 2 gender and θ 2 X 1 X 2 X 3 X 4 X 5 Gender θ 1 θ 2 Figure 4. Directed acyclic graph of X 1 two-dimensional X 2 IRTX model 3 with X 4 within-item X 5 structure Gender and gender-related DIF for X 5. Figure 4. Assessment Directed acyclic of Education graph of two-dimensional Progress (NAEP; IRT model Mislevy, with1985). within-item (However, structure we note and gender-related that the current DIF for X 5. Figure MIRT models 4. Directed in PISAacyclic and NAEP graph have of two-dimensional a between-item structure item as response in Figuretheory 1.) A second model with instance within-item can be constructed structure whenand we relate gender-related to an item differential response variable item instead functioning of a latent for Assessment of Education Progress (NAEP; Mislevy, 1985). (However, we note that the current Xvariable. 5. MIRT models This situation in PISA is and given NAEP in Figure have a 4, between-item where gender-related structure differential as in Figure item 1.) functioning A second (DIF) instance appears on the fifth item. Observing gender affects the belief about θ 2 through X 5, and Hooker can be and constructed Finkelman when (2010) we relate considered gender to the an item MIRT response paradox variable in models instead for of item a latent the belief about θ 1 as well because of the inverted forks. To reiterate, paradoxical results in all bundles. variable. They This focused situationon is given two models: in Figurethe 4, bifactor where gender-related model and the differential testlet model. item functioning In the bifactor these (DIF) instances appears model, every on are the not item fifth to be loads item. attributed Observing on a general to the gender focal dimension affects sixth variable, the and belief on an but about item to the θbundle inverted forks in 2 through dimension. X 5, and Hooker other the belief parts and about Finkelman of the θ model. 1 as well discussed because two of the cases, inverted one in forks. which Toall reiterate, latent variables paradoxical are results assumed in all to be these independent Hooker instances and and are Finkelman not one to be which (2010) attributed the considered item to the bundle the focal MIRT dimensions sixth paradox variable, are in but correlated. models to the for inverted item Independent bundles. forks in latent Theyvariables other focused parts of onare the two typically model. models: assumed the bi-factor identify model and the the bifactor testlet model, model. In which the bi-factor is the situation model, every Hooker item loads and on Finkelman a general dimension (2010) considered and on an 5 the item MIRT bundle paradox dimension. in models Hooker for and item Finkelman bundles. (2010) They focused discussed on two two models: cases, one the in bi-factor which all model latent and variables the testlet are model. assumed In the to be bi-factor independent, model, every item loads on a general dimension and on an item bundle dimension. Hooker and Finkelman

12 that we consider. An example of the bifactor model is represented in a DAG in Figure 5. Hooker and Finkelman consider a result to be paradoxical if answering an additional item (X 6 ) correctly results in a lower estimate for the general ability (θ 1 ) than when the additional item is answered incorrectly. From Figure 5, it follows that θ 1 and θ 3 are not d-separated, that is, there are paths between θ 1 and θ 3 that contain an inverted fork (in fact, all paths do). Hence the explaining-away phenomenon can occur, and paradoxical results are possible for this bifactor model. Hooker and Finkelman (2010) derived mathematically the specific conditions under which paradoxical results occur for the more general bifactor model. From their mathematical derivations, it follows that paradoxical results are not possible when the loadings of the bifactor model are restricted according to the so-called testlet model (a testlet model is a restricted bifactor model; see Rijmen, 2010). The fact that paradoxical results cannot occur for the testlet model (with independent nuisance dimensions) can be shown directly by looking at the corresponding DAG, alleviating the need for mathematical derivations. First, one should realize that the testlet model is a Schmid Leiman transformed second-order model (see, e.g., Yung, Thissen, & McLeod, 1999). Then, the conditional independence relations can be observed from the DAG of the equivalent second-order model, which is presented in Figure 6. In this figure, it is easily seen that θ 1 and θ 3 are always dependent because the path from θ 1 to θ 3 has a directed edge. However, θ 1 is independent from X 4, X 5, and X 6 is conditional on θ 3 ; that is, conditional on θ 3, the observation of X 6 does not change the belief about θ 1 in an unexpected manner. Therefore, as long as monotonicity holds, paradoxical results cannot occur in this case. 2 Concluding Remarks We have shown that the MIRT paradox utilized by Hooker et al. (2009) is an instance of the explaining-away phenomenon. Specifically, the so-called inverted fork in the path between latent variables is the main cause of the phenomenon. In many of the MIRT paradox papers, intuitions are built up from an educational measurement perspective, which causes the result to be surprising. However, we made use of the frameworks of graphical models and Bayesian networks in which this phenomenon is well established. We chose 6

13 8 θ 1 X 1 X 2 X 3 X 4 X 5 X 6 θ 2 θ 3 Figure 5. Directed acyclic graph of bi-factor three-dimensional IRT model. Figure 5. Directed acyclic graph of bifactor three-dimensional item response θ theory model. 1 these frameworks because the conditional dependencies between the variables in a specific θ 2 θ 3 model can be derived directly from its graph, independent of different parameterizations and link functions. X 1 X 2 X 3 X 4 X 5 X 6 The work of Hooker et al. (2009) is nevertheless to be lauded because they described the exact mechanics of the paradox in MIRT Figure in great 6. detail. We disagree, however, with the somewhat pessimistic Directed acyclic conclusions graph of of second-order Jordan and (orspiess testlet) (2012) three-dimensional and Van IRT der model. Linden (2012) on the usefulness of MIRT models. The MIRT paradox is a general statistical paradox that Concluding remarks holds for many models with multiple competing explanatory variables and is accepted in many contexts other than psychometrics such as biostatistics and artificial intelligence. We We have shown that the MIRT paradox utilized by Hooker, Finkelman, and Schwartzman find that the issue of test fairness raised by Hooker et al. (2009) and Jordan and Spiess (2009) is an instance of the explaining away phenomenon. Specifically, the so-called inverted fork (2012) results from confounding different views on the purpose of tests. For example, in the path between latent variables is the main cause of the phenomenon. In many of the MIRT Holland (1994) distinguished between tests as contests and tests as measurement. The paradox papers, intuitions are built up from an educational measurement perspective, which contest view can result in a firm belief that more items correct should result in a higher cause the result to be surprising. However, we made use of the frameworks of graphical models score, a feature that nevertheless pertains to relatively few IRT models (Van der Linden, 2012). and Bayes In the networks measurement in which view, thismodel phenomenon selection is well is perhaps established. the most We chose important these frameworks, issue so that because test-based the conditional inferences dependencies are sound. A between third view the variables on tests, in raised a specific by model Mislevy can(1994), be derived suggests directlythat fromtests its graph, can be independent used as sources of different of information parameterizations for evidentiary and linkreasoning functions. about students, for example, as in models for cognitive diagnosis. Preventing paradoxical results 7

14 θ 2 θ 3 Figure 5. Directed acyclic graph of bi-factor three-dimensional IRT model. θ 1 θ 2 θ 3 X 1 X 2 X 3 X 4 X 5 X 6 Figure 6. Directed acyclic graph of second-order (or testlet) three-dimensional IRT model. Figure 6. Directed acyclic graph of second-order (or testlet) three-dimensional Concluding remarks item response theory model. We have shown that the MIRT paradox utilized by Hooker, Finkelman, and Schwartzman might be relevant in the contest perspective on tests, but we argue that it is less relevant in (2009) is an instance of the explaining away phenomenon. Specifically, the so-called inverted fork the latter two perspectives on the purposes of educational tests. in the path between latent variables is the main cause of the phenomenon. In many of the MIRT paradox papers, intuitions are built up from an educational measurement perspective, which cause the result to be surprising. However, we made use of the frameworks of graphical models and Bayes networks in which this phenomenon is well established. We chose these frameworks, because the conditional dependencies between the variables in a specific model can be derived directly from its graph, independent of different parameterizations and link functions. 8

15 References Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, Berkson, J. (1946). Limitations of the application of fourfold tables to hospital data. Biometrics Bulletin, 2, Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer. Finkelman, M., Hooker, G., & Wang, J. (2010). Prevalence and magnitude of paradoxical results in multidimensional item response theory. Journal of Educational and Behavioral Statistics, 35, Holland, P. W. (1994). Measurements or contests? Comments on Zwick, Bond and Allen/Donoghue. In Proceedings of the Social Statistics Section of the American Statistical Association (pp ). Alexandria, VA: American Statistical Association. Holland, P. W., & Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent variable models. Annals of Statistics, 14, Hooker, G. (2010). On separable tests, correlated priors, and paradoxical results in multidimensional item response theory. Psychometrika, 75, Hooker, G., & Finkelman, M. (2010). Paradoxical results and item bundles. Psychometrika, 75, Hooker, G., Finkelman, M., & Schwartzman, A. (2009). Paradoxical results in multidimensional item response theory. Psychometrika, 74, Jordan, P., & Spiess, M. (2012). Generalizations of paradoxical results in multidimensional item response theory. Psychometrika, 77, Lord, F. M. (1962). Cutting scores and errors of measurement. Psychometrika, 27, Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115, Mislevy, R. J. (1985). Estimating latent group effects. Journal of the American Statistical Association, 80, Mislevy, R. J. (1994). Evidence and inference in educational assessment. Psychometrika, 9

16 59, Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). New York: Cambridge University Press. Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, Suppes, P., & Zanotti, M. (1981). When are probabilistic explanations possible? Synthese, 48, Van der Linden, W. J. (2012). On compensation in multidimensional response modeling. Psychometrika, 77, Wellman, M. P., & Henrion, M. (1993). Explaining explaining away. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, Williamson, J. (2005). Bayesian nets and causality. Oxford: Oxford University Press. Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64,

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

An Investigation of the Efficacy of Criterion Refinement Procedures in Mantel-Haenszel DIF Analysis

An Investigation of the Efficacy of Criterion Refinement Procedures in Mantel-Haenszel DIF Analysis Research Report ETS RR-13-16 An Investigation of the Efficacy of Criterion Refinement Procedures in Mantel-Haenszel Analysis Rebecca Zwick Lei Ye Steven Isham September 2013 ETS Research Report Series

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information

Test Reliability Basic Concepts

Test Reliability Basic Concepts Research Memorandum ETS RM 18-01 Test Reliability Basic Concepts Samuel A. Livingston January 2018 ETS Research Memorandum Series EIGNOR EXECUTIVE EDITOR James Carlson Principal Psychometrician ASSOCIATE

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Constrained Multidimensional Adaptive Testing without intermixing items from different dimensions

Constrained Multidimensional Adaptive Testing without intermixing items from different dimensions Psychological Test and Assessment Modeling, Volume 56, 2014 (4), 348-367 Constrained Multidimensional Adaptive Testing without intermixing items from different dimensions Ulf Kroehne 1, Frank Goldhammer

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

UCLA UCLA Electronic Theses and Dissertations

UCLA UCLA Electronic Theses and Dissertations UCLA UCLA Electronic Theses and Dissertations Title Detection of Differential Item Functioning in the Generalized Full-Information Item Bifactor Analysis Model Permalink https://escholarship.org/uc/item/3xd6z01r

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

Propensity Score Analysis Shenyang Guo, Ph.D.

Propensity Score Analysis Shenyang Guo, Ph.D. Propensity Score Analysis Shenyang Guo, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Propensity Score Analysis 1. Overview 1.1 Observational studies and challenges 1.2 Why and when

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

Rasch Versus Birnbaum: New Arguments in an Old Debate

Rasch Versus Birnbaum: New Arguments in an Old Debate White Paper Rasch Versus Birnbaum: by John Richard Bergan, Ph.D. ATI TM 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax: 520.323.9139 Copyright 2013. All rights reserved. Galileo

More information

Modeling Item-Position Effects Within an IRT Framework

Modeling Item-Position Effects Within an IRT Framework Journal of Educational Measurement Summer 2013, Vol. 50, No. 2, pp. 164 185 Modeling Item-Position Effects Within an IRT Framework Dries Debeer and Rianne Janssen University of Leuven Changing the order

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

Okayama University, Japan

Okayama University, Japan Directed acyclic graphs in Neighborhood and Health research (Social Epidemiology) Basile Chaix Inserm, France Etsuji Suzuki Okayama University, Japan Inference in n hood & health research N hood (neighborhood)

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Item Response Theory. Author's personal copy. Glossary

Item Response Theory. Author's personal copy. Glossary Item Response Theory W J van der Linden, CTB/McGraw-Hill, Monterey, CA, USA ã 2010 Elsevier Ltd. All rights reserved. Glossary Ability parameter Parameter in a response model that represents the person

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Jonathan Templin Department of Educational Psychology Achievement and Assessment Institute

More information

Using response time data to inform the coding of omitted responses

Using response time data to inform the coding of omitted responses Psychological Test and Assessment Modeling, Volume 58, 2016 (4), 671-701 Using response time data to inform the coding of omitted responses Jonathan P. Weeks 1, Matthias von Davier & Kentaro Yamamoto Abstract

More information

Doing After Seeing. Seeing vs. Doing in Causal Bayes Nets

Doing After Seeing. Seeing vs. Doing in Causal Bayes Nets Doing After Seeing Björn Meder (bmeder@uni-goettingen.de) York Hagmayer (york.hagmayer@bio.uni-goettingen.de) Michael R. Waldmann (michael.waldmann@bio.uni-goettingen.de) Department of Psychology, University

More information

Using the Score-based Testlet Method to Handle Local Item Dependence

Using the Score-based Testlet Method to Handle Local Item Dependence Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.

More information

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT

More information

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA Elizabeth Martin Fischer, University of North Carolina Introduction Researchers and social scientists frequently confront

More information

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT Amin Mousavi Centre for Research in Applied Measurement and Evaluation University of Alberta Paper Presented at the 2013

More information

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER

More information

Differential Item Functioning Amplification and Cancellation in a Reading Test

Differential Item Functioning Amplification and Cancellation in a Reading Test A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model Journal of Educational Measurement Winter 2006, Vol. 43, No. 4, pp. 335 353 Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model Wen-Chung Wang National Chung Cheng University,

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures. Dubravka Svetina

Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures. Dubravka Svetina Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures by Dubravka Svetina A Dissertation Presented in Partial Fulfillment of the Requirements for

More information

Statistics for Social and Behavioral Sciences

Statistics for Social and Behavioral Sciences Statistics for Social and Behavioral Sciences Advisors: S.E. Fienberg W.J. van der Linden For other titles published in this series, go to http://www.springer.com/series/3463 Jean-Paul Fox Bayesian Item

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

IRT Parameter Estimates

IRT Parameter Estimates An Examination of the Characteristics of Unidimensional IRT Parameter Estimates Derived From Two-Dimensional Data Timothy N. Ansley and Robert A. Forsyth The University of Iowa The purpose of this investigation

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction

Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction University of Michigan, Ann Arbor, Michigan, USA Correspondence to: Dr A V Diez Roux, Center for Social Epidemiology and Population Health, 3rd Floor SPH Tower, 109 Observatory St, Ann Arbor, MI 48109-2029,

More information

Can Bayesian models have normative pull on human reasoners?

Can Bayesian models have normative pull on human reasoners? Can Bayesian models have normative pull on human reasoners? Frank Zenker 1,2,3 1 Lund University, Department of Philosophy & Cognitive Science, LUX, Box 192, 22100 Lund, Sweden 2 Universität Konstanz,

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

The Acquisition and Use of Causal Structure Knowledge. Benjamin Margolin Rottman

The Acquisition and Use of Causal Structure Knowledge. Benjamin Margolin Rottman The Acquisition and Use of Causal Structure Knowledge Benjamin Margolin Rottman Learning Research and Development Center University of Pittsburgh 3939 O Hara Street Pittsburgh PA 526 In M.R. Waldmann (Ed.),

More information

A Bayesian Network Analysis of Eyewitness Reliability: Part 1

A Bayesian Network Analysis of Eyewitness Reliability: Part 1 A Bayesian Network Analysis of Eyewitness Reliability: Part 1 Jack K. Horner PO Box 266 Los Alamos NM 87544 jhorner@cybermesa.com ICAI 2014 Abstract In practice, many things can affect the verdict in a

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

The Impact of Overconfidence Bias on Practical Accuracy of Bayesian Network Models: An Empirical Study

The Impact of Overconfidence Bias on Practical Accuracy of Bayesian Network Models: An Empirical Study The Impact of Overconfidence Bias on Practical Accuracy of Bayesian Network Models: An Empirical Study Marek J. Drużdżel 1,2 & Agnieszka Oniśko 1,3 1 Faculty of Computer Science, Bia lystok Technical University,

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

PSYCHOMETRICS: FROM PRACTICE TO THEORY AND BACK

PSYCHOMETRICS: FROM PRACTICE TO THEORY AND BACK PSYCHOMETRICS: FROM PRACTICE TO THEORY AND BACK 15 Years of Nonparametric Multidimensional IRT, DIF/Test Equity, and Skills Diagnostic Assessment William Stout department of statistics, university of illinois

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Linking Mixed-Format Tests Using Multiple Choice Anchors. Michael E. Walker. Sooyeon Kim. ETS, Princeton, NJ

Linking Mixed-Format Tests Using Multiple Choice Anchors. Michael E. Walker. Sooyeon Kim. ETS, Princeton, NJ Linking Mixed-Format Tests Using Multiple Choice Anchors Michael E. Walker Sooyeon Kim ETS, Princeton, NJ Paper presented at the annual meeting of the American Educational Research Association (AERA) and

More information

A Comparison of DIMTEST and Generalized Dimensionality Discrepancy. Approaches to Assessing Dimensionality in Item Response Theory. Ray E.

A Comparison of DIMTEST and Generalized Dimensionality Discrepancy. Approaches to Assessing Dimensionality in Item Response Theory. Ray E. A Comparison of DIMTEST and Generalized Dimensionality Discrepancy Approaches to Assessing Dimensionality in Item Response Theory by Ray E. Reichenberg A Thesis Presented in Partial Fulfillment of the

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS by LAINE P. BRADSHAW (Under the Direction of Jonathan Templin and Karen Samuelsen) ABSTRACT

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

María Verónica Santelices 1 and Mark Wilson 2

María Verónica Santelices 1 and Mark Wilson 2 On the Relationship Between Differential Item Functioning and Item Difficulty: An Issue of Methods? Item Response Theory Approach to Differential Item Functioning Educational and Psychological Measurement

More information

Lecture Notes in Statistics Proceedings 202 Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger

Lecture Notes in Statistics Proceedings 202 Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger Lecture Notes in Statistics Proceedings 202 Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger For further volumes: http://www.springer.com/series/694 Neil J. Dorans l Editors

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS

POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS By OU ZHANG A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks V. Leucari October 2005 Typical features of the evidence arising from legal cases are its complex structure

More information

Studying the effect of change on change : a different viewpoint

Studying the effect of change on change : a different viewpoint Studying the effect of change on change : a different viewpoint Eyal Shahar Professor, Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona

More information

ZUMA-Arbeitsbericht 00/07 Some statistical aspects of causality D.R. Cox and Nanny Wermuth Juli 2000 ISSN

ZUMA-Arbeitsbericht 00/07 Some statistical aspects of causality D.R. Cox and Nanny Wermuth Juli 2000 ISSN ZUMA Quadrat B2,1 Postfach 12 21 55 68072 Mannheim Telefon: 0621-1246-155 Telefax: 0621-1246-100 E-Mail: wermuth@zuma-mannheim.de ZUMA-Arbeitsbericht 00/07 Some statistical aspects of causality D.R. Cox

More information

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Validity and Truth Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Department of Psychology, University of Amsterdam ml borsboom.d@macmail.psy.uva.nl Summary. This paper analyzes the semantics of

More information

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods Journal of Modern Applied Statistical Methods Volume 11 Issue 1 Article 14 5-1-2012 Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian

More information

A Bayesian Account of Reconstructive Memory

A Bayesian Account of Reconstructive Memory Hemmer, P. & Steyvers, M. (8). A Bayesian Account of Reconstructive Memory. In V. Sloutsky, B. Love, and K. McRae (Eds.) Proceedings of the 3th Annual Conference of the Cognitive Science Society. Mahwah,

More information

Scaling TOWES and Linking to IALS

Scaling TOWES and Linking to IALS Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy

More information

Comparing DIF methods for data with dual dependency

Comparing DIF methods for data with dual dependency DOI 10.1186/s40536-016-0033-3 METHODOLOGY Open Access Comparing DIF methods for data with dual dependency Ying Jin 1* and Minsoo Kang 2 *Correspondence: ying.jin@mtsu.edu 1 Department of Psychology, Middle

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational data Improving health worldwide www.lshtm.ac.uk Outline Sensitivity analysis Causal Mediation

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Using Probabilistic Reasoning to Develop Automatically Adapting Assistive

Using Probabilistic Reasoning to Develop Automatically Adapting Assistive From: AAAI Technical Report FS-96-05. Compilation copyright 1996, AAAI (www.aaai.org). All rights reserved. Using Probabilistic Reasoning to Develop Automatically Adapting Assistive Technology Systems

More information

Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models. Xiaowen Liu Eric Loken University of Connecticut

Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models. Xiaowen Liu Eric Loken University of Connecticut Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models Xiaowen Liu Eric Loken University of Connecticut 1 Overview Force Concept Inventory Bayesian implementation of one-

More information

Some interpretational issues connected with observational studies

Some interpretational issues connected with observational studies Some interpretational issues connected with observational studies D.R. Cox Nuffield College, Oxford, UK and Nanny Wermuth Chalmers/Gothenburg University, Gothenburg, Sweden ABSTRACT After some general

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

ESTIMATING PISA STUDENTS ON THE IALS PROSE LITERACY SCALE

ESTIMATING PISA STUDENTS ON THE IALS PROSE LITERACY SCALE ESTIMATING PISA STUDENTS ON THE IALS PROSE LITERACY SCALE Kentaro Yamamoto, Educational Testing Service (Summer, 2002) Before PISA, the International Adult Literacy Survey (IALS) was conducted multiple

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

T. Kushnir & A. Gopnik (2005 ). Young children infer causal strength from probabilities and interventions. Psychological Science 16 (9):

T. Kushnir & A. Gopnik (2005 ). Young children infer causal strength from probabilities and interventions. Psychological Science 16 (9): Probabilities and Interventions 1 Running Head: PROBABILITIES AND INTERVENTIONS T. Kushnir & A. Gopnik (2005 ). Young children infer causal strength from probabilities and interventions. Psychological

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information