A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory

Size: px

Start display at page:

Download "A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory"

Joanna Ford
6 years ago
Views:

1 Research Report ETS RR A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory Peter W. van Rijn Frank Rijmen August 2012

2 ETS Research Report Series EIGNOR EXECUTIVE EDITOR James Carlson Principal Psychometrician ASSOCIATE EDITORS Brent Bridgeman Distinguished Presidential Appointee Marna Golub-Smith Principal Psychometrician Shelby Haberman Distinguished Presidential Appointee Donald Powers Managing Principal Research Scientist John Sabatini Managing Principal Research Scientist Joel Tetreault Managing Research Scientist Matthias von Davier Director, Research Xiaoming Xi Director, Research Rebecca Zwick Distinguished Presidential Appointee Kim Fryer Manager, Editing Services PRODUCTION EDITORS Ruth Greenwood Editor Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS.

3 A Note on Explaining Away and Paradoxical Results in Multidimensional Item Response Theory Peter W. van Rijn and Frank Rijmen ETS, Princeton, New Jersey August 2012

4 As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS s constituents and the field. To obtain a PDF or a print copy of a report, please visit: Associate Editor: Matthias von Davier Reviewers: Sandip Sinharay and Frederic Robin Copyright 2012 by Educational Testing Service. All rights reserved. ETS, the ETS logo, and LISTENING. LEARNING. LEADING., are registered trademarks of Educational Testing Service (ETS).

5 Abstract Hooker and colleagues addressed a paradoxical situation that can arise in the application of multidimensional item response theory (MIRT) models to educational test data. We demonstrate that this MIRT paradox is an instance of the explaining-away phenomenon in Bayesian networks, and we attempt to enhance the understanding of MIRT models by placing the paradox in a broader statistical modeling perspective. Key words: multidimensional IRT, paradoxical results, explaining away, Bayesian networks i

6 Acknowledgments The authors would like to thank Matthias von Davier, Shelby Haberman, and Bob Mislevy for helpful comments. ii

7 Hooker, Finkelman, and Schwartzman (2009) addressed a paradoxical situation that can arise in the application of multidimensional item response theory (MIRT) models to educational test data. The paradox boils down to the fact that a correct response on an additional item can lead to a lower estimate for one of the latent ability variables, whereas an incorrect response can lead to a higher estimate (Van der Linden, 2012). Hooker et al. (2009) argued that this is unfair to test takers. Various different appearances, generalizations, and implications of the paradox have been studied by numerous authors over the past few years (Finkelman, Hooker, & Wang, 2010; Hooker, 2010; Hooker & Finkelman, 2010; Jordan & Spiess, 2012; Van der Linden, 2012). The stated paradoxical situation is related to the explaining-away phenomenon in Bayesian networks (Pearl, 2009; Wellman & Henrion, 1993), which in statistics is known as Berkson s paradox (Berkson, 1946). In this report, we demonstrate that the MIRT paradox is an instance of this phenomenon, and we attempt to enhance the understanding of MIRT models by placing the paradox in a broader statistical modeling perspective, namely, that of graphical models and Bayesian networks (Mislevy, 1994; Pearl, 2009; Williamson, 2005). These frameworks provide a shorthand for the probabilistic relationships of interest and can help understand the properties of these relationships. We discuss a small number of MIRT modeling examples in these frameworks, illustrating the relation between the MIRT paradox and the explaining-away phenomenon, and we end with some concluding remarks. 1 Examples In the following examples, we will adhere to parametric IRT in the framework of generalized nonlinear mixed models (Mellenbergh, 1994; Rijmen, Tuerlinckx, De Boeck, & Kuppens, 2003), and we will make additional assumptions as needed; that is, we do not make assumptions about the types of items (continuous or discrete; dichotomous or polytomous), the types of latent variables (continuous or discrete), and the response functions (linear, normal, or logistic). We assume that both item response variables and latent variables are random and that item response variables can be observed, whereas latent variables cannot. (Because we make as few assumptions as possible, standard linear 1

8 factor models are included here as well.) An important assumption in both unidimensional and multidimensional IRT models is monotonicity. Monotonicity requires the probabilities for the item variables to be strictly increasing or decreasing in each latent variable, and MIRT models are monotone if and only if the latent variables are compensatory (Holland & Rosenbaum, 1986; Van der Linden, 2012). Strictly speaking, we do not need to make the monotonicity assumption, but then a unidimensional IRT model for which local independence holds can always be specified for a set of item variables (Suppes & Zanotti, 1981). Therefore we need to keep the assumption of monotonicity and will illustrate other assumptions, such as local independence, through the examples. In all our examples, we have chosen to use six items to keep things simple yet nontrivial. Furthermore, we assume that the first five items are already observed so that the sixth item is always the focal additional item that possibly creates the paradoxical situation. Figure 1 displays a partially directed acyclic graph (DAG) of a MIRT model with two latent variables θ 1 and θ 2 and six item response variables X 1, X 2,..., X 6. (It is called partially directed because not all the lines in the graph have arrowheads. A partial DAG is also referred to as a chain graph.) This model is said to be of simple structure, also referred to as a between-item two-dimensional IRT model, because every item response variable is linked to a single latent variable only. In the graph, the nodes correspond to random variables, and the directed edges represent conditional dependency relations. An advantage of using graphical models is that there is a correspondence between the property of separation of the nodes in the graph and conditional independence of the random variables in the statistical model. For example, the path X 1 θ 1 X 2 in Figure 1 illustrates an instance of so-called d-separation (Pearl, 2009, pp ); that is, the only path from X 1 to X 2 runs through θ 1, and the arrows do not meet head to head at θ 1. The fact that X 1 and X 2 are d-separated in the graph implies that they are conditionally independent given θ 1. We can generalize this to all six items in the example, and obtain the familiar IRT assumption of local independence: the joint probability of X 1, X 2,..., X 6 is conditional on θ 1 and θ 2 can 3 be written as a simple product: Pr(X 6 1, X 2,..., X 6 θ 1, θ 2 ) = j=1 Pr(X j θ 1 ) j=4 Pr(X j θ 2 ). Because of the correspondence between d-separation and conditional independence, it is 2

9 possible to determine all conditional independence relations that are entailed solely by working with the graph. Now, the MIRT paradox revolves around the beliefs about θ 1 and θ 2 in different situations. In describing the paradox, Hooker et al. (2009) always seemed to condition implicitly on X 1, X 2,..., X 5. Keeping this in mind, the MIRT paradox cannot arise for the model in Figure 1 because the only path between θ 1 and θ 2 is the undirected edge; that is, conditional on X 1, X 2,..., X 5, the additional observation of X 6 does not affect the belief about θ 1 in an unexpected manner. 3 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 X 6 Figure 1. Partially directed acyclic graph of two-dimensional IRT model with between-item multidimensionality Figure 1. Partially directed acyclic graph of two-dimensional item response latent variables (continuous or discrete), and the response functions (linear, normal, or logistic). theory model with between-item multidimensionality. We assume that both item response variables and latent variables are random, and that item Figure 2 shows the DAG of a two-dimensional IRT model for six items with so-called response variables can be observed whereas latent variables cannot. (Because we make as few within-item multidimensionality for items 3 and 4. In this figure, the paths θ 1 X 3 θ 2 assumptions as possible, standard linear factor models are included here as well.) An important and θ 1 X 4 θ 2 are so-called inverted forks and contain the first and foremost step of assumption in both unidimensional and multidimensional IRT models is that of monotonicity. explaining what happens in the MIRT paradox. These paths between θ 1 and θ 2 are not Monotonicity requires the probabilities for the item variables to be strictly increasing or decreasing in each latent variable, and MIRT models are monotone if and only if the latent variables blocked by X 3 and X 4 because the edges on these paths meet head to head. Therefore θ 1 and θ 2 are not d-separated by X 3 and X 4, and conditional independence between θ 1 and θ are compensatory (Van der Linden, 2012; Holland & Rosenbaum, 1986). Strictly speaking, we 2 given X 3 and X 4 is not implied. We note that this kind of conditional independence is different do not from need to that make typically the monotonicity used in IRT assumption, because we but condition then a unidimensional here observed IRT variables model for instead which of local on independence unobserved variables. holds can Now, alwayseven be specified θ 1 and for θ 2 aare set independent of item variables a priori, (Suppes they & become Zanotti, dependent 1981). Therefore, when we wecondition need to keep on Xthe 1,.. assumption., X 5. Furthermore, of monotonicity the observation and will illustrate of X 6 can other affect assumptions the belief about such asθ 1 local in an independence unanticipated through fashion. thethis examples. at first Insight all our counterintuitive examples, we phenomenon is called the explaining-away effect. We refrain from giving substantive have chosen to use six items in order to keep things simple, yet nontrivial. Furthermore, we assume that the first five items are already observed, so that the sixth item is always the focal 3 additional item that possibly creates the paradoxical situation. Figure 1 displays a partially directed acyclic graph (DAG) of a MIRT model with two latent variables θ 1 and θ 2, and six item response variables X 1, X 2,..., X 6. (It is called partially directed,

10 examples to be concise and because intuitive examples of this phenomenon are described by many authors (e.g., Berkson, 1946; Bishop, 2006, p. 378; Hooker & Finkelman, 2010, p. 251; Pearl, 2009, p. 17). 5 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 X 6 Figure 2. Directed acyclic graph of two-dimensional IRT model with within-item structure Figure 2. Directed acyclic graph of two-dimensional item response theory model with not implied. within-item We notestructure. that this kind of conditional independence is different from that typically used in IRT, because we condition here on observed variables instead of on unobserved variables. We emphasize that this explaining-away phenomenon can arise as long as there is Now, even if θ 1 and θ 2 are independent a priori, they become dependent when we condition on at least one inverted fork on the paths between θ 1 and θ 2 through X 1, X 2,..., X 5 that does not X 1, depend..., X 5. on Furthermore, particular the observation relation of of θ 1 Xand 6 canθ 2 affect with the X 6 belief. We about illustrate θ 1 inthis an unanticipated by two other instances fashion. This of the at first phenomenon. sight counterintuitive The first case phenomenon is illustrated is called in Figure the explaining 3, which away effect. the focal We sixth refrain variable from giving is not substantive item response examplesbut in order the variable to be concise, gender, andwhere because gender intuitive is related examples to θ 2 of. this Obviously, phenomenon observing are described gender changes by manythe authors, belief e.g., about Berkson θ 2, but (1946), the belief Bishop about (2006, θp. 1 can 378), be affected in an unexpected manner owing to the inverted forks. Again, this dependency can Pearl (2009, p. 17), and Hooker and Finkelman (2009, p. 251). arise when θ 1 and θ 2 are a priori independent and when θ 1 is unrelated to gender (as is We emphasize that this explaining way phenomenon can arise as long as there is at least the case in Figure 3). This example is particularly interesting because many applications one inverted fork on the paths between θ 1 and θ 2 through X 1, X 2,..., X 5, and does not depend of multidimensional IRT models with background variables are found in large-scale on the particular relation of θ 1 and θ 2 with X 6. We illustrate this by two other instances of assessments such as the Programme for International Student Assessment (PISA; Adams, the phenomenon. The first case is illustrated in Figure 3, in which the focal sixth variable Wilson, & Wang, 1997) and the National Assessment of Educational Progress (NAEP; Mislevy, not an 1985). item (However, response, but we the note variable that the gender, current where MIRT gender models is related in PISA to and θ 2. NAEP Obviously, have a observing between-item gender structure, changes the as in belief Figure about 1.) θa 2, second but theinstance belief about can be θ 1 can constructed be affected when in an we relate unexpected gender manner to item due to response the inverted variable forks. instead Again, of this to a dependency latent variable. can arise This when situation θ 1 andis given θ 2 arein afigure priori independent 4, where gender-related and when θ 1 differential is unrelateditem to gender functioning (as is the appears case in on Figure the fifth 3). item. This Observing example is gender particularly affects interesting, the belief because about θmany 2 through applications X 5 as well of multidimensional as the belief about IRTθ 1 models with background variables are found in4 large scale assessments such as the Programme for International Student Assessment (PISA; Adams, Wilson, & Wu, 1997) and the National

11 because of the inverted forks. To reiterate, paradoxical results in all these instances are not to be attributed to the focal sixth variable but to the inverted forks in other parts of the model. 6 6 θ 1 θ 2 X 1 X 2 X 3 X 4 X 5 Gender θ 1 θ 2 Figure 3. Directed acyclic graphxof 1 two-dimensional X 2 IRT X 3 model with X 4 within-item X 5 structure Genderand relation between gender and θ 2 Figure 3. Directed acyclic graph of two-dimensional item response theory model Figure 3. with Directed within-item acyclic graph structure of two-dimensional and relation IRT model between with within-item gender structure and θ 2. and relation between θ 1 θ 2 gender and θ 2 X 1 X 2 X 3 X 4 X 5 Gender θ 1 θ 2 Figure 4. Directed acyclic graph of X 1 two-dimensional X 2 IRTX model 3 with X 4 within-item X 5 structure Gender and gender-related DIF for X 5. Figure 4. Assessment Directed acyclic of Education graph of two-dimensional Progress (NAEP; IRT model Mislevy, with1985). within-item (However, structure we note and gender-related that the current DIF for X 5. Figure MIRT models 4. Directed in PISAacyclic and NAEP graph have of two-dimensional a between-item structure item as response in Figuretheory 1.) A second model with instance within-item can be constructed structure whenand we relate gender-related to an item differential response variable item instead functioning of a latent for Assessment of Education Progress (NAEP; Mislevy, 1985). (However, we note that the current Xvariable. 5. MIRT models This situation in PISA is and given NAEP in Figure have a 4, between-item where gender-related structure differential as in Figure item 1.) functioning A second (DIF) instance appears on the fifth item. Observing gender affects the belief about θ 2 through X 5, and Hooker can be and constructed Finkelman when (2010) we relate considered gender to the an item MIRT response paradox variable in models instead for of item a latent the belief about θ 1 as well because of the inverted forks. To reiterate, paradoxical results in all bundles. variable. They This focused situationon is given two models: in Figurethe 4, bifactor where gender-related model and the differential testlet model. item functioning In the bifactor these (DIF) instances appears model, every on are the not item fifth to be loads item. attributed Observing on a general to the gender focal dimension affects sixth variable, the and belief on an but about item to the θbundle inverted forks in 2 through dimension. X 5, and Hooker other the belief parts and about Finkelman of the θ model. 1 as well discussed because two of the cases, inverted one in forks. which Toall reiterate, latent variables paradoxical are results assumed in all to be these independent Hooker instances and and are Finkelman not one to be which (2010) attributed the considered item to the bundle the focal MIRT dimensions sixth paradox variable, are in but correlated. models to the for inverted item Independent bundles. forks in latent Theyvariables other focused parts of onare the two typically model. models: assumed the bi-factor identify model and the the bifactor testlet model, model. In which the bi-factor is the situation model, every Hooker item loads and on Finkelman a general dimension (2010) considered and on an 5 the item MIRT bundle paradox dimension. in models Hooker for and item Finkelman bundles. (2010) They focused discussed on two two models: cases, one the in bi-factor which all model latent and variables the testlet are model. assumed In the to be bi-factor independent, model, every item loads on a general dimension and on an item bundle dimension. Hooker and Finkelman

12 that we consider. An example of the bifactor model is represented in a DAG in Figure 5. Hooker and Finkelman consider a result to be paradoxical if answering an additional item (X 6 ) correctly results in a lower estimate for the general ability (θ 1 ) than when the additional item is answered incorrectly. From Figure 5, it follows that θ 1 and θ 3 are not d-separated, that is, there are paths between θ 1 and θ 3 that contain an inverted fork (in fact, all paths do). Hence the explaining-away phenomenon can occur, and paradoxical results are possible for this bifactor model. Hooker and Finkelman (2010) derived mathematically the specific conditions under which paradoxical results occur for the more general bifactor model. From their mathematical derivations, it follows that paradoxical results are not possible when the loadings of the bifactor model are restricted according to the so-called testlet model (a testlet model is a restricted bifactor model; see Rijmen, 2010). The fact that paradoxical results cannot occur for the testlet model (with independent nuisance dimensions) can be shown directly by looking at the corresponding DAG, alleviating the need for mathematical derivations. First, one should realize that the testlet model is a Schmid Leiman transformed second-order model (see, e.g., Yung, Thissen, & McLeod, 1999). Then, the conditional independence relations can be observed from the DAG of the equivalent second-order model, which is presented in Figure 6. In this figure, it is easily seen that θ 1 and θ 3 are always dependent because the path from θ 1 to θ 3 has a directed edge. However, θ 1 is independent from X 4, X 5, and X 6 is conditional on θ 3 ; that is, conditional on θ 3, the observation of X 6 does not change the belief about θ 1 in an unexpected manner. Therefore, as long as monotonicity holds, paradoxical results cannot occur in this case. 2 Concluding Remarks We have shown that the MIRT paradox utilized by Hooker et al. (2009) is an instance of the explaining-away phenomenon. Specifically, the so-called inverted fork in the path between latent variables is the main cause of the phenomenon. In many of the MIRT paradox papers, intuitions are built up from an educational measurement perspective, which causes the result to be surprising. However, we made use of the frameworks of graphical models and Bayesian networks in which this phenomenon is well established. We chose 6

13 8 θ 1 X 1 X 2 X 3 X 4 X 5 X 6 θ 2 θ 3 Figure 5. Directed acyclic graph of bi-factor three-dimensional IRT model. Figure 5. Directed acyclic graph of bifactor three-dimensional item response θ theory model. 1 these frameworks because the conditional dependencies between the variables in a specific θ 2 θ 3 model can be derived directly from its graph, independent of different parameterizations and link functions. X 1 X 2 X 3 X 4 X 5 X 6 The work of Hooker et al. (2009) is nevertheless to be lauded because they described the exact mechanics of the paradox in MIRT Figure in great 6. detail. We disagree, however, with the somewhat pessimistic Directed acyclic conclusions graph of of second-order Jordan and (orspiess testlet) (2012) three-dimensional and Van IRT der model. Linden (2012) on the usefulness of MIRT models. The MIRT paradox is a general statistical paradox that Concluding remarks holds for many models with multiple competing explanatory variables and is accepted in many contexts other than psychometrics such as biostatistics and artificial intelligence. We We have shown that the MIRT paradox utilized by Hooker, Finkelman, and Schwartzman find that the issue of test fairness raised by Hooker et al. (2009) and Jordan and Spiess (2009) is an instance of the explaining away phenomenon. Specifically, the so-called inverted fork (2012) results from confounding different views on the purpose of tests. For example, in the path between latent variables is the main cause of the phenomenon. In many of the MIRT Holland (1994) distinguished between tests as contests and tests as measurement. The paradox papers, intuitions are built up from an educational measurement perspective, which contest view can result in a firm belief that more items correct should result in a higher cause the result to be surprising. However, we made use of the frameworks of graphical models score, a feature that nevertheless pertains to relatively few IRT models (Van der Linden, 2012). and Bayes In the networks measurement in which view, thismodel phenomenon selection is well is perhaps established. the most We chose important these frameworks, issue so that because test-based the conditional inferences dependencies are sound. A between third view the variables on tests, in raised a specific by model Mislevy can(1994), be derived suggests directlythat fromtests its graph, can be independent used as sources of different of information parameterizations for evidentiary and linkreasoning functions. about students, for example, as in models for cognitive diagnosis. Preventing paradoxical results 7

14 θ 2 θ 3 Figure 5. Directed acyclic graph of bi-factor three-dimensional IRT model. θ 1 θ 2 θ 3 X 1 X 2 X 3 X 4 X 5 X 6 Figure 6. Directed acyclic graph of second-order (or testlet) three-dimensional IRT model. Figure 6. Directed acyclic graph of second-order (or testlet) three-dimensional Concluding remarks item response theory model. We have shown that the MIRT paradox utilized by Hooker, Finkelman, and Schwartzman might be relevant in the contest perspective on tests, but we argue that it is less relevant in (2009) is an instance of the explaining away phenomenon. Specifically, the so-called inverted fork the latter two perspectives on the purposes of educational tests. in the path between latent variables is the main cause of the phenomenon. In many of the MIRT paradox papers, intuitions are built up from an educational measurement perspective, which cause the result to be surprising. However, we made use of the frameworks of graphical models and Bayes networks in which this phenomenon is well established. We chose these frameworks, because the conditional dependencies between the variables in a specific model can be derived directly from its graph, independent of different parameterizations and link functions. 8

15 References Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, Berkson, J. (1946). Limitations of the application of fourfold tables to hospital data. Biometrics Bulletin, 2, Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer. Finkelman, M., Hooker, G., & Wang, J. (2010). Prevalence and magnitude of paradoxical results in multidimensional item response theory. Journal of Educational and Behavioral Statistics, 35, Holland, P. W. (1994). Measurements or contests? Comments on Zwick, Bond and Allen/Donoghue. In Proceedings of the Social Statistics Section of the American Statistical Association (pp ). Alexandria, VA: American Statistical Association. Holland, P. W., & Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent variable models. Annals of Statistics, 14, Hooker, G. (2010). On separable tests, correlated priors, and paradoxical results in multidimensional item response theory. Psychometrika, 75, Hooker, G., & Finkelman, M. (2010). Paradoxical results and item bundles. Psychometrika, 75, Hooker, G., Finkelman, M., & Schwartzman, A. (2009). Paradoxical results in multidimensional item response theory. Psychometrika, 74, Jordan, P., & Spiess, M. (2012). Generalizations of paradoxical results in multidimensional item response theory. Psychometrika, 77, Lord, F. M. (1962). Cutting scores and errors of measurement. Psychometrika, 27, Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115, Mislevy, R. J. (1985). Estimating latent group effects. Journal of the American Statistical Association, 80, Mislevy, R. J. (1994). Evidence and inference in educational assessment. Psychometrika, 9

16 59, Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). New York: Cambridge University Press. Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, Suppes, P., & Zanotti, M. (1981). When are probabilistic explanations possible? Synthese, 48, Van der Linden, W. J. (2012). On compensation in multidimensional response modeling. Psychometrika, 77, Wellman, M. P., & Henrion, M. (1993). Explaining explaining away. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, Williamson, J. (2005). Bayesian nets and causality. Oxford: Oxford University Press. Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64,

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation