Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Size: px
Start display at page:

Download "Sensitivity of DFIT Tests of Measurement Invariance for Likert Data"

Transcription

1 Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA. Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Adam W. Meade North Carolina State University Gary J. Lautenschlager University of Georgia While popular, few studies have assessed the efficacy of the Differential Functioning of Items and Tests (DFIT) methodology for assessing measurement invariance with Likert data. Monte-Carlo analyses indicate a lack of sensitivity of the DFIT methodology for identifying lack of measurement invariance under some conditions of differential functioning. Likert scales are routinely used in educational and psychological research as measures of constructs of interest. If sound scale development procedures are followed, the resulting scale can reliably and validly measure a construct. However, if a given scale is used to make comparisons among different populations of respondents (e.g., cultures; Riordan & Vandenberg, 1994), over time in longitudinal measurements (Golembiewski, Billingsley, & Yeager, 1976), or across different mediums of data collection (Ployhart, Weekley, Holtz, & Kemp, 2003), measurement invariance must be established before meaningful comparisons in observed data can be made (Raju, Laffitte, & Byrne, 2002; Taris, Bok, & Meijer, 1998; Vandenberg, 2002). Traditionally, confirmatory factor analytic (CFA) methods have been used to assess measurement invariance in Likert-type data. Recently however, IRT methods of establishing measurement invariance for these data have gained greater acceptance. IRT methods of establishing measurement invariance typically have used the nomenclature of differential item functioning (DIF). Originally developed for identifying biased test items in dichotomous data, DIF assessments have been adapted to polytomous data. DIF is said to occur when the relationship between levels of examinees latent trait (θ) and the probability of responses for a particular item differ between two groups (Camilli & Shepard, 1994). DIF can also be thought of as multidimensionality or as a factor affecting item responses other than participants θ level (Camilli & Shepard, 1994). For polytomous items, DIF can be thought of as differences in item true score functions across two groups (Cohen, Kim, & Baker, 1993). The Differential Functioning of Items and Tests (DFIT) framework (Raju, van der Linden, & Fleer, 1995) has been advanced for assessing both DIF and differential test functioning (DTF). Although the DFIT methodology is relatively new, it has been used in several studies published in prestigious journals (e.g., Collins, Raju, & Edwards, 2000; Donovan, Drasgow, & Probst, 2000; Ellis & Mead, 2000; Facteau & Craig, 2001; Flowers, Oshima, & Raju, 1999; Maurer, Raju, & Collins, 1999; Raju, et al., 2002). Articles published in these journals (including Applied Psychological Measurement, Educational and Psychological Measurement, and Journal of Applied Psychology), can have considerable influence on researchers undertaking similar studies and lend a high level of credibility to the DFIT methodology. Despite its high profile use in the past few years, the DFIT methodology is still under development. Few Monte-Carlo studies have been published that examine the efficacy of the DFIT program for detecting DIF in Likert data. In this study, we simulated data with known DIF in order to determine the efficacy of the DFIT program for detecting DIF and DTF. DFIT Method Overview

2 Sensitivity of DFIT 2 The DFIT methodology developed by Raju et al. (1995) is a system designed to test for differential functioning at both the item and test level. This method provides three indices in total: an index of differential test functioning (DTF), an index of compensatory differential item functioning (CDIF), and an index of non-compensatory differential item functioning (NCDIF). However, only the NCDIF and DTF indices are used to assess the equivalence of a measure across samples. While a brief overview of these statistics is provided below, more detail and accompanying formula are provided in Raju et al. (1995) and Flowers et al. (1999). In order to detect overall differential functioning at the test level, two sets of expected test scores are compared. As an initial step, item parameters are estimated separately for the focal (i.e., minority) and referent (i.e., majority) groups. Next an expected test score is computed for the focal group using item parameters estimated for that group. Additionally, an expected test score is computed for the focal group using item parameters estimated with the referent group sample. In these latter analyses, expected test scores are computed for the focal group as if they had been members of the referent group. A test is said to function differently if the expected test score for the focal group is not equal to the expected test score of the focal group estimated as though they had been members of the referent group. The premise behind the DTF concept is that differences in item parameters for different items on the same scale can cancel out, leading to the same theta estimate at the scale level for two groups of respondents despite differing item parameter estimates for the two groups. For example, Item 2 may favor American respondents such that a given level of a latent trait,, is associated with a higher probability of a high observed response than for a German sample. If, however, Item 5 favors the German sample in a similar manner, the differences in item parameters may cancel out across these two items leading to comparable estimates for these populations. Though individual items may function differently across samples, the theta estimates based on the scale as a whole should not differ for a given level of a latent trait. The non-compensatory DIF index (NCDIF) is an item level index and is not used to draw conclusions regarding the scale as a whole. This index is more similar to traditional indices of DIF, namely Lord s (1980) χ 2 and Raju s (1988) unsigned area index (Raju et al., 1995; Flowers et al., 1999). The NCDIF index assumes all items other than the item being examined are free from DIF. This index is referred to as non-compensatory because the index does not take into account differential functioning of other scale items. Though DTF and NCDIF can be evaluated by parametric 2 tests, these tests are thought to erroneously over-identify items as exhibiting DIF (Fleer, 1993 cited in Raju et al., 1995). As a result, Raju et al. recommend using cutoff values to identify DIF items. In their original 1995 article, Raju et al. recommended a cutoff value of.016 for the noncompensatory DIF (NCDIF) index for polytomous data with five response options. In the documentation accompanying the most recent version of the software, the recommended NCDIF cutoff value was.096 for data with the same properties (Raju, 2000). The DTF cutoff value is computed as the NCDIF cutoff value multiplied by the number of scale items (Raju, 2000), while the CDIF values are used to select which items might be removed from a scale in order to eliminate DTF. Thus, all three indices are affected by the change in cutoff value. Empirical Investigation of the DFIT Methodology Though several published studies have used the DFIT program, few published studies have examined the efficacy of the DFIT methodology with data known to show a lack of measurement invariance with Likert data. In the only published article that we could locate, Flowers et al. (1999) simulated Likert scale data with either 20 or 40 scale items with five response options and found adequate DIF detection with the methodology. Importantly, Flowers et al. simulated two thousand items with no DIF in order to determine the 99 th percentile of the NCDIF index to guard against Type I error. As a result, they used an empirically established cutoff value of.016 for the NCDIF index, which corresponds to that originally recommended by Raju et al. (1995). While the results of the Flowers et al. (1999) study were promising, we believe that the scale length of 20 or 40 items is unrealistically long for many unidimensional attitudinal and personality-type constructs measured in practice. Moreover, we were interested in simulating both more complicated types of DIF as well as DIF that affected only a few item parameters for items exhibiting DIF. Lastly, while we applaud Flowers et al. for empirically establishing their own NCDIF critical value, it would seem that in practice most researchers would be more likely to use the cutoff indices of.096 (for 5-response option items) recommended by the software creators (i.e., Raju, 2000). Thus, we simulated data under a variety of conditions in order to examine the usefulness of the DTF and item-level NCDIF indices as they are likely to be used in practice. Method

3 Sensitivity of DFIT 3 Data were simulated to represent 500 participant responses on a twelve-item Likert scale with five response options for both a referent group and focal group. These data were meant to represent a single scale measuring a single unidimensional construct rather than a multi-factor survey. This smaller number of items is in sharp contrast to the 20 and 40 item measures simulated by Flowers et al. (1999). However these data have the potential benefit of extending our understanding to unidimensional constructs frequently measured by Likert scales with twelve (or fewer) items. Initial Item Properties In this study, item a and b parameters were manipulated in order to simulate DIF. In order to manipulate the amount of DIF present, referent group item parameters were simulated under the graded response model (Samejima, 1969), then these item parameters were changed in various ways in order to simulate focal group item parameters. A random normal distribution, N[=-1.7, =.45] was sampled in order to generate the threshold values for the lowest boundary response function (BRF) for the referent group data. Constants of 1.2, 2.4, and 3.6 were then added to the lowest threshold in order to generate the threshold parameters of the other three BRFs necessary to generate Likert-type data with five category response options. These constants were chosen in order to provide adequate separation between BRF threshold values and to result in probable BRFs ranging from approximately 2.0 to Actual values of the referent group item parameters can be found in Table 1. The a parameter for each item was also sampled from a random normal distribution, N[=1.25, =.07]. This distribution was chosen in order to create item a parameters which have a probable range of 1.0 to 1.5. Data were generated using the GENIRV item response generator (Baker, 1994). Simulating DIF Either 2, 4, or 8 items were simulated to exhibit varying amounts of DIF between groups. One hundred data samples were simulated for each condition described below. A parameter DIF. Items a parameters were simulated to be different across groups by subtracting.25 from the referent group s a parameter for each DIF item in order to create the focal group s a parameter. There was also a condition in which the a parameters for DIF items were not changed so that the effect of different b parameters could be examined in absence of DIF in a parameters. B parameter DIF. In addition to a parameters varying between groups, DIF items b parameters also were varied. Although the manner in which the items b parameters changed varied in several different ways (to be discussed below), the overall magnitude of the variation was the same for each condition. Specifically, for each DIF item in which b parameters varied between groups, a value of.40 was either added to, or subtracted from, the referent group b parameters in order to create the focal group b parameters. These changes are large enough value to be detectable yet not so large as to potentially cause overlap with other item parameters (which are 1.2 units apart). There were three conditions simulated in which items b parameters varied. In the first condition, only one b parameter differed for each DIF item. In general, this was accomplished by adding.40 to the referent group s largest b parameter value in order to create the focal group s largest b parameter value. This condition represents a case in which the most extreme option (i.e., a Likert rating of 5) is more likely to be used by the referent group than the focal group for persons with the same theta values. In a second condition, each DIF items largest two b parameters were set to differ between groups. Again, this was accomplished by adding.40 to the referent group s largest two b parameters in order to create the focal group s item parameters for the DIF items. This situation represents the case in which the highest two response options (e.g., 4 and 5) are more likely to be used by the referent group than the focal group. In a third condition, each DIF items two most extreme b parameters were simulated to differ between groups. This was accomplished by adding.40 to the item s largest b parameter while simultaneously subtracting the same value from the item s lowest b parameter. This situation represents the case in which the focal group is more prone to central tendency in their ratings than is the referent group. Such tendencies have been observed in crosscultural studies in which some cultures are less likely to use extreme response options (e.g., Clarke, 2000; Cheung & Rensvold, 2000; Hui & Triandis, 1989; Watkins & Cheung, 1995). Lastly, a condition was also simulated in which DIF items b parameters did not vary between groups. This condition allows the impact of difference in a parameters to be detected in absence of differences in b parameters. Note that while these levels of DIF are somewhat less than those simulated in the Flowers et al. (1999) study, they are nearly identical to those simulated by Meade and Lautenschlager (2004). In

4 Sensitivity of DFIT 4 that study, these levels of DIF were readily detected by both IRT likelihood ratio tests (Thissen, Steinberg, & Wainer, 1988, 1993) and by confirmatory factor analysis test of measurement invariance. Cancellation of DIF. The last set of conditions simulated allow for the possibility of DIF canceling out across items. We choose to simulate canceling DIF as the NCDIF index does not take into account the DIF of other scales items while the DTF index allows for such cancellation. Thus, we expected cancellation of DIF to have little effect on the performance of the NCDIF indices (which were expected to identify all DIF items), but to result in no perceptible DTF due to the compensatory nature of this index. For these conditions, DIF was simulated to either cancel across items or not cancel across items. If DIF was simulated to cancel across items, then items a and b parameters were set to vary in opposite ways for different items. For example, consider the scenario in which two DIF items were simulated in which the items a parameters do not differ but the items largest b parameters differ between the referent and focal groups. If cancellation of DIF was simulated, then.40 was added to largest b parameter for the first of the two DIF items while.40 was subtracted from the largest b parameter of the second DIF item. Similarly, if four of the items in the scenario described above were simulated to have DIF, and DIF was simulated to cancel across items, then two of the items were simulated to have values added to their b parameters while two have values subtracted from their b parameters. An overview of the conditions simulated in this study can be found in Table 2. Data Analysis In order to detect differences in item parameters between the referent group and focal group using the DFIT method, item parameters must be estimated and put onto the same metric using a linking procedure. Item parameters were estimated using the MULTILOG program (Thissen, 1991) while linking was accomplished by using Baker s modified test characteristic curve to estimate equating coefficients using the Equate 2.1 program (Baker, 1995). To evaluate the efficacy of the item-level NCDIF index, true positive (TP) rates for each replication sample were assessed by calculating the number of items generated to have DIF that were successfully detected as DIF items by the NCDIF statistic divided by the total number of DIF items generated for the sample. True negative (TN) rates were calculated by taking the number of items not flagged as DIF items divided by the total number of items simulated to not contain DIF for a given sample 1. These TP and TN rates were then averaged across all 100 replications simulated for each condition. Results Our results indicated that the recommended NCDIF index cutoff value of.096 (Raju, 2000) currently is set far too high in order to be able to detect the levels of DIF simulated in this study. While, the TN rate for the index was perfect (1.0) for all conditions simulated, the TP rate was less than 1% (.01) for all conditions. For all but four of the 43 conditions simulated, the TP rate was zero, indicating that no DIF items were detected across 100 data sample replications with 12 items per sample. These results were quite surprising and suggest the need for further consideration of the critical value associated with NCDIF in order to increase sensitivity to DIF. As DTF cutoff scores for indicating significant DTF are computed as the product of the critical values of the NCDIF cutoff scores and the number of scale items (Raju, 2000), not surprisingly, not a single DTF value was considered significant in any of the 4,300 data sets analyzed. Discussion An empirical investigation using data with simulated DIF suggests that the TP rates of the DFIT program indices are too low to be of practical value to researchers with the current cutoff values. Both the NCDIF and DTF indices were very poor at identifying simulated differences. It is particularly interesting that both TN and TP rates were somewhat dependent on whether simulated DIF cancelled. Searcy and Lautenschlager (1999) had reported a similar dependency in another Monte Carlo study of the DFIT framework using the graded response model. A focal point of the DFIT approach is that it allows for compensatory handling of DIF, however this strength is itself dependent upon a need for such compensation. We found that the effectiveness of the DTF approach is better in those conditions where DIF was simulated to cancel. As with all simulation studies, our study was limited in scope. We simulated relatively minor, but significant amounts of DIF in order to determine the efficacy of the DFIT program for detecting this DIF. While we attempted to simulate relatively minor DIF in order to establish the boundary conditions at which 1 False positive rates can be computed as 1 TN and false negatives can be computed as 1 TP, thus these indices are not reported.

5 Sensitivity of DFIT 5 NCDIF is sensitive to simulated DIF, we did not simulate more extreme DIF as a comparison. However, the amount of DIF simulated in this study nearly identical to that of other studies in which other measurement invariance tests such as likelihood ratio tests and confirmatory factor analysis tests were able to detect these simulated differences (Meade & Lautenschlager, 2004). Thus, though somewhat less DIF was simulated in this study as compared to the Flowers et al. (1999) study, these levels of DIF have been shown to be detectable by other methods. However, the specifics of the simulation conditions represent only a very small number of possible scenarios of Likert scales. Thus, future research is needed to both determine the optimal cutoff score to be used with the NCDIF index and to identify the conditions under which DFIT is likely to indicate that DIF and DTF exist. Conclusions and Recommendations Empirically, DFIT analyses indicate which items favor and penalize the focal group. We caution however that it is also too frequently observed that no rationale can be given for why any given item should favor/penalize a given group. From a practical standpoint then, the use of DFIT to empirically balance DIF items may give some comfort, but it does not advance our understanding of why items operate differently for different groups. From a theoretical standpoint we need more attention devoted to why items operate differently. This seems especially true when one considers that the DFIT framework allows for two undefined (and often unknown) sources of differences between groups to cancel out as reciprocal off-setting influences on responses that minimize DTF. Given this very empirical basis for the DFIT framework and our results, we encourage some amount of caution and increased examination of rationales when using the DFIT framework. Raju and his colleagues have made many significant contributions to the IRT literature and have introduced a promising methodology with the DFIT program. However, we hope that this paper initiates further refinement of the methodology. References Baker, F. B. (1994). GENIRV: Computer program for generating item response theory data. Madison: University of Wisconsin, Laboratory of Experimental Design. Baker, F. B. (1995). Equate 2.1: Computer program for equating two metrics in item response theory. Madison: University of Wisconsin, Laboratory of Experimental Design. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage. Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modeling. Journal of Cross Cultural Psychology, 31, Clarke, I., III. (2000). Extreme response style in cross-cultural research: An empirical investigation. Journal of Social Behavior and Personality, 15, Cohen, A. S., Kim, S., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, Donovan, M. A., Drasgow, F., & Probst, T. M. (2000). Does computerizing paper-andpencil job attitude scales make a difference? New IRT analyses offer insight. Journal of Applied Psychology, 85, Ellis, B. B., & Mead, A. D. (2000). Assessment of the measurement equivalence of a Spanish translation of the 16PF Questionnaire. Educational & Psychological Measurement, 60, Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources comparable? Journal of Applied Psychology, 86(2), Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias. (Doctoral dissertation, Illinois Institute of Technology). Dissertation Abstracts International, 54-04, 2266B. Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous-dfit framework. Applied Psychological Measurement, 23, Golembiewski, R. T., Billingsley, K., & Yeager, S. (1976). Measuring change and persistence in human affairs: Types of change generated by OD designs. Journal of Applied Behavioral Science, 12, Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response style. Journal of Cross Cultural Psychology, 20, Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

6 Sensitivity of DFIT 6 Maurer, T. J., Raju, N. S., Collins, W. C. (1998). Peer and subordinate performance appraisal measurement equivalence. Journal of Applied Psychology, 83, Meade, A. W., & Lautenschlager, G. J. (2004). A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance. Organizational Research Methods, 7(4), Ployhart, R. E., Weekley, J. A., Holtz, B. C., & Kemp, C. (2003). Web-based and paperand-pencil testing of applicants in a proctored setting: Are personality, biodata and situational judgment tests comparable? Personnel-Psychology, 56, Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, Raju, N. (2000). Notes accompanying the differential functioning of items and tests (DFIT) computer program. Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, Raju, N., van der Linden, W., & Fleer, P. (1995). An IRT-based internal measure of test bias with applications for differential item functioning. Applied Psychological Measurement, 19, Riordan, C. M., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do employees of different cultures interpret work-related measures in an equivalent manner? Journal of Management, 20, Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement. Searcy, C. A. & Lautenschlager, G. J. (1999). A Monte Carlo Investigation of DIF Assessment for Polytomously Scored Items. Paper presented at the 14th Annual Meeting of the Society for Industrial/Organizational Psychology, Atlanta, GA. Taris, T. W., Bok, I. A., & Meijer, Z. Y. (1998). Assessing stability and change of psychometric properties of multi-item concepts across different situations: A general approach. Journal of Psychology, 132, Thissen, D. (1991). MULTILOG users guide: Multiple categorical item analysis and test scoring using item response theory (computer program). Chicago: Scientific Software International. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp ). Hillsdale, NJ: Erlbaum. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp ). Hillsdale, NJ: Erlbaum. Vandenberg, R. J. (2002). Toward a further understanding of an improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, Watkins, D., & Cheung, S. (1995). Culture, gender, and response bias: An analysis of responses to the Self-Description Questionnaire. Journal of Cross Cultural Psychology, 26, Author Contact Info: Adam W. Meade Department of Psychology North Carolina State University Campus Box 7801 Raleigh, NC Phone: Fax: adam_meade@ncsu.edu Gary J. Lautenschlager Department of Psychology University of Georgia Athens, GA garylaut@uga.edu Phone: Fax:

7 Sensitivity of DFIT 7 Table 1 Item Parameters for Referent Group Item # A B1 B2 B3 B

8 Sensitivity of DFIT 8 Table 2 Conditions Overview for Simulated Data Number of Scale Items 12 Number of DIF items 2, 4, 8 Cancellation of DIF No Yes A Parameters Different Yes No Yes No B Parameters Different X X X 1 2 2X Note: 0 represents no b parameters differing across groups. 1 represents the largest b parameter differing across groups. 2 represents the largest two b parameters differing across groups. 2X represents both the largest and smallest b parameters differing across groups.

9 Sensitivity of DFIT 9 Table 3 TP and TN Rates for the Chi-Square Test Associated with the NCDIF Index Condition # DIF Cancel A Items DIF? DIF? B DIF TP TN 1 0 No No None No No Highest No No Highest No No Extremes No Yes None No Yes Highest No Yes Highest No Yes Extremes Yes No Highest Yes No Highest Yes No Extremes Yes Yes None Yes Yes Highest Yes Yes Highest Yes Yes Extremes No No Highest No No Highest No No Extremes No Yes None No Yes Highest No Yes Highest No Yes Extremes Yes No Highest Yes No Highest Yes No Extremes Yes Yes None Yes Yes Highest Yes Yes Highest Yes Yes Extremes No No Highest No No Highest No No Extremes No Yes None No Yes Highest No Yes Highest No Yes Extremes

10 Sensitivity of DFIT 10 Condition # DIF Cancel A Items DIF? DIF? B DIF TP TN 37 8 Yes No Highest Yes No Highest Yes No Extremes Yes Yes None Yes Yes Highest Yes Yes Highest Yes Yes Extremes

A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance

A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance 10.1177/1094428104268027 ORGANIZATIONAL Meade, Lautenschlager RESEARCH / COMP ARISON METHODS OF IRT AND CFA A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing

More information

Examining context effects in organization survey data using IRT

Examining context effects in organization survey data using IRT Rivers, D., Meade, A. W., & Fuller, W. L. (2007, April). Examining context effects in organization survey data using IRT. Paper presented at the 22nd Annual Meeting of the Society for Industrial and Organizational

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE CHRISTOPHER DAVID NYE DISSERTATION

THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE CHRISTOPHER DAVID NYE DISSERTATION THE DEVELOPMENT AND VALIDATION OF EFFECT SIZE MEASURES FOR IRT AND CFA STUDIES OF MEASUREMENT EQUIVALENCE BY CHRISTOPHER DAVID NYE DISSERTATION Submitted in partial fulfillment of the requirements for

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test

Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test Journal of Modern Applied Statistical Methods Volume 7 Issue 2 Article 3 11-1-2008 Comparing Factor Loadings in Exploratory Factor Analysis: A New Randomization Test W. Holmes Finch Ball State University,

More information

Violating the Independent Observations Assumption in IRT-based Analyses of 360 Instruments: Can we get away with It?

Violating the Independent Observations Assumption in IRT-based Analyses of 360 Instruments: Can we get away with It? In R.B. Kaiser & S.B. Craig (co-chairs) Modern Analytic Techniques in the Study of 360 Performance Ratings. Symposium presented at the 16 th annual conference of the Society for Industrial-Organizational

More information

Measurement Equivalence of Ordinal Items: A Comparison of Factor. Analytic, Item Response Theory, and Latent Class Approaches.

Measurement Equivalence of Ordinal Items: A Comparison of Factor. Analytic, Item Response Theory, and Latent Class Approaches. Measurement Equivalence of Ordinal Items: A Comparison of Factor Analytic, Item Response Theory, and Latent Class Approaches Miloš Kankaraš *, Jeroen K. Vermunt* and Guy Moors* Abstract Three distinctive

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

To link to this article:

To link to this article: This article was downloaded by: [Vrije Universiteit Amsterdam] On: 06 March 2012, At: 19:03 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Assessing the item response theory with covariate (IRT-C) procedure for ascertaining. differential item functioning. Louis Tay

Assessing the item response theory with covariate (IRT-C) procedure for ascertaining. differential item functioning. Louis Tay ASSESSING DIF WITH IRT-C 1 Running head: ASSESSING DIF WITH IRT-C Assessing the item response theory with covariate (IRT-C) procedure for ascertaining differential item functioning Louis Tay University

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

An Introduction to Missing Data in the Context of Differential Item Functioning

An Introduction to Missing Data in the Context of Differential Item Functioning A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

International Journal of Education and Research Vol. 5 No. 5 May 2017

International Journal of Education and Research Vol. 5 No. 5 May 2017 International Journal of Education and Research Vol. 5 No. 5 May 2017 EFFECT OF SAMPLE SIZE, ABILITY DISTRIBUTION AND TEST LENGTH ON DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING MANTEL-HAENSZEL STATISTIC

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size

A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size Georgia State University ScholarWorks @ Georgia State University Educational Policy Studies Dissertations Department of Educational Policy Studies 8-12-2009 A Monte Carlo Study Investigating Missing Data,

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Survey Sampling Weights and Item Response Parameter Estimation

Survey Sampling Weights and Item Response Parameter Estimation Survey Sampling Weights and Item Response Parameter Estimation Spring 2014 Survey Methodology Simmons School of Education and Human Development Center on Research & Evaluation Paul Yovanoff, Ph.D. Department

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison Using the Testlet Model to Mitigate Test Speededness Effects James A. Wollack Youngsuk Suh Daniel M. Bolt University of Wisconsin Madison April 12, 2007 Paper presented at the annual meeting of the National

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Priya Kannan. M.S., Bangalore University, M.A., Minnesota State University, Submitted to the Graduate Faculty of the

Priya Kannan. M.S., Bangalore University, M.A., Minnesota State University, Submitted to the Graduate Faculty of the COMPARING DIF DETECTION FOR MULTIDIMENSIONAL POLYTOMOUS MODELS USING MULTI GROUP CONFIRMATORY FACTOR ANALYSIS AND THE DIFFERENTIAL FUNCTIONING OF ITEMS AND TESTS by Priya Kannan M.S., Bangalore University,

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D. Psicológica (2009), 30, 343-370. SECCIÓN METODOLÓGICA Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data Zhen Li & Bruno D. Zumbo 1 University

More information

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods

Scoring Multiple Choice Items: A Comparison of IRT and Classical Polytomous and Dichotomous Methods James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 3-008 Scoring Multiple Choice Items: A Comparison of IRT and Classical

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS

A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DIFFERENTIAL RESPONSE FUNCTIONING FRAMEWORK FOR UNDERSTANDING ITEM, BUNDLE, AND TEST BIAS ROBERT PHILIP SIDNEY CHALMERS A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN By FRANCISCO ANDRES JIMENEZ A THESIS PRESENTED TO THE GRADUATE SCHOOL OF

More information

Measurement Invariance (MI): a general overview

Measurement Invariance (MI): a general overview Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

MULTISOURCE PERFORMANCE RATINGS: MEASUREMENT EQUIVALENCE ACROSS GENDER. Jacqueline Brooke Elkins

MULTISOURCE PERFORMANCE RATINGS: MEASUREMENT EQUIVALENCE ACROSS GENDER. Jacqueline Brooke Elkins MULTISOURCE PERFORMANCE RATINGS: MEASUREMENT EQUIVALENCE ACROSS GENDER by Jacqueline Brooke Elkins A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Arts in Industrial-Organizational

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods Journal of Modern Applied Statistical Methods Volume 11 Issue 1 Article 14 5-1-2012 Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

Multidimensionality and Item Bias

Multidimensionality and Item Bias Multidimensionality and Item Bias in Item Response Theory T. C. Oshima, Georgia State University M. David Miller, University of Florida This paper demonstrates empirically how item bias indexes based on

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use Heshan Sun Syracuse University hesun@syr.edu Ping Zhang Syracuse University pzhang@syr.edu ABSTRACT Causality

More information

Known-Groups Validity 2017 FSSE Measurement Invariance

Known-Groups Validity 2017 FSSE Measurement Invariance Known-Groups Validity 2017 FSSE Measurement Invariance A key assumption of any latent measure (any questionnaire trying to assess an unobservable construct) is that it functions equally across all different

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Rasch Versus Birnbaum: New Arguments in an Old Debate

Rasch Versus Birnbaum: New Arguments in an Old Debate White Paper Rasch Versus Birnbaum: by John Richard Bergan, Ph.D. ATI TM 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax: 520.323.9139 Copyright 2013. All rights reserved. Galileo

More information

Modeling DIF with the Rasch Model: The Unfortunate Combination of Mean Ability Differences and Guessing

Modeling DIF with the Rasch Model: The Unfortunate Combination of Mean Ability Differences and Guessing James Madison University JMU Scholarly Commons Department of Graduate Psychology - Faculty Scholarship Department of Graduate Psychology 4-2014 Modeling DIF with the Rasch Model: The Unfortunate Combination

More information

A Simulation Study Comparing Two Methods Of Evaluating Differential Test Functioning (DTF): DFIT and the Mantel-Haenszel/Liu-Agresti Variance

A Simulation Study Comparing Two Methods Of Evaluating Differential Test Functioning (DTF): DFIT and the Mantel-Haenszel/Liu-Agresti Variance Georgia State University ScholarWorks @ Georgia State University Educational Policy Studies Dissertations Department of Educational Policy Studies 12-18-2014 A Simulation Study Comparing Two Methods Of

More information

Psychology Science Quarterly, Volume 51, 2009 (2), pp

Psychology Science Quarterly, Volume 51, 2009 (2), pp Psychology Science Quarterly, Volume 51, 2009 (2), pp. 148-180 Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS):

More information

Detecting Aberrant Responding on Unidimensional Pairwise Preference Tests: An Application of based on the Zinnes Griggs Ideal Point IRT Model

Detecting Aberrant Responding on Unidimensional Pairwise Preference Tests: An Application of based on the Zinnes Griggs Ideal Point IRT Model University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School January 2013 Detecting Aberrant Responding on Unidimensional Pairwise Preference Tests: An Application of based

More information

Computerized Adaptive Testing for Classifying Examinees Into Three Categories

Computerized Adaptive Testing for Classifying Examinees Into Three Categories Measurement and Research Department Reports 96-3 Computerized Adaptive Testing for Classifying Examinees Into Three Categories T.J.H.M. Eggen G.J.J.M. Straetmans Measurement and Research Department Reports

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

Technical Brief for the THOMAS-KILMANN CONFLICT MODE INSTRUMENT

Technical Brief for the THOMAS-KILMANN CONFLICT MODE INSTRUMENT TM Technical Brief for the THOMAS-KILMANN CONFLICT MODE INSTRUMENT Japanese Amanda J. Weber Craig A. Johnson Richard C. Thompson CPP Research Department 800-624-1765 www.cpp.com Technical Brief for the

More information

Running head: Situational Judgment Item Validity. Toward an Understanding of Situational Judgment Item Validity. Michael A.

Running head: Situational Judgment Item Validity. Toward an Understanding of Situational Judgment Item Validity. Michael A. Situational judgment item validity 1 Running head: Situational Judgment Item Validity Toward an Understanding of Situational Judgment Item Validity Michael A. McDaniel Virginia Commonwealth University

More information

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc.

Item Response Theory. Robert J. Harvey. Virginia Polytechnic Institute & State University. Allen L. Hammer. Consulting Psychologists Press, Inc. IRT - 1 Item Response Theory Robert J. Harvey Virginia Polytechnic Institute & State University Allen L. Hammer Consulting Psychologists Press, Inc. IRT - 2 Abstract Item response theory (IRT) methods

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

The Influence of Conditioning Scores In Performing DIF Analyses

The Influence of Conditioning Scores In Performing DIF Analyses The Influence of Conditioning Scores In Performing DIF Analyses Terry A. Ackerman and John A. Evans University of Illinois The effect of the conditioning score on the results of differential item functioning

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Journal of Applied Psychology

Journal of Applied Psychology Journal of Applied Psychology Effect Size Indices for Analyses of Measurement Equivalence: Understanding the Practical Importance of Differences Between Groups Christopher D. Nye, and Fritz Drasgow Online

More information

A 37-item shoulder functional status item pool had negligible differential item functioning

A 37-item shoulder functional status item pool had negligible differential item functioning Journal of Clinical Epidemiology 59 (2006) 478 484 A 37-item shoulder functional status item pool had negligible differential item functioning Paul K. Crane a, *, Dennis L. Hart b, Laura E. Gibbons a,

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

A Broad-Range Tailored Test of Verbal Ability

A Broad-Range Tailored Test of Verbal Ability A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from

More information

An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts. in Mixed-Format Tests. Xuan Tan. Sooyeon Kim. Insu Paek.

An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts. in Mixed-Format Tests. Xuan Tan. Sooyeon Kim. Insu Paek. An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts in Mixed-Format Tests Xuan Tan Sooyeon Kim Insu Paek Bihua Xiang ETS, Princeton, NJ Paper presented at the annual meeting of the

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons

Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons Measures of children s subjective well-being: Analysis of the potential for cross-cultural comparisons Ferran Casas & Gwyther Rees Children s subjective well-being A substantial amount of international

More information

Methodological Issues in Measuring the Development of Character

Methodological Issues in Measuring the Development of Character Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

A Comparison of Four Test Equating Methods

A Comparison of Four Test Equating Methods A Comparison of Four Test Equating Methods Report Prepared for the Education Quality and Accountability Office (EQAO) by Xiao Pang, Ph.D. Psychometrician, EQAO Ebby Madera, Ph.D. Psychometrician, EQAO

More information

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis Structural Equation Modeling, 15:676 704, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print/1532-8007 online DOI: 10.1080/10705510802339080 TEACHER S CORNER Incorporating Measurement Nonequivalence

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Adaptive EAP Estimation of Ability

Adaptive EAP Estimation of Ability Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,

More information

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Marshall University Marshall Digital Scholar Management Faculty Research Management, Marketing and MIS Fall 11-14-2009 Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Wai Kwan

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

External Variables and the Technology Acceptance Model

External Variables and the Technology Acceptance Model Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1995 Proceedings Americas Conference on Information Systems (AMCIS) 8-25-1995 External Variables and the Technology Acceptance Model

More information

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1.

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1. Running Head: A MODIFIED CATSIB PROCEDURE FOR DETECTING DIF ITEMS 1 A Modified CATSIB Procedure for Detecting Differential Item Function on Computer-Based Tests Johnson Ching-hong Li 1 Mark J. Gierl 1

More information

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 1-2016 The Matching Criterion Purification

More information

Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital

Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital Paper I Olsen, E. (2008). Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital. In J. Øvretveit and P. J. Sousa (Eds.), Quality and Safety Improvement Research:

More information

Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms

Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms Item-Rest Regressions, Item Response Functions, and the Relation Between Test Forms Dato N. M. de Gruijter University of Leiden John H. A. L. de Jong Dutch Institute for Educational Measurement (CITO)

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information