A Multilevel Testlet Model for Dual Local Dependence

Size: px
Start display at page:

Download "A Multilevel Testlet Model for Dual Local Dependence"

Transcription

1 Journal of Educational Measurement Spring 2012, Vol. 49, No. 1, pp A Multilevel Testlet Model for Dual Local Dependence Hong Jiao University of Maryland Akihito Kamata University of Oregon Shudong Wang Northwest Evaluation Association Ying Jin United BioSource Corporation The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four-level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three-level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study. Local independence is one of the central assumptions that underlie item response theory (IRT) models (e.g., Hambleton & Swaminathan, 1985). As Embretson and Reise (2000, p. 48) have stated, Essentially, local independence is obtained when the relationship among items (or persons) is fully characterized by the IRT model. According to Reckase (2009, p. 13), the local independence assumption has two parts: local item independence and local person independence. Local item independence implies that a person s response to an item will not affect the probability of the person s response to another item and can be represented mathematically as p(u = u θ) = I p(u i θ) = p(u 1 θ)p(u 2 θ)...p(u I θ). (1) i=1 This equation indicates that the probability of a response pattern, u, to multiple items for a person with latent trait level θ, p(u = u θ.), equals the product of the probabilities of the individual s responses, u i,totheith item on a test, p(u i θ). Local person independence implies that a person s response to a specific item is 82 Copyright c 2012 by the National Council on Measurement in Education

2 Multilevel IRT Model for Local Dependence independent of any other person s response to that item and can be mathematically represented as p(u i = u i θ) = n p(u ij θ j ) = p(u i1 θ 1 )p(u i2 θ 2 )...p(u in θ n ). (2) j=1 This equation indicates that the probability of a set of responses to any item i by n persons with abilities θ j in the vector θ is the product of the probabilities of each individual person s response to that item. When the local independence assumption is evaluated, both of the facets of independence should be considered since when local item dependence or local person dependence is present, the assumption of local independence is violated. Further, when local independence is not present, the IRT model parameter estimation, which is based on the likelihood function, will be affected (Baker & Kim, 2004). However, the research that has investigated the effects of these violations typically has addressed the effects of local item dependence and local person dependence on IRT model parameter estimation separately (e.g., Bradlow, Wainer, & Wang, 1999; Fox & Glas, 2001; Jiao, Wang, & Kamata, 2005; Kamata, 1998, 2001; Wang & Wilson, 2005; Yen, 1984). Local Item Dependence IRT models are not robust to the violation of the local item independence assumption. It has been demonstrated that local item dependence affects model parameter estimation, equating, and estimation of test reliability (e.g., Ackerman, 1987; Bradlow et al., 1999; Chen & Thissen, 1997; Jiao et al., 2005; Wainer, Bradlow, & Wang, 2007; Wang & Wilson, 2005; Yen, 1984). Possible causes for local item dependence include passage dependence; item chaining; explanation of previous answers such as clueing, item, or response format (multiple-choice vs. constructedresponse items); scoring rubrics; fatigue; speededness; and practice effects (e.g., Yen, 1984). Passage dependence is a commonly studied cause of local item dependence in educational assessments. A passage does not necessarily mean a reading passage. It could alternatively refer to such contextual effects as a scenario in science assessment or a graph or table in a mathematics test. When items are constructed around such a common stimulus, items associated with the same stimulus are connected by the common context. Thus, item connection or clustering may affect an examinee s performance on those items due to the common contextual effects. As a result, local item dependence or testlet effects may be induced (Wainer & Kiely, 1987). Recent studies have proposed models to deal with local item dependence due to testlets. For example, Bradlow et al. (1999) proposed a two-parameter Bayesian random-effects testlet model indicating the interaction between person and item cluster by incorporating a random-effect parameter into unidimensional two-parameter item response models. Extensions of this model have been made to a three-parameter IRT model as well as to the graded-response model (Du, 1998; Wang, Bradlow, & Wainer, 2002). In another attempt to model testlet effects, Wang and Wilson (2005) proposed the Rasch testlet model as a special case of the multidimensional random 83

3 Jiao et al. coefficients multinomial logit model. Similarly, Jiao et al. (2005) developed a three-level one-parameter testlet model from the hierarchical generalized linear modeling framework for item analysis (Kamata, 2001). These testlet models account for local item dependence due to item clustering; however, they only deal with one facet of the local independence assumption and none of these papers takes into account local person dependence due to person clustering. Local Person Dependence As pointed out earlier, local person independence is an important facet of the local independence assumption for IRT models. Violations of this assumption are likely to occur as a result of person clustering due to factors such as cluster sampling, external assistance or interference, differential opportunity to learn, or different problemsolving strategies. A failure to incorporate person clustering effects in model parameter estimation may jeopardize measurement precision and lead to biased parameter estimates as the dependence among individuals within clusters reduces the effective sample size (Cochrane, 1977; Cyr & Davies, 2005; Kish, 1965). In the educational assessment setting, it is typical that students are nested within classes and classes are nested within schools. Students in a given classroom or school may respond to items in a more consistent manner than students from different classrooms or schools as a result of having received more similar instruction than what would be expected across classrooms or schools. Therefore, when the sampling unit is classrooms or schools, a person clustering effect may have an increased likelihood of occurrence. For standardized achievement tests, such as the Stanford Achievement Tests, psychometric analyses often are based on a representative sample that is selected using a cluster sampling method. Another example of this kind of clustering is the K-12 state tests required by the No Child Left Behind Act of States with a population of more than 100,000 students per grade may select a representative sample for analyses using a cluster sampling method. A third example is the Race to the Top assessment. It is likely that the common core assessments will use a cluster sampling method to select a representative sample per grade from all participating states. A multilevel modeling framework can be used to account for dependence among the clustered individuals in psychometric analysis (e.g., Bryk & Raudenbush, 1992; de Leeuw & Kreft, 1986; Goldstein, 1995). Some researchers have reformulated IRT models into multilevel models in which items are nested within persons, and extended these IRT models to reflect the fact that persons are nested within higher contextual levels (e.g., Adams, Wilson, & Wu, 1997; Fox & Glas, 2001; Kamata, 1998, 2001). Other researchers have included person-level covariates to improve the estimation of item and person parameters (e.g., Mislevy, 1987; Mislevy & Bock, 1989). Another attempt was made by Cohen, Chan, Jiang, and Seburn (2008) to use a nonparametric method to study Rasch modeling under the complex sample designs typically found in state testing programs. In addition, multilevel IRT models (Johnson & Jenkins, 2005; Li, Oranje, & Jiang, 2009) have been applied in largescale survey assessments such as the National Assessment of Educational Progress. However, these multilevel IRT models do not explain the effects due to item clustering. 84

4 Person 1 Item 1 Group 1 Testlet 1 Person 2 Item 2 Item 3 Testlet 2 Person 3 Group 2 Item 4 Person 4 Item 5 Item 6 Testlet 3 Person 5 Group 3 Person 6 Figure 1. The hierarchy of the four-level model for item clustering and person clustering. A Four-Level IRT Model for Dual Local Dependence Currently available IRT models do not simultaneously account for both local item dependence due to item clustering and local person dependence due to person clustering. Testlet models, including random-effects testlet models, multidimensional testlet models, and multilevel testlet models, ignore the effects of person clustering; multilevel IRT models, which account for person clustering, ignore item clustering effects. Thus, the local independence assumption is not fully addressed under any of the above-mentioned modeling approaches. To simultaneously model both item and person dependence, this study proposes a four-level IRT model from the multilevel measurement modeling framework (Jiao, Kamata, Wang, & Jin, 2010b). By extending the hierarchy of the three-level one-parameter testlet model by Jiao et al. (2005), the four-level model can be schematically represented as depicted in Figure 1. The first level models item effects and the second level models testlet effects. The items are nested within the testlets. Level three models the effects of persons who are fully crossed with testlets and ultimately with items. This indicates that every person answers every item but items are associated with only one testlet. The fourth level models the examinee group effects. Persons are nested within groups such as classes, schools, or school districts. Each person is associated with only one level-four unit. This hierarchy represents testing situations where testlets are used for assessment purposes and a cluster sampling method is used for selecting a representative sample for psychometric analysis. The four-level IRT model originally was constructed within a hierarchical generalized linear modeling framework (Jiao, Kamata, Wang, & Jin, 2010b). (Information about the model setup is available to the reader upon request.) By following the convention used in the IRT framework, the combined four-level IRT model for dual dependence can be expressed as p jdig = exp[ (θ j + θ g b i + γ jd(i) )], (3) 85

5 Jiao et al. where θ j is the person-specific ability for person j, θ g is the group-specific ability for group g, b i is the item difficulty for item i, and γ jd(i) is the testlet effect for person j on testlet d. The interpretations of θ j and b i follow those in the conventional unidimensional IRT models. The group ability represented by θ g is the same for all individuals in the group but different for individuals from different groups. The variability of θ g indicates the extent of group effects. The item-clustering effects, the person-clustering effects, and the person ability are assumed to be mutually exclusive and independent. It also is assumed that the residuals are uncorrelated after controlling the variances from the above-mentioned three sources. Model Identification Like the conventional Rasch model, the proposed four-level model for dual dependence cannot be identified without imposing some constraints. The common practice used to identify the Rasch model is to constrain either the mean ability or the mean of the item difficulty parameters to be 0. When considering the Rasch model as a multilevel model assuming that the ability and item difficulty parameters follow a normal distribution, Gelman and Hill (2007) suggested fixing the mean of θ or fixing the mean of item difficulty parameters to be 0 but not doing both at the same time. This restriction approach for producing model identification is essentially the same as the current practice used in Rasch model applications. Another method to identify the Rasch model scale is to allow the item and person ability parameters to float but to adjust the floating parameters to a new defined value (Bafumi, Gelman, Park, & Kaplan, 2005; Gelman & Hill, 2007). The new adjusted quantities replace the model parameters to make the model well identified but preserve the logit scale of the model. The adjusted quantities can be defined as θ adj j = θ j θ, for j = 1, 2,...,J and b adj j = b j θ, for i = 1, 2,...,I. (4) As discussed in Fox and Glas (2001), in multilevel IRT models with the scale of the latent dimension made up of several variance components, one often fits a hierarchical set of models with various decompositions of the ability variance. Fixing one of these variance components is not practical. An alternative is to impose the identifying restrictions on the item parameters. For the two-parameter normal ogive multilevel model, they fixed one discrimination parameter to 1 and one difficulty to 0. In this study, the mean item difficulty is constrained to be 0 to produce identifiability for the proposed model with no further adjustment needed. This approach for scale identification is in fact consistent with one of the approaches provided in popularly utilized software for identifying the Rasch model (such as WINSTEPS). Model Estimation To our knowledge, none of the currently available IRT or multilevel software packages can analyze data using the proposed four-level IRT model for dual local dependence. (Note that it is possible to take a two-level bi-factor modeling approach with a complex sampling design in software like Mplus.) Therefore, this study explored model parameter estimation using the Markov Chain Monte Carlo (MCMC) method 86

6 Multilevel IRT Model for Local Dependence implemented in WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000). The advantages of using the MCMC method for multilevel modeling have been clearly illustrated in Draper (2008, p. 91). In general, the MCMC procedure makes use of the prior information of model parameters and the data to obtain the posterior estimates of the parameters. The proposed four-level IRT model as presented in (3) can be expressed as follows: y jdig Bernoulli (p jdig ), logit(p jdig ) = θ j + θ g b i + γ jd(i), i = 1, 2,..., I, d = 1, 2,...,T, j = 1, 2,...,J, g = 1, 2,...,G b i iid N(0, 1), θ j iid N ( 0, σ 2 θ j ), θg iid N ( 0, σ 2 θ g ), and γj iid N ( 0, σ 2 γ j ). (5) These specify the distributions for item difficulty parameters, person ability, group ability, and testlet effect parameters. The item difficulty parameters are assumed to have a standard normal distribution. Those random effects are assumed to be normally distributed with means of 0 and respective variances to be estimated. Gibbs sampling generates random variables from a joint distribution indirectly without calculating the density (Casella & George, 1992, p. 167). The MCMC with Gibbs sampling starts with the initialization of the model parameters (Draper, 2008). In this study, the model parameters to be estimated are item difficulty parameters, person ability parameters, ability variance, group ability variance, and testlet variance for each testlet. The initial values for these parameters are generated from the prespecified prior distributions. After initialization for the model parameters, given the observed item response data, testlet indicators, and group indicators, the MCMC data set is filled by sampling each parameter to be estimated from a joint distribution of the latest estimates of other parameters. When the Markov chains reach equilibrium, the burn-in process ends and the monitoring process starts. The results from each iteration after the burn-in iteration are used for model inferences and summarized to obtain the estimates of the model parameters. Gibbs sampling requires that samples be drawn from the full conditional distributions derived from the target distribution. However, the full conditional distributions for the multilevel logistic regression models do not have an analytic closed form (Chaimongkol, Huffer, & Kamata, 2006). Gilks and Wild (1992) suggested using adaptive rejection sampling for log-concave full conditional distributions. This approach is implemented in the WinBUGS 1.4 software and is used to fit the models considered in this study. Method Simulation Study A simulation study was carried out to examine the performance of the proposed four-level IRT model for dual dependence. Item response data that mimicked a largescale passage-based reading comprehension test were simulated by assuming four 87

7 Jiao et al. passages with nine items associated with each passage. A cluster sampling method using class as the sampling unit was used to select simulated examinees. The true values of the person ability and the item difficulty parameters were both randomly generated from a standard normal distribution. The cluster size was assumed to be an average class size of 20, which reflected a typical class sample size in the National Assessment of Educational Progress state assessments (Li et al., 2009). The number of clusters was set to be 50. According to Binici (2007), a cluster number of 50 with a cluster size of 20 produced a reasonable and acceptable level of estimation error for a multilevel IRT model. This combination of the cluster number and the cluster size resulted in a total of 1,000 students. For the simulated study conditions, the true ability and item difficulty parameters were kept the same across simulation conditions, while the magnitudes of local item dependence and local person dependence were manipulated. Local item dependence parameter values were simulated from a normal distribution N(0,σγ 2) by specifying different magnitudes of variance, σ2 γ. The magnitude of local item dependence was set at two levels by specifying σγ 2 as.25 or Two levels of person clustering variance, σθ 2 g, were set at.25 or 1.00 to represent low or moderate person clustering effects, respectively. The group-specific effect parameters (θ g ) then were simulated from a normal distribution N(0, σθ 2 g ). For each joint level of local item dependence and local person dependence, item responses were generated by incorporating the true ability, item difficulty, group-specific ability, and local item dependence parameters in (3). Once item responses were generated, the MCMC method in WinBUGS 1.4 was used to estimate parameters for the proposed four-level IRT model for dual dependence as well as for the Rasch model, the Rasch testlet model, and the three-level Rasch model (modeling person clustering effects) for comparison purposes. The combination of different levels of item dependence (testlet variance), person dependence (group variance), and the estimation model resulted in 16 study conditions. Twenty-five replications were run and parameter recovery was evaluated in terms of bias, standard error (SE), and root mean squared error (RMSE) for the item and person parameters and the variances for the three random effects. The bias, SE, and RMSE were computed by Bias(ˆβ) = SE(ˆβ) = 1 N N (ˆβ r β) N, (6) r=1 N r ˆβ r=1 2 N ˆβ t, (7) N t=1 RMSE(ˆβ) = 1 N N (ˆβ r β) 2, (8) r=1 88

8 Multilevel IRT Model for Local Dependence where β is the true model parameter, ˆβ r is the estimated model parameter for the rth replication, and N is the number of replications. Classification accuracy, computed as the percentage of correct classifications of examinees into pass and fail groups, was compared across the four calibration models under each simulation condition. The binary classification decisions were made by comparing the relative position of a person s θ estimate and θ cut score obtained from standard setting conducted for a state reading comprehension test for high school graduation. Prior Setting Within the Bayesian framework, a prior distribution needs to be specified for each of the model parameters. When the prior distribution is noninformative, the range of the uncertainty should be wider than the range of reasonable values of the parameters (Gelman & Hill, 2007). When the prior knowledge is reliable, the use of an informative prior with smaller uncertainty may facilitate the estimation of the posterior. When a noninformative proper prior is supplied, the posterior is estimated by relying heavily on the data. Various noninformative prior distributions have been suggested for the variance parameters in the literature. For example, Gelman and Hill (2007) found that some noninformative prior distributions may unduly affect inferences when the number of groups is small and the group-level variability is close to zero in multilevel modeling. They demonstrated three different types of priors for variance estimation. Their results showed that the inverse-gamma (α =.001, β =.001) prior distribution distorted the posterior distribution. On the other hand, the inverse-gamma (α = 1, β = 1) prior distribution generally made the priors concentrate in the range (.5, 5) and the posterior closely matched the prior distribution. In addition, they found that the uniform prior distribution with a wide range functioned close to a noninformative prior distribution and did not appear to constrain the posterior inferences. In general, Gelman and Hill (2007) recommend the uniform prior for the standard deviation with a wide range such as (0, 100). They do not recommend using the inverse-gamma distribution as a noninformative prior but recommend the use of the inverse-gamma as a proper prior distribution when the group sample size is small and the group variance is near zero. It is noted, however, that the inverse-gamma prior family may be preferred by some other researchers due to its conditional conjugacy, which provides cleaner mathematical properties. In fact, we experimented with all three options suggested by Gelman and Hill (2007) as the prior distributions for the variance parameters, including the inverse-gamma with α =.001 and β =.001, the inversegamma with α = 1 and β = 1, and the uniform prior with a range (0, 100). However, only the inverse-gamma (α = 1, β = 1) prior distribution worked for our model. In addition, for our simulation data, the group variance was substantially different from zero and our group sample size was not small. Therefore, the prior distributions for the ability variance, group variance, and testlet variances all were set to an inverse-gamma (α = 1, β = 1) distribution. In other applications, different priors may function differently, and other choices should be explored and inferences should be compared. 89

9 Jiao et al. Convergence Check The MCMC procedure is an iterative algorithm. Several Markov chains run in parallel, each starting with different initial values. The algorithm is often run until the simulation from different initial values converges to a common distribution. Usually multiple chains are recommended for checking the proper mixing of the chains (Gelman & Hill, 2007). Thus in this study, four chains were used for the MCMC runs. The convergence was checked based on multiple criteria to make sure that convergence was achieved before the model parameter estimates were monitored. First, the Gelman-Rubin statistic (R) as modified by Brooks and Gelman (1998) was used. Convergence was assessed by comparing within-chain (W) and between-chain (B) variability over the second half of those chains. The ratio R = B/W was expected to be greater than 1 if the starting values were sufficiently different and it would get close to 1 as convergence was approached. For practical purposes, convergence can be assumed if R < 1.05 (Lunn et al., 2000). A sample check over replications under simulation conditions indicated that R generally was close to 1 and smaller than Brooks and Gelman (1998) emphasized the importance of ensuring not only that R has converged to 1 but also that B and W have converged to stability. The Brooks-Gelman Ratio (BGR) diagnostic plots indicated that stability and convergence usually occurred between iteration 1,000 and 2,000. The trace plots also indicated that each parameter of interest became stationary after about 700 iterations, which further supported that convergence was reached before 2,000 iterations. The quantile plots showed the running mean with 95% confidence intervals against iteration numbers. The running mean and the 95% confidence intervals from the four chains mixed very well and reached equilibrium before 2,000 iterations. Other plots including history and density plots all indicated that the four chains mixed well before 2,000 iterations and reached equilibrium by then. Based on these observations, we decided to discard the first 2,000 iterations as burn-in iterations. After the burn-in iteration phase, an additional 3,000 iterations were monitored for each chain. The model parameter inferences were made based on these 3,000 monitoring iterations from each of the four chains, which yielded a total of 12,000 samples. Real Data Analysis The proposed model was further fit to data from a reading comprehension test for high school graduation in a mid-south state in the United States. The test contained four passages with 8, 8, 9, and 7 items, respectively. School districts were used as the group-level units. The original data set contained a total of 1,644 students nested within 424 school districts, with the number of students in each school district ranging from 1 to 397. Those school districts with sample sizes smaller than 15 students were excluded, resulting in a sample size of 1,534 students in 20 school districts with district sizes ranging from 16 to 397. The same prior distributions as those in the simulation study were used for the four models, and the same procedure was followed to check model parameter convergence in WinBUGS runs. It was concluded that we would use 2,000 burn-in iterations and 3,000 monitoring iterations for four chains to evaluate the models. 90

10 Table 1 Standard Error in Item Difficulty Parameter Estimation Simulation Conditions Model N Minimum Maximum Mean SD S1 Dual Multilevel Rasch Testlet S2 Dual Multilevel Rasch Testlet S3 Dual Multilevel Rasch Testlet S4 Dual Multilevel Rasch Testlet Note. S1 = small local item dependence and small local person dependence, S2 = moderate local item dependence and small local person dependence, S3 = small local item dependence and moderate local person dependence, S4 = moderate local item dependence and moderate local person dependence. Results Simulation Study The Deviance Information Criterion (DIC) produced by WinBUGS was used to assess model fit. The model with the smallest DIC was the best fitting model. The proposed model accounting for dual dependence and the testlet model performed similarly with better fit, while the multilevel model and the Rasch model performed similarly with worse fit. Based on the DIC, the four models were ranked consistently across simulation conditions and replications from the best to the worst fitting models as follows: the proposed model, the testlet model, the multilevel model, and the Rasch model. Item difficulty recovery was evaluated and compared in terms of bias, SE, and RMSE. A univariate three-way analysis of variance was conducted by specifying each of the error indexes as the dependent variable and local item dependence (2 levels), local person dependence (2 levels), and Model (4 levels) as three factors. Based on the analysis of variance results, none of the factors significantly impacted bias. This is due to the fact that all models were identified by constraining the mean item difficulty to zero. Further, no interaction effect had a statistically significant impact on bias. However, the SE (see Table 1) was significantly affected by the local person dependence and the model factors, each with a small (f =.16) and a moderate (f =.26) effect size, respectively. The interaction between local item dependence and model was significant with a small effect size (f =.13). The average SEs were about the same for the multilevel model and the Rasch model, while those for the 91

11 Table 2 Root Mean Squared Error in Item Difficulty Parameter Estimation Simulation Conditions Model N Minimum Maximum Mean SD S1 Dual Multilevel Rasch Testlet S2 Dual Multilevel Rasch Testlet S3 Dual Multilevel Rasch Testlet S4 Dual Multilevel Rasch Testlet proposed model and the testlet model were close with the latter two having higher average SEs. The higher average SE in the proposed model and the testlet model might have been due to the increased number of parameters estimated for these models. Similarly, the local item dependence, the estimation model, and their interaction all had significant effects on the RMSE. All three significant effects occurred with moderate effect sizes (f =.32, f =.33, f =.27, respectively). The average RMSE were smaller for the proposed dual model and the testlet model than those for the multilevel model and the Rasch model (See Table 2). The differences in the RMSE between the two pairs of models were smaller for small local item dependence than for moderate local item dependence. The lower RMSE in the better fitting models was consistent with the expectations that a better fitting model usually has less total estimation error. The impact of the calibration model on bias in the ability parameter estimation was statistically significant but with a negligible effect size (f =.08). No other factors had a significant impact on bias in the ability parameter estimation. All factors significantly influenced the SE in the ability parameter estimation. However, only the model factor was of moderate effect size (f =.35). The local person dependence factor, the local item dependence factor, and the interaction between local person dependence and model factors were of small effect sizes (f =.12, f =.19, and f =.20, respectively). All other effects were negligible. No consistent patterns in the SE were observed for the four models when the magnitudes of local item and person dependence changed (see Table 3). In general, the mean SEs in the ability parameter estimates for the proposed model and the multilevel model were slightly higher than those for the Rasch and the testlet model. A possible explanation is that the proposed model and the multilevel model estimated both individual ability and group ability parameters simultaneously, which increased the difficulty in separating the effect 92

12 Table 3 Standard Error in Ability Parameter Estimation Simulation Conditions Model N Minimum Maximum Mean SD S1 Dual Multilevel Rasch Testlet S2 Dual Multilevel Rasch Testlet S3 Dual Multilevel Rasch Testlet S4 Dual Multilevel Rasch Testlet from individual persons and the group. In addition, the number of clusters (50) and the cluster size (20), which are not very large, may contribute to the difficulty in separating the individual ability and the group ability effects, thus increasing the random error in the ability parameter estimation. Regarding the RMSE, all the effects except the three-way interaction were statistically significant at the α-level of.05. The three studied factors, model, local item dependence, and local person dependence, and the interaction effects between the model and local person dependence factors had small effect sizes (f =.20, f =.10, f =.13, and f =.10, respectively), while other effects were negligible. The average RMSE were smaller for the proposed dual model and the multilevel model than those for the testlet model and the Rasch model, regardless of the local item dependence and local person dependence magnitudes (see Table 4). This can be explained by the proper modeling of the person and group effect separately in the two models. The local item and person dependence factors had no significant impact on the ability variance recovery, while the effects of the model factor on the three types of error were significant and had large effect sizes. The proposed model and the multilevel model were effective in recovering the true ability variance, while the testlet model and the Rasch model overestimated the true values. No factors other than the local person dependence factor had an impact on the group variance recovery: As the magnitude of local person dependence increased, the SE increased. On the other hand, the local item dependence factor significantly affected the SE and the RMSE in the testlet variance estimation with large effect sizes. The increase in the magnitude of local item dependence also increased the SE and the RMSE in testlet variance estimation. The local person dependence factor also impacted the SE in testlet variance recovery with a large effect size. The increase in the magnitude of local person dependence increased the SE in testlet variance estimation. 93

13 Table 4 Root Mean Squared Error in Ability Parameter Estimation Simulation Conditions Model N Minimum Maximum Mean SD S1 Dual Multilevel Rasch Testlet S2 Dual Multilevel Rasch Testlet S3 Dual Multilevel Rasch Testlet S4 Dual Multilevel Rasch Testlet In general, the simulation results demonstrated that for both item and person parameter estimation, the bias was not affected but the SE was affected. These results were consistent with the general theory of the mixed effect model where the fixed-effect model parameter estimates are the best linear unbiased estimators and the random-effect model parameter are the best linear unbiased predictor. Therefore, ignoring the dependence will not affect bias but will only affect the efficiency. The effects of the three studied factors on person parameter estimation were further investigated by examining the accuracy in classifying students into passing or failing categories based on the comparison of their estimated ability relative to θ cut score obtained from a state reading comprehension test under simulation conditions. All three factors had significant effects on the classification accuracy with large effect sizes. No matter what magnitude the local item dependence or the local person dependence was, the proposed dual model always led to the most accurate classification of examinees while the Rasch model showed the worst performance. The mean classification accuracy difference between the proposed model and the Rasch model was about 5.5%, ranging from 1.8% to 11%. In sum, this simulation study demonstrated that the proposed four-level IRT model accounting for dual local dependence performed better in terms of model parameter recovery and classification accuracy compared to the three alternative models that were considered. Real Data Analysis The proposed model and the three other comparison models were fit to a real data set. Model fit assessed by DIC indicated that the proposed model was the best fitting model (DIC = 48,374.6) followed by the multilevel model (DIC = 48,512.0), the testlet model (DIC = 48,636.9), and the Rasch model (DIC = 48,780.6). 94

14 Multilevel IRT Model for Local Dependence The estimated testlet effects were relatively small, but the person clustering effects were relatively large. Therefore, it made sense that the multilevel model provided better fit than the testlet model. The estimated testlet variances for the four testlets were.1557,.1836,.2656, and.1884 using the proposed model, and.2125,.176,.1894, and.1831 using the testlet model. In general, the testlet variances were small (even smaller than the small values in the simulation study). The estimated group variances for the proposed model and the multilevel (three-level) model were close with values of 1.41 and 1.332, respectively, and the ability variance estimates based on these models were.545 and.567. The estimated ability variance for the testlet model and the Rasch model were and A possible explanation for the much larger ability variance estimates for the Rasch model and the testlet model is that ignoring a large person clustering effect may have increased the errors in ability parameter estimation. All four models were identified by constraining the mean item difficulty to be zero. The mean item parameter estimates were not significantly different across models. The estimated person parameters were highly correlated between the proposed model and the multilevel model with a Pearson product moment correlation coefficient near 1.0. The same was true for correlations between the Rasch model and the testlet model estimates. The correlations between the models across these two pairs were about.9. The means for the ability estimates based on the two models that incorporated the local person dependence effect were close to zero, while the two models that ignored the local person dependence effect had a mean of around.7. The classification consistency between the two models that accounted for the local person dependence was 99.3%, while that between the two models that ignored the local person dependence effects was 99.7%. The classification consistency between the proposed model and the other two models that ignored local person dependence were clearly lower at around 79%. To better understand the real data analysis results, one replication from the simulation condition with small local item dependence and moderate local person dependence effects, which was close to the characteristics of the real data, was examined in detail. The findings related to item parameter estimates and the correlations between the estimated person parameters among models were nearly identical to those in the real data analyses. The means for the ability estimates based on the two models that incorporated the local person dependence effect were close to zero, while the means for the ability parameter estimates based on the two models that ignored the local person dependence effect were around.13. The classification accuracy was higher for the two models that incorporated local person dependence (87% correct classification) than that for the other two models that ignored the local person dependence (78% correct classification). The classification consistency between the two models that incorporated the local person dependence was 98%, while the one between the two models that ignored the local person dependence effects was 99.5%. The classification consistency between the proposed model and each of the other two models that ignored the local person dependence was around 82%. In general, the real data analysis revealed similar results to those found in the simulation study. The item difficulty parameter estimates were not greatly affected by estimation models. Classification consistency between the proposed model and the 95

15 Jiao et al. multilevel model was very high, while between the proposed model and the other two models that ignored the local person dependence classification consistency was lower. The differences in correlations between ability estimates for different models and the reported classification consistency based on ability estimates for different models provide evidence that the differences in the ability parameter estimates among the models were practically important. In the real data set, the local item dependence effects were relatively small but the local person dependence effects were large. Discussion Local independence is one of the important assumptions for traditional IRT models. This assumption implies both local item independence and local person independence. Different indexes and models have been proposed for detecting and modeling local dependence due to item and person clustering. However, little research has addressed both local item dependence and local person dependence concurrently. This study proposed a four-level IRT model for dual local dependency that incorporates both local item dependence and local person dependence simultaneously. Model parameter estimation was explored using the MCMC algorithm in WinBUGS. The proposed four-level IRT model can be extended to two- and three-parameter RT models (Jiao, Kamata, & Binici, 2010a), as well as polytomous item response data (Jiao, Mislevy, & Zhang, 2011), when the test form is built upon testlets and a representative sample is selected from a large student population based on a cluster sampling method. It is relevant to K-12 state assessment programs, such as the common core assessments, and large-scale national and international assessment programs. When matrix sampling is used in large-scale assessments, it is not clear whether the findings in this study can be extended to missing data cases. Further exploration can be extended to this scenario. A possible criticism of the four-level IRT model is that because it has more parameters than the other models it provides better fit; this issue has been accounted for in three ways. First, the proposed model is for more complicated testing situations when both item clustering due to testlets and person clustering due to cluster sampling are present. If there is no such structure in the test, the proposed model does not need to be considered. Second, DIC as a model fit index does not always favor the model with more parameters. In general, an information-criterion-index-based model selection strategy does not always select a model with a larger number of parameters since the model complexity is compensated for in the index. DIC contains two terms: one for the expectation of deviance and a second for the effective number of parameters in the model. As the number of model parameters increases, the expectation of deviance decreases and the model fit is better. However, the term for the effective number of model parameter compensates for this effect by favoring models with a smaller number of parameters; this implies that DIC does not always favor a model with a larger number of parameters. This point was in fact supported in the real data analysis where the multilevel IRT model was favored over the testlet model, which had a larger number of model parameters. Third, our simulation study results indicated that model choice had a moderate effect on ability parameter estimation error and a large effect on classification decisions. This implies that the inferences 96

16 Multilevel IRT Model for Local Dependence from different models bear practical significance. If the clustering effects, especially the person clustering effect, are not properly modeled, the model parameters and classification both will be affected significantly. Although they were not reported in this article, two other information criterion indexes, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), were explored in our simulation study. However, neither index could properly identify the simulated true model; they could not effectively distinguish between the proposed model and the testlet model. Only DIC was able to pick the true model for all simulation conditions. We determined that DIC, an information criterion index directly extracted from WINBUGS output, was sufficient to choose the best fit model under various simulation conditions. Thus, we did not report the model fit results based on AIC and BIC. Nevertheless, a comparison of various information indexes in choosing the best fitting model is an important issue and worthy of future exploration. For any given issue in psychometrics, multiple perspectives or approaches are always possible. For example, as one reviewer pointed out, one alternative way to deal with clustering effects would be the correction of SEs using replication methods like jackknife repeated replication or balanced repeated replication. Alternatively the quasi-pseudo maximum likelihood approach (Asparouhov, 2004) could be considered; this approach can be implemented in some off-the-shelf software packages such as Mplus. All of these options are possible solutions to the clustering effects in the IRT applications. However, since no previous study has explicitly modeled dual clustering effects, this study focused on describing one possible solution to this problem rather than comparing multiple approaches. This study explored the MCMC algorithm for model parameter estimation. This algorithm is simple to implement, but it is computationally intensive and the computer running time can be very long. (As computational power of personal computers is increasing dramatically, this may not be a concern in the near future.) Other model estimation methods could be further explored and compared to the results obtained with the MCMC algorithm. For example, Mplus would allow for specifying a testlet model and simultaneously correcting for cluster sampling using a quasi-pseudo maximum likelihood method. This combination also would account for dual dependence (as one reviewer pointed out). The MCMC estimation presented in this article could be compared with this specific maximum likelihood approach. In addition, comparison should be conducted with other estimation methods including the marginal maximum likelihood estimation method with an expectation and maximization algorithm, the sixth-order Laplace method, and the Gauss-Hermite quadrature method (Rabe- Hesketh, Skrondal, & Pickles, 2005). Gelman & Hill (2007) stated that inferences from any statistical theory should include the factors used in the design of data collection; multilevel modeling is a direct way to include indicators of clusters at all levels of a design. This study presented only the simplest model for studying the effects of item and person clustering. Future work could expand the model to include covariates at all four levels to better explain variance in the observed data set. Such covariates include item characteristics, testlet characteristics, person characteristics, and group characteristics to improve model parameter estimation (Mislevy, 1987). 97

17 Jiao et al. Based on the findings in this study, it is recommended that the magnitude of person clustering effects should be evaluated in addition to evaluating item clustering effects when a cluster sampling method is used for psychometric analyses in testlet-based assessments. The analysis of real data clearly revealed that the mean ability could differ by as much as.7 logit units when ignoring both item and person clustering effects. Ignoring these effects could lead to misleading inferences related to grouplevel change from year to year in addition to the inflated error in model parameter estimation and a decrease in the level of classification accuracy. It is further suggested that effects of item clustering and person clustering on equating, vertical scaling, and standard setting be investigated in future studies. Acknowledgments The authors would like to thank the editor, Dr. Brian Clauser, and the reviewers for their valuable advice and suggestions, which greatly improved the manuscript. The authors are indebted to Dr. Robert Mislevy and Dr. Robert Lissitz for insightful discussions. Thanks are also due to Dr. George Macready for editing suggestions. All errors remain the responsibility of the authors. References Ackerman, T. (1987). The robustness of LOGIST and BILOG IRT estimation programs to violations of local independence. ACT Research Report Series, Iowa City, IA: American College Testing. Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, Asparouhov, T. (2004). Stratification in multivariate modeling. Web Notes: No. 9. ( (Accessed April 2010). Bafumi, J., Gelman, A., Park, D. K., & Kaplan, N. (2005). Practical issues in implementing and understanding Bayesian ideal point estimation. Political Analysis Advance Access, 13, Baker, F. B., & Kim, S. (2004). Item response theory: Parameter estimation techniques. New York, NY: Marcel Dekker. Binici, S. (2007). Random-effects differential item functioning via hierarchical generalized linear model and generalized linear latent mixed model: A comparison of estimation methods. Unpublished doctoral dissertation, Florida State University. Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, Brooks, S. P., & Gelman, A. (1998). Alternative methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Casella, G., & George, E. (1992). Explaining the Gibbs Sampler. American Statistician, 46, Chaimongkol, S., Huffer, F., & Kamata, A. (2006). A Bayesian approach for fitting a random effect differential item functioning across group units. Thailand Statistician, 4, Chen, W., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22,

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Item Parameter Recovery for the Two-Parameter Testlet Model with Different. Estimation Methods. Abstract

Item Parameter Recovery for the Two-Parameter Testlet Model with Different. Estimation Methods. Abstract Item Parameter Recovery for the Two-Parameter Testlet Model with Different Estimation Methods Yong Luo National Center for Assessment in Saudi Arabia Abstract The testlet model is a popular statistical

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill) Advanced Bayesian Models for the Social Sciences Instructors: Week 1&2: Skyler J. Cranmer Department of Political Science University of North Carolina, Chapel Hill skyler@unc.edu Week 3&4: Daniel Stegmueller

More information

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison Using the Testlet Model to Mitigate Test Speededness Effects James A. Wollack Youngsuk Suh Daniel M. Bolt University of Wisconsin Madison April 12, 2007 Paper presented at the annual meeting of the National

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Marianne (Marnie) Bertolet Department of Statistics Carnegie Mellon University Abstract Linear mixed-effects (LME)

More information

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT Amin Mousavi Centre for Research in Applied Measurement and Evaluation University of Alberta Paper Presented at the 2013

More information

Comparing DIF methods for data with dual dependency

Comparing DIF methods for data with dual dependency DOI 10.1186/s40536-016-0033-3 METHODOLOGY Open Access Comparing DIF methods for data with dual dependency Ying Jin 1* and Minsoo Kang 2 *Correspondence: ying.jin@mtsu.edu 1 Department of Psychology, Middle

More information

Advanced Bayesian Models for the Social Sciences

Advanced Bayesian Models for the Social Sciences Advanced Bayesian Models for the Social Sciences Jeff Harden Department of Political Science, University of Colorado Boulder jeffrey.harden@colorado.edu Daniel Stegmueller Department of Government, University

More information

Differential Item Functioning Amplification and Cancellation in a Reading Test

Differential Item Functioning Amplification and Cancellation in a Reading Test A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Using Sample Weights in Item Response Data Analysis Under Complex Sample Designs

Using Sample Weights in Item Response Data Analysis Under Complex Sample Designs Using Sample Weights in Item Response Data Analysis Under Complex Sample Designs Xiaying Zheng and Ji Seung Yang Abstract Large-scale assessments are often conducted using complex sampling designs that

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model Journal of Educational Measurement Winter 2006, Vol. 43, No. 4, pp. 335 353 Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model Wen-Chung Wang National Chung Cheng University,

More information

Copyright. Kelly Diane Brune

Copyright. Kelly Diane Brune Copyright by Kelly Diane Brune 2011 The Dissertation Committee for Kelly Diane Brune Certifies that this is the approved version of the following dissertation: An Evaluation of Item Difficulty and Person

More information

Bayesian Analysis of Between-Group Differences in Variance Components in Hierarchical Generalized Linear Models

Bayesian Analysis of Between-Group Differences in Variance Components in Hierarchical Generalized Linear Models Bayesian Analysis of Between-Group Differences in Variance Components in Hierarchical Generalized Linear Models Brady T. West Michigan Program in Survey Methodology, Institute for Social Research, 46 Thompson

More information

A COMPARISON OF BAYESIAN MCMC AND MARGINAL MAXIMUM LIKELIHOOD METHODS IN ESTIMATING THE ITEM PARAMETERS FOR THE 2PL IRT MODEL

A COMPARISON OF BAYESIAN MCMC AND MARGINAL MAXIMUM LIKELIHOOD METHODS IN ESTIMATING THE ITEM PARAMETERS FOR THE 2PL IRT MODEL International Journal of Innovative Management, Information & Production ISME Internationalc2010 ISSN 2185-5439 Volume 1, Number 1, December 2010 PP. 81-89 A COMPARISON OF BAYESIAN MCMC AND MARGINAL MAXIMUM

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

The Use of Multilevel Item Response Theory Modeling in Applied Research: An Illustration

The Use of Multilevel Item Response Theory Modeling in Applied Research: An Illustration APPLIED MEASUREMENT IN EDUCATION, 16(3), 223 243 Copyright 2003, Lawrence Erlbaum Associates, Inc. The Use of Multilevel Item Response Theory Modeling in Applied Research: An Illustration Dena A. Pastor

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods Journal of Modern Applied Statistical Methods Volume 11 Issue 1 Article 14 5-1-2012 Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian

More information

ABSTRACT. Professor Gregory R. Hancock, Department of Measurement, Statistics and Evaluation

ABSTRACT. Professor Gregory R. Hancock, Department of Measurement, Statistics and Evaluation ABSTRACT Title: FACTOR MIXTURE MODELS WITH ORDERED CATEGORICAL OUTCOMES: THE MATHEMATICAL RELATION TO MIXTURE ITEM RESPONSE THEORY MODELS AND A COMPARISON OF MAXIMUM LIKELIHOOD AND BAYESIAN MODEL PARAMETER

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

A Comparison of Item and Testlet Selection Procedures. in Computerized Adaptive Testing. Leslie Keng. Pearson. Tsung-Han Ho

A Comparison of Item and Testlet Selection Procedures. in Computerized Adaptive Testing. Leslie Keng. Pearson. Tsung-Han Ho ADAPTIVE TESTLETS 1 Running head: ADAPTIVE TESTLETS A Comparison of Item and Testlet Selection Procedures in Computerized Adaptive Testing Leslie Keng Pearson Tsung-Han Ho The University of Texas at Austin

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

How few countries will do? Comparative survey analysis from a Bayesian perspective

How few countries will do? Comparative survey analysis from a Bayesian perspective Survey Research Methods (2012) Vol.6, No.2, pp. 87-93 ISSN 1864-3361 http://www.surveymethods.org European Survey Research Association How few countries will do? Comparative survey analysis from a Bayesian

More information

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan.

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan. HLM Cheating 1 A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance William Skorupski University of Kansas Karla Egan CTB/McGraw-Hill Paper presented at the May, 2012 Conference

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1.

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1. Running Head: A MODIFIED CATSIB PROCEDURE FOR DETECTING DIF ITEMS 1 A Modified CATSIB Procedure for Detecting Differential Item Function on Computer-Based Tests Johnson Ching-hong Li 1 Mark J. Gierl 1

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Statistics for Social and Behavioral Sciences

Statistics for Social and Behavioral Sciences Statistics for Social and Behavioral Sciences Advisors: S.E. Fienberg W.J. van der Linden For other titles published in this series, go to http://www.springer.com/series/3463 Jean-Paul Fox Bayesian Item

More information

Multidimensionality and Item Bias

Multidimensionality and Item Bias Multidimensionality and Item Bias in Item Response Theory T. C. Oshima, Georgia State University M. David Miller, University of Florida This paper demonstrates empirically how item bias indexes based on

More information

POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS

POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS POLYTOMOUS IRT OR TESTLET MODEL: AN EVALUATION OF SCORING MODELS IN SMALL TESTLET SIZE SITUATIONS By OU ZHANG A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

Bayesian Mediation Analysis

Bayesian Mediation Analysis Psychological Methods 2009, Vol. 14, No. 4, 301 322 2009 American Psychological Association 1082-989X/09/$12.00 DOI: 10.1037/a0016972 Bayesian Mediation Analysis Ying Yuan The University of Texas M. D.

More information

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Selection of Linking Items

Selection of Linking Items Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,

More information

Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models. Xiaowen Liu Eric Loken University of Connecticut

Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models. Xiaowen Liu Eric Loken University of Connecticut Dimensionality of the Force Concept Inventory: Comparing Bayesian Item Response Models Xiaowen Liu Eric Loken University of Connecticut 1 Overview Force Concept Inventory Bayesian implementation of one-

More information

Scaling TOWES and Linking to IALS

Scaling TOWES and Linking to IALS Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Bayesian Model Averaging for Propensity Score Analysis

Bayesian Model Averaging for Propensity Score Analysis Multivariate Behavioral Research, 49:505 517, 2014 Copyright C Taylor & Francis Group, LLC ISSN: 0027-3171 print / 1532-7906 online DOI: 10.1080/00273171.2014.928492 Bayesian Model Averaging for Propensity

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

The matching effect of intra-class correlation (ICC) on the estimation of contextual effect: A Bayesian approach of multilevel modeling

The matching effect of intra-class correlation (ICC) on the estimation of contextual effect: A Bayesian approach of multilevel modeling MODERN MODELING METHODS 2016, 2016/05/23-26 University of Connecticut, Storrs CT, USA The matching effect of intra-class correlation (ICC) on the estimation of contextual effect: A Bayesian approach of

More information

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Working Paper Series, N. 7, July 2004 Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Luca Grassetti Department of Statistical Sciences University of Padua Italy Michela

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

The Hierarchical Testlet Response Time Model: Bayesian analysis of a testlet model for item responses and response times.

The Hierarchical Testlet Response Time Model: Bayesian analysis of a testlet model for item responses and response times. The Hierarchical Testlet Response Time Model: Bayesian analysis of a testlet model for item responses and response times By Suk Keun Im Submitted to the graduate degree program in Department of Educational

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

A structural equation modeling approach for examining position effects in large scale assessments

A structural equation modeling approach for examining position effects in large scale assessments DOI 10.1186/s40536-017-0042-x METHODOLOGY Open Access A structural equation modeling approach for examining position effects in large scale assessments Okan Bulut *, Qi Quo and Mark J. Gierl *Correspondence:

More information

Designing small-scale tests: A simulation study of parameter recovery with the 1-PL

Designing small-scale tests: A simulation study of parameter recovery with the 1-PL Psychological Test and Assessment Modeling, Volume 55, 2013 (4), 335-360 Designing small-scale tests: A simulation study of parameter recovery with the 1-PL Dubravka Svetina 1, Aron V. Crawford 2, Roy

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2009 Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices Bradley R. Schlessman

More information

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Journal of Educational Measurement Summer 2010, Vol. 47, No. 2, pp. 227 249 Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Jimmy de la Torre and Yuan Hong

More information

Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating A. A. Béguin B. A.

Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating A. A. Béguin B. A. Measurement and Research Department Reports 2001-2 Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating A. A. Béguin B. A. Hanson Measurement

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'18 85 Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Bing Liu 1*, Xuan Guo 2, and Jing Zhang 1** 1 Department

More information