The Patient-Reported Outcomes Measurement Information

Size: px
Start display at page:

Download "The Patient-Reported Outcomes Measurement Information"

Transcription

1 ORIGINAL ARTICLE Practical Issues in the Application of Item Response Theory A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales Cheryl D. Hill, PhD,* Michael C. Edwards, PhD, David Thissen, PhD, Michelle M. Langer, MA, R. J. Wirth, MA, Tasha M. Burwinkle, PhD, and James W. Varni, PhD Background: Item response theory (IRT) is increasingly being applied to health-related quality of life instrument development and refinement. This article discusses results obtained using categorical confirmatory factor analysis (CCFA) to check IRT model assumptions and the application of IRT in item analysis and scale evaluation. Objectives: To demonstrate the value of CCFA and IRT in examining a health-related quality of life measure in children and adolescents. Methods: This illustration uses data from 10,241 children and their parents on items from the 4 subscales of the PedsQL 4.0 Generic Core Scales. CCFA was applied to confirm domain dimensionality and identify possible locally dependent items. IRT was used to assess the strength of the relationship between the items and the constructs of interest and the information available across the latent construct. Results: CCFA showed generally strong support for 1-factor models for each domain; however, several items exhibited evidence of local dependence. IRT revealed that the items generally exhibit favorable characteristics and are related to the same construct within a given domain. We discuss the lessons that can be learned by comparing alternate forms of the same scale, and we assess the potential impact of local dependence on the item parameter estimates. Conclusions: This article describes CCFA methods for checking IRT model assumptions and provides suggestions for using these methods in practice. It offers insight into ways information gained through IRT can be applied to evaluate items and aid in scale construction. From the *RTI Health Solutions, Research Triangle Park, North Carolina; Department of Psychology, The Ohio State University, Columbus; Department of Psychology, University of North Carolina, Chapel Hill; Department of Pediatrics, Texas A&M University College of Medicine, Temple; Department of Pediatrics, College of Medicine; and Department of Landscape Architecture and Urban Planning, College of Architecture, Texas A&M University, College Station. Supported by National Institutes of Health Grant 1U01AR Presented at the annual meeting of the International Society for Quality of Life Research on October 20, 2005 in San Francisco, CA. Reprints: Cheryl D. Hill, PhD, RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC cdhill@rti.org. Copyright 2007 by Lippincott Williams & Wilkins ISSN: /07/ Medical Care Volume 45, Number 5 Suppl 1, May 2007 Key Words: IRT, factor analysis, instrument development, PedsQL (Med Care 2007;45: S39 S47) The Patient-Reported Outcomes Measurement Information System (PROMIS) project aims to assemble health-related quality of life (HRQoL) item banks for developing both adaptive (ie, computerized adaptive testing CAT ) and nonadaptive (ie, linear) patient-reported outcomes instruments. 1 This process of item banking and test assembly relies heavily on item response theory (IRT) to assess the properties of the candidate items that inform the assignment of items to domain banks and the selection of appropriate items for instruments. As with any model, the use of IRT implies a number of assumptions about the data. 2 This article discusses methods that can be used to check 2 primary assumptions of many IRT models, unidimensionality and local independence. In presenting these methods, we work through an example using data on items from an existing HRQoL instrument; these items were considered for inclusion in the PROMIS item bank and were also used to inform the development of new items for use with PROMIS. IRT models describe the probability of observing a particular pattern of responses given the respondent s level on the underlying construct ( ). With the 2-parameter logistic (2PL) model, which is appropriate for items measured in 2 response categories (eg, yes/no, true/false), this probability is modeled using a slope parameter (a i ) and a location parameter (b i ) for each item i. The slope parameter measures the strength of the relationship between the item and the underlying construct; higher slopes mean that the item can discriminate more sharply between respondents above and below some level on the latent continuum. For dichotomous items, the location parameter is the point along the latent continuum at which the item is most discriminating or informative; a respondent whose level on the underlying construct is at this location has a 50% chance of endorsing the item. In fields such as educational measurement, the location parameter is known as the difficulty parameter, where higher values are associated with more difficult items (ie, the respondent must be higher on the latent trait to provide a correct response). S39

2 Hill et al Medical Care Volume 45, Number 5 Suppl 1, May 2007 The probability of endorsing an item is described by a function of these item parameters called a trace line, or item characteristic curve, which takes the form for the 2PL model of: 1 T u i 1 1 exp a i b i, (1) where u i 1 refers to a positive response to item i. 3 An alternative model often used in health outcomes research is Samejima s graded response model (GRM), 4,5 which generalizes the 2PL model to include multiple b ij parameters per item (j from 1 to m 1) to correspond to m response categories (eg, items with the response scale Strongly Disagree, Disagree, Neutral, Agree, and Strongly Agree ). The formula for a GRM trace line is: 1 T u i j 1 exp a i b ij 1 1 exp a i b ij 1 ), (2) which states that the probability of responding in category j is the difference between a 2PL trace line for the probability of responding in category j or higher and a 2PL trace line for the probability of responding in category j 1 or higher. In the case of the GRM, a respondent with an underlying construct value of b ij has an equal probability of choosing category j or lower and category j 1 or higher. These trace lines can be plotted as the probability of endorsement along the continuum of the latent trait to provide a visual representation of location and discrimination. An expected score plot is an alternative to a trace line plot that collapses the lines for each category into 1 trajectory, showing the expected response score across the latent trait. Trace lines can also be used to calculate information curves that display the amount of information an item provides along the continuum of the latent trait. In health outcomes research, items are often scored so that higher scores indicate that the respondent is higher on the scale of the latent construct, or that the individual possesses more of the trait that the items are designed to measure. For example, a scale designed to assess quality of life would be scored so that higher scores correspond with higher quality of life. This would also mean that categories with larger b ij parameters would be more likely to be endorsed by respondents with better quality of life than those with worse quality of life. Because of the way IRT models combine information across items, 2 primary data requirements must be met. First, the scale must be unidimensional, that is, the pattern of item responses is best described by 1 dominant construct. When items that are related to multiple underlying constructs are forced to provide information for 1 construct alone, it is difficult to determine what construct is being represented in the ensuing scale score. Second, the items must be locally independent, which means that the probabilities of each item S40 response are related only through the value of the latent variable. That is, after accounting for the respondent s latent variable value, there should be no relationship between the responses to different items. Items that do have a relationship apart from the latent variable can create their own second dimension that explains covariance between these items that is not shared with the other items on the scale. This becomes a specific factor that is common to the locally dependent items and is separate from the general factor common to all items on the scale. When this multidimensional scale is forced into a unidimensional model, if the locally dependent items are strongly defined (ie, high-factor loadings) and the remaining items are weakly defined (ie, low-factor loadings), the strength of the relationship between the locally dependent items can change the construct measured by a scale by causing the 1 factor to be a measure of the specific factor rather than the general factor of interest. 6 The goal of this article is to outline the use of categorical confirmatory factor analysis (CCFA) and IRT in item selection and scale development as applied in the PROMIS project, using examples from data obtained from items on the PedsQL 4.0 Generic Core Scales. 7 CCFA, a factor analytic approach that accounts for the non-normality of categorical data that renders traditional confirmatory factor analysis methods inappropriate, will be used to assess domain dimensionality and to identify possible locally dependent items. Although there are other approaches for assessing local dependence and dimensionality, the use of CCFA was supported by the PROMIS psychometric team. 8 IRT will be used to assess how well the items measure the construct of interest and the appropriateness of the set of items for various ranges on the latent construct. METHODS The use of CCFA and IRT will be demonstrated using data on items from the 4 subscales of the PedsQL 4.0 Generic Core Scales. 7 This instrument consists of 23 items designed to measure HRQoL in children and adolescents. Four domains are assessed: (1) Physical Functioning, (2) Emotional Functioning, (3) Social Functioning, and (4) School Functioning. A number of instrument versions exist for various age ranges, different informants, and assorted languages; however, this example will focus only on the child self-report and parent proxy-report for children (ages 8 12 years old) and adolescents (ages years old) in English and Spanish. All analyses considered informant (self or parent), age (child or adolescent), and language (English or Spanish) separately, for a total of 8 replications of the analysis for each domain. Items from the PedsQL 4.0 Generic Core Scales were examined during the initial stages of the PROMIS project to obtain information about the dimensionality of the domains of interest to the project, to provide preliminary information about some items being considered for inclusion in the PROMIS item bank, and to familiarize the research team with the analysis plan that will be applied to PROMIS data when they become available. Thus, this analysis is not intended to be an evaluation of the PedsQL 4.0 Generic Core 2007 Lippincott Williams & Wilkins

3 Medical Care Volume 45, Number 5 Suppl 1, May 2007 IRT in Health Outcomes Research Scales, but rather an examination of individual items under consideration for use with PROMIS. Items are scored on a 5-point response scale (0 never a problem, 1 almost never a problem, 2 sometimes a problem, 3 often a problem, 4 almost always a problem ). Because of the natural direction of the response scale, models were fit with higher scores indicating higher severity. (It is important to note that the PedsQL 4.0 Generic Core Scales scoring instructions have the items reversed-scored and modeled with higher scores indicating higher quality of life). 7 Whereas the PROMIS project has an elaborate plan for selecting samples on which to calibrate item parameters that are appropriate for measurement in the intended populations, this example used a convenience sample obtained through the California State Children s Health Insurance Program, consisting primarily of healthy children. The PedsQL 4.0 Generic Core Scales were mailed to 20,031 English, Spanish, Vietnamese, Korean, or Cantonese speaking families in California with children between 2 16 years old who enrolled in California State Children s Health Insurance Program during the months of February and March, The number of returned surveys was 10,241, and only data from the families who spoke English or Spanish with children 8 years of age and older were considered for these analyses. Additional details about the sample and its characteristics can be found in a study by Varni et al. 9 Item 5 on the Physical Functioning domain was omitted from the analyses because nearly everyone responded never to having problems with taking a bath or shower by myself. This very strong floor effect would make parameter estimation difficult. With the removal of this item, there were 7 items on the Physical Functioning domain and 5 items on each of the Emotional, Social, and School Functioning domains. CCFA remains an area of active development, with the estimation methods performing well under some circumstances (eg, small number of items, large samples) 10 and failing under other circumstances (eg, sparse data, complex models) 11 ; so a number of estimation methods were used with 2 software packages (PRELIS 12 /LISREL 13 and Mplus 14 ). Researchers generally agree that using Pearson product moment correlations to fit factor analytic models with categorical variables induces bias in the parameters and fit statistics, 15 so polychoric correlations among the items were obtained using listwise deletion to eliminate missing data. Weighted least squares (WLS) estimation is currently the statistically optimal method for CCFA because it provides appropriate statistical estimates with categorical data. 16,17 However, WLS uses a weight matrix (the asymptotic covariance matrix) that must be inverted, which can create numerical problems when the sample size is small or the model has many measured variables. 12 Alternative approaches include diagonally weighted least squares (DWLS, available in LISREL) or robust weighted least squares with a mean and variance correction (RWLSM/V, available in Mplus), which is WLS with a diagonal weight matrix. Thus, WLS and DWLS or RWLSM/V methods were used in each of the software packages. Unweighted least squares (ULS) estimation was also used as a conservative, though least desirable, alternative. These estimation methods cover a range of weighting from full to partial to none. WLS is the ideal estimation method for CCFA because it has the potential to provide estimates and fit statistics that are closest to the truth. However, when the data are inappropriate for WLS (eg, small samples, complex models), WLS will produce unstable results. Under such conditions, DWLS, RWLSM/V, or ULS estimation will provide good approximations to the parameter estimates. Disadvantages to these methods include that fit statistics obtained with DWLS or RWLSM/V have unknown properties at this time, and ULS provides limited fit information. WLS should be used when possible, but these alternative estimation methods may be considered when necessary. We were fortunate to have an adequate size for the current application, which allowed us to use WLS throughout our analyses. In many cases, issues of sample size will prevent users from being able to use WLS, so we report ULS and DWLS here in an attempt to better understand how they perform relative to WLS. It is hoped that these comparisons will provide greater confidence in understanding ULS and DWLS (but especially DWLS) results when WLS becomes unfeasible. One-factor solutions were obtained for the items on each of the 4 domains to assess the unidimensionality of each domain and the local independence of the items within each domain. Particular attention was given to the size of the factor loadings, fit statistics, and modification indices. In a desirable solution the items load substantially on the factor, the 2 statistic is nonsignificant, the root mean square error of approximation (RMSEA) is less than 0.05, 18 the Non- Normed Fit Index (NNFI) is greater than 0.92, 19 the root mean square residual (RMR) is near 0, and the modification indices for the error covariances are all of small magnitude. Consistency across the 8 forms of the domain is ideal. The goal of small modification indices is based on the idea that a modification index is an estimate of the change that would be seen in the size of the 2 statistic if that constrained parameter were unconstrained. The size of a modification index should be considered in regard to the other modification indices (ie, is this modification index abnormally large compared with others in this model?) and in regard to the magnitude of the 2 statistic (ie, will model fit substantially improve if this parameter was allowed to vary?). It is important to note that perfect model fit is rarely achieved and, instead, the goal is to find a set of items that are essentially unidimensional. 20 That is to say, a set of items that is defined by 1 dominant dimensional may meet the requirement of unidimensional IRT models. It is less important for the model to be perfect than it is for it to be useful. An additional 4-factor model was obtained for the entire set of 22 PedsQL 4.0 Generic Core Scales items (excluding item 5 on the Physical Functioning domain) in which the items on each domain loaded only on the factor corresponding to that domain, and the factors were correlated. This model could offer additional support for the assumption of unidimensionality if restricting the association of the items 2007 Lippincott Williams & Wilkins S41

4 Hill et al Medical Care Volume 45, Number 5 Suppl 1, May 2007 to their respective domains does not result in a poorly fitting model. Large modification indices between items on different domains might be an indication of multidimensionality within domains, whereas large modification indices between items on the same domain might be an indication of local dependence. Once the dimensionality of the domains and local independence among the items had been considered, this information was used to inform IRT parameter estimation. Although many IRT models are available for use with polytomous data, Samejima s GRM was chosen by the PROMIS psychometric team 8 and was applied to these analyses. Maximum marginal likelihood estimation was used to fit the items on each of the domains using Multilog. 21 Plots of the trace lines, the expected score, and the information curve were made for each item, facilitating comparisons across informant, age, and language. When local dependence was suspected, parameters for that domain were estimated with and without the problematic items, and comparisons were made. Support for local dependence was found when the item slopes were substantially different between the model that included the locally dependent items and the model in which the potential dependencies were removed. RESULTS Categorical Confirmatory Factor Analysis Using 8 samples and 3 estimation methods (in 2 computer programs), we made a number of comparisons on each domain to assess the fit of the model and to gain confidence in the results. The results were compared across estimation methods to identify the method that produced the most consistent, reasonable results. Then the results obtained with the chosen method were used to assess IRT model assumptions. The Physical Functioning domain presents a nice example of what CCFA results look like when the scale meets the assumptions of IRT. Results for ULS, DWLS, and WLS estimation methods for this domain (item 5 omitted) are compared in Table 1 (factor loadings, standard errors, and 2 statistics) and Table 2 (modification indices) using data from the English child self-report sample. Factor loadings are similar across estimation methods, though they increase slightly from ULS to DWLS (or WLSM/V) to WLS. It cannot be determined using empirical data if the ULS estimates are deflated or if the WLS estimates are inflated. The fit indices provide a relatively consistent picture of omnibus fit across the various estimation methods, although some caution must be taken in deciding which fit indices to consider (see note for Table 1). Here, 2 fit indices are presented because they are important in interpreting modification indices. However, as with this example, 2 statistics tend to indicate significance when the sample is large, 22 and alternative fit indices should also be considered. Modification indices were not entirely consistent across methods. Those obtained using ULS and DWLS (or RWLSM/V) are more erratic than those obtained using WLS (when comparing estimation methods across computer software). In fact, 2 of the modification indices obtained under ULS are impossibly large, as they are greater in magnitude than the omnibus 2. Theoretically, DWLS (or WLSM/V) S42 TABLE 1. Factor Loadings (Standard Errors) and 2 Statistics From Lisrel v8.54 for 3 Estimation Methods Using Items From the Physical Functioning Domain With English Child Self-Report Data ULS DWLS WLS Item (0.03) 0.80 (0.03) 0.81 (0.03) Item (0.02) 0.84 (0.02) 0.86 (0.02) Item (0.02) 0.87 (0.02) 0.89 (0.02) Item (0.03) 0.68 (0.03) 0.71 (0.03) Item (0.03) 0.59 (0.03) 0.60 (0.03) Item (0.03) 0.71 (0.02) 0.76 (0.02) Item (0.03) 0.71 (0.03) 0.77 (0.02) C1 2 (df 14) 55 C2 2 (df 14) C3 2 (df 14) C4 2 (df 14) Note: LISREL produces up to 4 different 2 values in the output. C1 is the minimum fit function 2 and is the only measure of fit provided if WLS is used. C2 is the normal theory weighted least squares 2, which assumes normality in the measured variable. With categorical data, under ULS or DWLS estimation, C2 does not follow a 2 distribution and should not be used. C3 is the Satorra Bentler Scaled 2 and C4 is the 2 corrected for non-normality. Both are theoretically correct, the difference being that C3 does not require the inversion of the asymptotic covariance matrix. TABLE 2. Modification Indices (Rounded to the Nearest Whole Number to Facilitate Comparisons) for Various Estimation Methods Using Items From the Physical Functioning Domain With English Child Self-Report Data ULS DWLS WLS Item 2 with item Item 3 with item Item 3 with item Item 4 with item Item 4 with item Item 4 with item Item 6 with item Item 6 with item Item 6 with item Item 6 with item Item 7 with item Item 7 with item Item 7 with item Item 7 with item Item 7 with item Item 8 with item Item 8 with item Item 8 with item Item 8 with item Item 8 with item Item 8 with item Note: WLS was chosen as the appropriate estimation method for these data, and no modification index stands out as particularly large. An example of a noteworthy modification index would be that between items 2 and 3 or between items 7 and 8 under DWLS, and had this modification method been chosen, an alternative model with these error variances unconstrained would have been examined. However, both ULS and DWLS show modification indices that are larger than the 2 statistic for that method, and so there is some doubt as to the usefulness of these indices in these cases Lippincott Williams & Wilkins

5 Medical Care Volume 45, Number 5 Suppl 1, May 2007 IRT in Health Outcomes Research and WLS should provide accurate estimates and fit statistics in this situation. It is somewhat disturbing to see these differences in the modification indices provided under each method. These findings were consistent across domain and sample. As WLS is the gold standard estimator and seems to perform in a stable fashion, it was chosen as the method on which the evaluation of the domains is based. The factor loadings for the Physical Functioning domain were similar across samples. The suggestion for evaluating RMSEA values is that values less than 0.05 indicate close fit, values less than 0.08 indicate reasonable fit, and values greater than 0.10 indicate an unfavorable model. 18 For the Physical Functioning domain, the RMSEA values ranged from 0.05 to 0.10, indicative of an acceptable model. This was supported by NNFIs between 0.95 and 0.98 and RMRs between 0.06 and Most modification indices were similar across forms. Items 7 ( I hurt or ache ) and 8 ( I have low energy ) had somewhat larger modification indices on the parent proxy-report forms than on the self-report forms (ie, these were between 10 and 94, whereas those for other items were between 0 and 20), indicating correlated error variances between these items for parents. These items were the only 2 that did not ask about activities that could be observed by a proxy reporter (eg, walking, lifting, doing chores), so it makes sense that these items are more related to each other in the proxy-report forms given that they are externally observable behaviors. The modification indices were not sufficiently large to conclude that the items are locally dependent, therefore, the Physical Functioning domain was considered to be essentially unidimensional with no local dependence. Thus, the 7 items on the Physical Functioning domain could be considered together in IRT analyses. The School Functioning domain is an example of CCFA results that raise questions about the validity of the IRT assumptions. The factor loadings for the School Functioning domain, though not necessarily identical across samples, were large in magnitude for all forms. The modification indices were consistent across samples; items 4 ( I miss school because of not feeling well ) and 5 ( I miss school to go to the doctor or hospital ) showed very large modification indices on all 8 forms (ie, these were between 65 and 293, whereas those for others were between 0 and 231). In contrast to the other items that concern paying attention in class, forgetting things, and keeping up with schoolwork, these 2 items clearly have the concept of missing school in common. When these items were allowed to have correlated errors, the fit of the model improved substantially. For example, on the teen self-report form in Spanish, allowing these items to have correlated errors reduced the RMSEA from 0.16 to 0.02, the RMR from 0.14 to 0.02, and the largest modification index from 13 to 3, whereas this change increased the NNFI from 0.89 to 1.0. Because of the magnitude of the modification indices and the content of the items, items 4 and 5 were suspected of local dependence. Additionally, the modification indices for items 1 ( It is hard to pay attention in class ) and 3 ( I have trouble keeping up with my schoolwork ) also were elevated on the parent proxy-reports (eg, modification index of 231 mentioned above). This is likely because these items concern behavior that the parent might not necessarily observe (ie, the parent is not at school with the child when these behaviors occur). Because the PROMIS project focuses on self-report measurement, the potential local dependence between these items for parent report data was noted but was not of primary concern in these analyses. Thus, IRT analyses could proceed on the 5 School Functioning items as long as the potential dependence between items 4 and 5 is further examined within an IRT model. The remaining domains, Emotional Functioning and Social Functioning, were also found to be unidimensional with no obvious local dependence and consistent across samples, and the planned IRT analyses seemed appropriate for the items within each domain. Finally, the parameters of a 4-factor model were estimated, and the results were consistent with those found in the unidimensional models. The factor loadings were large in magnitude and the modification indices agreed with those found in the unidimensional models. Modification indices for items loading on the factors for other domains were all small, suggesting that the model would not be substantially improved by allowing items on 1 domain to load on the factor associated with another domain. The factors were highly correlated (correlations ranging from 0.75 to 0.88). The unidimensionality of each domain was supported by the results of the 4-factor model, which showed reasonably good fit. For example, the child self-report sample in Spanish produced an RMSEA of 0.05, an NNFI of 0.95, and an RMR of Item Response Theory Based on the results of CCFA, the 4 domains were each modeled using a unidimensional IRT model. Because the possibility of local dependence between items 4 and 5 on the School Functioning domain was suggested by CCFA, this domain was modeled 3 times: with all 5 items, with item 4 removed, and with item 5 removed. The Emotional Functioning domain is a case with reasonable item and test characteristics and consistency across test forms. For example, Figure 1 shows item 2 ( I feel sad or blue ) for both the self-report and proxy-report, child and teen forms in Spanish. The sets of trace lines are nearly coincident and the information obtained using this domain is similar across forms. The steepness of the trace lines (slope ranges from 2.4 to 3.9, depending on the sample) indicates that the item is highly discriminating for examinees who are somewhat above average in their symptom severity, and an item with trace lines of this magnitude would be a desirable addition to an item bank. However, this item would not provide much information for examinees with good quality of life (ie, those to the left of 0 on the scale) because the trace lines suggest that they would be very likely to respond never (thresholds range from 0.2 to 3.3 depending on the sample). In other words, a CAT should select this item for someone indicating above average severity, but this item would be less useful for someone indicating good quality of life. The parameters of the other items on the Emotional Functioning domain were consistent with item 2 (slopes range from 1.6 to 3.9, thresholds range from 0.9 to 4.0). All 2007 Lippincott Williams & Wilkins S43

6 Hill et al Medical Care Volume 45, Number 5 Suppl 1, May 2007 FIGURE 1. Item 2 from the Emotional Functioning domain for Spanish child self-report (solid lines in left panel), Spanish child parent-proxy report (dashed lines in left panel), Spanish teen self-report (solid lines in right panel), and Spanish teen parentproxy report (dashed lines in right panel). 5 items have desirable properties for the PROMIS item bank, though additional items should be written for measuring examinees with good quality of life. In contrast, the School Functioning domain contains 2 items with relatively poor characteristics. The results of CCFA suggested that item 4 ( I miss school because of not feeling well ) and item 5 ( I miss school to go to the doctor or hospital ) might be locally dependent. For this domain, the parameters of the IRT model were estimated in 3 ways, once with all 5 items (solid lines in Fig. 2) and then removing either item 4 or item 5 (dashed lines in Fig. 2). Often an indicator of local dependence is that when both items are included, their slopes may be high whereas those for the other items are lower. 6 When either item from a locally dependent pair is removed, the slope of the item remaining from the pair decreases and the slopes of the other items increase. However, this pattern was not found because item 4 and item 5 are poorly related to the outcome of interest (slopes range from 0.7 to 1.2). Regardless of their inclusion or exclusion in the model, the parameter estimates for these items and the other items on the domain did not change. Despite their excess relationship, it did not damage the fit of the model to include both items 4 and 5 on the domain, though they do not add much information for scoring (bottom panels of Fig. 2). This is a case where violation of IRT assumptions, specifically multidimensionality, must be investigated but may be ignored if it turns out to have no impact on measurement. Still, because of the limited information available from the items, they would add little measurement value to an item bank and be a poor choice for administration on a CAT. S44 School Functioning items 4 and 5 discriminate poorly among examinees on the latent construct. For example, an examinee who responds in the second category ( almost never ) is as likely to have good quality of life as she is to have poor quality of life. Because the trace lines are very flat, the information curve is also very low and flat, indicating that responses to these items provide little information about the examinee s overall quality of life. These curves are based on a sample of primarily healthy children, and it is possible that missing school because of not feeling well or to go to the doctor/hospital would be more related to school functioning for children with chronic diseases. In this population, the frequency of missing school may be more detrimental to school performance than is occasional school absence by healthy children. This possibility is examined in a study by Langer et al. 23 It may be appropriate to calibrate all 5 items on the School Functioning domain together and include in the PROMIS item bank, though items 4 and 5 would be unlikely selections for CAT administration. It is also possible that the properties of items 4 and 5 may change in a different population, and this should be considered when these items are calibrated using samples chosen for PROMIS. Knowing that a pattern of inflated and deflated slopes may occur when locally dependent items are included in IRT parameter estimation led to a discovery of local dependence that was not apparent in the CCFA. Based on the CCFA, all 7 items included from the Physical Functioning domain were modeled using IRT. However, the slopes for items 1 ( It is hard for me to walk more than 1 block ), 2 ( It is hard for me to run ), and 3 ( It is hard for me to do sports activity or 2007 Lippincott Williams & Wilkins

7 Medical Care Volume 45, Number 5 Suppl 1, May 2007 IRT in Health Outcomes Research FIGURE 2. Item 4 (left panel) and item 5 (right panel) from the School Functioning domain for English teen self-report when all items are calibrated together (solid lines). Dashed lines are item 4 calibrated with items 1 3 (left panel) and item 5 calibrated with items 1 3 (right panel). exercise ) were found to be surprisingly large as compared with those of the other items in the model (slope values from around 4 to 6 vs. near 2, depending on the sample). These items seem to be concerned with performing an exercise or activity, whereas the other items refer to lifting, doing chores, having aches, and having low energy. The curves for these items from the Spanish teen parent-proxy report are presented as solid lines in Figure 3, and it is apparent that the items are highly related to the construct being measured, especially items 2 and 3 which have very steep curves. The large information values suggest that these items are very useful for measuring physical functioning. However, because they turn out to be locally dependent, they essentially turn the construct into a measure of exercise performance; their inclusion on the scale may narrow the scope of the scale. The evidence for this conclusion is shown as dashed lines in Figure 3, which are the curves for each of the 3 items as estimated with the other 2 items removed from the model. The difference between the solid lines and the dashed lines is striking; the curves become much flatter and the information FIGURE 3. Item 1 (left panel), item 2 (middle panel), and item 3 (right panel) from the Physical Functioning domain for Spanish teen parent-proxy report when all items are calibrated together (solid lines). Dashed lines are item 1 calibrated with items 4 and 6 8 (left panel), item 2 calibrated with items 4 and 6 8 (middle panel), and item 3 calibrated with items 4 and 6 8 (right panel) Lippincott Williams & Wilkins S45

8 Hill et al Medical Care Volume 45, Number 5 Suppl 1, May 2007 function is greatly reduced when the other items involved in the local dependence are removed (slopes range between 2 and 3). This is a reflection of the fact that each of the 3 items is weakly related to the remaining 4 items on the scale but the 3 items are strongly related to each other. By removing 2 of the 3 items, the remaining scale measures a broader construct (physical functioning) while leaving in the triplet of items results in a scale that primarily measures exercise performance. If the slopes in the original model had not been carefully scrutinized, local dependence could have persisted in the scale, changing the construct being measured through IRT scoring. Although all 7 Physical Functioning items may be included together in an item bank, the parameters for items 1 3 would have to be calibrated using separate samples and linked to the sample scale. Further, logic would have to be included in the CAT item selection algorithm to prohibit more than 1 of these 3 items from being administered to the same examinee. DISCUSSION IRT can be a powerful tool in health outcomes assessment and will play a crucial role in PROMIS s item banking and linear and adaptive test assembly. 1 Assumptions of the IRT model must be checked before proceeding with item calibration. Without checking assumptions, the appropriateness of the model and the accuracy of the parameter estimates may be jeopardized. This article provides ways that CCFA and IRT analysis can be used in questionnaire development. CCFA facilitates identification of potential locally dependent items and evaluates domain dimensionality. Different estimation methods and even different software packages may produce different results, so it is important that the user be aware of the appropriateness of the estimation method to the particular type of data and be familiar with the software in use. In this example, having 8 samples facilitated assumption checking because trends across samples could be observed. It is reassuring when the results across all samples agree, but confidence slips with mixed results. It is possible that a scale could be unidimensional in 1 sample and multidimensional in another, as well as for 2 items to be locally dependent in some but not all samples. Cases such as these would require separate item calibrations for different samples, therefore, it is best to check assumptions for all available samples before proceeding with IRT. It has been our experience with CCFA that when sample sizes are large, WLS, DWLS, and ULS all obtain similar factor loadings. When the sample is inadequate for use with WLS, factor loadings tend to disagree between WLS and DWLS. Researchers who are uncertain about which method to utilize should try both, and use WLS when factor loadings agree and DWLS when factor loadings disagree. However, even with large samples, we have found fit indices and modification indices to be inconsistent across methods. When DWLS is applied, these indices should be used with caution. IRT provides evidence about the information from each item response for inferences about the respondent s standing S46 on the construct being measured. Item parameters can be used to inform scoring rules and to determine for which populations the scale is most appropriate. Item parameters are key in selecting items for administration to a particular examinee on a CAT such as PROMIS. Again, in our example, multiple samples facilitated a better understanding of the nature of the items. When discrepancies are found between item parameters for different samples, often the item content and the sample characteristics explain the results. Careful examination of the item parameters can also identify locally dependent items that are not revealed in CCFA. Items with very high slopes can dominate the domain, in which case local dependence must be eliminated for the scale to take on its intended meaning. Graphics facilitate item evaluation because it is easy to inspect the curves to identify items that provide substantial information in the range of the latent construct that the test is designed to measure. When locally dependent items are identified, 1 or both items do not necessarily have to be excluded from the item bank. Often, 1 item will be informative for examinees in 1 area of the latent trait, and the other will be similarly informative in another region. Such items must be calibrated separately so that they do not take over the latent trait of the scale, and the parameters can be subsequently linked. CAT programmers would then need to specify that these items should not be administered to the same examinee. In this example, the domains were found to be essentially unidimensional with some evidence of local dependence between some items. It is important that this local dependence be flagged before item calibration is finalized so that adjustments can be made to ensure that the slope parameters are on the metric of the construct of interest. All items seemed appropriate for inclusion in the PROMIS item bank, though some (eg, those with large slopes) are more likely to be selected by the CAT algorithm than others. The discrimination parameters for these items were generally positive, indicating that these items provide the most information for examinees with moderate to poor quality of life. This knowledge can be used to write PROMIS items that are designed to fill in the measurement gaps along the continuum of the latent traits. With the recent interest in CAT used in health outcomes research, CCFA and IRT will soon become staples in the health outcomes researcher s statistical tool bag. Certainly, for the PROMIS project, CCFA and IRT will be invaluable for evaluating candidate items and assembling quality patient-reported outcomes instruments. We hope that this example emphasizes that these tools are powerful when used correctly, but can produce unintended results when model assumptions are not verified. ACKNOWLEDGMENTS This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research. Information on this RFA (Dynamic Assessment of Patient-Reported Chronic Disease Outcomes) can be found at ( clinicalresearch/overview-dynamicoutcomes.asp) Lippincott Williams & Wilkins

9 Medical Care Volume 45, Number 5 Suppl 1, May 2007 IRT in Health Outcomes Research REFERENCES 1. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap Cooperative Group during its first two years. Med Care. 2007; 45(Suppl 1):S3 S Hambleton RK. Emergence of item response modeling in instrument development and data analysis. Med Care. 2000;38:II-60 II Birnbaum A. Some latent trait models and their use in inferring an examinee s ability. In: Lord FM, Novick MR, eds. Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley; 1968: Samejima F. Estimation of Latent Ability Using a Response Pattern of Graded Scores. Iowa City, IA: Psychometric Society; Psychometric Monograph No Samejima F. Graded response model. In: van der Linden WJ, Hambleton RK, eds. Handbook of Modern Item Response Theory. New York, NY: Springer Verlag; 1997: Chen W, Thissen D. Local dependence indexes for item pairs using item response theory. J Educ Behav Stat. 1997;22: Varni JW, Seid M, Kurtin PS. The PedsQL 4.0: reliability and validity of the Pediatric Quality of Life Inventory Version 4. 0 Generic Core Scales in healthy and patient populations.med Care. 2001;39: Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(Suppl 1):S22 S Varni JW, Burwinkle TM, Seid M, et al. The PedsQL 4.0 as a pediatric population health measure: feasibility, reliability, and validity. Ambul Pediatr. 2003;3: Oranje A. Comparison of estimation methods in factor analysis with categorized variables: applications to NAEP data. Paper presented at Annual Meeting of the American Educational Research Association, Chicago, IL; April Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol Methods. 2004;9: Jöreskog KG, Sörbom D. PRELIS 2 User s Reference Guide: A Program for Multivariate Data Screening and Data Summarization; A Preprocessor for LISREL. Chicago, IL: Scientific Software International; Jöreskog KG, Sörbom D. LISREL 8: User s Reference Guide. Chicago, IL: Scientific Software International; Muthén LK, Muthén BO. Mplus User s Guide. 3rd ed. Los Angeles, CA: Muthén & Muthén; Jöreskog KG. New developments in LISREL: analysis of ordinal variables using polychoric correlations and weighted least squares. Qual Quantity. 1990;24: Browne MW. Asymptotically distribution-free methods for the analysis of covariance structures. Br J Math Stat Psychol. 1984;37: Muthén B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49: Browne MW, Cudeck R. Alternative ways of assessing model fit. Sociol Methods Res. 1992;21: Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38: McDonald RP. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates; Thissen D, Chen W-H, Bock RD. Multilog (Version 7) Computer Software. Lincolnwood, IL: Scientific Software International; Schumaker RE, Lomax RG. A Beginner s Guide to Structural Equation Modeling. Mahwah, NJ: Erlbaum; Langer MM, Hill CD, Thissen D, et al. Detection and evaluation of differential item functioning using item response theory: an application to the Pediatric Quality of Life Inventory (PedsQL ) 4.0 Generic Core Scales. J Clin Epidemiol. In press Lippincott Williams & Wilkins S47

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Utilizing the NIH Patient-Reported Outcomes Measurement Information System

Utilizing the NIH Patient-Reported Outcomes Measurement Information System www.nihpromis.org/ Utilizing the NIH Patient-Reported Outcomes Measurement Information System Thelma Mielenz, PhD Assistant Professor, Department of Epidemiology Columbia University, Mailman School of

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON

More information

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis Structural Equation Modeling, 15:676 704, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print/1532-8007 online DOI: 10.1080/10705510802339080 TEACHER S CORNER Incorporating Measurement Nonequivalence

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

PROMIS DEPRESSION AND NEURO-QOL DEPRESSION

PROMIS DEPRESSION AND NEURO-QOL DEPRESSION PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS DEPRESSION AND NEURO-QOL DEPRESSION SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON F. COOK

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

PROMIS PAIN INTERFERENCE AND BRIEF PAIN INVENTORY INTERFERENCE

PROMIS PAIN INTERFERENCE AND BRIEF PAIN INVENTORY INTERFERENCE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS PAIN INTERFERENCE AND BRIEF PAIN INVENTORY INTERFERENCE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D.

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

PROMIS DEPRESSION AND CES-D

PROMIS DEPRESSION AND CES-D PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS DEPRESSION AND CES-D SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET, KARON F. COOK & DAVID CELLA

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE SEUNG W. CHOI, TRACY PODRABSKY, NATALIE MCKINNEY, BENJAMIN D. SCHALET,

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Marc J. Tassé, PhD Nisonger Center UCEDD

Marc J. Tassé, PhD Nisonger Center UCEDD FINALLY... AN ADAPTIVE BEHAVIOR SCALE FOCUSED ON PROVIDING PRECISION AT THE DIAGNOSTIC CUT-OFF. How Item Response Theory Contributed to the Development of the DABS Marc J. Tassé, PhD UCEDD The Ohio State

More information

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS PEDIATRIC ANXIETY AND NEURO-QOL PEDIATRIC ANXIETY

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS PEDIATRIC ANXIETY AND NEURO-QOL PEDIATRIC ANXIETY PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS PEDIATRIC ANXIETY AND NEURO-QOL PEDIATRIC ANXIETY DAVID CELLA, BENJAMIN D. SCHALET, MICHAEL A. KALLEN, JIN-SHEI LAI,

More information

Department of Educational Administration, Allameh Tabatabaei University, Tehran, Iran.

Department of Educational Administration, Allameh Tabatabaei University, Tehran, Iran. ORIGINAL ARTICLE Received 25 Dec. 2013 Accepted 05 Feb. 2014 2014, Science-Line Publication www.science-line.com ISSN: 2322-4770 Journal of Educational and Management Studies J. Educ. Manage. Stud.,4 (1):22-38,

More information

Initial Report on the Calibration of Paper and Pencil Forms UCLA/CRESST August 2015

Initial Report on the Calibration of Paper and Pencil Forms UCLA/CRESST August 2015 This report describes the procedures used in obtaining parameter estimates for items appearing on the 2014-2015 Smarter Balanced Assessment Consortium (SBAC) summative paper-pencil forms. Among the items

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS SLEEP DISTURBANCE AND NEURO-QOL SLEEP DISTURBANCE

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS SLEEP DISTURBANCE AND NEURO-QOL SLEEP DISTURBANCE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS SLEEP DISTURBANCE AND NEURO-QOL SLEEP DISTURBANCE DAVID CELLA, BENJAMIN D. SCHALET, MICHAEL KALLEN, JIN-SHEI LAI, KARON

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Dr. KAMALPREET RAKHRA MD MPH PhD(Candidate) No conflict of interest Child Behavioural Check

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey Matthew J. Bundick, Ph.D. Director of Research February 2011 The Development of

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University

More information

Adaptive EAP Estimation of Ability

Adaptive EAP Estimation of Ability Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing

Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

The Use of Item Statistics in the Calibration of an Item Bank

The Use of Item Statistics in the Calibration of an Item Bank ~ - -., The Use of Item Statistics in the Calibration of an Item Bank Dato N. M. de Gruijter University of Leyden An IRT analysis based on p (proportion correct) and r (item-test correlation) is proposed

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample

Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample JOURNAL OF SPORT & EXERCISE PSYCHOLOGY, 19%. 18,49-63 O 1996 Human Kinetics Publishers, Inc. Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample Fuzhong Li

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

Computerized Adaptive Testing for Classifying Examinees Into Three Categories

Computerized Adaptive Testing for Classifying Examinees Into Three Categories Measurement and Research Department Reports 96-3 Computerized Adaptive Testing for Classifying Examinees Into Three Categories T.J.H.M. Eggen G.J.J.M. Straetmans Measurement and Research Department Reports

More information

Item-Response-Theory Analysis of Two Scales for Self-Efficacy for Exercise Behavior in People With Arthritis

Item-Response-Theory Analysis of Two Scales for Self-Efficacy for Exercise Behavior in People With Arthritis Journal of Aging and Physical Activity, 2011, 19, 239-248 2011 Human Kinetics, Inc. Item-Response-Theory Analysis of Two Scales for Self-Efficacy for Exercise Behavior in People With Arthritis Thelma J.

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

Sources of Comparability Between Probability Sample Estimates and Nonprobability Web Sample Estimates

Sources of Comparability Between Probability Sample Estimates and Nonprobability Web Sample Estimates Sources of Comparability Between Probability Sample Estimates and Nonprobability Web Sample Estimates William Riley 1, Ron D. Hays 2, Robert M. Kaplan 1, David Cella 3, 1 National Institutes of Health,

More information

Michael C. Edwards 1827 Neil Ave

Michael C. Edwards 1827 Neil Ave Michael C. Edwards 1827 Neil Ave 614.688.8030 Columbus, OH 43210 edwards.134@osu.edu http://faculty.psy.ohio-state.edu/edwards/ Educational Background 2005 PhD, Quantitative Psychology L.L. Thurstone Psychometric

More information

Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence

Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence Research Report ETS RR 12-11 Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence Guangming Ling June 2012 Why the Major Field Test in Business Does

More information

Chapter 9. Youth Counseling Impact Scale (YCIS)

Chapter 9. Youth Counseling Impact Scale (YCIS) Chapter 9 Youth Counseling Impact Scale (YCIS) Background Purpose The Youth Counseling Impact Scale (YCIS) is a measure of perceived effectiveness of a specific counseling session. In general, measures

More information

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS V2.0 COGNITIVE FUNCTION AND FACT-COG PERCEIVED COGNITIVE IMPAIRMENT DAVID CELLA, BENJAMIN D. SCHALET, MICHAEL KALLEN,

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Detection Theory: Sensitivity and Response Bias

Detection Theory: Sensitivity and Response Bias Detection Theory: Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System

More information

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram

Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Maximum Marginal Likelihood Bifactor Analysis with Estimation of the General Dimension as an Empirical Histogram Li Cai University of California, Los Angeles Carol Woods University of Kansas 1 Outline

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley.

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley. Bibliography [1] Birnbaum, A., 1968, Some latent trait models and their use in inferring an examinee s ability, In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479),

More information

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS GLOBAL HEALTH-PHYSICAL AND VR-12- PHYSICAL

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS GLOBAL HEALTH-PHYSICAL AND VR-12- PHYSICAL PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS GLOBAL HEALTH-PHYSICAL AND VR-12- PHYSICAL DAVID CELLA, BENJAMIN D. SCHALET, MICHAEL KALLEN, JIN-SHEI LAI, KARON F. COOK,

More information

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients

Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Sample Sizes for Predictive Regression Models and Their Relationship to Correlation Coefficients Gregory T. Knofczynski Abstract This article provides recommended minimum sample sizes for multiple linear

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

A General Procedure to Assess the Internal Structure of a Noncognitive Measure The Student360 Insight Program (S360) Time Management Scale

A General Procedure to Assess the Internal Structure of a Noncognitive Measure The Student360 Insight Program (S360) Time Management Scale Research Report ETS RR 11-42 A General Procedure to Assess the Internal Structure of a Noncognitive Measure The Student360 Insight Program (S360) Time Management Scale Guangming Ling Frank Rijmen October

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

To link to this article:

To link to this article: This article was downloaded by: [Vrije Universiteit Amsterdam] On: 06 March 2012, At: 19:03 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Scaling TOWES and Linking to IALS

Scaling TOWES and Linking to IALS Scaling TOWES and Linking to IALS Kentaro Yamamoto and Irwin Kirsch March, 2002 In 2000, the Organization for Economic Cooperation and Development (OECD) along with Statistics Canada released Literacy

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Item Selection in Polytomous CAT

Item Selection in Polytomous CAT Item Selection in Polytomous CAT Bernard P. Veldkamp* Department of Educational Measurement and Data-Analysis, University of Twente, P.O.Box 217, 7500 AE Enschede, The etherlands 6XPPDU\,QSRO\WRPRXV&$7LWHPVFDQEHVHOHFWHGXVLQJ)LVKHU,QIRUPDWLRQ

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Computerized Adaptive Testing With the Bifactor Model

Computerized Adaptive Testing With the Bifactor Model Computerized Adaptive Testing With the Bifactor Model David J. Weiss University of Minnesota and Robert D. Gibbons Center for Health Statistics University of Illinois at Chicago Presented at the New CAT

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Turning Output of Item Response Theory Data Analysis into Graphs with R

Turning Output of Item Response Theory Data Analysis into Graphs with R Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health

Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Understanding and Applying Multilevel Models in Maternal and Child Health Epidemiology and Public Health Adam C. Carle, M.A., Ph.D. adam.carle@cchmc.org Division of Health Policy and Clinical Effectiveness

More information