Difficulty and Discrimination Parameters of Boston Naming Test Items in a Consecutive Clinical Series

Size: px
Start display at page:

Download "Difficulty and Discrimination Parameters of Boston Naming Test Items in a Consecutive Clinical Series"

Transcription

1 Archives of Clinical Neuropsychology 26 (2011) Difficulty and Discrimination Parameters of Boston Naming Test Items in a Consecutive Clinical Series Abstract Otto Pedraza*, Bonnie C. Sachs, Tanis J. Ferman, Beth K. Rush, John A. Lucas Department of Psychiatry and Psychology, Mayo Clinic, Jacksonville, FL, USA *Corresponding author at: Department of Psychiatry and Psychology, Mayo Clinic, Jacksonville, FL 32224, USA. Tel.: ; fax: address: otto.pedraza@mayo.edu (O. Pedraza). Accepted 21 April 2011 The Boston Naming Test is one of the most widely used neuropsychological instruments; yet, there has been limited use of modern psychometric methods to investigate its properties at the item level. The current study used Item response theory to examine each item s difficulty and discrimination properties, as well as the test s measurement precision across the range of naming ability. Participants included 300 consecutive referrals to the outpatient neuropsychology service at Mayo Clinic in Florida. Results showed that successive items do not necessarily reflect a monotonic increase in psychometric difficulty, some items are inadequate to distinguish individuals at various levels of naming ability, multiple items provide redundant psychometric information, and measurement precision is greatest for persons within a low-average range of ability. These findings may be used to develop short forms, improve reliability in future test versions by replacing psychometrically poor items, and analyze profiles of intra-individual variability. Keywords: Boston Naming Test; Item response theory; Item difficulty; Item discriminability Introduction The Boston Naming Test (BNT) (Kaplan, Goodglass, & Weintraub, 1983) is the most frequently used instrument for the assessment of visual naming ability (Rabin, Barr, & Burton, 2005). Its validity and reliability are well established and reviewed elsewhere (Strauss, Sherman, & Spreen, 2006). Briefly, internal consistency for the 60-item version ranges from r ¼.78 to.96 across studies. Test retest stability in cognitively normal adults varies as a function of time interval and sample composition, but generally ranges from r ¼.59 to.92. Moreover, the BNT correlates highly (r ¼.76 to.86) with other naming tests, such as the Visual Naming Test from the Multilingual Aphasia Examination. Although the psychometric properties of the BNT have been established at the global test level, few studies have used modern psychometric methods to evaluate the BNT at the item level (some studies considered item characteristics at a descriptive level, e.g., Tombaugh & Hubley, 1997). This information can be helpful to develop new short forms, improve test reliability by replacing psychometrically poor items, analyze error patterns or profiles of intra-individual variability, or take into account regional or cultural influences on individual item responses. For instance, Graves, Bezeau, Fogarty, and Blair (2004) used a one-parameter (Rasch) model to analyze the difficulty of BNT items and develop a new short form. Items were excluded from the short form if they were too easy, failed to fit the Rasch model, or had poor loadings on the first component of a principal components analysis. Item response theory (IRT) is a state-of-the-art measurement approach that uses examinees item responses to simultaneously estimate each person s underlying (latent) ability and the characteristics of the test items used to measure that ability (Embretson & Reise, 2000; Hambleton & Swaminathan, 1985; Hambleton, Swaminathan, & Rogers, 1991). In this framework, a person s ability level is considered a function of the pattern of unique item responses as well as the parametric properties of the test items. It thus becomes possible to estimate an item s discrimination (a), or the degree to which the item # The Author Published by Oxford University Press. All rights reserved. For permissions, please journals.permissions@oup.com. doi: /arclin/acr042 Advance Access publication on 18 May 2011

2 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) distinguishes persons with higher ability from those with lower ability, and difficulty (b), the point in the ability scale at which a person has a 50% chance of responding correctly to the item. Models that estimate both item discrimination and difficulty parameters are well suited for the investigation of cognitive tests and abilities (Teresi, 2006). In IRT, item characteristic curves (ICCs) trace the probability of a correct item response as a function of the underlying ability construct and can be thought of as the regression of an item score on the person s latent ability. Item difficulty is depicted by the location along the x-coordinate where the probability of a correct response for a binary item is 50%, and item discrimination is represented by the slope of the trace line at that location. For instance, Fig. 1 depicts a theoretical test item with a difficulty parameter equal to zero. In this case, a person with an average ability has a 50% chance of responding correctly to the item. In contrast, Fig. 2 depicts a theoretical item with a difficulty parameter equal to Because a lesser degree of latent ability is required to obtain a 50% chance of responding correctly, the item in Fig. 2 is considered less difficult than that in Fig. 1. Note also the differences in the discrimination parameters between the two items. The steeper slope (i.e., higher discrimination) for the item in Fig. 2 indicates that it is better at distinguishing persons within a very narrow range of ability. When item discrimination is zero, every person has an equal probability of providing a correct response. In this case, the ICC is flat and the item should be flagged for deletion or replacement from the pool of test items. An advantage of IRT over classical test theory methods is that reliability is not constrained to a single coefficient, but instead can be measured continuously over the entire ability spectrum. Reliability in IRT is equivalent to the concept of information and is inversely related to the standard error of measurement (Embretson & Reise, 2000). Item, and hence test, information is maximized by higher discrimination parameters and an adequate match between item difficulty and a person s ability level. A further attractive property of IRT models is that item information can be summed to yield a global test information function, which represents the degree of precision for the test at each level of the latent ability. Recently, Pedraza and colleagues (2009) used an IRT approach to evaluate the differential response pattern for BNT items in cognitively normal Caucasian and African American adults. Results showed that successive BNT items do not necessarily reflect an increase in psychometric difficulty, many items do not discriminate persons with low versus high naming ability, and a subset of items demonstrate comparable difficulty or discrimination properties, suggesting that these items may be psychometrically redundant. In addition, the BNT showed the greatest measurement precision for individuals with mild naming difficulty. The current study represents an extension of Pedraza and colleagues (2009) to investigate the item-level properties of the BNT in a prospective series of adult patients with a broad range of naming ability. Method Participants Study participants included 300 consecutive referrals to the outpatient clinical neuropsychology service at Mayo Clinic in Florida. Patients were referred predominantly by the Departments of Neurology and Neurosurgery (65%) and Internal Medicine and its subspecialties (19%). Approximately half of the patients were referred for dementia evaluations, with the Fig. 1. Theoretical item with discrimination (a) ¼ 2.0 and difficulty (b) ¼ 0.

3 436 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 2. Theoretical item with discrimination (a) ¼ 3.0 and difficulty (b) ¼ remainder including epilepsy, normal pressure hydrocephalus, depression, poststroke status, and other medical and neurologic conditions. All patients were evaluated between July 2009 and January Only those patients whose primary language was English were considered for inclusion. All data were obtained in full compliance with a study protocol approved by the Mayo Clinic Institutional Review Board. Materials The BNT was administered in ascending order beginning with item 1 and proceeding until item 60. Items were scored as correct or incorrect following standardized instructions (Kaplan et al., 1983). For the purposes of the current investigation, the BNT total raw score represents the sum of all correct items regardless of basal or discontinuation rules. A separate score using basal and discontinuation rules was recorded for the purposes of the clinical examination and will not be considered in this study. Statistical Analyses A fundamental assumption in IRT is that the set of test items should measure a single dimension or construct. The dimensionality of the BNT was evaluated using multiple approaches. First, internal consistency was examined using Cronbach s alpha coefficient. Although internal consistency (i.e., alpha. 0.70) does not preclude the presence of multiple constructs, it represents a necessary but insufficient component of unidimensionality and is considered in that context (Gardner, 1995; Schmitt, 1996). Second, an exploratory factor analysis was performed using unweighted least squares extraction, followed by confirmatory factor analysis (CFA) in LISREL (Jöreskog & Sörbom, 1997, 2006) on the tetrachoric covariance matrix using an asymptotic distribution-free (ADF) estimator. A limitation of ADF estimators, however, is that substantially large sample sizes are necessary to generate admissible solutions (Boomsma & Hoogland, 2001). Non-admissible solutions can result from parameter estimates failing to converge after multiple iterations or negative variance estimates due to sampling fluctuations. Given our sample size, as well as our prior experience resulting in non-admissible solutions (Pedraza et al., 2009), robust maximum-likelihood estimation was also considered. The asymptotic covariance matrix was generated using PRELIS 2.0. Model fit was evaluated with the comparative fit index (CFI, values.0.90 indicate better fit) and root-mean-square error of approximation (RMSEA, values,0.10 indicate better fit), as well as the Satorra Bentler scaled chi-square statistic for the robust model (Satorra & Bentler, 1988). Lastly, unidimensionality was evaluated further using DIMTEST 2.0, a non-parametric conditional covariance-based test (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001). Item difficulty and discriminability parameters, standard errors, and summary statistics were obtained using marginal maximum-likelihood estimation in MULTILOG (Thissen, 2003). The characteristic curves for each item were plotted for visual inspection, and the overall test information was calculated to measure reliability across the range of naming ability.

4 Table 1. Demographic characteristics and BNT data for 300 patients Mean SD Range Age Education Sex (% men) 53.0 BNT total score Note: BNT ¼ Boston Naming Test. O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 3. Mean percent correct item responses on the BNT. Results Demographic characteristics and mean BNT data are presented in Table 1. Participants ranged in age from 22 to 92 years, and the majority were Caucasian (.95%). BNT scores were significantly correlated with age (r ¼ 2.21, p,.001) and years of education (r ¼.28, p,.001), but not with sex (r ¼ 2.10, p ¼.10). As expected, internal consistency was high (alpha ¼ 0.91). Exploratory factor analysis revealed a 5.3:1 ratio between the first and second eigenvalues. A single-factor CFA using ADF estimators returned non-admissible solutions, but the use of robust maximum-likelihood estimation yielded a well-fitting single-factor model (CFI ¼ 0.97; RMSEA ¼ ; Satorra Bentler scaled x 2 ¼ , p,.001). A two-factor model did not result in improved fit. Moreover, the result from DIMTEST (T-statistic ¼ 0.99, p ¼.16) was consistent with the prior dimensionality assessments. Altogether, these findings suggest that the BNT was sufficiently unidimensional to proceed with IRT modeling. BNT total scores ranged from 22 to 60. As shown in Fig. 3, participants provided 100% correct responses to four items (BNT item numbers denoted in parenthesis): bed (1), tree (2), toothbrush (10), and hanger (15). Protractor (59) had the fewest correct responses (19%). The graph in Fig. 3 also illustrates multiple dips or points at which there is a prominent decline in the percent of correct responses for consecutive items. For example, 92% of participants responded correctly to wreath (28) and 88% responded correctly to harmonica (30), but only 63% responded correctly to beaver (29). Similarly, 81% responded correctly to asparagus (49), yet 41% responded correctly to the following item, compass (50). Table 2 presents the IRT item discrimination and difficulty parameters. As expected, there was no variance associated with the four items with 100% correct responses. The standard error for items with highly skewed response patterns (e.g., scissors, broom ) could not be defined under maximum-likelihood estimation. Protractor (59) had a negative, near-zero discrimination parameter, suggesting that it was a poor item yielding minimal-to-no psychometric information for the IRT model.

5 438 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Table 2. Item discrimination and difficulty parameters for the BNT BNT item Discrimination Difficulty a Std. error b Std. error 1. Bed Tree Pencil House Whistle Scissors Comb Flower Saw Toothbrush Helicopter Broom Octopus Mushroom Hanger Wheelchair Camel Mask Pretzel Bench Racquet Snail Volcano Seahorse Dart Canoe Globe Wreath Beaver Harmonica Rhinoceros Acorn Igloo Stilts Dominoes Cactus Escalator Harp Hammock Knocker Pelican Stethoscope Pyramid Muzzle Unicorn Funnel Accordion Noose Asparagus Compass Latch Tripod Scroll Tongs Sphinx Yoke Trellis Palette Protractor Abacus Note: BNT ¼ Boston Naming Test.

6 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 4. Matrix of ICCs for the BNT (Note: ICCs not available for items 1, 2, 10, and 15). Among the remaining items, comb (7) showed the highest magnitude of discrimination, followed by racquet (21), saw (9), canoe (26), and wheelchair (16). The least discriminating items were flower (8), scissors (6), latch (51), yoke (56), and trellis (57). These findings are more clearly visualized in Fig. 4, where the items with the highest degree of discrimination show the steepest slopes, and those with the lowest discrimination have relatively flat slopes.

7 440 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 4. Continued In terms of difficulty, abacus (60) exhibited the highest parameter and was followed by compass (50), yoke (56), palette (58), and sphinx (55). As noted earlier, although 81% of participants responded incorrectly to protractor (59), its IRT parametric difficulty could not be properly estimated because the likelihood of responding correctly was nearly equal for any individual along the ability spectrum. Besides the four items in which all participants responded correctly, the next five easiest items were flower (8), scissors (6), broom (12), camel (17), and house (4). Several items had difficulty parameters that

8 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 4. Continued suggested a notable discrepancy from their ordered placement on the test. For instance, acorn (32) was the 19th easiest item and harp (38) the 22nd easiest item. In contrast, octopus (13) was the 36th easiest item and seahorse (24) the 48th easiest item. These results highlight the lack of monotonic increase in psychometric difficulty among successive items.

9 442 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 4. Continued Figure 5 displays the global test information curve. The BNT provided the most information (reliability) for individuals in the low-average range of naming ability, or approximately 21.0 standardized units. Measurement error increased considerably when assessing individuals with at least a high-average degree of naming ability. Discussion The present study explores the item-level psychometric properties of the BNT in a clinical outpatient sample and suggests the following: First, each successive BNT item does not necessarily confer a stepwise increase in psychometric difficulty. Easier items generally do group together in the first half of the test and harder items in the second half, but there is marked variability in difficulty levels within smaller clusters of successive items. Second, some BNT items do not discriminate well between individuals at close levels of naming ability, and a few items are simply inadequate to make such distinctions.

10 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Fig. 5. Test information and standard error curves for the BNT. For instance, scissors, flower, and protractor do not discriminate well within any range of naming ability, and this lack of discrimination is independent of their difficulty level. Third, a subset of items exhibits a comparable degree of difficulty or discrimination, which suggests that these items provide redundant psychometric information. For instance, there is a high degree of redundancy between the following item pairs: octopus and asparagus, racquet and canoe, wreath and harp, and igloo and volcano. It can be expected that excluding an item from each of these pairs would result in negligible psychometric loss. This may be helpful for future derivation of shorter naming tasks using BNT items without a loss of discrimination characteristics. And fourth, the BNT yields the highest degree of measurement precision near the low average to mildly impaired range of naming ability (i.e., between 21.0 and 21.5 standardized units). Measurement precision remains acceptable in the moderate range of impairment, but declines markedly in the high-to-above average ability range, likely due to the test s measurement ceiling. In practical terms, this suggests that the BNT is most precise for adults who present to an outpatient clinical practice with an early or mild naming deficit. These findings in a neurologic and medical outpatient sample are consistent with those reported by Pedraza and colleagues (2009) among cognitively normal older adults. This detailed item-level psychometric information may be useful to supplement the test s total score and more clearly delineate the nature of a patient s naming deficit. A briefer short form could be created empirically for a clinical trial by selecting highly discriminating items located at equidistant intervals along the entire range of difficulty and selecting only those without differential item functioning. For instance, a brief 10-item form could include the following items, ordered from easiest to hardest: saw, comb, mushroom, racquet, harmonica, pyramid, seahorse, beaver, sphinx, and abacus. These items demonstrate relatively equidistant difficulty parameters, relatively high discriminability, and no differential item functioning with respect to Caucasian versus African American adults. These data also could be used to construct alternate test forms in which items have equivalent difficulty and discrimination, and which may be helpful in rehabilitation or other settings requiring repeated evaluations. Moreover, examining a person s pattern of BNT item responses as part of a forensic exam could yield a symptom validation index if the person tends to make a disproportionate number of errors to psychometrically easy items but responds correctly to difficult items. A few limitations are worth noting. First, although a key advantage of IRT over classical test theory is that item parameters are invariant across populations, this property holds only when the range of the ability sampled is maximized. Participants in this study obtained BNT total scores ranging from 22 to 60, and only 10 participants had scores between 22 and 29. Thus, the findings may not generalize to patients with acute, language-dominant hemisphere stroke or advanced semantic dementia, who may be expected to make substantially greater number of errors on the BNT. Also, the clinical sample in this study has limited ethnic minority representation, and our past findings from cognitively normal adults suggest that slight differences in item parameters exist between ethnic groups (Pedraza et al., 2009). Although these results demonstrate a lack of incremental or monotonic difficulty among ordered items, the extent to which this factor may contribute to variation in total test scores under standard basal and discontinuation criteria is unknown. It seems reasonable to assume, however, that such an effect may be negligible because normative values (e.g., MOANS/MOAANS age-scaled scores; Heaton T-scores) generally comprise a range of raw scores rather than a single score. Lastly, it bears noting that these results do not negate the utility of error pattern analyses as originally intended by the BNT authors. In summary, these results offer additional information regarding the psychometric properties of the BNT that may be useful in clinical practice, research, and future test development or refinement.

11 444 O. Pedraza et al. / Archives of Clinical Neuropsychology 26 (2011) Funding This work was supported by the National Institutes of Health (NS to O.P.). Conflict of Interest None declared. Acknowledgements We would like to thank Dan Mungas, Ph.D., for helpful comments on an earlier portion of the manuscript. We also extend our gratitude to our wonderful team of psychometrists: Diana Achem, Cameron Griffin, Ashley Marshall, Jill McBride, Wendy Mercer, and Sonya Prescott. References Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R. Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation models: Present and future (pp ). Lincolnwood, IL: Scientific Software International. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahway, NJ: Lawrence Erlbaum Associates. Gardner, P. L. (1995). Measuring attitudes to science: Unidimensionality and internal consistency revisited. Research in Science Education, 25(3), Graves, R. E., Bezeau, S. C., Fogarty, J., & Blair, R. (2004). Boston naming test short forms: A comparison of previous forms with new item response theory based forms. Journal of Clinical and Experimental Neuropsychology, 26(7), Hambleton, R. K., & Swaminathan, H. (1985). Item response theory. Principles and applications. Boston: Kluwer-Nijhoff Publishing. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications. Jöreskog, K. G., & Sörbom, D. (1997). LISREL 8: User s reference guide (2nd ed.). Chicago, IL: Scientific Software International. Jöreskog, K. G., & Sörbom, D. (2006). LISREL Chicago, IL: Scientific Software International. Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston Naming Test. Philadelphia: Lea & Febiger. Nandakumar, R., & Stout, W. (1993). Refinements of Stout s procedure for assessing latent trait unidimensionality. Journal of Educational Statistics, 18, Pedraza, O., Graff-Radford, N. R., Smith, G. E., Ivnik, R. J., Willis, F. B., Petersen, R. C., et al.. (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(5), Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment practices of clinical neuropsychologists in the United States and Canada: A survey of INS, NAN, and APA Division 40 members. Archives of Clinical Neuropsychology, 20(1), Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. In American Statistical Association 1988 proceedings of the business and economics section (pp ). Alexandria, VA: American Statistical Association. Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8(4), Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52 (4), Stout, W., Froelich, A., & Gao, F. (2001). Using resampling methods to produce an improved DIMTEST procedure. In A. Boomsma, M. A. J. van Duijn, & T. A. B. Snijders (Eds.), Essays on item response theory (pp ). New York: Springer-Verlag. Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary (3rd ed.). New York: Oxford University Press. Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl. 3), S152 S170. Thissen, D. (2003). MULTILOG 7.0: Multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software International. Tombaugh, T.N., & Hubley, A.M. (1997). The 60-item Boston Naming Test: Norms for cognitively intact adults aged 25 to 88 years. Journal of Clinical and Experimental Neuropsychology, 19 (6),

EXPLORING DIFFERENTIAL ITEM FUNCTIONING AMONG HAWAI I RESIDENTS ON THE BOSTON NAMING TEST

EXPLORING DIFFERENTIAL ITEM FUNCTIONING AMONG HAWAI I RESIDENTS ON THE BOSTON NAMING TEST EXPLORING DIFFERENTIAL ITEM FUNCTIONING AMONG HAWAI I RESIDENTS ON THE BOSTON NAMING TEST A THESIS SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI I AT MĀNOA IN PARTIAL FULFILLMENT OF THE

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

Assessing the Validity and Reliability of a Measurement Model in Structural Equation Modeling (SEM)

Assessing the Validity and Reliability of a Measurement Model in Structural Equation Modeling (SEM) British Journal of Mathematics & Computer Science 15(3): 1-8, 2016, Article no.bjmcs.25183 ISSN: 2231-0851 SCIENCEDOMAIN international www.sciencedomain.org Assessing the Validity and Reliability of a

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey Matthew J. Bundick, Ph.D. Director of Research February 2011 The Development of

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS

PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS Prepared for: Robert K. Bell, Ph.D. National Science Foundation Division of Science Resources Studies 4201 Wilson Blvd. Arlington,

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

The Patient-Reported Outcomes Measurement Information

The Patient-Reported Outcomes Measurement Information ORIGINAL ARTICLE Practical Issues in the Application of Item Response Theory A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales Cheryl D. Hill, PhD,*

More information

Techniques for Explaining Item Response Theory to Stakeholder

Techniques for Explaining Item Response Theory to Stakeholder Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

validscale: A Stata module to validate subjective measurement scales using Classical Test Theory

validscale: A Stata module to validate subjective measurement scales using Classical Test Theory : A Stata module to validate subjective measurement scales using Classical Test Theory Bastien Perrot, Emmanuelle Bataille, Jean-Benoit Hardouin UMR INSERM U1246 - SPHERE "methods in Patient-centered outcomes

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Chapter 9. Youth Counseling Impact Scale (YCIS)

Chapter 9. Youth Counseling Impact Scale (YCIS) Chapter 9 Youth Counseling Impact Scale (YCIS) Background Purpose The Youth Counseling Impact Scale (YCIS) is a measure of perceived effectiveness of a specific counseling session. In general, measures

More information

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Dr. KAMALPREET RAKHRA MD MPH PhD(Candidate) No conflict of interest Child Behavioural Check

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics Proceedings of Ahmad Dahlan International Conference on Mathematics and Mathematics Education Universitas Ahmad Dahlan, Yogyakarta, 13-14 October 2017 Exploratory Factor Analysis Student Anxiety Questionnaire

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Running head: CFA OF TDI AND STICSA 1 p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Caspi et al. (2014) reported that CFA results supported a general psychopathology factor,

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Survey Sampling Weights and Item Response Parameter Estimation

Survey Sampling Weights and Item Response Parameter Estimation Survey Sampling Weights and Item Response Parameter Estimation Spring 2014 Survey Methodology Simmons School of Education and Human Development Center on Research & Evaluation Paul Yovanoff, Ph.D. Department

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven Introduction The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven outcome measure that was developed to address the growing need for an ecologically valid functional communication

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample

Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample JOURNAL OF SPORT & EXERCISE PSYCHOLOGY, 19%. 18,49-63 O 1996 Human Kinetics Publishers, Inc. Confirmatory Factor Analysis of the Group Environment Questionnaire With an Intercollegiate Sample Fuzhong Li

More information

INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE

INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE The University of British Columbia Edgeworth Laboratory for Quantitative Educational & Behavioural Science INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE Anita M. Hubley,

More information

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

Bipolar items for the measurement of personal optimism instead of unipolar items

Bipolar items for the measurement of personal optimism instead of unipolar items Psychological Test and Assessment Modeling, Volume 53, 2011 (4), 399-413 Bipolar items for the measurement of personal optimism instead of unipolar items Karl Schweizer 1, Wolfgang Rauch 2 & Andreas Gold

More information

Naming Test of the Neuropsychological Assessment Battery: Convergent and Discriminant Validity

Naming Test of the Neuropsychological Assessment Battery: Convergent and Discriminant Validity Archives of Clinical Neuropsychology 24 (2009) 575 583 Naming Test of the Neuropsychological Assessment Battery: Convergent and Discriminant Validity Brian P. Yochim*, Katherine D. Kane, Anne E. Mueller

More information

International Conference on Humanities and Social Science (HSS 2016)

International Conference on Humanities and Social Science (HSS 2016) International Conference on Humanities and Social Science (HSS 2016) The Chinese Version of WOrk-reLated Flow Inventory (WOLF): An Examination of Reliability and Validity Yi-yu CHEN1, a, Xiao-tong YU2,

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1.

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1. Running Head: A MODIFIED CATSIB PROCEDURE FOR DETECTING DIF ITEMS 1 A Modified CATSIB Procedure for Detecting Differential Item Function on Computer-Based Tests Johnson Ching-hong Li 1 Mark J. Gierl 1

More information

Rapidly-administered short forms of the Wechsler Adult Intelligence Scale 3rd edition

Rapidly-administered short forms of the Wechsler Adult Intelligence Scale 3rd edition Archives of Clinical Neuropsychology 22 (2007) 917 924 Abstract Rapidly-administered short forms of the Wechsler Adult Intelligence Scale 3rd edition Alison J. Donnell a, Neil Pliskin a, James Holdnack

More information

The MHSIP: A Tale of Three Centers

The MHSIP: A Tale of Three Centers The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.

More information

Running head: CPPS REVIEW 1

Running head: CPPS REVIEW 1 Running head: CPPS REVIEW 1 Please use the following citation when referencing this work: McGill, R. J. (2013). Test review: Children s Psychological Processing Scale (CPPS). Journal of Psychoeducational

More information

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Background of problem in assessment for elderly Key feature of CCAS Structural Framework of CCAS Methodology Result

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Elderly Norms for the Hopkins Verbal Learning Test-Revised*

Elderly Norms for the Hopkins Verbal Learning Test-Revised* The Clinical Neuropsychologist -//-$., Vol., No., pp. - Swets & Zeitlinger Elderly Norms for the Hopkins Verbal Learning Test-Revised* Rodney D. Vanderploeg, John A. Schinka, Tatyana Jones, Brent J. Small,

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance

Confirmatory Factor Analysis and Item Response Theory: Two Approaches for Exploring Measurement Invariance Psychological Bulletin 1993, Vol. 114, No. 3, 552-566 Copyright 1993 by the American Psychological Association, Inc 0033-2909/93/S3.00 Confirmatory Factor Analysis and Item Response Theory: Two Approaches

More information

ORIGINAL CONTRIBUTION. Detecting Dementia With the Mini-Mental State Examination in Highly Educated Individuals

ORIGINAL CONTRIBUTION. Detecting Dementia With the Mini-Mental State Examination in Highly Educated Individuals ORIGINAL CONTRIBUTION Detecting Dementia With the Mini-Mental State Examination in Highly Educated Individuals Sid E. O Bryant, PhD; Joy D. Humphreys, MA; Glenn E. Smith, PhD; Robert J. Ivnik, PhD; Neill

More information

Parallel Forms for Diagnostic Purpose

Parallel Forms for Diagnostic Purpose Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing

More information

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description

Test review. Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., Test description Archives of Clinical Neuropsychology 19 (2004) 703 708 Test review Comprehensive Trail Making Test (CTMT) By Cecil R. Reynolds. Austin, Texas: PRO-ED, Inc., 2002 1. Test description The Trail Making Test

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Examining the Validity and Fairness of a State Standards-Based Assessment of English-Language Arts for Deaf or Hard of Hearing Students

Examining the Validity and Fairness of a State Standards-Based Assessment of English-Language Arts for Deaf or Hard of Hearing Students Examining the Validity and Fairness of a State Standards-Based Assessment of English-Language Arts for Deaf or Hard of Hearing Students Jonathan Steinberg Frederick Cline Guangming Ling Linda Cook Namrata

More information

The Ego Identity Process Questionnaire: Factor Structure, Reliability, and Convergent Validity in Dutch-Speaking Late. Adolescents

The Ego Identity Process Questionnaire: Factor Structure, Reliability, and Convergent Validity in Dutch-Speaking Late. Adolescents 33 2 The Ego Identity Process Questionnaire: Factor Structure, Reliability, and Convergent Validity in Dutch-Speaking Late Adolescents Koen Luyckx, Luc Goossens, Wim Beyers, & Bart Soenens (2006). Journal

More information

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming Curr Psychol (2009) 28:194 201 DOI 10.1007/s12144-009-9058-x The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming C. K. John Wang & W. C. Liu & A. Khoo Published online: 27 May

More information

Validation of the Patient Perception of Migraine Questionnaire

Validation of the Patient Perception of Migraine Questionnaire Volume 5 Number 5 2002 VALUE IN HEALTH Validation of the Patient Perception of Migraine Questionnaire Kimberly Hunt Davis, MS, 1 Libby Black, PharmD, 1 Betsy Sleath, PhD 2 1 GlaxoSmithKline, Research Triangle

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

A Broad-Range Tailored Test of Verbal Ability

A Broad-Range Tailored Test of Verbal Ability A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from

More information

Graphical Representation of Multidimensional

Graphical Representation of Multidimensional Graphical Representation of Multidimensional Item Response Theory Analyses Terry Ackerman University of Illinois, Champaign-Urbana This paper illustrates how graphical analyses can enhance the interpretation

More information

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179 185. (3362 citations in Google Scholar as of 4/1/2016) Who would have thought that a paper

More information

An item response theory analysis of Wong and Law emotional intelligence scale

An item response theory analysis of Wong and Law emotional intelligence scale Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 2 (2010) 4038 4047 WCES-2010 An item response theory analysis of Wong and Law emotional intelligence scale Jahanvash Karim

More information

School Administrators Level of Self-Esteem and its Relationship To Their Trust in Teachers. Mualla Aksu, Soner Polat, & Türkan Aksu

School Administrators Level of Self-Esteem and its Relationship To Their Trust in Teachers. Mualla Aksu, Soner Polat, & Türkan Aksu School Administrators Level of Self-Esteem and its Relationship To Their Trust in Teachers Mualla Aksu, Soner Polat, & Türkan Aksu What is Self-Esteem? Confidence in one s own worth or abilities (http://www.oxforddictionaries.com/definition/english/self-esteem)

More information

Reliability and Validity of the Divided

Reliability and Validity of the Divided Aging, Neuropsychology, and Cognition, 12:89 98 Copyright 2005 Taylor & Francis, Inc. ISSN: 1382-5585/05 DOI: 10.1080/13825580590925143 Reliability and Validity of the Divided Aging, 121Taylor NANC 52900

More information

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA Running head: CFA OF STICSA 1 Model-Based Factor Reliability and Replicability of the STICSA The State-Trait Inventory of Cognitive and Somatic Anxiety (STICSA; Ree et al., 2008) is a new measure of anxiety

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

An Assessment of the Mathematics Information Processing Scale: A Potential Instrument for Extending Technology Education Research

An Assessment of the Mathematics Information Processing Scale: A Potential Instrument for Extending Technology Education Research Association for Information Systems AIS Electronic Library (AISeL) SAIS 2009 Proceedings Southern (SAIS) 3-1-2009 An Assessment of the Mathematics Information Processing Scale: A Potential Instrument for

More information

Multidimensionality and Item Bias

Multidimensionality and Item Bias Multidimensionality and Item Bias in Item Response Theory T. C. Oshima, Georgia State University M. David Miller, University of Florida This paper demonstrates empirically how item bias indexes based on

More information

Anumber of studies have shown that ignorance regarding fundamental measurement

Anumber of studies have shown that ignorance regarding fundamental measurement 10.1177/0013164406288165 Educational Graham / Congeneric and Psychological Reliability Measurement Congeneric and (Essentially) Tau-Equivalent Estimates of Score Reliability What They Are and How to Use

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information