Substantive and Cognitive Interpretations of Gender DIF on a Fractions Concept Test. Robert H. Fay and Yi-Hsin Chen. University of South Florida

Size: px
Start display at page:

Download "Substantive and Cognitive Interpretations of Gender DIF on a Fractions Concept Test. Robert H. Fay and Yi-Hsin Chen. University of South Florida"

Transcription

1 1 Running Head: GENDER DIF Substantive and Cognitive Interpretations of Gender DIF on a Fractions Concept Test Robert H. Fay and Yi-Hsin Chen University of South Florida Yuh-Chyn Leu National Taipei University of Education Presented at the annual meeting of National Council on Measurement in Education, San Diego, California, April 13-17, 2009 Please address correspondence concerning this manuscript to: Robert Fay Department of Educational Measurement and Research University of South Florida 4202 E. Fowler Ave. EDU 162 Tampa, FL rfay@mail.usf.edu

2 2 Substantive and Cognitive Interpretations of Gender DIF on a Fractions Concept Test Background Much previous research has been done in analyzing gender differential item functioning (DIF). Most of it has involved dissimilitude in performance on quantitative type items. DIF studies have been conducted since Eells, Davis, Havighurst, Herrick, and Tyler (1951) tried to differentiate between test items that favored children who came from a high SES background from those items that favored children from low SES households. Since then, dozens of studies have been published on the subject. One of the first gender DIF studies was done by Hambleton and Traub (1974), who explored the multi-dimensional effects between gender and item order on math tests, but the results were inconclusive. Some subsequent gender DIF studies were more profound in their conclusions, with certain dissimilarities of results realized between them. In general, it was determined that boys and girls do indeed perform differently on different types of questions. Plake and associates (1982) found that boys did better than girls on timed tests where the difficult items were back loaded. Word problems were found to be differentially easier for boys, as were any problems requiring higher level thinking skills, including geometry and arithmetic, while girls and boys did equally well on test items involving spatial components (Ryan & Chiu, 2001). Alternatively, Ryan and Fan (1996) found that algebra and computation items were differentially easier for females with geometry items differentially easier for males. Their results also indicated that the arithmetic items were differentially easier for females. Once again, the applied items were found to be differentially more difficult for females. Berberoglu (1995), who studied items of a mathematics test across both gender and SES groups, determined that males had the advantage on computation items, but that females actually performed better on

3 3 word problems and geometry items, indicating that females had better verbal and spatial ability and males had better overall computational skills. Inabi and Dodeen (2006) found 37 of 124 items on an 8 th grade math test displayed gender DIF. In this study, all the measurement-type items with DIF favored boys while most of the DIF items in the algebraic and data analysis areas favored girls. Most of the latter were items that were unfamiliar to girls and had open-ended answers with risk, involving some estimations, expectations, or approximations, but often no finite answers. In contrast, most of the DIF items that favored girls were familiar items with one specific correct answer. Bielinski and Davison (2001) reported a sex difference by item difficulty interaction where easy items tended to be easier for girls than boys, and hard items tended to be harder for girls and easier for boys. Wang and Magone (1996) used a procedure based on logistic discriminant function analysis on a math test with open-ended tasks taken by middle school students, and concluded that with regard to uniform gender DIF four tasks favored female students and two tasks favored male students. The two tasks that favored boys had figures not drawn to scale, while the four tasks that favored girls did not. One task involving geometry skills displayed severe DIF and it favored males. Hamilton (1999) showed that gender differences were largest on items that involved visualization and called on knowledge acquired outside of school to explain gender DIF. Walstad and Robson (1997) analyzed items from the Test of Economic Literacy (TEL), using item response theory (IRT) to identify test questions with large gender DIF. Although there was a statistically significant difference between girls and boys scores before DIF items were removed, there were indications that there were other sources besides DIF to explicate gender differences. Some of these things were suggested to be differential reasoning, differences in socialization skills, or different instructional methods or testing formats. Roussos and Stout

4 4 (1996) developed a multi-dimensionality paradigm that combined substantive (without statistical confirmation) DIF processes like cognitive differences, and other descriptive content, with statistical DIF analysis. Even though this concept had been elucidated in previous papers, including that of Cronbach (1990), Jensen (1980), Messick (1989) and Wiley (1990), Roussos and Stout expanded on the paradigm by being the first to try to increase statistical power by using bundles of items measuring similar dimensions. Multidimensionality-Based DIF Analysis Paradigm The simultaneous item bias test (SIBTEST) was created using Roussos and Stout s multidimensional paradigm to detect differential item and bundle functioning (DIF and DBF). Ryan and Chiu (2001) used SIBTEST to study word problem items to test for DBF against the total score of non-word problem items in order to determine if girls and boys were differentially affected by a change in item position within the test. Walker and Beretvas (2001) tested the hypothesis that open-ended math test items were multi-dimensional, measuring math communication skills as well as general math ability between proficient and non-proficient fourth- and seventh-grade writers. Finally, SIBTEST was used on items from a curriculum-based math achievement test to quantify the effect size of DIF and to test general DIF hypotheses like whether the data actually possess two distinct dimensions, as defined by Roussos and Stout (Gierl, Bisanz, Bisanz & Boughton, 2003). Items said to measure two dimensions were matched against items said to measure only one dimension. In this study, we intended to make similar explanations for gender DIF items and to offer these as possible remediation tools for future test takers. Purpose of the Research This study investigated whether gender differential item functioning (DIF) was present in

5 5 any of the twenty-three items and whether gender differential bundle functioning (DBF) was present in parcels of items on a fractions concept test for Taiwanese elementary school students. That is, both DIF and DBF were measured. The objective was to determine the reasons behind the gender DIF, when present, and to proffer a solution for eliminating the source of these reasons. Did problems with the contextual properties of a specific item or items lead to one gender or the other being more familiar with the particular framework of the items, thereby leading to differential problem solving strategies for the two groups, and, to concomitantly different performance levels, as well? Both exploratory single item and confirmatory bundle DIF analyses were carried out. Methods Participants The data was collected from 2612 fifth and sixth grade students in Taiwan with 1283 fifth graders (49.12 %) and 1329 sixth graders (50.88 %). There were 1330 girls (50.92%) and 1282 boys (49.08%). Efforts were taken to obtain a representative sample of the population by first separating Taiwan into six school regions. Then schools were grouped by size within each region, with schools having less than 13 class rooms considered small, those with 13 to 35 class rooms called medium-sized and schools with more than 35 class rooms being classified as big. Stratified random sampling was then done within the regions to obtain a representative distribution for class size. Instrument A twenty-three item multiple-choice test on fractions concepts, developed by Chan and Leu (2004), was administered to the sample of 2612 students. It was designed to measure three major concepts of fractions known as the equal sharing concept, the units concept, and the

6 6 equivalent fraction concept. A fourth concept, the basic fraction concept, was also considered crucial for students to understand in order to solve fraction test items correctly, and so these types of items were also incorporated into the test. Some test items require knowledge in more than one of these conceptual areas if they are to be solved. Table 1 presents the four major fractions concepts, along with the relevant test items for each. The Cronbach s alpha for the entire test was Task Analysis In addition to the four content categories, the fraction items were also classified into knowledge, skills, and abilities (KSAs; Gorin 2006; Chen, Gorin, Thompson, & Tatsuoka, 2008) that are related with solving items. Seven cognitive component item categories were constructed by MacDonald, Chen, Li, and Leu (2009) based on mathematics cognitive skills from Corter and Tatsuoka (2002) and a variation of the cognitive component factor structured developed by Chan, Leu, and Chen (2007). The seven cognitive component item categories were validated by linear logistic test theory (LLTM) for the sample used in this study (MacDonald, Chen, Li, & Leu, 2009). The seven bundles tested had items clustered together according to the following types of cognitive components: 1) Using illustrations; 2) Providing an written interpretation; 3)Judgmental application; 4) Computation; 5) Checking options; 6) Spatial unfolding; and 7) Solving routine problems. Statistical Analysis A series of statistical DIF and DBF analyses was conducted. First, individual gender DIF analyses were performed using the logistic regression (LR) approach. A variation of the logistic regression code developed by Zumbo (1999) was used to determine if there was either uniform or non-uniform DIF for any of the twenty-three test items in regard to the binary gender variable.

7 7 The Zumbo s LR DIF approach starts a three-step process that determines the Nagelkerke R- squared measure of DIF effect size, for both uniform and non-uniform DIF. The total scale score for each student and the binary gender variable were used as independent variables in the logistic regression equation for determining uniform DIF, while an interaction term between the two was added for the non-uniform DIF determination. Chi-square changes with two degrees of freedom for the uniform DIF and one degree of freedom for the non-uniform DIF were calculated simultaneously with the R-squared measures for effect size. If either of the p-values for an item were 0.05 or less, the item was flagged for DIF. Ultimately, researcher judgmental analysis was used for each of the DIF items in order to determine why the flagged items had DIF. If boys and girls were learning differently or being exposed variably to certain key learning stimuli, it would be crucial to be able to explain the reason(s) for the gender DIF between them. Next, SIBTEST, a non-parametric statistical method, was used to assess differential item functioning (DIF) for individual items (this would be compared to the results of the logistic regression done above) and differential bundle functioning (DBF) for a bundle of items. The statistic SIB and the bias estimator β were calculated (for a discussion of the theory of SIB and β, see Shealy & Stout, 1993). The amount of DIF in the studied subtest in SIBTEST is seen in the parameter estimate, β UNI, as explained by Shealy and Stout: β UNI = β(ө)ƒ F (Ө)dӨ, where β(ө) = P(Ө,R) P(Ө, F), the difference in the probabilities of correct response for the test-takers from the reference and focal groups, respectively, conditional on Ө; ƒ F (Ө) is the density function for Ө in the focal group; and, d is the width of the scaling interval. β UNI is

8 8 integrated over Ө to produce a weighted expected score difference between the reference and focal group examinees of the same ability on an item or bundle of items. SIBTEST was used to assess this parameter estimate with the test statistic, SIB = β UNI /σ( β UNI ), where σ( β UNI ) is the estimated standard error of β UNI. Shealy and Stout (1993) showed that SIB was normal with a mean of 0 and variance 1 under the null hypothesis of no DIF. To examine DBF, items on the fraction test were first separated into the studied subtest and the matching subtest based on the four item content groupings (see Gierl et al., 2003 for more detailed procedures). Before the DBF analysis (confirmatory DIF analysis), an individual item DIF analysis (exploratory DIF analysis) was conducted with SIBTEST, to be compared to the results from the LR approach, in an effort to improve the task analysis on all of the test items. A Crossing-SIBTEST was done for each of the twenty three items in order to perform a comparison of the results of the non-uniform DIF analysis with the concomitant results obtained with the LR approach, as well. Finally, the cognitive-skills bundles were examined for DIF using SIBTEST. This was done by dividing the items from each theorized cognitive component bundle into subject subtest items that were believed to have multidimensionality (based on their theorized constructs) and, therefore, should show DIF, and into those matching subtest items that were thought to contain only one dimension. This matching subtest places the boys and girls into subgroups at each score level so their performances on the studied subtest items can be compared (Gierl, et. al., 2003). Results Uniform and non-uniform item DIF Eleven of the twenty three items on the fractions concept test displayed uniform gender

9 9 DIF, from the LR approach, while nine of them showed non-uniform gender DIF. The concomitant R-squared values for the DIF items effect sizes had larger incremental increases in size than did the same values in test items without DIF, a clear indication of which items behaved differentially. Table 2 shows the results of this analysis. It was expected that the results of the SIBTEST for DIF would produce similar results to those found using the LR approach above. Fortunately, these expectations were realized, increasing the validity of both methods in the precision of the DIF analysis of the fraction items (See Table 3). Only with one of the most borderline items was there a discrepancy between the results of the two methods. Ten of the same eleven items found to have DIF in the Nagelkerke logistic regression approach also showed DIF when the SIBTEST was used. One borderline DIF item, Item 21, showed uniform DIF using LR (its change in Chi-square of 4.36 was the smallest change of the eleven items showing DIF), but was not quite significant for it using SIBTEST (.071). Six of the ten DIF items favored males (Items 1, 2, 7, 14, 18, and 20) while the other four (Items 9, 11, 16, and 19, along with Item 21) displayed DIF favoring girls. An explanation of some possible reasons for these gender differences follows. First, the male DIF items: Items 1 and 2 use spatial representation, which is a trait favoring boys (Gallagher, et al., 2000). Both of these items also use sweets for context, a subject slightly more motivating for boys of this age than girls (Allesen-Holm, Bredie, & Frost, 2008); Item 7 also talks about sweets and involves another skill favoring boys, again taken from Gallagher; i.e., the transformation of information from one spatial format to another.

10 10 Items 14 and 18 require multiple solution paths, which Gallagher et al. say is a problem solving skill that favors boys. Item 18 also involves sweets. Item 20 requires the conversion of a word problem to a spatial representation, yet another skill that Gallagher and colleagues say favors boys. The girl dominant DIF items can be explained similarly: Items 9 and 11 posed situations where solving a fraction required one to pick the solution from a series of pictures. Hamilton (1999) did say that problems involving visualization favored girls and that, furthermore, gender DIF can really only be explained by looking at things outside the classroom environment. This may be a skill that is more commonly found in girls of this age than boys, but there is no affirmation of that in the literature. In this case, the content of the DIF items did not explicitly manifest distinct themes that would help identify the source of the DIF. Item 16 is easier for girls because they read word problems more carefully (Berberoglu, 1995) and thus can more easily convert the words into algebraic solutions (Gallagher, et al., 2000). There are both marbles and buttons in the scenario but only the number of marbles is needed to solve the problem. Careful readers would notice that. Item 19 contains a reference to fruit, a type of food that favors girls, who don t have the wilder tastes of boys (Allesen-Holm, et al., 2008). Also, the wording of the question could be misinterpreted more easily than most. The non-uniform DIF results were also similar for the two methods. This time Item 21 did not differ, with no DIF being indicated (just barely) with Crossing-SIBTEST, as the results

11 11 there revealed a p-value of 0.76, or with the LR method, as the change in Chi-square was 4.58, not quite high enough to be showing DIF. The other borderline DIF item, Item 16, did differ between the two analytical methods. The LR approach showed its chi-square change to be just above that for Item 21, but still just below the DIF range (the lowest Chi-square change for any of the non-uniform DIF items was over 6 (Table 2). Meanwhile, the Crossing-SIBTEST resulted in a p-value right at 0.5 for this item, affirming its borderline status. The other nine non-uniform DIF items were commonly unveiled by both methods. Content and Cognitive Differential Bundle Functioning The analyses of DBF for the four content and seven cognitive bundles were performed exclusively with SIBTEST, using the option for grouping of subject items. Two of the four content bundles were revealed to have DBF (See Table 4). The Equal Sharing conceptual grouping, consisting of items that ask the student to figure out what proportion of some product remains after some fraction is shared with others, uses illustrations to show the separated fractions. This group consists of Items 1, 2, 3, and 10 (See Table 1). However, this grouping showed DBF favoring boys, with a positive beta-estimate of (p <.01). A probable explanation of this, according to the modified Gallagher et al. taxonomy (2000), is that at least one of these items, Item 10, requires multiple solution paths to solve it, and the ability to do this is more often found in boys of this age than it is in girls. The Equivalent Fraction conceptual grouping is much larger, containing Items 3, 5, 6, 8, 9, 11, 12, 17, 18, 19, 20, and 21, but also has DBF. Girls are favored on the items in this bundle, with a negative beta-estimate of (p <.01). The other two content bundles, Basic Fraction Concepts and Units Concepts, were

12 12 bereft of DBF, having Beta-estimates of and 0.003, respectively. With two of the four content bundles having DBF among their item groups, there seems to be no valid basis of these two major concept areas. Although the four content area groups are known major concept areas of fractions, some of the items from this fractions test may measure the secondary or third concepts or abilities, with items favoring males dominating the Equal Sharing Concept group and items favoring females dominating the Equivalent Fraction Concept group. Since Items 1 and 2 are positive for male DIF, the Equal Sharing group was bound to show this, as well. The Equivalent Fraction group contains four items that favor females, and just two with male dominant DIF, thus it is no surprise that this group has bundle DIF favoring females. Showing DBF in the test can illuminate the fact that there are a lot of individual items with DIF. It can show how items should not be conceptually bundled or which content areas are not valid constructs if unidimensional component clustering is the goal. In this case, it was determined that fraction items involving sharing pieces of food favor boys and so this type of item should use some other tangibly shared object, less gender neutral (food shows itself to be differentially pleasing boys over girls of this age), if the Equal Sharing concept items are to be formed as a unidimensional bundle. Likewise, the Equivalent Fraction group could be constructed better if it is to be an illustrative collection of items representing that concept. A much different set of bundles were those defining the seven cognitive components that are required to solve correctly (Table 5). The SIBTEST analyses showed all seven bundles to be free of DBF, indicating that the items in each were defining their respective groups precisely and without significant gender differentiation. No multidimensionality was seen, although it had been posited. The cognitive bundles used here were not informative in this way. However, these specific cognitive bundles, by not showing DBF, should influence the design of future cognitive

13 13 component bundles in future studies that will further elucidate the dimensions of cognition in the human mind, making for the creation of psychometric tests that can better incorporate different cognitive skills and processes into the test items. Educational Significance The findings from this study should help to confirm the reasons for gender DIF in girls and boys by elucidating which conceptual areas within a fractions test show differences by gender. A further significant finding of this study is that psychometric tests and cognitive psychology content areas can be bridged to form precise cognitive component bundles from test items. The implications of this are potentially enormous, as testing can be made much more specific to the cognitive skills to be measured, and more cognitive skills and processes can be understood and included on psychometric tests. Other studies have proffered reasons for gender DIF and DBF on other types of math tests, but the conceptual differences when solving fraction test items has not before been elaborated upon. Test makers and teachers should be able to avail themselves of this knowledge and develop more focused tests and teaching techniques that address these issues in this subject area.

14 14 References Berberoglu, G. (1995). Differential item functioning (DIF) analysis of computation, word problem and geometry questions across gender and SES groups. Studies in Educational Evaluation, 21(4), Bielinski J., & Davison, M.L. (Spring 2001). A sex difference by item difficulty interaction in multiple-choice mathematics items administered to national probability samples. Journal of Educational Measurement, 38(1), Camilli, G, & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Chan, W. H., & Leu, Y. C. (2004). The design of the rating scale of fraction for 5 th and 6 th graders. Chinese Journal of Science Education, 12, Chan, W., Leu, Y, & Chen, C. (2007). Exploring group-wise conceptual deficiencies of fractions for fifth and sixth graders in Taiwan. Journal of Experimental Education, 76, Chen, Y.-H., Gorin, J. S., Thompson, M. S., & Tatsuoka, K. K. (2008). Cross-cultural Validity of the TIMSS-1999 Mathematics Test: Verification of a Cognitive Model. International Journal of Testing. Cronbach, L.J. (1990). Essentials of Psychological Testing (5 th ed.). New York: Harper and Row. Eels, K.W., Davis, A., Havighurst, R.J., Herrick, V.E. & Tyler, R.W. (1951). Intelligence and cultural differences. Chicago: University of Chicago Press. Gierl, M.J., Bisanz, J., Bisanz, G.L., & Boughton, K.A. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis. Journal of Educational Measurement, 40(4), 281-

15 Gorin, J. S. (2006). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25, Hambleton, R.K., & Traub, R.E. (1974). The effects of item order on test performance and stress. Journal of Experimental Education, 43, Hamilton, L.S. (1999). Detecting gender-based differential item functioning on a constructedresponse science test. Applied Measurement in Education, 12(3), Inabi, H., & Dodeen, H. (2006, Dec). Content analysis of gender-related differential item functioning TIMSS items in mathematics in Jordan, School Science & Mathematics, 106(8), Jensen, A.R. (1980). Bias in mental testing. New York: MacMillan. Lane, S., Wang, N., & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement, 15, Macdonald, G., Chen, Y.-H., & Leu, Y.-C. (2009). Exploring Cognitive Sources of Item Difficulty of Mathematic Fraction Items. Paper was presented at the annual meeting of the National Council on Measurement in Education, San Diego, California. Messick, S. (1989). Validity. In R.L.Linn, (Ed.), Educational measurement (3 rd ed., pp ). New York: MacMillan. Plake, B.S., Ansorge, C.J., Parker, C.S., & Lowry, S.R. (1982). Effects of item arrangement, knowledge of arrangement, test anxiety, and sex on test performance. Journal of Educational Measurement, 19, Roussos, L., & Stout, W. (1996, Dec). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4),

16 16 Ryan, K.E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), Ryan, K.E., & Fan, M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational Measurement: Issues and Practice, 15(4), Shealy, R., & Stout, W. (1993). An item response theory model for test bias and differential test functioning. In P. Holland and H. Wainer (Eds.), Differential item functioning (pp ). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Tatsuoka, K.K., Linn, R.L., Tatsuoka, M.M., & Yamamoto, K. (Winter 1988). Differential item functioning resulting from the use of different solution strategies. Journal of Educational Measurement, 25(4), Walker, C.M., & Beretvas, S.N. (Summer 2001). An empirical investigation demonstrating the multidimensional DIF paradigm: A cognitive explanation for DIF. Journal of Educational Measurement, 38(2), Walstad, W.B., & Robson, D. (Spring 1997). Differential item functioning and male-female differences on multiple-choice tests in economics. The Journal of Economic Education, 28, Wiley, D.E. (1990). Test validity and invalidity reconsidered. In R. Snow & D.E. Wiley (Eds.), Improving inquiry in social science (pp ). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic Regression Modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, On: Directorate of Human Resources. University of Copenhagen (2008, December 18). Girls Have Superior Sense Of Taste To Boys.

17 17 ScienceDaily. Retrieved April 6, 2009, from /releases/2008/12/ htm

18 18 Table 1 Four Major Contents and Belonging Items in the Fraction Test Major Concept of Fraction Item 1. Basic Fraction Concept 4, 5, 7, 13, Equal Sharing Concept 1, 2, 3, Units Concept 6, 7, 9, 13, 14, 15, 16, 22, Equivalent Fraction Concept 3, 5, 6, 8, 9, 11, 12, 17, 18, 19, 20, 21

19 19 Table 2 DIF Results from the Logistic Regression Approach Chisquare (Step 1) Effect size (Step 1) Chisquare (Step 2) Effect size (Step 2) Change in chisquare Change of effect size Uniform DIF (Yes or No) Item Yes Item Yes Item No Item No Item No Item No Item Yes Item No Item Yes Item No Item Yes Item No Item No Item Yes Item No Item Yes Item No Item Yes Item Yes Item Yes Item Yes Item No Item No

20 20 Table 3 DIF Results from SIBTEST Analyses βuni Standard Error p-value DIF Item Yes Item Yes Item No Item 4 No usable score cells Item 5 No usable score cells Item 6 No usable score cells Item Yes Item No Item Yes Item No Item Yes Item No Item No Item Yes Item No Item Yes Item No Item Yes Item Yes Item Yes Item No Item No Item No

21 21 Table 4 DBF Results for Four Major Contents Major Concept of Fraction Item β p- value 1. Basic Fraction Concept 4, 5, 7, 13, Equal Sharing Concept 1, 2, 3, Units Concept 6, 7, 9, 13, 14, 15, 16, 22, Equivalent Fraction Concept 3, 5, 6, 8, 9, 11, 12, 17, 18, 19, 20,

22 22 Table 5 DBF Results from SIBTEST Analyses for Cognitive Components Cognitive Component Item p-value DBF 1. Using Illustrations 1-6, 8-11, 13-15, 19, No 2. Written Explanations 8, 13, 15, 18, No 3. Judgment Application 12-14, 16, No 4. Computation 4-6, 12, 13, 15, No 5. Checking Options 1, 7, 12, No 6. Spatial Folding 13, 16, No 7. Solving Routine Problems 4, 5, No

23 23 Substantive Interpretation of Cognitive Components A1 - Using Illustrations found in items: Q1, Q2, Q3, Q5, Q6, Q8, Q9, Q10, Q11, Q13, Q14, Q15, Q19 & Q21. Fourteen items had illustrations in the stem or as part of the distracters. Regardless of item difficulty the presence of an illustration changed the fraction item in a significant way. Theoretical support for the concept of illustration was subsequently found in the Manual of Attribute-Coding for General Mathematics in TIMSS Studies (Tatsuoka, Corter, & Gererro, 1995) items P7 and S3. Where P7 is defined as Be able to generate, visualize figures and graphs. Where S3 is defined as Be able to work with figures, tables, charts and graphs. In general the item became more difficult to solve given the presence of an illustration in a fraction item. This leads from inductive reasoning to the speculative conclusion that the human mind must marshal different resources in order to solve a fraction when an illustration is included regardless of how easy or difficult the item may be. A2 - Providing an Interpretation when solving the fraction item: Q8, Q13, Q15, Q18, Q20, Q21, Q22, & Q23 Eight items required more than simply solving the fraction item and or chosing the right multiple choice answer. In questions Q18-Q20 with the exception of Q19 the students were required to explain their multiple choice answer in writing. In Q8 there was no correct answer in the distracters so the student was required to write in the correct response. In Q13 & Q15 the students had to properly interpret the question before trying to solve it. In all cases, once again independent of item difficulty, the questions required interpretation or the answer would not be correct. Subsequent theoretical support was found in the Manual of Attribute-Coding for General Mathematics in TIMSS Studies (Tatsuoka, Corter, & Gererro, 1995) items S10.Where S10 is defined as Be able to work with open-ended questions. In general the item became more difficult to solve given the requirement to interpret the fraction item while solving the question. This leads from inductive reasoning to the speculative conclusion that the human mind must marshal different resources in order to solve a fraction and provide an intrepretaion while solving the item, regardless of how easy or difficult the item may be. A3 - Judgmental Application: Q12, Q13, Q14, Q23 In these questions a judgment is required for the student to get the item correct. There is no option for the student to compute the item and then give a right answer. The student must analyze the problem, restructure the problem and then make a judgment about the right answer. The requirement for the student to make a judgment made the item significantly more difficult. Further this cognitive skill tended to be found only in the more advanced questions. A4 Computation: Q4, Q5, Q6, Q12, Q13, Q15, & Q17 (P2)

24 24 In these questions the students were required to compute the fraction. Generally speaking, in this conceptual fraction test, students had to posessess a strong reading and conceptual ability. Often, being strong computationally would not help. In these questions being able to compute the right answer was a help. This was one of only two cognitive Components that actually made the item easier. In fact when students encountered these questions generally they found the question to be easier. A5- Checking Options: Q1, Q7, Q12 & Q17 This option was only found in four of the items but in order to solve the problem the students had to check the multiple choice answer. There was no other option in order to solve the problem. They could not reason it out or abstractly come to the right conclusion. The students had to use the multiple choice distracter or they could not answer the question. This requirement made the items more difficult. A6 Spatial folding: Q13, Q16, and Q23 These three items required the student to be able to mentally unfold the item. In item 13 the student needed to fold and unfold the ribbon in their minds. In item 16 the student was required to remove the three purple buttons from the jar in their mind and in item 23 the student had to consider the possibility of one quarter being bigger or smaller than one half. This requirement to reflect upon and spatially process the item in the mind made the question more difficult. In this case using spatial displacement (rotate mentally, reflection and exchange) or spatial distortion (add, remove and or shading) made no difference. A7 Solving Routine Problems Q4, Q5, & Q18 This twenty-three item conceptual fraction test did only offered the students two skills they could employ which made the items easier. In this case these questions were routine in nature and solving them followed known algorithms. As students encountered these questions the item became easier.

25 25 Table 6 A Q-Matrix for 7 Cognitive Components with Presence or Absence in 23 Questions Using Illustrations Written Interpretations Judgmental Application Computation Checking Options Spatial Unfolding Solving Routine Problems A1 A2 A3 A4 A5 A6 A7 Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Teodora M. Salubayba St. Scholastica s College-Manila dory41@yahoo.com Abstract Mathematics word-problem

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information

Revisiting Differential Item Functioning: Implications for Fairness Investigation

Revisiting Differential Item Functioning: Implications for Fairness Investigation Revisiting Differential Item Functioning: Implications for Fairness Investigation Jinyan Huang** and Turgay Han* **Associate Professor and Ph.D. Faculty Member College of Education, Niagara University

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1.

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1. Running Head: A MODIFIED CATSIB PROCEDURE FOR DETECTING DIF ITEMS 1 A Modified CATSIB Procedure for Detecting Differential Item Function on Computer-Based Tests Johnson Ching-hong Li 1 Mark J. Gierl 1

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Academic Discipline DIF in an English Language Proficiency Test

Academic Discipline DIF in an English Language Proficiency Test Journal of English Language Teaching and Learning Year 5, No.7 Academic Discipline DIF in an English Language Proficiency Test Seyyed Mohammad Alavi Associate Professor of TEFL, University of Tehran Abbas

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Keywords: Dichotomous test, ordinal test, differential item functioning (DIF), magnitude of DIF, and test-takers. Introduction

Keywords: Dichotomous test, ordinal test, differential item functioning (DIF), magnitude of DIF, and test-takers. Introduction Comparative Analysis of Generalized Mantel Haenszel (GMH), Simultaneous Item Bias Test (SIBTEST), and Logistic Discriminant Function Analysis (LDFA) methods in detecting Differential Item Functioning (DIF)

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Parallel Forms for Diagnostic Purpose

Parallel Forms for Diagnostic Purpose Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing

More information

Gender-Based Differential Item Performance in English Usage Items

Gender-Based Differential Item Performance in English Usage Items A C T Research Report Series 89-6 Gender-Based Differential Item Performance in English Usage Items Catherine J. Welch Allen E. Doolittle August 1989 For additional copies write: ACT Research Report Series

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Testing the Multiple Intelligences Theory in Oman

Testing the Multiple Intelligences Theory in Oman Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 190 ( 2015 ) 106 112 2nd GLOBAL CONFERENCE on PSYCHOLOGY RESEARCHES, 28-29, November 2014 Testing the Multiple

More information

Published by European Centre for Research Training and Development UK (

Published by European Centre for Research Training and Development UK ( DETERMINATION OF DIFFERENTIAL ITEM FUNCTIONING BY GENDER IN THE NATIONAL BUSINESS AND TECHNICAL EXAMINATIONS BOARD (NABTEB) 2015 MATHEMATICS MULTIPLE CHOICE EXAMINATION Kingsley Osamede, OMOROGIUWA (Ph.

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

AN ASSESSMENT OF ITEM BIAS USING DIFFERENTIAL ITEM FUNCTIONING TECHNIQUE IN NECO BIOLOGY CONDUCTED EXAMINATIONS IN TARABA STATE NIGERIA

AN ASSESSMENT OF ITEM BIAS USING DIFFERENTIAL ITEM FUNCTIONING TECHNIQUE IN NECO BIOLOGY CONDUCTED EXAMINATIONS IN TARABA STATE NIGERIA American International Journal of Research in Humanities, Arts and Social Sciences Available online at http://www.iasir.net ISSN (Print): 2328-3734, ISSN (Online): 2328-3696, ISSN (CD-ROM): 2328-3688 AIJRHASS

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

International Journal of Education and Research Vol. 5 No. 5 May 2017

International Journal of Education and Research Vol. 5 No. 5 May 2017 International Journal of Education and Research Vol. 5 No. 5 May 2017 EFFECT OF SAMPLE SIZE, ABILITY DISTRIBUTION AND TEST LENGTH ON DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING MANTEL-HAENSZEL STATISTIC

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

An Introduction to Missing Data in the Context of Differential Item Functioning

An Introduction to Missing Data in the Context of Differential Item Functioning A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Differential Performance of Test Items by Geographical Regions. Konstantin E. Augemberg Fordham University. Deanna L. Morgan The College Board

Differential Performance of Test Items by Geographical Regions. Konstantin E. Augemberg Fordham University. Deanna L. Morgan The College Board Differential Performance of Test Items by Geographical Regions Konstantin E. Augemberg Fordham University Deanna L. Morgan The College Board Paper presented at the annual meeting of the National Council

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

Multidimensionality and Item Bias

Multidimensionality and Item Bias Multidimensionality and Item Bias in Item Response Theory T. C. Oshima, Georgia State University M. David Miller, University of Florida This paper demonstrates empirically how item bias indexes based on

More information

Writing Reaction Papers Using the QuALMRI Framework

Writing Reaction Papers Using the QuALMRI Framework Writing Reaction Papers Using the QuALMRI Framework Modified from Organizing Scientific Thinking Using the QuALMRI Framework Written by Kevin Ochsner and modified by others. Based on a scheme devised by

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D. Psicológica (2009), 30, 343-370. SECCIÓN METODOLÓGICA Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data Zhen Li & Bruno D. Zumbo 1 University

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Information Structure for Geometric Analogies: A Test Theory Approach

Information Structure for Geometric Analogies: A Test Theory Approach Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,

More information

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics Proceedings of Ahmad Dahlan International Conference on Mathematics and Mathematics Education Universitas Ahmad Dahlan, Yogyakarta, 13-14 October 2017 Exploratory Factor Analysis Student Anxiety Questionnaire

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Communication Research Practice Questions

Communication Research Practice Questions Communication Research Practice Questions For each of the following questions, select the best answer from the given alternative choices. Additional instructions are given as necessary. Read each question

More information

Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going

Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going LANGUAGE ASSESSMENT QUARTERLY, 4(2), 223 233 Copyright 2007, Lawrence Erlbaum Associates, Inc. Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going HLAQ

More information

Introduction to Test Theory & Historical Perspectives

Introduction to Test Theory & Historical Perspectives Introduction to Test Theory & Historical Perspectives Measurement Methods in Psychological Research Lecture 2 02/06/2007 01/31/2006 Today s Lecture General introduction to test theory/what we will cover

More information

V. Measuring, Diagnosing, and Perhaps Understanding Objects

V. Measuring, Diagnosing, and Perhaps Understanding Objects V. Measuring, Diagnosing, and Perhaps Understanding Objects Our purpose when undertaking this venture was not to explain data or even to build better instruments. It may not seem like it based on the discussion

More information

The Influence of Conditioning Scores In Performing DIF Analyses

The Influence of Conditioning Scores In Performing DIF Analyses The Influence of Conditioning Scores In Performing DIF Analyses Terry A. Ackerman and John A. Evans University of Illinois The effect of the conditioning score on the results of differential item functioning

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

A Differential Item Functioning (DIF) Analysis of the Self-Report Psychopathy Scale. Craig Nathanson and Delroy L. Paulhus

A Differential Item Functioning (DIF) Analysis of the Self-Report Psychopathy Scale. Craig Nathanson and Delroy L. Paulhus A Differential Item Functioning (DIF) Analysis of the Self-Report Psychopathy Scale Craig Nathanson and Delroy L. Paulhus University of British Columbia Poster presented at the 1 st biannual meeting of

More information

Rewards for reading: their effects on reading motivation

Rewards for reading: their effects on reading motivation Abstract Journal of Instructional Pedagogies Rewards for reading: their effects on reading Pin-Hwa Chen National Pingtung University of Education, Taiwan Jen-Rung Wu Lishe Elementary School, Taiwan In

More information

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign Reed Larson 2 University of Illinois, Urbana-Champaign February 28,

More information

Variability. After reading this chapter, you should be able to do the following:

Variability. After reading this chapter, you should be able to do the following: LEARIG OBJECTIVES C H A P T E R 3 Variability After reading this chapter, you should be able to do the following: Explain what the standard deviation measures Compute the variance and the standard deviation

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

A framework for predicting item difficulty in reading tests

A framework for predicting item difficulty in reading tests Australian Council for Educational Research ACEReSearch OECD Programme for International Student Assessment (PISA) National and International Surveys 4-2012 A framework for predicting item difficulty in

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

GRE R E S E A R C H. Development of a SIBTEST Bundle Methodology for Improving Test Equity With Applications for GRE Test Development

GRE R E S E A R C H. Development of a SIBTEST Bundle Methodology for Improving Test Equity With Applications for GRE Test Development GRE R E S E A R C H Development of a SIBTEST Bundle Methodology for Improving Test Equity With Applications for GRE Test Development William Stout Dan Bolt Amy Goodwin Froelich Brian Habing Sarah Hartz

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Encoding of Elements and Relations of Object Arrangements by Young Children

Encoding of Elements and Relations of Object Arrangements by Young Children Encoding of Elements and Relations of Object Arrangements by Young Children Leslee J. Martin (martin.1103@osu.edu) Department of Psychology & Center for Cognitive Science Ohio State University 216 Lazenby

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Getting a DIF Breakdown with Lertap

Getting a DIF Breakdown with Lertap Getting a DIF Breakdown with Lertap Larry Nelson Curtin University of Technology Document date: 8 October 2009 website: www.lertap.curtin.edu.au This document shows how Lertap 5 may be used to look for

More information

Session 2, Document A: The Big Ideas and Properties in Multiplication

Session 2, Document A: The Big Ideas and Properties in Multiplication Session 2, Document A: The Big Ideas and Properties in Multiplication In groups, respond to one of the prompts below. Your group should: 1) Read the prompt you are assigned. 2) Read the sections of the

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

Span Theory: An overview

Span Theory: An overview Page 1 Span Theory: An overview Bruce L. Bachelder 1 2 Morganton, NC Span theory (Bachelder, 1970/1971; 1974; 1977a, b, c; 1978; 1980; 1981; 1999; 2001a,b; 2003; 2005a,b; 2007; Bachelder & Denny, 1976;

More information

The Role of Modeling and Feedback in. Task Performance and the Development of Self-Efficacy. Skidmore College

The Role of Modeling and Feedback in. Task Performance and the Development of Self-Efficacy. Skidmore College Self-Efficacy 1 Running Head: THE DEVELOPMENT OF SELF-EFFICACY The Role of Modeling and Feedback in Task Performance and the Development of Self-Efficacy Skidmore College Self-Efficacy 2 Abstract Participants

More information

Item Difficulty Modeling on Logical and Verbal Reasoning Tests

Item Difficulty Modeling on Logical and Verbal Reasoning Tests Item Difficulty Modeling on Logical and Verbal Reasoning Tests Kuan Xing 1 and Kirk Becker 2 1 University of Illinois Chicago; 2 Pearson VUE, Chicago IL Acknowledgement: This pilot study was done during

More information

Techniques for Explaining Item Response Theory to Stakeholder

Techniques for Explaining Item Response Theory to Stakeholder Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

A Multidimensionality-Based DIF Analysis Paradigm

A Multidimensionality-Based DIF Analysis Paradigm A Multidimensionality-Based DIF Analysis Paradigm Louis Roussos, Law School Admission Council William Stout, University of Illinois at Urbana-Champaign A multidimensionality-based differential item functioning

More information

Tasks of Executive Control TEC. Interpretive Report. Developed by Peter K. Isquith, PhD, Robert M. Roth, PhD, Gerard A. Gioia, PhD, and PAR Staff

Tasks of Executive Control TEC. Interpretive Report. Developed by Peter K. Isquith, PhD, Robert M. Roth, PhD, Gerard A. Gioia, PhD, and PAR Staff Tasks of Executive Control TEC Interpretive Report Developed by Peter K. Isquith, PhD, Robert M. Roth, PhD, Gerard A. Gioia, PhD, and PAR Staff Client Information Client Name: Sample Client Client ID:

More information

- Triangulation - Member checks - Peer review - Researcher identity statement

- Triangulation - Member checks - Peer review - Researcher identity statement Module 3 Dr. Maxwell Reema Alsweel EDRS 797 November 13, 2009 Matrix Potential validity threat Looking for answers Leading Overlooking important data that does not seem to fit Asking the right questions

More information

Detection Theory: Sensitivity and Response Bias

Detection Theory: Sensitivity and Response Bias Detection Theory: Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

2013 Supervisor Survey Reliability Analysis

2013 Supervisor Survey Reliability Analysis 2013 Supervisor Survey Reliability Analysis In preparation for the submission of the Reliability Analysis for the 2013 Supervisor Survey, we wanted to revisit the purpose of this analysis. This analysis

More information

Rasch Versus Birnbaum: New Arguments in an Old Debate

Rasch Versus Birnbaum: New Arguments in an Old Debate White Paper Rasch Versus Birnbaum: by John Richard Bergan, Ph.D. ATI TM 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax: 520.323.9139 Copyright 2013. All rights reserved. Galileo

More information

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, 2018 The Scientific Method of Problem Solving The conceptual phase Reviewing the literature, stating the problem,

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Making a psychometric. Dr Benjamin Cowan- Lecture 9 Making a psychometric Dr Benjamin Cowan- Lecture 9 What this lecture will cover What is a questionnaire? Development of questionnaires Item development Scale options Scale reliability & validity Factor

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Optimization and Experimentation. The rest of the story

Optimization and Experimentation. The rest of the story Quality Digest Daily, May 2, 2016 Manuscript 294 Optimization and Experimentation The rest of the story Experimental designs that result in orthogonal data structures allow us to get the most out of both

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Jin Gong University of Iowa June, 2012 1 Background The Medical Council of

More information