Parallel Forms for Diagnostic Purpose

Size: px
Start display at page:

Download "Parallel Forms for Diagnostic Purpose"

Transcription

1 Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010

2 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing full research from the initial stage of a test development to the final stage of score interpretation and usage. Strictly speaking, a test designed for a specific purpose cannot be used for other unintended usage. However, there are so many goals a test is expected to fulfill for different audience that there are frequent compromise in practice. One strong argument in favor of using one test for more than one purpose is to minimize interference and time-robbing from class-room teaching. This is especially true for achievement tests in many parts of the world. With these in mind, how to maximize the information from a single test would be a topic of interest for many measurement researchers as well as practitioners. One approach to realize this may be to analyze data from different perspectives for different purposes. For example, using Item Response Theory (IRT) to find the best cutting score for selection purpose and using Cognitive Diagnostic Models (CDM) to look for profiles of skill-mastery or not for diagnostic, placement or program evaluation purposes. Another issue of interest to the measurement field is parallel test forms. Generating parallel test forms is important when measuring achievement. It helps with test security and it enables multiple windows for testing to ensure fairness for best performance of every test taker. It is even more so when the same test result is used for limited further educational opportunities. However, generating parallel forms is also a challenging task as tests have to strike a balance between content and measurement specifications at the same time (Gibson & Weiner, 1998). Under classical test theory (CTT), Item difficulty and item discrimination are used to measure whether test forms are parallel. With modern test theories (models) such as item response theory (IRT), one selects items from pre-calibrated banks according to the test information function under constraints. Tests do not need to be parallel in terms of item or test difficulty although it is still regarded better with comparable content coverage. However, while IRT has become a norm for modern testing, it cannot provide more refined information that can benefit teachers and students for diagnosing and teaching purpose. For this purpose, CDMs are developed. CDMs can produce detailed analysis of a person s ability profile and help maximize information from a test in addition to what IRT can provide. However, for this emerging model, the examination of whether test forms are parallel has not been covered as much in the literature. For this reason, it is interesting to explore procedures to evaluate the parallism of test forms from a cognitive diagnostic perspective. A detailed introduction to CDM is beyond the scope of this paper, interested readers can refer to Leighton and Gierl (2007), and Rupp, Templin and Henson (2010). Chinese readers can also refer to a non-technical introduction by Chen (2011). In CDMs, latent classes are involved, that is we regard the correct and incorrect answers to an item a manifestation of a group of latent attributes working together for that item. There may be different attribute patterns that can lead to the probability of overall correct or incorrect responses. These patterns are called latent classes. Within latent classes is the idea of conditional independence in which the probabilities of responses are independent of one another given the number of latent classes within the model (Rupp et al, 2010). CDM includes a whole series of sub-models with their own particular constraints. One model in particular, the noncompensatory reduced reparameterized unified model (NCRUM) assumes that that the probability of a correct response reduces as the number of attributes mastered decreases (Rupp et al, 2010). Simply though, the NCRUM requires that a person master simpler attributes before more complex attributes. A model such as this may be especially useful in achievement testing, where there are 2

3 several content areas that range from rudimentary to more complex, i.e., hierarchical learning is expected (Bloom, 1956). Benjamin S. Bloom developed the taxonomy of learning as a result of instruction, known as the Taxonomy of Educational Objectives, as an easier approach to developing examinations (Krathwohl, 2002). The taxonomy is used as a tool to measure learning in the six main cognitive domains: knowledge, comprehension, application, analysis, synthesis, and evaluation (Krathwohl, 2002). Furthermore, the taxonomy is arranged as a hierarchy; in order to progress to the next level in thinking, one must possess the skills of those levels by which it was preceded (Bloom, 1956). For example, one cannot progress to comprehension without having the skills acquired in knowledge. Although Bloom s taxonomy has been extended and developed into numerous new framework, the central concept remain: the cognitive skills are hierarchical. Is this assumption supported by real achievement test data? If so, is the cognitive diagnostic information for test-takers based on traditionally-defined parallel forms also consistent from CDM perspective? How to evaluate parallelism if the test purpose is for diagnostic analyses? We decided to explore parallelism of test forms in terms of cognitive assessment and diagnosing which is closely related to the proposal of maximizing information from achievement tests through CDMs. We demonstrate the considerations needed to evaluate parallelism for the purpose of diagnosing and explore indices that can help with the judgment. METHOD Data We used data from the 2007 Trends in International Mathematics and Science Study (TIMSS) program. TIMSS is an international program designed to improve students mathematical and science skills (Olson, Martin, & Mullis, 2009). TIMSS measures trends in math and science every four years at the fourth and eighth grade levels in fifty-nine countries (Olson et al, 2009). The sampling selection in TI MSS 2007 follows a systematic, two-stage probability proportional-to-size (PPS) sampling technique, where schools are first selected and then classes within sampled (and participating) schools. This sampling method is a natural match with the hierarchical nature of the population where classes of students are nested within schools. The schools are sampled to mimic the variety of school type, and classes within schools are sampled to mimic the diversity among classes. For our study, we used student sample of the United States that had a total of 545 students of which 50.6% were girls and 49.4% were boys. We also decide to focus on the mathematical test for exploration purpose, where the cognitive hierarchy is clear to define. For the mathematical test, students are given a booklet of questions to measure achievement (Olson et al, 2009). The booklets are divided into two blocks ( Block 1 and Block 2). While each student takes two different blocks, every one of the blocks is used between two groups of test-takers for linking purposes. The test questions are divided into three different cognitive domains: Knowing, applying, and reasoning, similar to that of Bloom s taxonomy. We can examine whether the two blocks of TIMSS mathematical achievement tests were parallel from a diagnostic perspective, i.e., whether they give similar estimation of students ability profile between the two forms according to the test blueprint. The NCRUM model was chosen based on our theory that the three cognitive domains are hierarchical in nature. Put simply, if reasoning is the skill required to respond to a question correctly, knowing the concept and being able to apply the knowledge is not enough to ensure correct answer. The chosen mathematical test is divided into two blocks with 13 and 16 questions respectively. The design of the blocks makes it clear that the blocks are assumed to be parallel in 3

4 terms of structure, content and quality. That is, they can be exchanged between each other and provide reliable score interpretation for any group of test takers. The cognitive skills are the focus of this paper and the coverage of the skills as measured by the blocks are defined by the test development team and summarized in Table 1. Table 1. Distribution of Cognitive Skills Block 1 Block 2 Knowing 3 6 Applying 9 6 Reasoning 1 4 Total There are three relevant research questions: 1. How well can the TIMSS items discriminate between students with high and low cognitive abilities? 2. Can the two test forms (blocks) give consistent and reliable classification of students in terms of cognitive abilities? 3. Do student responses reflect a hierarchy of the three cognitive skills? Or is the assumed hierarchy of Bloom s taxonomy supported by the data? Model A cognitive diagnostic model (DCM/CDM), NCRUM, was chosen as the model for several reasons. First, we wanted to see more detailed information than just a total score. A regular unidimensional Item Response Model (IRT) was enough to provide overall item quality and person ability estimation but did not differentiate between test takers on each cognitive skill required for each item. However, DCM provided this type of information. When combined with analyses on math attributes such as Algebra and Number, as classified by TIMSS, users will be able to explain the differences in student responses in terms of both math ability and cognitive ability. As a subtype of DCM models, NCRUM assumes that the attributes measured by each item can not compensate for the lack of other attributes required for the item (Rupp et al, 2010). As previously mentioned this is consistent with Bloom s taxonomy and justifies our choice with this particular model within DCM families. A Q-matrix relevant to our research purpose is retrofitted to the data for analysis. To create a Q-matrix, we re-specified the cognitive skills that met Bloom s Taxonomy. Thus, an item that was meant measure Knowing was coded 1,0, 0, meaning the student only achieved level 1 in terms of cognitive skills. An item measuring Applying was coded as 1,1,0, meaning the student is at level 2; and an item measuring Reasoning was coded as 1,1,1. We used the program RUM, written by Dr. Robert Henson for the purpose of this analysis. It was an easy tool to implement and it provided outcomes for attribute-level item parameters p* and r* (Rupp, etal, 2010). These parameters were then used to calculate the attribute level discrimination parameters for the purpose of item evaluation. 4

5 Analyses procedures We calculated discrimination parameters using Equation 1 following the notation in Rupp et al (2010). A * * * qia dia = π π ria (Equation 1) a= 1 High π* and low r* indicates good items. We also used traditional difficulty index, the p-values in classical test theory and compared the results between the two approaches. Next, we classified the students into different categories. Although there were eight possible attribute categories, Bloom s taxonomy only allows four possible categories because of the hierarchy nature. However, we summarized both categories in our study for exploration of question 3. Finally, we compared the students profiles as the result of the test and compared the two blocks. The percentage of profile change was probed and analyzed. If the blocks were parallel and items were good, we expected to have a smaller percentage change for each attribute, and vice versa. RESULTS Item analyses Item discrimination analyses based on NCRUM model and classical test theory are shown in Table 2 and 3. Table 2. Attribute Level Item Discrimination based on Block 1 NCRUM Item Knowing Applying Reasoning Mean Table 3. Attribute Level Item Discrimination based on Block 2 5

6 NCRUM Item Knowing Applying Reasoning Mean There is little literature to guide the evaluation about the quality of items under NCRUM. We decided to use.20 as a reasonable cut point. This means that if the percentage of students that answer an item correctly (masters) is 20% more than non-masters, this item discriminates between the masters and non-masters very well. Using this rule, the two blocks were found to be discriminating the masters from non-masters moderately well. Specifically, out of all twenty-four attribute measures that could be evaluated for Block 1, eighteen of them were discriminating; out of thirty possible attribute measures for Block 2, twenty-four of them were discriminating. Classical Test Theory (CTT) was also used to examine the quality of the test for Blocks 1 and 2, which can be seen in Table 4. Overall, the 29 test items reliability was According to the item statistics, item 22 (p-value = 0.14) and item 12 (p-value = 0.17) were the hardest items on the test. Item 27 (p-value = 0.84) was the easiest item among the examinees taking this test. The least discriminating items for the 29 items were item 17 (r pb = 0.162) and item 6 (r pb = 0.164). Table 4. Classical Test Theory: Reliability Statistics (N = 545) Block Cronbach s Alpha # of Items Block 1 Block

7 Block 1 consisted of 13 items that had a reliability of The easiest item was item 1 (p-value = 0.81) and the hardest item was item 12 (p-value = 0.17). Item 6 (r pb = 0.159) and item 1 (r pb = 0.195) had the lowest discriminating values. Block 2 had a total of 16 items with a reliability of Item 27 had a p-value of 0.84, which made it the easiest item in Block 2. Additionally, item 22 seemed to be the hardest item with a p-value of Item 17(r pb = 0.158) doesn t discriminate well among examinees. These results are also listed in Table 5. Table 5. Classical Test Theory: Item Statistics (N= 545) Block 1 Block 2 Items M Corrected Item-Total Correlation Items M Corrected Item-Total Correlation Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item Block 1 and Block 2 were separated according to Bloom s Taxonomy to examine the item analyses in each subcategory (Table 6): Knowing, Applying, and Reasoning. In Block 1, Knowing had a total of 3 items with a reliability of In the Applying category, 9 items were included with a reliability of 0.71; however, item analyses could not be conducted for the Reasoning category due to there being one item in that section. 7

8 In Block 2, Knowing included 6 items (α = 0.51) and Applying included 6 items (α = 0.49). The Reasoning section included 4 items with a reliability of Table 6. Classical Test Theory: Reliability Statistics (N= 545) Block Cronbach s Alpha # of Items Block 1 Knowing Applying Reasoning Block 2 Knowing Applying Reasoning If a higher level cognitive skill is required for an item, most of the time the item can discriminate the lower level skill better than the higher skills. This is in line with Bloom s Taxonomy because more guessing may be involved for the item requiring higher skill, making responses to it more variable and discrimination less clear. In addition, having more items and/or high discriminating items would provide a better quality of items in both Block 1 and Block 2. When items are analyzed for the cognitive attributes specifically, we found that the item that discriminated Knowing best was item 9 in Block 2. This item measured Knowing, Applying and Reasoning, and only 14% students got the item right. Contrarily, item 6 in Block 1 discriminated Knowing least. This item measured Knowing and Applying, and 73% students got this item correct. The item that discriminates Applying best was item 13 in Block 1. It measured Knowing and Applying, and 26% student got the item right. The item that discriminated Applying least was item 2 in Block 1. It measured Knowing and Applying, and 76% students got this item right. Interestingly, item 2 is a word problem rather than a multiple choice question. This infers that perhaps the attributes the question was intended to measure were not clearly defined (for the questions for both blocks, see Appendix A). The item that discriminated Reasoning best was item 8 in Block 2. It measured all three attributes, and 32% students got this item right. However, the item that was least discriminating for the Reasoning attribute was item 10 in Block 2, which had a correct probability of 37%. For the attributes of Knowing and Applying, easy items had low ability of discrimination. This may suggest a ceiling effect for these items in that most students knew the correct answer; thus, the items lost the ability to discriminate. For the attribute of Reasoning, no such pattern was found. 8

9 Attribute Profile A probability higher than 0.60 suggests one has mastered an attribute, while a probability lower than 0.40 suggests that one has not mastered an attribute (Rupp et al, 2010). We deleted the cases with any probability of mastering an attribute between 0.40 and 0.60, meaning we needed more information for accurate classification of these students. 402 out of 545 individuals provided valid data. The possible latent class profiles for three attributes are shown in the Table 7. Table 7. Possible student profile classification Latent Class Attribute Profile Only the first four classes were reasonable according to hierarchy described in Bloom s Taxonomy. In our results, each block generated five different attribute profiles. The attribute profile for latent class and the corresponding probability is shown Table 8. Table 8. The probability for attribute profiles Block 1 Block2 Latent class Attribute Profile for Latent p Attribute Profile for Latent p Class Class 1 α 11 = (0,0,0).493 α 21 = (0,0,0) α 12 = (0,1,0).032 α 22 = (1,0,0) α 13 = (1,0,0).313 α 23 = (1,0,1) α 14 = (1,1,0).159 α 24 = (1,1,0) α 15 = (1,1,1).002 α 25 = (1,1,1).170 According to the table, Bloom s Taxonomy is generally supported. Three attribute profiles had low probability percentages: (0,1,0), (1,0,1) and (1,1,1). The first two were unreasonable according to Bloom s Taxonomy and low p value in the data also supports these hypotheses. Excluding these attribute profiles, we found only four latent classes: (0,0,0), (1,0,0), (1,1,0), and (1,1,1). These are exactly what Bloom s taxonomy would predict. Among the four latent cases, we found the percentage of students mastering each attribute also reasonable. The probability decreases as the number of attribute increases. It suggests a hierarchy among these three attributes. However, the probability of latent class 5 in Block 1 was extremely low. This may be due to the fact that there is only one item in Block 1 to measure reasoning so the results may not be accurate. 9

10 Block Comparison Our hypothesis was that Block 1 and Block 2 should generate the same mastery profile for each person because these blocks are parallel. If the mastery profiles generated by two blocks are different, they are not parallel tests in reality. By comparing each examinee s attribute mastery profiles generated by Block 1 and Block 2, discrepancies were found across three attributes. The percentage of students switching classes between a master of an attribute and a non-master was 29.7% for Knowing, 17.4% for Applying and 12.2% for Reasoning. This means our Blocks could categorize the students reliably. Though the discrepancy seems to decrease with a higher cognitive domain, it should be noted that the probability of getting the items concerning applying and reasoning also decreases. Apparently, these two blocks were not parallel tests for classification and diagnostic purposes. A second reason for the discrepancy between the diagnostic results may have been due to the unbalanced item distribution in Block 1, in which there is only one item to measure reasoning. Therefore, the judgment for students reasoning ability obtained from Block 1 was not reliable. Relating CTT with CDM indices Generally speaking, both of the two blocks discriminated students well. The range of the discrimination index was (0.01, 0.75) based on CDM and (0.15, 0.53) based on CTT for Block 1. The range of the discrimination index was (0.02, 0.85) based on CDM and (0.15, 0.52) based on CTT for Block 2. All the item discrimination indices were positive, which show that those items were reasonable. While the discrimination index based on CTT was the item discrimination, the discrimination index based on CDM was the attribute discrimination. When the item only measured one attribute, the CDM discrimination index and the CTT discrimination index was consistent. When all attributes for an item had small discrimination indices, that item had a small CTT discrimination index as well (eg. Item 4 in Block 2). However, when the attribute discrimination index varied across the attributes for a certain item, the CTT item discrimination index seemed to be a balance of all attribute discrimination indices. SUMMARY This paper evaluated the efficacy of Booklet 1 of TIMSS 2007 on measuring the cognitive ability of students at the eighth grade. We used both methods based on Cognitive Diagnostic Modeling (CDM) and Classic Test Theory (CTT) to examine how well these items discriminated students with different ability levels. We demonstrated how to use various indices to evaluate parallelism from a CDM perspective and compared it to CTT indices. More empirical research should be done to explore the relationship between item difficulty in CTT and attribute discrimination in CDM. We also compared the two blocks in Booklet 1 regarding their ability to accurately classify students. The reliability evaluation based on CTT showed that both blocks had good internal consistency. However, after using CDM to categorize students into different latent classes, we did not obtain evidence that two blocks give consistent and reliable classification of students. The percentage of students switching classes between a master of an attribute and a non-master was 29.7% for Knowing, 17.4% for Applying and 12.2% for Reasoning. This leads to concerns in the validity of test score interpretation if the diagnosing feature is built into the design stage and is expected to be shared with score users. Fortunately, TIMSS was not designed for diagnostic purpose. If a test is designed for diagnostic purpose, concepts such as parallel 10

11 forms and reliability will have to be different from non-diagnostic tests. This paper explores this issue and cast a new understanding of these traditional concepts from a CDM perspective. This is relevant not only to the validity of test score interpretation but also the initial stage of test development where the test blueprint will have to consider new dimensions to ensure test quality. Of course, many other concepts related to the current trend to computer-based testing such as test-assembly engineering will also change. This is a worthwhile field for furthur exploration for diagnostic assessment. CDM was used to exam the hierarchy among the three cognitive domains. Our results support the hypothesis that student responses reflect a hierarchy of the three cognitive skills for TIMSS math measurement. If a higher level cognitive skill was required for the item, all the lower level skills had to be present to give a correct response. These three attributes are in the same order as defined in Bloom s Taxonomy, with Knowing being the lowest skill and Reasoning (Evaluation) being the highest. Reference Bloom, B.S. (1956). Taxonomy of educational objectives, handbook 1: The cognitive domain. New York: David McKay. 陈芳,2011, 诊断分类模型 : 测试领域的新工具, 外语教学理论与实践 第 2 期, Gibson, W.M., Weiner, J.A. (1998). Generating parallel test forms using CTT in a computerbased environment. Journal of Educational Measurement, 35, Hambleton, R.K., Swaminathan, H. (1990). Item response theory: Principles and Applications. Norwell, MA: Kluwer Academic Publishers. Krathwohl, D.R. (2002). A revision of Bloom s taxonomy: An overview. Theory Into Practice, 41, Leighton, J. P., & Griel, M. J. (2007), Ed. Cognitive diagnostic assessment for education: theory and applications. New York, NY: Cambridge University Press. Olson, J. F., Martin, M.O. & Mullis, I.V. (2009). TIMSS 2007 Technical Report. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Olson, J.F., & Foy, P. (2009). TIMSS 2007 User Guide for the International Database. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guilford Press. 11

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Model-based Diagnostic Assessment. University of Kansas Item Response Theory Stats Camp 07

Model-based Diagnostic Assessment. University of Kansas Item Response Theory Stats Camp 07 Model-based Diagnostic Assessment University of Kansas Item Response Theory Stats Camp 07 Overview Diagnostic Assessment Methods (commonly called Cognitive Diagnosis). Why Cognitive Diagnosis? Cognitive

More information

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics

Reviewing the TIMSS Advanced 2015 Achievement Item Statistics CHAPTER 11 Reviewing the TIMSS Advanced 2015 Achievement Item Statistics Pierre Foy Michael O. Martin Ina V.S. Mullis Liqun Yin Kerry Cotter Jenny Liu The TIMSS & PIRLS conducted a review of a range of

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Journal of Educational Measurement Summer 2010, Vol. 47, No. 2, pp. 227 249 Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Jimmy de la Torre and Yuan Hong

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Keywords: positive attribution, negative attribution, English learning, gender. Introduction

Keywords: positive attribution, negative attribution, English learning, gender. Introduction US-China Foreign Language, October 2016, Vol. 14, No. 10, 706-711 doi:10.17265/1539-8080/2016.10.005 D DAVID PUBLISHING Junior Middle School Students Self-attribution in English Learning GAO Yuan-yuan

More information

MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL

MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL Man In India, 96 (1-2) : 173-181 Serials Publications MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL Adibah Binti Abd Latif 1*, Ibnatul Jalilah Yusof 1, Nor Fadila

More information

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS by LAINE P. BRADSHAW (Under the Direction of Jonathan Templin and Karen Samuelsen) ABSTRACT

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Standaarden voor kerndoelen basisonderwijs : de ontwikkeling van standaarden voor kerndoelen basisonderwijs op basis van resultaten uit peilingsonderzoek van der

More information

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS Attached are the performance reports and analyses for participants from your surgery program on the

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

2016 Technical Report National Board Dental Hygiene Examination

2016 Technical Report National Board Dental Hygiene Examination 2016 Technical Report National Board Dental Hygiene Examination 2017 Joint Commission on National Dental Examinations All rights reserved. 211 East Chicago Avenue Chicago, Illinois 60611-2637 800.232.1694

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

JONATHAN TEMPLIN LAINE BRADSHAW THE USE AND MISUSE OF PSYCHOMETRIC MODELS

JONATHAN TEMPLIN LAINE BRADSHAW THE USE AND MISUSE OF PSYCHOMETRIC MODELS PSYCHOMETRIKA VOL. 79, NO. 2, 347 354 APRIL 2014 DOI: 10.1007/S11336-013-9364-Y THE USE AND MISUSE OF PSYCHOMETRIC MODELS JONATHAN TEMPLIN UNIVERSITY OF KANSAS LAINE BRADSHAW THE UNIVERSITY OF GEORGIA

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK Hanny Pradana, Gatot Sutapa, Luwandi Suhartono Sarjana Degree of English Language Education, Teacher

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Validity, Reliability, and Fairness in Music Testing

Validity, Reliability, and Fairness in Music Testing chapter 20 Validity, Reliability, and Fairness in Music Testing Brian C. Wesolowski and Stefanie A. Wind The focus of this chapter is on validity, reliability, and fairness in music testing. A test can

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Evaluating the Consistency of Test Content across Two Successive Administrations of a State- Mandated Science and Technology Assessment 1,2

Evaluating the Consistency of Test Content across Two Successive Administrations of a State- Mandated Science and Technology Assessment 1,2 Evaluating the Consistency of Test Content across Two Successive Administrations of a State- Mandated Science and Technology Assessment 1,2 Timothy O Neil 3 and Stephen G. Sireci University of Massachusetts

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions:

Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions: Workshop Overview Workshop Sessions: Diagnostic Measurement: Theory, Methods, and Applications Jonathan Templin The University of Georgia Session 1 Conceptual Foundations of Diagnostic Measurement Session

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Teodora M. Salubayba St. Scholastica s College-Manila dory41@yahoo.com Abstract Mathematics word-problem

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

A Comparison of Three Measures of the Association Between a Feature and a Concept

A Comparison of Three Measures of the Association Between a Feature and a Concept A Comparison of Three Measures of the Association Between a Feature and a Concept Matthew D. Zeigenfuse (mzeigenf@msu.edu) Department of Psychology, Michigan State University East Lansing, MI 48823 USA

More information

ACADEMIC APPLICATION:

ACADEMIC APPLICATION: Academic Skills: Critical Thinking Bloom s Taxonomy Name Point of the Assignment: To help you realize there are different forms of critical thinking to be used in education. Some forms of critical thinking

More information

for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University of Georgia Author Note

for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University of Georgia Author Note Combing Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University

More information

Computer Adaptive-Attribute Testing

Computer Adaptive-Attribute Testing Zeitschrift M.J. für Psychologie Gierl& J. / Zhou: Journalof Computer Psychology 2008Adaptive-Attribute Hogrefe 2008; & Vol. Huber 216(1):29 39 Publishers Testing Computer Adaptive-Attribute Testing A

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Jonathan Templin Department of Educational Psychology Achievement and Assessment Institute

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

University of Alberta

University of Alberta University of Alberta Estimating Attribute-Based Reliability in Cognitive Diagnostic Assessment by Jiawen Zhou A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment

More information

January 2, Overview

January 2, Overview American Statistical Association Position on Statistical Statements for Forensic Evidence Presented under the guidance of the ASA Forensic Science Advisory Committee * January 2, 2019 Overview The American

More information

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk Pearson Education Limited 2014

More information

Impact of Methods of Scoring Omitted Responses on Achievement Gaps

Impact of Methods of Scoring Omitted Responses on Achievement Gaps Impact of Methods of Scoring Omitted Responses on Achievement Gaps Dr. Nathaniel J. S. Brown (nathaniel.js.brown@bc.edu)! Educational Research, Evaluation, and Measurement, Boston College! Dr. Dubravka

More information

Changing the Order of Mathematics Test Items: Helping or Hindering Student Performance?

Changing the Order of Mathematics Test Items: Helping or Hindering Student Performance? Journal of Humanistic Mathematics Volume 3 Issue 1 January 2013 Changing the Order of Mathematics Test Items: Helping or Hindering Student Performance? Kristin T. Kennedy Bryant University, kkennedy@bryant.edu

More information

INSPECT Overview and FAQs

INSPECT Overview and FAQs WWW.KEYDATASYS.COM ContactUs@KeyDataSys.com 951.245.0828 Table of Contents INSPECT Overview...3 What Comes with INSPECT?....4 Reliability and Validity of the INSPECT Item Bank. 5 The INSPECT Item Process......6

More information

Using the Score-based Testlet Method to Handle Local Item Dependence

Using the Score-based Testlet Method to Handle Local Item Dependence Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.

More information

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Stephen G. Sireci University of Massachusetts Amherst Presentation delivered at the Connecticut Assessment Forum Rocky

More information

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1.

A Modified CATSIB Procedure for Detecting Differential Item Function. on Computer-Based Tests. Johnson Ching-hong Li 1. Mark J. Gierl 1. Running Head: A MODIFIED CATSIB PROCEDURE FOR DETECTING DIF ITEMS 1 A Modified CATSIB Procedure for Detecting Differential Item Function on Computer-Based Tests Johnson Ching-hong Li 1 Mark J. Gierl 1

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT Amin Mousavi Centre for Research in Applied Measurement and Evaluation University of Alberta Paper Presented at the 2013

More information

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades

More information

Chapter 1 Introduction to Educational Research

Chapter 1 Introduction to Educational Research Chapter 1 Introduction to Educational Research The purpose of Chapter One is to provide an overview of educational research and introduce you to some important terms and concepts. My discussion in this

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Assessment with Multiple-Choice Questions in Medical Education: Arguments for Selected-Response Formats

Assessment with Multiple-Choice Questions in Medical Education: Arguments for Selected-Response Formats Assessment with Multiple-Choice Questions in Medical Education: Arguments for Selected-Response Formats Congreso Nacional De Educacion Medica Puebla, Mexico 11 January, 2007 Steven M. Downing, PhD Department

More information

Published by European Centre for Research Training and Development UK (

Published by European Centre for Research Training and Development UK ( DETERMINATION OF DIFFERENTIAL ITEM FUNCTIONING BY GENDER IN THE NATIONAL BUSINESS AND TECHNICAL EXAMINATIONS BOARD (NABTEB) 2015 MATHEMATICS MULTIPLE CHOICE EXAMINATION Kingsley Osamede, OMOROGIUWA (Ph.

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Computerized Adaptive Testing for Classifying Examinees Into Three Categories

Computerized Adaptive Testing for Classifying Examinees Into Three Categories Measurement and Research Department Reports 96-3 Computerized Adaptive Testing for Classifying Examinees Into Three Categories T.J.H.M. Eggen G.J.J.M. Straetmans Measurement and Research Department Reports

More information

How Many Options do Multiple-Choice Questions Really Have?

How Many Options do Multiple-Choice Questions Really Have? How Many Options do Multiple-Choice Questions Really Have? ABSTRACT One of the major difficulties perhaps the major difficulty in composing multiple-choice questions is the writing of distractors, i.e.,

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe

More information

Supplementary Material*

Supplementary Material* Supplementary Material* Lipner RS, Brossman BG, Samonte KM, Durning SJ. Effect of Access to an Electronic Medical Resource on Performance Characteristics of a Certification Examination. A Randomized Controlled

More information

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven Introduction The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven outcome measure that was developed to address the growing need for an ecologically valid functional communication

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

A Broad-Range Tailored Test of Verbal Ability

A Broad-Range Tailored Test of Verbal Ability A Broad-Range Tailored Test of Verbal Ability Frederic M. Lord Educational Testing Service Two parallel forms of a broad-range tailored test of verbal ability have been built. The test is appropriate from

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4 UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4 Algebra II Unit 4 Overview: Inferences and Conclusions from Data In this unit, students see how the visual displays and summary statistics they

More information

Running head: MAJOR FIELD TEST SCORES AND PSYCHOLOGY COURSE WORK

Running head: MAJOR FIELD TEST SCORES AND PSYCHOLOGY COURSE WORK Major Field Test 1 Running head: MAJOR FIELD TEST SCORES AND PSYCHOLOGY COURSE WORK Major Field Test Scores Related to Quantity, Quality and Recency of Psychology Majors Course Work at Two Universities

More information

Differential Item Functioning from a Compensatory-Noncompensatory Perspective

Differential Item Functioning from a Compensatory-Noncompensatory Perspective Differential Item Functioning from a Compensatory-Noncompensatory Perspective Terry Ackerman, Bruce McCollaum, Gilbert Ngerano University of North Carolina at Greensboro Motivation for my Presentation

More information