MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL

Size: px
Start display at page:

Download "MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL"

Transcription

1 Man In India, 96 (1-2) : Serials Publications MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL Adibah Binti Abd Latif 1*, Ibnatul Jalilah Yusof 1, Nor Fadila Mohd Amin 1, Wilfredo Herrera Libunao 1 and Siti Sarah Yusri 1 The purpose of this study is to analyze the level of difficulty item and person ability using two measurement frameworks, Classical Test Theory (CTT) and Rasch Measurement Model (RMM). A total of 100 undergraduate students from Faculty of Education were responded to final examination paper, Research Methodology. This paper consists of 60 multiple choice questions (MCQ). The value of Cronbach- (CTT) obtained is 0.62 while Person Reliability (RMM) is 0.59 whereas Item Reliability is This study found that there is a slight difference in level of item difficulty index and person ability obtained from CTT and RMM. Additionally, there is no significant difference (p>.05) between item difficulty index (CTT) and item measure (RMM). This study also found that there is no significant difference of person ability (p>.05) estimates by CTT and RMM. RMM is theoretically considered as the superior measurement framework over CTT, this study however found that item and person statistics appeared to be similar for this two measurement frameworks. Thus, the interpretations beyond the philosophy were discussed. Keyword: Item analysis, Classical Test Theory, Rasch Measurement Model Introduction Musial et al. (2009) defined assessment as the art of placing learners in a setting that clarifies what learners experience and can do, as well as what learners may not recognize or cannot perform. It provides a picture of a student s advancements and achievements. The data obtained from an assessment are used as part of highstakes decision making about placement decisions such as choosing a program of study, promotion decisions such as tracking learning progress and determining whether students obtain certificates or other qualifications that empower them to achieve their objectives (Riley & Cantu, 2000; Braun et al., 2006). Malaysia educational system is now presented with the challenge of developing appropriate and the meaningful ways to evaluate the extent to which students are meeting the standards. Tests and examinations can accurately or inaccurately reflect the current level of students learning. However, a test can be studied from different angles and the items in the test can be evaluated accordingly to different theories or models that can provide better perspective on the relationship that may exists between the observed score on an examination and the underlying capability in the domain which is generally unobserved (Champlain, 2010). Two main test theory models that have been proposed for creating and evaluating test items are Classical Test Theory (CTT) and Item Response Theory (IRT). These two theories currently 1 Faculty of Education, Universiti Teknologi Malaysia, Malaysia * p-adibah@utm.my

2 174 MAN IN INDIA are popular measurement frameworks for identifying measurement problems such as test-score equating, test development and the identification of biased items (Hambleton & Jones, 1993; Lawson, 2006). To date, many educators in Malaysia are still using the CTT approach in analyzing tests items. Theoretically CTT is simple and easy to apply. Its straightforward and weak theoretical assumptions that easily met by test data, makes it extensively used in analyzing items (Hambleton & Jones, 1993; Champlain, 2010). However, many researchers started questioning their utility in the modern era (Amir et al., 2008). CTT has the limitation of circular dependency for estimating the test items parameters namely the item difficulty and item discrimination (Fan, 1998; Adedoyin & Adedoyin, 2013; Lawson, 2006; Stage, 2003). Circular dependency means that for example; an easy test can overestimate the ability estimates of the students while difficult test can do the reverse job by minimizing the abilities of examinees (Fan, 1998 & Amir et al., 2008). An individual will look as if they are low ability when the test is difficult, however a student will look as if they are high ability when the test is easy. It is thus difficult to compare the relative abilities of students taking two different tests (McAlpine, 2002). CTT considered the same total marks gained by the students indicate that they have the same abilities, regardless of whether it is easy items or difficult items. Therefore, this will affect an interpretation of students grading, ranking and reporting. In contrast to CTT, IRT generates rank ordering of students on the underlying trait rather than on the test scores. Students should be placed in the correct rank order regardless of which items that they chose to answer (McAlpine, 2002). Nevertheless, IRT has witnessed an exponential growth in recent decades as it is used to overcome the limitations of CTT (Nesé et al., 2013). Thus, this paper intended to compare item analysis using both approaches; CTT and IRT s Rasch Measurement Model (RMM). There are four objectives in this study: (i) To investigate the level of items difficulty using CTT and RMM approach. (ii) To analyze the significant statistical difference between item difficulties using CTT and RMM approach. (iii) To investigate the level of students ability using CTT and RMM approach. (iv) To analyze the significant difference between students ability using CTT and RMM approach. Classical Test Theory CTT introduces three concepts such as test score, true score and error score (Hambleton & Jones, 1993; Kline 2005) where test score often identified as observed scores and true score and error score identified as unobserved scores or latent. These concepts propose as an individual test score (X) consists of true score (T) and error score (E) which can be depicted in an equation below:

3 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST X = T + E From this formula, it can be concluded that individual test scores is influenced by true score and error score. True score is the expected score that can be obtained by taking the mean score that an individual would get across equivalent or parallel forms (Hambleton & Jones, 1993; Kline 2005) while Harvill (1991) claims that true score represents an individual score that is uninfluenced by any random events. True score according to Miller et al., (2011) can never be known; it is just shown to be an expected score that can be obtained by an individual through parallel forms (Hambleton & Jones, 1993). In definition provided by Gronlund and Linn (1990), parallel forms are tests that administer the same group of individuals in close succession, and the test scores obtained are correlated, while Hambleton and Jones (1993) suggest that parallel forms are tests that measure the same content for which the true score and size of error score of all students are equal. Error score also known as error of measurement is the difference between obtained score and true score. The error score is random in nature, unsystematic, due to chance, uncontrolled, and unspecified factors that influenced individual test score (Harvill, 1991; Miller et al., 2011). It means that an individual score could be high or could be low because of the error score. Over an infinite number of testing, the error score will increase and decrease an individual score by exactly the same amount because of its random characteristic (Miller et al., 2011). One Parameter Logistic (Rasch Measurement Model) There are three widely used IRT models which are One-Parameter Logistic Model (1-PL), Two-Parameter Logistic Model (2-PL) and Three-Parameter Logistic Model (3-PL) where each of these models has its own parameters. One of the key components that distinguish this model is the Item Characteristics Curve (ICC) which graphically displays the information of each item generated by IRT (Kline, 2005; Gleason, 2008). One-Parameter Logistics Model (1-PL) also known as Rasch Model (Gleason, 2008; Kline, 2005; Adedoyin & Adedoyin, 2013) is the most basic model in IRT which estimates only one parameter, difficulty parameter (b) (Kline, 2005). In 1- PL, level of item discrimination (a) and guessing probability (c) are assumed to be constant (Magno, 2009). In 1-PL model, the ICC for each item is given by the equation below. ( bi ) e Pi ( ) ( bi ) 1 e Where P i ( ) represents the probability of student with ( ) ability responses to the i-th item correctly and b i is the level of difficulty value of the i-th item. The b i value typically ranged from -2 to 2 but can take more extreme values (Sick, 2008).

4 176 MAN IN INDIA As noted in Kline, (2005), b and are scaled using a normal distribution with a standard deviation of 1.0 and a mean of 0.0, hence Magno (2009) presents two summaries from this equation: (i) The easier the item, a high probability of students will answer it correctly (ii) Students with high ability more likely will answer the items correctly compared to students with less ability. Materials and Methods This study was conducted using quantitative survey research and item analysis approach. Population of this study was the undergraduate students of Semester II (2013/2014) from Faculty of Education in one of public universities in Malaysia. The number of students from Semester II 2013/2014 is 520 students. A hundred of students who took the Research Methodology paper which consists of 60 multiplechoice questions were purposively taken as samples for this study. Item Difficulty Item difficulty items were analyzed to see which items were difficult than other items based on the value of item difficulty index (p) of the items. Mitra et al. (2009) suggest item is considered difficult if the p-value is less than 0.3 while it is considered easy if the p-value is more than 0.7. For, item difficulty level (CTT), data were calculated using formula of item difficulty index which is the total of correct response divided by total response. While, item difficulty level using Rasch were analyzed using Winstep, and produced Item Map and Item Measure. The estimates of ability and difficulty calculated from the analysis were referred as logit/measure (Ludlow & Haley, 1995). Person Ability This study also intends to investigate the differences of students ability using CTT and IRT. Ability of students using CTT is based on the total scores obtained by students regardless the difficulty of items. Under CTT, students with higher scores will regard as high ability students while students with lower scores will regard as low ability students, while students ability under IRT is based on level of difficulty on an item. Students who were able to answer more difficult items correctly considered as higher ability students compared to students who answered more difficult items wrongly. The significant difference of item difficulty and students ability were tested using t-test analysis. Result Person Reliability was tested using RMM and the value was 0.56 which means that it has low consistency and Item Reliability for this test is 0.95 which indicated

5 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST wide range of item measures or an adequate sample. Prior to item analysis using RMM, the examination paper was analyzed to check whether it fulfilled the assumption of unidimensionality. Table 1.0 shows that the raw variance explained by measures was less than 40% which is the minimum accepted value in using RMM (Azrilah et al., 2013). The Unexplained Variance in 1 st Contrast had a good value that was 5.6%. The value here should not exceed 15% which indicates too much noise (Azrilah et al., 2013). Thus, this dimensionality results showed the examination papers need to be revised especially in terms of the weightage of important content that being asked in the questions. Level Item of Difficulty Table 2.0 and Table 3.0 show in the details of the classifications of difficulty level using RMM and CTT respectively. It can be seen that items Q17, Q49, Q60 were in moderate difficulty level according to Rasch Analysis. Under CTT, these items were in high level difficulty. For items Q2, Q27, Q58, they were in moderate level using RMM and were in low level using CTT. Both CTT and RMM values of item difficulty index were standardized by transforming the values to Z-Scores. The score comparison was analyzed using t- test analysis. The result shows there was no significant difference [t(59),p>.05 ] of item difficulty index using CTT approach and RMM approach. Person Ability As can be seen from both Table 4.0 and Table 5.0, there was no student placed under high ability students. Students S45, S44, S1, S36, S37, S40, S49, and S8 were placed in moderately high ability in both RMM and CTT. Students S39, S12, S19, S35, S61, S88, S18, S21, S38, S48, S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93 and S96 were in moderately high in RMM, in contrast to CTT, they were placed in moderately low ability. In RMM, students S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, and S68 were placed in moderately low, while in CTT, they were placed in students with low ability in responding to the test. Both CTT marks and RMM person measure values were standardized by transforming the values to Z-Scores. The comparison was analyzed using t-test analysis. Finding shows there was no significant difference [t(99), p<.05] of person ability according to CTT and RMM analysis.

6 178 MAN IN INDIA TABLE 1: UNIDIMENSIONALITY Assumption of Unidimensionality Percentage (%) Raw variance Explained (Empirical) 21.6 Raw variance Explained (Model) 21.3 Unexplained Variance (1 st Contrast) 5.6 Level of Difficulty High (above logit 0.82) TABLE 2: CLASSIFICATION OF ITEM DIFFICULTY LEVEL SUBJECT B (RMM) Items Q32, Q1, Q10, Q50, Q34, Q31, Q46, Q14, Q28, Q11 Moderately High (between logit Q17, Q49, Q60, Q30, Q40, Q45, Q5, Q29, Q36, 0.82 to logit 0.00) Q44, Q47, Q57, Q18, Q39, Q41, Q25, Q42, Q43, Q52 Moderately Low (between logit 0.00 to Q26, Q3, Q37, Q15, Q24, Q53, Q55, Q7, Q38, logit -1.18) Q56, Q16, Q20, Q22, Q6, Q19, Q48, Q21, Q33, Q13, Q4, Q8 Low (below logit -1.18) Q51, Q59, Q9, Q12, Q35, Q54, Q23, Q2, Q27, Q58 Level of Difficulty TABLE 3: CLASSIFICATION OF ITEM DIFFICULTY LEVEL SUBJECT B (CTT) Items High (p 0.3) Q32, Q1, Q10, Q50,Q34, Q31, Q46, Q14, Q28, Q11, Q17, Q60,Q49 Moderate (0.31 p 0.79) Q30, Q40, Q45, Q5, Q29, Q36, Q44, Q47, Q57, Q18, Q39, Q41, Q25, Q42, Q43, Q52, Q26, Q3, Q37, Q15, Q24, Q53, Q55, Q7, Q38, Q56, Q16, Q20, Q22, Q6, Q19, Q48, Q21, Q33, Q13, Q4, Q8, Q51, Q59, Q9, Q12, Q35, Q54, Q23 Low (p 0.8) Q2, Q27, Q58 Level of Person Ability TABLE 4: CLASSIFICATION OF PERSON ABILITY FOR SUBJECT B (RMM) Person Moderately High (between logit S45, S44, S1, S36, S37, S40, S49, S8, S39, S12, 0.82 to logit -0.37) S19, S35, S61, S88, S18, S21, S38, S48, S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93, S96 Moderately Low (between logit S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, to logit -1.18) S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, S68

7 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST TABLE 5: CLASSIFICATION OF PERSON ABILITY FOR SUBJECT B (CTT) Level of Person Ability Person Moderately High S45, S44, S1, S36, S37, S40, S49, S8 (Marks: 74 to 60) (Grade Point: 3.33 to 2.67) Moderately Low S39, S12, S19, S35, S61, S88, S18, S21, S38, S48, (Marks: 59 to 45) S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, (Grade Point: ) S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93, S96 Low S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, (Marks: 44 to 00) S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, (Grade: ) S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, S68 Discussion Findings show there were no significant differences towards item difficulty and students ability using RMM and CTT. Research done by Idowu et al. (2011) indicated that item statistics derived from the two measurement frameworks are quite comparable and appeared to be similar for CTT and IRT. However, in categorizing the item difficulty and students ability based on cut-off score, there were some same items that fall under different level and same person was categorized under different abilities. These finding supported by research done by Dibu (2013) where person statistics derived by CTT and IRT produces similar results. Amir, et al. (2008) also found that an analysis of the ability level of individual examinee lead to similar results across the different measurement theories. Fan (1998) examined the behaviors of item and person statistics using IRT and CTT, showed that there was not much difference of item and person statistics using CTT and 1-PL, 2-PL, 3-PL. The likeness of the findings among the researchers shows that by using the total score of the marks, the probability to rank students in the same level is high using CTT and IRT. This is because both IRT and CTT at the first place will use the total score without make any justification of students pattern and process in answering the questions. Hence, the possibility to rank them in the same ability is high. For example if two different students got the same marks in exam, let say they got 80 marks, both CTT and IRT will placed them under same abilities based on their raw scores. In CTT, the interpretation of this achievement will conclude as same, but not in IRT. In IRT, the interpretation of students answer is based on student responses on easy and difficult items. The students with same marks will be interpreted as having different abilities if one of them score more on easier item while the other one

8 180 MAN IN INDIA score more on difficult items. The student that answers more difficult items correctly will be classified as student with higher abilities. In IRT, analysis of scalogram using Guttman Scale is the best way to differentiate students according to their ability in answering difficult items. For example, if the students with high marks answer more on difficult items correctly, it shows a positive direction, but if the students got high marks with more easy items correctly, while many difficult items are wrong, the direction is negative and the ability will consider lower than the previous types of students. From the scalogram, the pattern of students answer can be predicted. For examples, we can predict either the students have lucky guess on answering some items correctly or the students really have good knowledge in answering the items. Prediction can also be made if the students do not answer the items. IRT can determine either the students really do not know the answer, or have not time enough to answer the item or may be the students intentionally not answering the item. By analyzing all of these patterns thru Guttman scale in scalogram, the fair judgment towards students performance and accurate decision making can be done. IRT is theoretically considered as the superior measurement framework over CTT. Although this study found that item and person statistics appeared to be no significant differences between these two measurement frameworks, but the interpretation using IRT will give rich information in judging students achievement. Acknowledgment This research was funded by Ministry of Education and Research Management Centre UTM thru Fundamental Research Grant Scheme Vott Number 4f381 References Adedoyin, O. O., & Adedoyin, J. A. (2013). Assessing the comparability between classical test theory (CTT) and item response theory (IRT) models in estimating test item parameters. Herald Journal of Education and General Studies, Volume (2), Amir, Z., Atiq-Ur-Rehman, K., Mamoon, M., & Arshad, A. (2008). Students Ranking, Based on their Abilities on Objective Type Test: Comparison of CTT and IRT. Proceedings of the EDU-COM 2008 International Conference Azrilah, A.A, Saidfudin, M.M & Azami, Z. (2013). Asas Model Pengukuran Rasch Pembentukan Skala & Struktur Pengukuran. Malaysia. Penerbit Universiti Kebangsaan Malaysia. Braun, H., Kanjee, A., Bettinger, E., & Kremer, M. (2006). Improving Education Through Assessment, Innovation, and Evaluation. Cambridge: American Academy of Arts and Sciences. Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44(1), Dibu, O. O. (2013). Classical Test Theory (CTT) VS Item Response Theory (IRT): An Evaluation of the Comparability of Item Analysis Results. Lecture Presentation. Lecture conducted from, Abuja, Nigeria. (May, 23).

9 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST Fan, X. (1998). Item Response Theory and Classical Test Theory: An Empirical Comparison of their Item/Person Statistics. Educational and Psychological Measurement, Gleason, J. (2008). An evaluation of mathematics competitions using item response theory. Notices of the ACM, 55(1). Gronlund, N. E., & Linn, R. L. (1990). Measurement and Evaluation in Teaching. (6 th ed.). New York: Macmillan Publishing Company. Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice. Hambleton, R. K., & Swaminathan, H. (1995). Item response theory: principles and applications. Norwell Kluwer-Academic Publishers. Harvill, L. M. (1991). Standard Error of Measurement. Educational Measurement: Issues and Practice, 10: Idowu, E. O., Eluwa, A. N., & Abang, B. K. (2011). Evaluation of Mathematics Achievement Test: A Comparison Between Classical Test Theory (CTT) and Item Response Theory (IRT). Journal of Educational and Social Research, 1(4). Kline, T. (2005). Psychological Testing a Practical Approach to Design and Evaluation. Thousand Oaks: Sage Publications. Magno, C. (2009) Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data. The International Journal of Educational and Psychological Assessment, 1(1), McAlpine, M. (2002). A Summary of Methods of Item Analysis. University of Glasgow: Robert Clark Centre for Technological Education. Miller, L. A., McIntire, S. A., & Lovler, R. L. (2011). Foundations of psychological testing: a practical approach. (3 rd ed.). Thousand Oaks: Sage Publications. Mitra, N.K., Nagaraja, H.S., Ponnudurai, G. & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type A multiple choice questions of Pre-clinical Semester 1 multidisciplinary summative tests. IeJSME, 3, 1, pp. 2-7 Musial, D., Nieminen, G., Thomas, J., & Burke, K. (2009). Foundations of Meaningful Educational Assessment. New York: McGraw-Hill. Neşe, G., Gülden, K. U., & Gülşen, T. T. (2013). Comparison of classical test theory and item response theory in terms of item parameters. European Journal of Research on Education, 2(1), 1-6. Lawson, D. M. (2006). Applying the Item Response Theory to Classroom Examinations. Journal of Manipulative and Physiological Therapeutics, Ludlow, L.H. & Haley, S.M. (1995). Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55: Riley, R., & Cantu, N. (2000). The Use of Tests as Part of High-Stakes Decision-Making for Students: A Resource Guide for Educators and Policy Makers. Washington, DC: U.S. Department of Education, Office of Civil Rights. Stage, C. (2003). Classical Test Theory or Item Response Theory: The Swedish Experience. Centro de Estudios Públicos, 42.

10

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Turning Output of Item Response Theory Data Analysis into Graphs with R

Turning Output of Item Response Theory Data Analysis into Graphs with R Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Parallel Forms for Diagnostic Purpose

Parallel Forms for Diagnostic Purpose Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Validity and Reliability of the Malaysian Creativity and Innovation Instrument (MyCrIn) using the Rasch Measurement Model

Validity and Reliability of the Malaysian Creativity and Innovation Instrument (MyCrIn) using the Rasch Measurement Model Validity and Reliability of the sian Creativity and Innovation Instrument (MyCrIn) using the Rasch Measurement Model SITI RAHAYAH ARIFFIN, FARHANA AHMAD KATRAN, AYESHA ABDULLAH NAJIEB BADIB & NUR AIDAH

More information

Techniques for Explaining Item Response Theory to Stakeholder

Techniques for Explaining Item Response Theory to Stakeholder Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of

More information

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities DOI: 10.7763/IPEDR. 2014. V 78. 21 The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities Azra Ayue Abdul Rahman 1, Siti Aisyah Panatik

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity

The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity Pertanika J. Soc. Sci. & Hum. 25 (S): 81-88 (2017) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity

More information

Ascertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2014 UTME Physics test

Ascertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2014 UTME Physics test Ascertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2014 UTME Physics test Sub Theme: Improving Test Development Procedures to Improve

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe

More information

Discrimination Weighting on a Multiple Choice Exam

Discrimination Weighting on a Multiple Choice Exam Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Published by European Centre for Research Training and Development UK (

Published by European Centre for Research Training and Development UK ( DETERMINATION OF DIFFERENTIAL ITEM FUNCTIONING BY GENDER IN THE NATIONAL BUSINESS AND TECHNICAL EXAMINATIONS BOARD (NABTEB) 2015 MATHEMATICS MULTIPLE CHOICE EXAMINATION Kingsley Osamede, OMOROGIUWA (Ph.

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK Hanny Pradana, Gatot Sutapa, Luwandi Suhartono Sarjana Degree of English Language Education, Teacher

More information

World Academy of Science, Engineering and Technology International Journal of Psychological and Behavioral Sciences Vol:8, No:1, 2014

World Academy of Science, Engineering and Technology International Journal of Psychological and Behavioral Sciences Vol:8, No:1, 2014 Validity and Reliability of Competency Assessment Implementation (CAI) Instrument Using Rasch Model Nurfirdawati Muhamad Hanafi, Azmanirah Ab Rahman, Marina Ibrahim Mukhtar, Jamil Ahmad, Sarebah Warman

More information

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating

More information

Using the Score-based Testlet Method to Handle Local Item Dependence

Using the Score-based Testlet Method to Handle Local Item Dependence Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Students' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience

Students' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Students' perceived understanding and competency

More information

Application of Logistic Regression Model in Physics Education

Application of Logistic Regression Model in Physics Education Application of Logistic Regression Model in Physics Education Shobha Kanta Lamichhane Tribhuvan University, Prithwi Narayan Campus, Pokhara, Nepal sklamichhane@hotmail.com Abstract This paper introduces

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

How Many Options do Multiple-Choice Questions Really Have?

How Many Options do Multiple-Choice Questions Really Have? How Many Options do Multiple-Choice Questions Really Have? ABSTRACT One of the major difficulties perhaps the major difficulty in composing multiple-choice questions is the writing of distractors, i.e.,

More information

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS

MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Registered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics

Registered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics Registered Radiologist Assistant (R.R.A. ) Examination Statistics INTRODUCTION This report summarizes the results of the Registered Radiologist Assistant (R.R.A. ) examinations developed and administered

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING BY John Michael Clark III Submitted to the graduate degree program in Psychology and

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS 1 SARITA DESHPANDE, 2 RAVINDRA KUMAR PRAJAPATI 1 Professor of Education, College of Humanities and Education, Fiji National University, Natabua,

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Psychological testing

Psychological testing Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

Bayesian Tailored Testing and the Influence

Bayesian Tailored Testing and the Influence Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

WRITING CR ITEMS. Anjuran: Unit Hal Ehwal Kurikulum BHEA

WRITING CR ITEMS. Anjuran: Unit Hal Ehwal Kurikulum BHEA Anjuran: Unit Hal Ehwal Kurikulum WRITING CR ITEMS PART I: WHAT IS A TEST ITEM? The test item has the following key points: Unit of measurement A stimulus and a prescriptive form for answering The response

More information

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners APPLIED MEASUREMENT IN EDUCATION, 22: 1 21, 2009 Copyright Taylor & Francis Group, LLC ISSN: 0895-7347 print / 1532-4818 online DOI: 10.1080/08957340802558318 HAME 0895-7347 1532-4818 Applied Measurement

More information

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D. Psicológica (2009), 30, 343-370. SECCIÓN METODOLÓGICA Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data Zhen Li & Bruno D. Zumbo 1 University

More information

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics Proceedings of Ahmad Dahlan International Conference on Mathematics and Mathematics Education Universitas Ahmad Dahlan, Yogyakarta, 13-14 October 2017 Exploratory Factor Analysis Student Anxiety Questionnaire

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer

More information

The Lens Model and Linear Models of Judgment

The Lens Model and Linear Models of Judgment John Miyamoto Email: jmiyamot@uw.edu October 3, 2017 File = D:\P466\hnd02-1.p466.a17.docm 1 http://faculty.washington.edu/jmiyamot/p466/p466-set.htm Psych 466: Judgment and Decision Making Autumn 2017

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Measuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky

Measuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky Measuring the External Factors Related to Young Alumni Giving to Higher Education Kathryn Shirley Akers 1, University of Kentucky J. Travis McDearmon, University of Kentucky 1 1 Please use Kathryn Akers

More information

Properties of Single-Response and Double-Response Multiple-Choice Grammar Items

Properties of Single-Response and Double-Response Multiple-Choice Grammar Items Properties of Single-Response and Double-Response Multiple-Choice Grammar Items Abstract Purya Baghaei 1, Alireza Dourakhshan 2 Received: 21 October 2015 Accepted: 4 January 2016 The purpose of the present

More information

A Comparison of Three Measures of the Association Between a Feature and a Concept

A Comparison of Three Measures of the Association Between a Feature and a Concept A Comparison of Three Measures of the Association Between a Feature and a Concept Matthew D. Zeigenfuse (mzeigenf@msu.edu) Department of Psychology, Michigan State University East Lansing, MI 48823 USA

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS

MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS Farzilnizam AHMAD a, Raymond HOLT a and Brian HENSON a a Institute Design, Robotic & Optimizations (IDRO), School of Mechanical

More information

AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS. A Dissertation TROY GERARD COURVILLE

AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS. A Dissertation TROY GERARD COURVILLE AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS A Dissertation by TROY GERARD COURVILLE Submitted to the Office of Graduate Studies of Texas A&M University

More information

Introduction to Reliability

Introduction to Reliability Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What

More information