MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL
|
|
- Lora Carson
- 5 years ago
- Views:
Transcription
1 Man In India, 96 (1-2) : Serials Publications MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST THEORY AND RASCH MEASUREMENT MODEL Adibah Binti Abd Latif 1*, Ibnatul Jalilah Yusof 1, Nor Fadila Mohd Amin 1, Wilfredo Herrera Libunao 1 and Siti Sarah Yusri 1 The purpose of this study is to analyze the level of difficulty item and person ability using two measurement frameworks, Classical Test Theory (CTT) and Rasch Measurement Model (RMM). A total of 100 undergraduate students from Faculty of Education were responded to final examination paper, Research Methodology. This paper consists of 60 multiple choice questions (MCQ). The value of Cronbach- (CTT) obtained is 0.62 while Person Reliability (RMM) is 0.59 whereas Item Reliability is This study found that there is a slight difference in level of item difficulty index and person ability obtained from CTT and RMM. Additionally, there is no significant difference (p>.05) between item difficulty index (CTT) and item measure (RMM). This study also found that there is no significant difference of person ability (p>.05) estimates by CTT and RMM. RMM is theoretically considered as the superior measurement framework over CTT, this study however found that item and person statistics appeared to be similar for this two measurement frameworks. Thus, the interpretations beyond the philosophy were discussed. Keyword: Item analysis, Classical Test Theory, Rasch Measurement Model Introduction Musial et al. (2009) defined assessment as the art of placing learners in a setting that clarifies what learners experience and can do, as well as what learners may not recognize or cannot perform. It provides a picture of a student s advancements and achievements. The data obtained from an assessment are used as part of highstakes decision making about placement decisions such as choosing a program of study, promotion decisions such as tracking learning progress and determining whether students obtain certificates or other qualifications that empower them to achieve their objectives (Riley & Cantu, 2000; Braun et al., 2006). Malaysia educational system is now presented with the challenge of developing appropriate and the meaningful ways to evaluate the extent to which students are meeting the standards. Tests and examinations can accurately or inaccurately reflect the current level of students learning. However, a test can be studied from different angles and the items in the test can be evaluated accordingly to different theories or models that can provide better perspective on the relationship that may exists between the observed score on an examination and the underlying capability in the domain which is generally unobserved (Champlain, 2010). Two main test theory models that have been proposed for creating and evaluating test items are Classical Test Theory (CTT) and Item Response Theory (IRT). These two theories currently 1 Faculty of Education, Universiti Teknologi Malaysia, Malaysia * p-adibah@utm.my
2 174 MAN IN INDIA are popular measurement frameworks for identifying measurement problems such as test-score equating, test development and the identification of biased items (Hambleton & Jones, 1993; Lawson, 2006). To date, many educators in Malaysia are still using the CTT approach in analyzing tests items. Theoretically CTT is simple and easy to apply. Its straightforward and weak theoretical assumptions that easily met by test data, makes it extensively used in analyzing items (Hambleton & Jones, 1993; Champlain, 2010). However, many researchers started questioning their utility in the modern era (Amir et al., 2008). CTT has the limitation of circular dependency for estimating the test items parameters namely the item difficulty and item discrimination (Fan, 1998; Adedoyin & Adedoyin, 2013; Lawson, 2006; Stage, 2003). Circular dependency means that for example; an easy test can overestimate the ability estimates of the students while difficult test can do the reverse job by minimizing the abilities of examinees (Fan, 1998 & Amir et al., 2008). An individual will look as if they are low ability when the test is difficult, however a student will look as if they are high ability when the test is easy. It is thus difficult to compare the relative abilities of students taking two different tests (McAlpine, 2002). CTT considered the same total marks gained by the students indicate that they have the same abilities, regardless of whether it is easy items or difficult items. Therefore, this will affect an interpretation of students grading, ranking and reporting. In contrast to CTT, IRT generates rank ordering of students on the underlying trait rather than on the test scores. Students should be placed in the correct rank order regardless of which items that they chose to answer (McAlpine, 2002). Nevertheless, IRT has witnessed an exponential growth in recent decades as it is used to overcome the limitations of CTT (Nesé et al., 2013). Thus, this paper intended to compare item analysis using both approaches; CTT and IRT s Rasch Measurement Model (RMM). There are four objectives in this study: (i) To investigate the level of items difficulty using CTT and RMM approach. (ii) To analyze the significant statistical difference between item difficulties using CTT and RMM approach. (iii) To investigate the level of students ability using CTT and RMM approach. (iv) To analyze the significant difference between students ability using CTT and RMM approach. Classical Test Theory CTT introduces three concepts such as test score, true score and error score (Hambleton & Jones, 1993; Kline 2005) where test score often identified as observed scores and true score and error score identified as unobserved scores or latent. These concepts propose as an individual test score (X) consists of true score (T) and error score (E) which can be depicted in an equation below:
3 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST X = T + E From this formula, it can be concluded that individual test scores is influenced by true score and error score. True score is the expected score that can be obtained by taking the mean score that an individual would get across equivalent or parallel forms (Hambleton & Jones, 1993; Kline 2005) while Harvill (1991) claims that true score represents an individual score that is uninfluenced by any random events. True score according to Miller et al., (2011) can never be known; it is just shown to be an expected score that can be obtained by an individual through parallel forms (Hambleton & Jones, 1993). In definition provided by Gronlund and Linn (1990), parallel forms are tests that administer the same group of individuals in close succession, and the test scores obtained are correlated, while Hambleton and Jones (1993) suggest that parallel forms are tests that measure the same content for which the true score and size of error score of all students are equal. Error score also known as error of measurement is the difference between obtained score and true score. The error score is random in nature, unsystematic, due to chance, uncontrolled, and unspecified factors that influenced individual test score (Harvill, 1991; Miller et al., 2011). It means that an individual score could be high or could be low because of the error score. Over an infinite number of testing, the error score will increase and decrease an individual score by exactly the same amount because of its random characteristic (Miller et al., 2011). One Parameter Logistic (Rasch Measurement Model) There are three widely used IRT models which are One-Parameter Logistic Model (1-PL), Two-Parameter Logistic Model (2-PL) and Three-Parameter Logistic Model (3-PL) where each of these models has its own parameters. One of the key components that distinguish this model is the Item Characteristics Curve (ICC) which graphically displays the information of each item generated by IRT (Kline, 2005; Gleason, 2008). One-Parameter Logistics Model (1-PL) also known as Rasch Model (Gleason, 2008; Kline, 2005; Adedoyin & Adedoyin, 2013) is the most basic model in IRT which estimates only one parameter, difficulty parameter (b) (Kline, 2005). In 1- PL, level of item discrimination (a) and guessing probability (c) are assumed to be constant (Magno, 2009). In 1-PL model, the ICC for each item is given by the equation below. ( bi ) e Pi ( ) ( bi ) 1 e Where P i ( ) represents the probability of student with ( ) ability responses to the i-th item correctly and b i is the level of difficulty value of the i-th item. The b i value typically ranged from -2 to 2 but can take more extreme values (Sick, 2008).
4 176 MAN IN INDIA As noted in Kline, (2005), b and are scaled using a normal distribution with a standard deviation of 1.0 and a mean of 0.0, hence Magno (2009) presents two summaries from this equation: (i) The easier the item, a high probability of students will answer it correctly (ii) Students with high ability more likely will answer the items correctly compared to students with less ability. Materials and Methods This study was conducted using quantitative survey research and item analysis approach. Population of this study was the undergraduate students of Semester II (2013/2014) from Faculty of Education in one of public universities in Malaysia. The number of students from Semester II 2013/2014 is 520 students. A hundred of students who took the Research Methodology paper which consists of 60 multiplechoice questions were purposively taken as samples for this study. Item Difficulty Item difficulty items were analyzed to see which items were difficult than other items based on the value of item difficulty index (p) of the items. Mitra et al. (2009) suggest item is considered difficult if the p-value is less than 0.3 while it is considered easy if the p-value is more than 0.7. For, item difficulty level (CTT), data were calculated using formula of item difficulty index which is the total of correct response divided by total response. While, item difficulty level using Rasch were analyzed using Winstep, and produced Item Map and Item Measure. The estimates of ability and difficulty calculated from the analysis were referred as logit/measure (Ludlow & Haley, 1995). Person Ability This study also intends to investigate the differences of students ability using CTT and IRT. Ability of students using CTT is based on the total scores obtained by students regardless the difficulty of items. Under CTT, students with higher scores will regard as high ability students while students with lower scores will regard as low ability students, while students ability under IRT is based on level of difficulty on an item. Students who were able to answer more difficult items correctly considered as higher ability students compared to students who answered more difficult items wrongly. The significant difference of item difficulty and students ability were tested using t-test analysis. Result Person Reliability was tested using RMM and the value was 0.56 which means that it has low consistency and Item Reliability for this test is 0.95 which indicated
5 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST wide range of item measures or an adequate sample. Prior to item analysis using RMM, the examination paper was analyzed to check whether it fulfilled the assumption of unidimensionality. Table 1.0 shows that the raw variance explained by measures was less than 40% which is the minimum accepted value in using RMM (Azrilah et al., 2013). The Unexplained Variance in 1 st Contrast had a good value that was 5.6%. The value here should not exceed 15% which indicates too much noise (Azrilah et al., 2013). Thus, this dimensionality results showed the examination papers need to be revised especially in terms of the weightage of important content that being asked in the questions. Level Item of Difficulty Table 2.0 and Table 3.0 show in the details of the classifications of difficulty level using RMM and CTT respectively. It can be seen that items Q17, Q49, Q60 were in moderate difficulty level according to Rasch Analysis. Under CTT, these items were in high level difficulty. For items Q2, Q27, Q58, they were in moderate level using RMM and were in low level using CTT. Both CTT and RMM values of item difficulty index were standardized by transforming the values to Z-Scores. The score comparison was analyzed using t- test analysis. The result shows there was no significant difference [t(59),p>.05 ] of item difficulty index using CTT approach and RMM approach. Person Ability As can be seen from both Table 4.0 and Table 5.0, there was no student placed under high ability students. Students S45, S44, S1, S36, S37, S40, S49, and S8 were placed in moderately high ability in both RMM and CTT. Students S39, S12, S19, S35, S61, S88, S18, S21, S38, S48, S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93 and S96 were in moderately high in RMM, in contrast to CTT, they were placed in moderately low ability. In RMM, students S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, and S68 were placed in moderately low, while in CTT, they were placed in students with low ability in responding to the test. Both CTT marks and RMM person measure values were standardized by transforming the values to Z-Scores. The comparison was analyzed using t-test analysis. Finding shows there was no significant difference [t(99), p<.05] of person ability according to CTT and RMM analysis.
6 178 MAN IN INDIA TABLE 1: UNIDIMENSIONALITY Assumption of Unidimensionality Percentage (%) Raw variance Explained (Empirical) 21.6 Raw variance Explained (Model) 21.3 Unexplained Variance (1 st Contrast) 5.6 Level of Difficulty High (above logit 0.82) TABLE 2: CLASSIFICATION OF ITEM DIFFICULTY LEVEL SUBJECT B (RMM) Items Q32, Q1, Q10, Q50, Q34, Q31, Q46, Q14, Q28, Q11 Moderately High (between logit Q17, Q49, Q60, Q30, Q40, Q45, Q5, Q29, Q36, 0.82 to logit 0.00) Q44, Q47, Q57, Q18, Q39, Q41, Q25, Q42, Q43, Q52 Moderately Low (between logit 0.00 to Q26, Q3, Q37, Q15, Q24, Q53, Q55, Q7, Q38, logit -1.18) Q56, Q16, Q20, Q22, Q6, Q19, Q48, Q21, Q33, Q13, Q4, Q8 Low (below logit -1.18) Q51, Q59, Q9, Q12, Q35, Q54, Q23, Q2, Q27, Q58 Level of Difficulty TABLE 3: CLASSIFICATION OF ITEM DIFFICULTY LEVEL SUBJECT B (CTT) Items High (p 0.3) Q32, Q1, Q10, Q50,Q34, Q31, Q46, Q14, Q28, Q11, Q17, Q60,Q49 Moderate (0.31 p 0.79) Q30, Q40, Q45, Q5, Q29, Q36, Q44, Q47, Q57, Q18, Q39, Q41, Q25, Q42, Q43, Q52, Q26, Q3, Q37, Q15, Q24, Q53, Q55, Q7, Q38, Q56, Q16, Q20, Q22, Q6, Q19, Q48, Q21, Q33, Q13, Q4, Q8, Q51, Q59, Q9, Q12, Q35, Q54, Q23 Low (p 0.8) Q2, Q27, Q58 Level of Person Ability TABLE 4: CLASSIFICATION OF PERSON ABILITY FOR SUBJECT B (RMM) Person Moderately High (between logit S45, S44, S1, S36, S37, S40, S49, S8, S39, S12, 0.82 to logit -0.37) S19, S35, S61, S88, S18, S21, S38, S48, S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93, S96 Moderately Low (between logit S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, to logit -1.18) S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, S68
7 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST TABLE 5: CLASSIFICATION OF PERSON ABILITY FOR SUBJECT B (CTT) Level of Person Ability Person Moderately High S45, S44, S1, S36, S37, S40, S49, S8 (Marks: 74 to 60) (Grade Point: 3.33 to 2.67) Moderately Low S39, S12, S19, S35, S61, S88, S18, S21, S38, S48, (Marks: 59 to 45) S10, S22, S28, S34, S46, S69, S9, S24, S42, S57, (Grade Point: ) S11, S20, S4, S47, S55, S59, S6, S62, S81, S89, S93, S96 Low S2, S23, S33, S41, S50, S64, S7, S70, S74, S83, (Marks: 44 to 00) S84, S100, S14, S16, S26, S27, S3, S30, S56, S58, (Grade: ) S66, S72, S73, S77, S79, S87, S94, S95, S31, S51, S78, S86, S92, S13, S15, S29, S43, S54, S63, S71, S80, S85, S90, S97, S98, S99, S17, S5, S65, S91, S25, S32, S52, S75, S76, S82, S53, S60, S67, S68 Discussion Findings show there were no significant differences towards item difficulty and students ability using RMM and CTT. Research done by Idowu et al. (2011) indicated that item statistics derived from the two measurement frameworks are quite comparable and appeared to be similar for CTT and IRT. However, in categorizing the item difficulty and students ability based on cut-off score, there were some same items that fall under different level and same person was categorized under different abilities. These finding supported by research done by Dibu (2013) where person statistics derived by CTT and IRT produces similar results. Amir, et al. (2008) also found that an analysis of the ability level of individual examinee lead to similar results across the different measurement theories. Fan (1998) examined the behaviors of item and person statistics using IRT and CTT, showed that there was not much difference of item and person statistics using CTT and 1-PL, 2-PL, 3-PL. The likeness of the findings among the researchers shows that by using the total score of the marks, the probability to rank students in the same level is high using CTT and IRT. This is because both IRT and CTT at the first place will use the total score without make any justification of students pattern and process in answering the questions. Hence, the possibility to rank them in the same ability is high. For example if two different students got the same marks in exam, let say they got 80 marks, both CTT and IRT will placed them under same abilities based on their raw scores. In CTT, the interpretation of this achievement will conclude as same, but not in IRT. In IRT, the interpretation of students answer is based on student responses on easy and difficult items. The students with same marks will be interpreted as having different abilities if one of them score more on easier item while the other one
8 180 MAN IN INDIA score more on difficult items. The student that answers more difficult items correctly will be classified as student with higher abilities. In IRT, analysis of scalogram using Guttman Scale is the best way to differentiate students according to their ability in answering difficult items. For example, if the students with high marks answer more on difficult items correctly, it shows a positive direction, but if the students got high marks with more easy items correctly, while many difficult items are wrong, the direction is negative and the ability will consider lower than the previous types of students. From the scalogram, the pattern of students answer can be predicted. For examples, we can predict either the students have lucky guess on answering some items correctly or the students really have good knowledge in answering the items. Prediction can also be made if the students do not answer the items. IRT can determine either the students really do not know the answer, or have not time enough to answer the item or may be the students intentionally not answering the item. By analyzing all of these patterns thru Guttman scale in scalogram, the fair judgment towards students performance and accurate decision making can be done. IRT is theoretically considered as the superior measurement framework over CTT. Although this study found that item and person statistics appeared to be no significant differences between these two measurement frameworks, but the interpretation using IRT will give rich information in judging students achievement. Acknowledgment This research was funded by Ministry of Education and Research Management Centre UTM thru Fundamental Research Grant Scheme Vott Number 4f381 References Adedoyin, O. O., & Adedoyin, J. A. (2013). Assessing the comparability between classical test theory (CTT) and item response theory (IRT) models in estimating test item parameters. Herald Journal of Education and General Studies, Volume (2), Amir, Z., Atiq-Ur-Rehman, K., Mamoon, M., & Arshad, A. (2008). Students Ranking, Based on their Abilities on Objective Type Test: Comparison of CTT and IRT. Proceedings of the EDU-COM 2008 International Conference Azrilah, A.A, Saidfudin, M.M & Azami, Z. (2013). Asas Model Pengukuran Rasch Pembentukan Skala & Struktur Pengukuran. Malaysia. Penerbit Universiti Kebangsaan Malaysia. Braun, H., Kanjee, A., Bettinger, E., & Kremer, M. (2006). Improving Education Through Assessment, Innovation, and Evaluation. Cambridge: American Academy of Arts and Sciences. Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44(1), Dibu, O. O. (2013). Classical Test Theory (CTT) VS Item Response Theory (IRT): An Evaluation of the Comparability of Item Analysis Results. Lecture Presentation. Lecture conducted from, Abuja, Nigeria. (May, 23).
9 MULTIPLE-CHOICE ITEMS ANALYSIS USING CLASSICAL TEST Fan, X. (1998). Item Response Theory and Classical Test Theory: An Empirical Comparison of their Item/Person Statistics. Educational and Psychological Measurement, Gleason, J. (2008). An evaluation of mathematics competitions using item response theory. Notices of the ACM, 55(1). Gronlund, N. E., & Linn, R. L. (1990). Measurement and Evaluation in Teaching. (6 th ed.). New York: Macmillan Publishing Company. Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice. Hambleton, R. K., & Swaminathan, H. (1995). Item response theory: principles and applications. Norwell Kluwer-Academic Publishers. Harvill, L. M. (1991). Standard Error of Measurement. Educational Measurement: Issues and Practice, 10: Idowu, E. O., Eluwa, A. N., & Abang, B. K. (2011). Evaluation of Mathematics Achievement Test: A Comparison Between Classical Test Theory (CTT) and Item Response Theory (IRT). Journal of Educational and Social Research, 1(4). Kline, T. (2005). Psychological Testing a Practical Approach to Design and Evaluation. Thousand Oaks: Sage Publications. Magno, C. (2009) Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data. The International Journal of Educational and Psychological Assessment, 1(1), McAlpine, M. (2002). A Summary of Methods of Item Analysis. University of Glasgow: Robert Clark Centre for Technological Education. Miller, L. A., McIntire, S. A., & Lovler, R. L. (2011). Foundations of psychological testing: a practical approach. (3 rd ed.). Thousand Oaks: Sage Publications. Mitra, N.K., Nagaraja, H.S., Ponnudurai, G. & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type A multiple choice questions of Pre-clinical Semester 1 multidisciplinary summative tests. IeJSME, 3, 1, pp. 2-7 Musial, D., Nieminen, G., Thomas, J., & Burke, K. (2009). Foundations of Meaningful Educational Assessment. New York: McGraw-Hill. Neşe, G., Gülden, K. U., & Gülşen, T. T. (2013). Comparison of classical test theory and item response theory in terms of item parameters. European Journal of Research on Education, 2(1), 1-6. Lawson, D. M. (2006). Applying the Item Response Theory to Classroom Examinations. Journal of Manipulative and Physiological Therapeutics, Ludlow, L.H. & Haley, S.M. (1995). Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55: Riley, R., & Cantu, N. (2000). The Use of Tests as Part of High-Stakes Decision-Making for Students: A Resource Guide for Educators and Policy Makers. Washington, DC: U.S. Department of Education, Office of Civil Rights. Stage, C. (2003). Classical Test Theory or Item Response Theory: The Swedish Experience. Centro de Estudios Públicos, 42.
10
Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationEmpowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison
Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological
More informationContents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD
Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT
More informationProceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)
EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational
More informationItem Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century
International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationConstruct Validity of Mathematics Test Items Using the Rasch Model
Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,
More informationThe Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory
The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health
More informationDuring the past century, mathematics
An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first
More informationComprehensive Statistical Analysis of a Mathematics Placement Test
Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational
More informationMCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2
MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts
More informationUsing Analytical and Psychometric Tools in Medium- and High-Stakes Environments
Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session
More informationTurning Output of Item Response Theory Data Analysis into Graphs with R
Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te
More informationAndré Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationUSE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION
USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,
More informationParallel Forms for Diagnostic Purpose
Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing
More informationMantel-Haenszel Procedures for Detecting Differential Item Functioning
A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of
More informationDevelopment, Standardization and Application of
American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,
More informationInfluences of IRT Item Attributes on Angoff Rater Judgments
Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts
More informationValidity and Reliability of the Malaysian Creativity and Innovation Instrument (MyCrIn) using the Rasch Measurement Model
Validity and Reliability of the sian Creativity and Innovation Instrument (MyCrIn) using the Rasch Measurement Model SITI RAHAYAH ARIFFIN, FARHANA AHMAD KATRAN, AYESHA ABDULLAH NAJIEB BADIB & NUR AIDAH
More informationTechniques for Explaining Item Response Theory to Stakeholder
Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society
More informationGENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS
GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at
More informationExamining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches
Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of
More informationThe Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities
DOI: 10.7763/IPEDR. 2014. V 78. 21 The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities Azra Ayue Abdul Rahman 1, Siti Aisyah Panatik
More informationItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides
More informationCentre for Education Research and Policy
THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An
More informationREPORT. Technical Report: Item Characteristics. Jessica Masters
August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut
More informationITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE
California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION
More informationThe Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity
Pertanika J. Soc. Sci. & Hum. 25 (S): 81-88 (2017) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity
More informationAscertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2014 UTME Physics test
Ascertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2014 UTME Physics test Sub Theme: Improving Test Development Procedures to Improve
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationThe Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland
Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University
More informationPsychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals
Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe
More informationDiscrimination Weighting on a Multiple Choice Exam
Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright
More informationDescription of components in tailored testing
Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of
More informationPublished by European Centre for Research Training and Development UK (
DETERMINATION OF DIFFERENTIAL ITEM FUNCTIONING BY GENDER IN THE NATIONAL BUSINESS AND TECHNICAL EXAMINATIONS BOARD (NABTEB) 2015 MATHEMATICS MULTIPLE CHOICE EXAMINATION Kingsley Osamede, OMOROGIUWA (Ph.
More informationlinking in educational measurement: Taking differential motivation into account 1
Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to
More informationCYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)
DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;
More informationItem Analysis Explanation
Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between
More informationHaving your cake and eating it too: multiple dimensions and a composite
Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite
More informationLinking Assessments: Concept and History
Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.
More informationTHE NATURE OF OBJECTIVITY WITH THE RASCH MODEL
JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that
More informationLikelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.
Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions
More informationDiagnostic Classification Models
Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom
More informationBruno D. Zumbo, Ph.D. University of Northern British Columbia
Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.
More informationAN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK
AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK Hanny Pradana, Gatot Sutapa, Luwandi Suhartono Sarjana Degree of English Language Education, Teacher
More informationWorld Academy of Science, Engineering and Technology International Journal of Psychological and Behavioral Sciences Vol:8, No:1, 2014
Validity and Reliability of Competency Assessment Implementation (CAI) Instrument Using Rasch Model Nurfirdawati Muhamad Hanafi, Azmanirah Ab Rahman, Marina Ibrahim Mukhtar, Jamil Ahmad, Sarebah Warman
More informationBrent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014
Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating
More informationUsing the Score-based Testlet Method to Handle Local Item Dependence
Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.
More informationA Comparison of Several Goodness-of-Fit Statistics
A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures
More informationComparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria
Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill
More informationStudents' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Students' perceived understanding and competency
More informationApplication of Logistic Regression Model in Physics Education
Application of Logistic Regression Model in Physics Education Shobha Kanta Lamichhane Tribhuvan University, Prithwi Narayan Campus, Pokhara, Nepal sklamichhane@hotmail.com Abstract This paper introduces
More informationEVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS
DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main
More informationHow Many Options do Multiple-Choice Questions Really Have?
How Many Options do Multiple-Choice Questions Really Have? ABSTRACT One of the major difficulties perhaps the major difficulty in composing multiple-choice questions is the writing of distractors, i.e.,
More informationMEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS
MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades
More informationShiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )
Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement
More informationIssues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy
Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus
More informationRegistered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics
Registered Radiologist Assistant (R.R.A. ) Examination Statistics INTRODUCTION This report summarizes the results of the Registered Radiologist Assistant (R.R.A. ) examinations developed and administered
More informationA Bayesian Nonparametric Model Fit statistic of Item Response Models
A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely
More informationResearch and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida
Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality
More informationValidating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky
Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University
More informationDifferential Item Functioning
Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item
More informationEmpirical Formula for Creating Error Bars for the Method of Paired Comparison
Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science
More informationThe Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing
The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in
More informationBy Hui Bian Office for Faculty Excellence
By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys
More informationA Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model
A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson
More informationRATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study
RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with
More informationDetecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker
Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National
More informationTHE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.
Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to
More informationOn indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state
On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties
More informationABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III
ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING BY John Michael Clark III Submitted to the graduate degree program in Psychology and
More informationItem Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses
Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,
More informationITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS
ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS 1 SARITA DESHPANDE, 2 RAVINDRA KUMAR PRAJAPATI 1 Professor of Education, College of Humanities and Education, Fiji National University, Natabua,
More informationAuthor s response to reviews
Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)
More informationDoes factor indeterminacy matter in multi-dimensional item response theory?
ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional
More informationConstruct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale
University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey
More informationPsychological testing
Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining
More informationThe Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective
Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item
More informationBayesian Tailored Testing and the Influence
Bayesian Tailored Testing and the Influence of Item Bank Characteristics Carl J. Jensema Gallaudet College Owen s (1969) Bayesian tailored testing method is introduced along with a brief review of its
More informationExamining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*
Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This
More informationINVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form
INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement
More informationWRITING CR ITEMS. Anjuran: Unit Hal Ehwal Kurikulum BHEA
Anjuran: Unit Hal Ehwal Kurikulum WRITING CR ITEMS PART I: WHAT IS A TEST ITEM? The test item has the following key points: Unit of measurement A stimulus and a prescriptive form for answering The response
More informationRESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners
APPLIED MEASUREMENT IN EDUCATION, 22: 1 21, 2009 Copyright Taylor & Francis Group, LLC ISSN: 0895-7347 print / 1532-4818 online DOI: 10.1080/08957340802558318 HAME 0895-7347 1532-4818 Applied Measurement
More informationImpact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.
Psicológica (2009), 30, 343-370. SECCIÓN METODOLÓGICA Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data Zhen Li & Bruno D. Zumbo 1 University
More informationExploratory Factor Analysis Student Anxiety Questionnaire on Statistics
Proceedings of Ahmad Dahlan International Conference on Mathematics and Mathematics Education Universitas Ahmad Dahlan, Yogyakarta, 13-14 October 2017 Exploratory Factor Analysis Student Anxiety Questionnaire
More informationBasic concepts and principles of classical test theory
Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer
More informationThe Lens Model and Linear Models of Judgment
John Miyamoto Email: jmiyamot@uw.edu October 3, 2017 File = D:\P466\hnd02-1.p466.a17.docm 1 http://faculty.washington.edu/jmiyamot/p466/p466-set.htm Psych 466: Judgment and Decision Making Autumn 2017
More informationAnalyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia
Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA
More informationStructural Equation Modeling (SEM)
Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores
More informationMeasuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky
Measuring the External Factors Related to Young Alumni Giving to Higher Education Kathryn Shirley Akers 1, University of Kentucky J. Travis McDearmon, University of Kentucky 1 1 Please use Kathryn Akers
More informationProperties of Single-Response and Double-Response Multiple-Choice Grammar Items
Properties of Single-Response and Double-Response Multiple-Choice Grammar Items Abstract Purya Baghaei 1, Alireza Dourakhshan 2 Received: 21 October 2015 Accepted: 4 January 2016 The purpose of the present
More informationA Comparison of Three Measures of the Association Between a Feature and a Concept
A Comparison of Three Measures of the Association Between a Feature and a Concept Matthew D. Zeigenfuse (mzeigenf@msu.edu) Department of Psychology, Michigan State University East Lansing, MI 48823 USA
More informationNonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia
Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla
More informationMEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS
MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS Farzilnizam AHMAD a, Raymond HOLT a and Brian HENSON a a Institute Design, Robotic & Optimizations (IDRO), School of Mechanical
More informationAN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS. A Dissertation TROY GERARD COURVILLE
AN EMPIRICAL COMPARISON OF ITEM RESPONSE THEORY AND CLASSICAL TEST THEORY ITEM/PERSON STATISTICS A Dissertation by TROY GERARD COURVILLE Submitted to the Office of Graduate Studies of Texas A&M University
More informationIntroduction to Reliability
Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What
More information