Comprehensive Statistical Analysis of a Mathematics Placement Test

Size: px
Start display at page:

Download "Comprehensive Statistical Analysis of a Mathematics Placement Test"

Transcription

1 Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational Psychology Texas A&M University, USA (doduli@tamu.edu) Michael S. Pilant Department of Mathematics Texas A&M University, USA (mpilant@tamu.edu) Background Abstract: As part of an NSF Science, Technology, Engineering and Mathematics Talent Expansion Project (STEP) grant, a math placement exam (MPE) has been developed at Texas A&M University for the purpose of evaluating the pre-calculus mathematical skills of entering students. It consists of 33 multiple-choice questions, and is taken online. A score of 22 or higher is required in order to register for Calculus I. Following admission to the University and prior to registering for classes, approximately 4,500 students take the placement exam each fall. To date, more than 15,000 students have taken the MPE. This paper focuses on the psychometric properties of the MPE based on analysis of test score results over a four-year period from 2008 to An item response theory (IRT) analysis has been performed. Various other statistical tests, such as a confirmatory factor analysis and the computation of Cronbach's alpha (α), have also been performed. The value for Cronbach's α (using the entire test sample) exceeds This is comparable to high-stakes placement tests such as the SAT and the Advanced Placement tests. A detailed description of the statistical analyses performed on the MPE, as well as the results and their interpretation, will be presented in this paper. Finally, a brief analysis of the degree to which student performance in Calculus I can be predicted using MPE scores is presented. Texas A&M University has the second largest engineering program in the United States, with over 8,000 undergraduate engineering majors. Traditionally, students in this program take calculus during their freshman year, along with physics and other science, technology, engineering, and mathematics (STEM) courses. Some students are not sufficiently prepared and have difficulty passing their mathematics courses. In order to identify students with potential problems, a Math Placement Exam (MPE) was developed. Our goal was to develop a reliable, robust measure of the preparedness of incoming students for college-level calculus. Beginning in 2008, almost all entering freshmen were required to take one of two math placement exams, either for engineering calculus or for business math. More than 4,000 students took the exams in summer 2011 prior to entering Texas A&M, and 1,781 students were enrolled in Calculus I during the fall 2011 semester. As of January 2012, data (MPE scores and grades) for more than 15,000 students in the science and engineering programs has been collected. This study focuses on three research areas: 1. Internal consistency and robustness of the test instrument; 2. Number of latent variables present; and 3. Difficulty and discrimination of each test item.

2 Conceptual Design of the Mathematics Placement Exam (MPE) The MPE consists of 33 multiple choice items covering the following areas: polynomials, functions, graphing, exponentials and logarithms, and trigonometric functions. Questions were designed by two veteran faculty members who are experienced in teaching both pre-calculus and calculus. Each problem has 15 variants constructed from a template with different parameters. Consequently, each of the variants is of equal difficulty. When an exam is created, a question is selected and one of 15 variants of that question is delivered online to the student. Questions are delivered in the same order in every exam. This ensures that each exam is of equal difficulty. After the student selects a response, another question is delivered until all questions are completed. Students have an opportunity to review questions and answers before submitting the test. Following submission, questions are graded (correct or incorrect), and scores are returned to students. There are 15 N different versions of the MPE where N is the number of questions on the exam. Consequently, students each receive a unique set of questions on their version of the MPE. Based on cumulative performance data, a cutoff score of 22 was established to help ensure basic algebra and pre-calculus skills in Calculus I. Historically, more than 70% of students with placement scores of 22 or greater pass Calculus I. Passing is defined as receiving a grade of A, B, or C. Under current guidelines, a student must score 22 or higher on the MPE in order to enroll in Calculus I. If they score below 22, they make take the exam again (after waiting a month) or enroll in a summer program called the Personalized Pre-Calculus Program (PPP). Until fall 2011, students with less than 22 could still self-enroll in Calculus I, but as of fall 2011, the registration system blocks registration in Calculus I if their MPE score is less than 22. Statistical Analysis Reliability Under classical test theory, reliability is defined as the proportion of the true score variance over the observed score variance (Raykov & Marcoulides, 2011). Practically, we can define the reliability as the consistency of the test scores across different administrations or different populations. Among the many indexes for reliability, Cronbach s α is the most widely used, since internal consistency among test items can be measured with only a single test administration. Cronbach s α is calculated by taking the mean of all possible split-half coefficients which are computed using Rulon s method (Crocker & Algina, 1986). Given the entire data set (over 15,000 tests), the Cronbach's α coefficient value of the test items is 0.901, which indicates that the internal consistency among the items is very good. One of the important features of the Cronbach's α test is the re-calculation of α when one question is removed. Removing the question that reduces α the least gives a reduced set of questions with the maximal α. This can be done repeatedly until one has a minimal number of questions with an α above a certain level. The results of this procedure are shown below in Figure 1. Figure 1. Values of Cronbach s α Removing One Question at a Time from the MPE (n =15,128)

3 In general, good internal consistency amongst items on a test is indicated when the Cronbach s α coefficient is greater than 0.8. The MPE has Cronbach α's which exceed 0.8 for the combined year data (N=15,128) and for data separated by year (Table 1). Therefore, we can say that the MPE has very good internal consistency. Construct Validity Year Number of Students Mean Difficulty Standard Deviation Cronbach s α Table 1. Values of Cronbach s α for the MPE by Year Under the traditional confirmatory factor analysis (CFA) model, only continuous observed variables can be handled. Often, however, questions are scored dichotomously, correct or incorrect. Advances in CFA modeling allow researchers to test factor structures using measures made up of dichotomous items. The CFA model for categorical data does not use the item scores themselves, but depends on the assumption of an underlying, normally distributed variable behind each discrete item or instrument component (Raykov & Marcoulides, 2011). Confirmatory factor analysis has been widely used to test factor structures underlying tests or instruments with multiple items. For example, cognitive tests are usually designed to measure one or more theoretical latent constructs, such as math proficiency, reading comprehension, or problem-solving skills. The primary function of CFA models is to relate observed variables with unobserved latent constructs. CFA allows researchers to test hypotheses about a particular factor structure by examining both the dimensionality (i.e., the number of underlying factors) of the test and the pattern of relations between the items and factors (Brown, 2006). To evaluate the construct validity of the MPE, various categorical data analysis models were tested using the software package Mplus6 (Muthén & Muthén, ). Confirmatory factor analysis (CFA) of categorical variables uses the concept of latent variables. These variables are not directly observable but are assumed to give rise to the categorical responses. For the MPE, observed variables are scores on the algebra test items and the latent construct is thought to be higher-order algebra skills. Even though the observed scores of our data are dichotomous, the latent variables are thought to be continuous and assumed to be normally distributed. The estimation method used is weighted least squares. Fundamentally, the CFA model under the MPlus6 Program is equivalent to the two parameter item response theory (2-P IRT) model. For this analysis, we chose to try to fit a two parameter normal-ogive model as opposed to a two parameter Logistic model (2-PL). The normal-ogive model is mathematically more difficult to fit but provides a more robust, direct interpretation than does the logistic regression model. Mplus6 provides indicators of overall model fit and parameter estimates (i.e., difficulty and discrimination) for the two parameter IRT model. Ultimately, however, we would like to be able to fit the more restrictive one parameter model as we are primarily interested in whether dimensionality (i.e., one factor under the set of items) is plausible or not. We want to establish that the MPE can be characterized by a single latent higher-order algebra factor. We begin by fitting the two parameter, less restrictive model, and then proceed to examining fit using a one parameter model. The overall model fit for a two factor model can be evaluated by computing a chi-square statistic and accompanying fit indices, root mean squared error of approximation (RMSEA) and comparative fit index (CFI). The items of the MPE were fitted to the two factor model across years and by year. Results are shown in Table 2. Although the chi-square values were statistically significant (indicating that our data did not fit a two parameter model across years or within year), the MPE data samples were large, and large samples often result in significant chi-square values (Brown, 2006). In such cases, it is both necessary and appropriate to check other fit statistics, such as the CFI and RMSEA indices, before judging model fit. Conceptually, the comparative fit index (CFI) measures the distance of the proposed model from a bad model, one that does not pose any relationship amongst variables (i.e., covariances among all input indicators are fixed to zero). Hence, a larger CFI value, 0.95 or above, is deemed acceptable. For the MPE, the combined year data had a CFI of Individual year data had CFIs of 0.977, 0.979, 0.982, and 0.980, respectively. CFIs, therefore, offer a compelling argument for a two parameter model. The RMSEA measures the discrepancy between the model and the observed data; therefore, smaller values of the RMSEA are better. Typically, an RMSEA less than 0.05 is thought to represent a good fit between the proposed model and the observed data. For the MPE, the RMSEA value for the combined years was while RMSEA values for years 2008, 2009, 2010, and 2011 were

4 0.027, 0.025, 0.024, and 0.025, respectively. All RMSEAs were small, indicating very good model fit. To summarize, CFA analyses were interpreted to indicate a good fit for a two factor (one latent variable) model whether data were combined into one large group or separated into year cohorts. To further characterize this model, we turned to item response theory (IRT). Year N χ2 df CFI RMSEA Total Table 2. Uni-dimensionality Confirmatory Factor Analysis Results (Two Parameter Model) Item Response Theory (IRT) Analysis Item Response Theory is called a latent trait model since it uses a mathematical model that relates a theorized latent construct (or trait) with the observed item response using item parameters (Hableton, et al., 1991). In order to fit observed data to an IRT model, two assumptions should be met. First, the dimensionality of the items (i.e., the number of latent variables) should be confirmed. Second, the items should not be correlated after taking account for the latent factor (or factors). This assumption is called local independence. Based on the CFA analysis, we can say that the uni-dimensionality of the complete set of MPE items is supported. If there is a big modification index, suggesting correlation between unique factor scores under the CFA analysis, the local independence assumption is considered to be violated. In our analysis, there was no evidence for violation of the local independence assumptions based on review of the modification indices which did not suggest adding any error covariances to reduce the overall χ 2. Therefore, the use of a uni-dimensional, single latent variable IRT model is justified. Often, cognitive test items are scored as dichotomous (i.e., correct or incorrect). The IRT model allows for analysis of dichotomous data by theorizing a latent trait with the probability of the correct response for each item. Basically, the dichotomous IRT model is an S-shaped curve. The horizontal axis represents the values of the latent trait scaled to have a mean of zero, and the vertical axis represents the probability of making a correct response (see Figure 2). Figure 2. Item Characteristic Curve: Dichotomous IRT Model

5 The IRT model mathematically captures item difficulty and the individual's ability on the same continuum. For this reason, we can compare the item difficulty and individual s ability level directly. There are three generally accepted dichotomous IRT models. The first IRT model (one parameter) postulates a single parameter, item difficulty, as the underlying latent variable. This corresponds to equation (1) where c i =0 and the a i are the same for each question. This is closely related to the classical Rasch model (Rasch, 1961). In the context of a person s ability, measurement of the parameter implies that one needs more ability to endorse or pass a more difficult item. In Figure 2, the x-value corresponding to the 0.5 probability of passing an item indicates the difficulty parameter for that item. The second IRT model involves two parameters and is referred to as the (2-PL) model. In this model, an attempt is made to fit two parameters to the observed data, item discrimination and difficulty. The discrimination parameter represents how well the item differentiates individuals according to their ability level. In other words, the slope of the item response curve where the difficulty level locates is the discrimination parameter. This corresponds to equation (1) where c i = 0 for each question. The third IRT model is a three-parameter (3-PL) model that allows the parameter c i to vary in addition to the difficulty and discrimination parameters. It is called a pseudo-guessing parameter and represents the probability of passing an item with very low ability level. The three parameter IRT model can be written as the following equation: c i + (1 c i ) e a i( θ b i) 1 +e a i( θ b i) (1) In equation (1), the parameter b i represents the difficulty parameter of the i th item. The parameter a i represents the discrimination of the i th item. The parameter c i represents the pseudo-guessing parameter of the i th item. As the skill (latent trait) increases, the probability of getting the correct response goes to 1. As the skill decreases, the probability asymptotically approaches that of guessing. The effect of guessing is reduced as the value of the latent variable (skill) increases. In the case of the MPE, we chose not to fit a three parameter model. Conceptually, the pseudo-guessing parameter is problematic because it does not take into account differential option attractiveness; thus the random guessing model s assumption is not reflected in the response data. Given potential problems with the pseudoguessing parameter, de Ayala (2009) suggests that a two parameter model may provide a sufficiently reasonable representation of the data (p. 126). To handle random guessing, we based our analyses on protocols with scores of 7 or higher. On the 33-item MPE, with each question having 5 possible responses, random guessing could result in scores of 6 or below. MPlus6 was used to fit both one and two parameter IRT models to the MPE data. Data were fitted to both models as a whole (i.e., combining years) and by year. MPlus6 also allows us to fit the one and two parameter IRT models using both normal-ogive (inverse normal cumulative distribution) and logistic functions. In this analysis, we used the normal-ogive function. First, the two parameter model which is equal to the confirmatory factor analysis model previously tested was found to be feasible with only one dimensionality. Next, a one parameter IRT model was tested by fixing all the factor loadings to be the same. Table 3 presents the results of the one parameter IRT analyses of data combined across years and separated by year. The combined data (N = 15,128) has a good fit with indices of for CFI and for RMSEA. However, for the two parameter model, the data by year fit is not as good. All the RMSEAs indicate good fit while all the CFIs indicate adequate fit. Conversely, the one parameter model showed acceptable fit statistics for the combined-year as well as for the by-year MPE data. Year N χ2 df CFI RMSEA Total Table 3. Confirmatory Factor Analysis Results (i.e., the One Parameter IRT model)

6 In summary, further analyses regarding the difficulty and discrimination parameters will follow depending upon need, but at this point we are confident that the MPE instrument is psychometrically sound, measuring a single latent variable sensitive to how well items differentiate individuals according to their ability level. This is fundamentally true whether we look at the group as a whole or break the group into year-based cohorts. Predicting Student Performance Using the MPE Given that the outcome variable is categorical (i.e., A, B, C, D, F, or Pass/Fail), measurement of the Pearson correlation coefficient is not a good indicator of the relationship between grades and MPE scores. In the case of our data, for example, the resulting Pearson correlation coefficient r 2 values ranged between 0.15 and To better understand the relationship between MPE scores and grades, we switched to an Odd s Ratio or probability analysis. Using historical data, we can compute the frequency of students passing the course with MPE scores in a given range. This cumulative frequency distribution function can be modeled very accurately by a two parameter logistic function. Consequently, instead of using grades as the outcome variable, we compute the probability of passing with MPE scores in a given range. The r 2 value between the output of this model and the actual historical data exceeds Using this model, we can identify an MPE score such that 70% of the students with this score or higher pass Calculus I. The MPE cutoff score of 22 originated from this analysis. MPE Scoring Across Years Using a logistics curve fitting model, it was determined that a cutoff score of 22 for the 33 item MPE instrument could be used to indicate who would be successful in the beginning calculus class, Calculus 1. The MPE instrument, psychometrically, does a good job of discriminating performance on one latent variable (measuring calculus readiness or pre-calculus ability ) across item difficulty levels. The instrument was designed to measure the latent variable, higher order algebraic ability, and cutoff scores of 22 indicate a probability of about 0.7 that students majoring in engineering or the sciences will be successful in the gateway calculus class, Calculus 1. The following analyses look at the relationship between student grades in Calculus 1 to measured ability on the Math Placement Exam. Table 4 provides descriptive statistics for the MPE by year for 15,160 students. Average scores for the MPE are similar across years but there is a noticeable upward (creeping) trend in the scores. An ANOVA with Year as the independent variable and MPE score as the dependent variable produced a statistically significant main effect for year (F (3, 15,156 df) = 32.44; p <.000). Subsequent post hoc testing (Tukey HSD and Bonferroni) indicated that the mean MPE score for 2008 was lower than the mean score for any other year. The mean MPE scores for 2009 and 2010 grouped together (no difference between means) as did the mean scores for 2010 and Overall, there is evidence that might be interpreted to suggest that over time MPE scores are increasing. What does this mean for using a firm cutoff point for the MPE? First, even though the cohort differences appear to be getting larger, the largest cohort mean difference is only 1.5 questions. Although the difference is statistically significant, it may not be meaningfully significant. Meaningful significance is addressed through the concept of explained or common variance, the effect size. Partial eta squared for the year main effect was or 0.6 of one percent. Knowing a student s MPE score tells us nothing about the student s cohort year. An artifact of large samples is small standard error values making small mean differences statistically significant. Year Mean N S.D Total a a Note. At the time of this analysis, grades for 32 additional students were paired with MPE scores increasing the n from 15,128 to 15,160 Table 4. Descriptive Statistics for MPE by Year

7 Despite the fact that the mean increases are small and not of major concern there is a trend upward. The creep in scoring, however, makes sense given the evolution of the instrument. Initially, in 2008, students were asked to complete the MPE before beginning their freshman year. At that time, the MPE was in a development phase. Since nothing was at stake, students tended not to take the MPE more than one time. As the level of MPE scoring has become tied to placement in freshman math classes, students are now more aware that low scores on the MPE will prevent them from taking Calculus 1 as an entering freshman. Since the highest score on repeated administrations of the MPE is the officially registered MPE value, we might expect some upward movement like that seen in the current data set. Students are also made aware, through academic counseling, that poor performance on the MPE reflects relative weakness in a skill area, complex algebra, which is related to poor performance in the 4-course engineering math sequence. Students taking time to refresh or instantiate their algebraic skills may improve their performance on the MPE, thus contributing to the slight elevation in overall performance observed in the data set. Summary and Conclusions The purpose of IRT and Rasch modeling is to provide a framework for evaluating how well assessments work and how well individual items on assessments work (Embretson & Reise, 2000). Typically they are used in conjunction with test development, not to test whether assessments are psychometrically sound following development. We built the MPE using subject-matter experts, then used Rasch and IRT modeling to validate the MPE, thus lending support to the product development process. By performing a CFA analysis of the MPE data, we found that we could model the MPE data using a unidimensional, single latent variable model. From this information, we were justified in applying an item response theory analysis which supported a two parameter model. RMSEA and CFI values confirmed the fit of the two parameter model. In addition, we looked at the internal consistency of the 33-item MPE and found Cronbach s α values of approximately This was true both cumulatively and by year. This indicates the high internal consistency of the MPE placement test both longitudinally and cumulatively. Finally, if the relationship between grades in Calculus I and MPE scores is expressed as the probability of passing Calculus I given an MPE score range, the result is an accurate prediction regarding the success (retention rate) of how students typically perform in Calculus I. In summary, results from these analyses, Cronbach s α, CFA, and IRT, attest to the psychometric soundness of the instrument for this sample of students. Moreover, the relationship between grades and MPE scores suggests that knowledge of performance on the MPE can be used to predict the probability that a student will experience difficulties in the first year engineering calculus sequence. We recognize, however, that what may be a good instrument for measuring the pre-calculus mathematical skills for students entering this particular university may not work well for measuring math readiness placement skills at other institutions due to differing student populations. The MPE developed at Texas A&M University for a particular engineering mathematics course provides some measure of confidence that one can develop placement exams with similar statistical properties for other courses at other institutions. References Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY: The Guilford Press. Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Fort Worth, TX: Harcourt College. de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press. Embretson, S. E. and Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Hableton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

8 Muthén, L.K. and Muthén, B.O. ( ). MPlus User s Guide. Sixth Edition. Los Angeles, CA: Muthén & Muthén. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, IV (pp ). Berkeley, CA: University of California Press. Raykov, T. & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. New York,NY: Routledge. Acknowledgements Research supported in part by a grant from the National Science Foundation, NSF-DUE# Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

College Student Self-Assessment Survey (CSSAS)

College Student Self-Assessment Survey (CSSAS) 13 College Student Self-Assessment Survey (CSSAS) Development of College Student Self Assessment Survey (CSSAS) The collection and analysis of student achievement indicator data are of primary importance

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century International Journal of Scientific Research in Education, SEPTEMBER 2018, Vol. 11(3B), 627-635. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

Factors Influencing Undergraduate Students Motivation to Study Science

Factors Influencing Undergraduate Students Motivation to Study Science Factors Influencing Undergraduate Students Motivation to Study Science Ghali Hassan Faculty of Education, Queensland University of Technology, Australia Abstract The purpose of this exploratory study was

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Techniques for Explaining Item Response Theory to Stakeholder

Techniques for Explaining Item Response Theory to Stakeholder Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society

More information

Known-Groups Validity 2017 FSSE Measurement Invariance

Known-Groups Validity 2017 FSSE Measurement Invariance Known-Groups Validity 2017 FSSE Measurement Invariance A key assumption of any latent measure (any questionnaire trying to assess an unobservable construct) is that it functions equally across all different

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

On the Many Claims and Applications of the Latent Variable

On the Many Claims and Applications of the Latent Variable On the Many Claims and Applications of the Latent Variable Science is an attempt to exploit this contact between our minds and the world, and science is also motivated by the limitations that result from

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Testing the Multiple Intelligences Theory in Oman

Testing the Multiple Intelligences Theory in Oman Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 190 ( 2015 ) 106 112 2nd GLOBAL CONFERENCE on PSYCHOLOGY RESEARCHES, 28-29, November 2014 Testing the Multiple

More information

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey Matthew J. Bundick, Ph.D. Director of Research February 2011 The Development of

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

FACTOR VALIDITY OF THE MERIDEN SCHOOL CLIMATE SURVEY- STUDENT VERSION (MSCS-SV)

FACTOR VALIDITY OF THE MERIDEN SCHOOL CLIMATE SURVEY- STUDENT VERSION (MSCS-SV) FACTOR VALIDITY OF THE MERIDEN SCHOOL CLIMATE SURVEY- STUDENT VERSION (MSCS-SV) Nela Marinković 1,2, Ivana Zečević 2 & Siniša Subotić 3 2 Faculty of Philosophy, University of Banja Luka 3 University of

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics Proceedings of Ahmad Dahlan International Conference on Mathematics and Mathematics Education Universitas Ahmad Dahlan, Yogyakarta, 13-14 October 2017 Exploratory Factor Analysis Student Anxiety Questionnaire

More information

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students 611456SGOXXX10.1177/2158244015611456SAGE OpenYockey and Kralowec research-article2015 Article Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students SAGE Open October-December

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Paul Irwing, Manchester Business School

Paul Irwing, Manchester Business School Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Structural Validation of the 3 X 2 Achievement Goal Model

Structural Validation of the 3 X 2 Achievement Goal Model 50 Educational Measurement and Evaluation Review (2012), Vol. 3, 50-59 2012 Philippine Educational Measurement and Evaluation Association Structural Validation of the 3 X 2 Achievement Goal Model Adonis

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

Section 5. Field Test Analyses

Section 5. Field Test Analyses Section 5. Field Test Analyses Following the receipt of the final scored file from Measurement Incorporated (MI), the field test analyses were completed. The analysis of the field test data can be broken

More information

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Marshall University Marshall Digital Scholar Management Faculty Research Management, Marketing and MIS Fall 11-14-2009 Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment Wai Kwan

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS

PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS PUBLIC KNOWLEDGE AND ATTITUDES SCALE CONSTRUCTION: DEVELOPMENT OF SHORT FORMS Prepared for: Robert K. Bell, Ph.D. National Science Foundation Division of Science Resources Studies 4201 Wilson Blvd. Arlington,

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

The MHSIP: A Tale of Three Centers

The MHSIP: A Tale of Three Centers The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.

More information

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA Running head: CFA OF STICSA 1 Model-Based Factor Reliability and Replicability of the STICSA The State-Trait Inventory of Cognitive and Somatic Anxiety (STICSA; Ree et al., 2008) is a new measure of anxiety

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

Survey Sampling Weights and Item Response Parameter Estimation

Survey Sampling Weights and Item Response Parameter Estimation Survey Sampling Weights and Item Response Parameter Estimation Spring 2014 Survey Methodology Simmons School of Education and Human Development Center on Research & Evaluation Paul Yovanoff, Ph.D. Department

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH

THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH THE MANTEL-HAENSZEL METHOD FOR DETECTING DIFFERENTIAL ITEM FUNCTIONING IN DICHOTOMOUSLY SCORED ITEMS: A MULTILEVEL APPROACH By JANN MARIE WISE MACINNES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities DOI: 10.7763/IPEDR. 2014. V 78. 21 The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities Azra Ayue Abdul Rahman 1, Siti Aisyah Panatik

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

A PRELIMINARY EXAMINATION OF THE ICAR PROGRESSIVE MATRICES TEST OF INTELLIGENCE

A PRELIMINARY EXAMINATION OF THE ICAR PROGRESSIVE MATRICES TEST OF INTELLIGENCE A PRELIMINARY EXAMINATION OF THE ICAR PROGRESSIVE MATRICES TEST OF INTELLIGENCE Jovana Jankovski 1,2, Ivana Zečević 2 & Siniša Subotić 3 2 Faculty of Philosophy, University of Banja Luka 3 University of

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Effects of the Number of Response Categories on Rating Scales

Effects of the Number of Response Categories on Rating Scales NUMBER OF RESPONSE CATEGORIES 1 Effects of the Number of Response Categories on Rating Scales Roundtable presented at the annual conference of the American Educational Research Association, Vancouver,

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Research Methods and Ethics in Psychology Week 4 Analysis of Variance (ANOVA) One Way Independent Groups ANOVA Brief revision of some important concepts To introduce the concept of familywise error rate.

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Encoding of Elements and Relations of Object Arrangements by Young Children

Encoding of Elements and Relations of Object Arrangements by Young Children Encoding of Elements and Relations of Object Arrangements by Young Children Leslee J. Martin (martin.1103@osu.edu) Department of Psychology & Center for Cognitive Science Ohio State University 216 Lazenby

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Anumber of studies have shown that ignorance regarding fundamental measurement

Anumber of studies have shown that ignorance regarding fundamental measurement 10.1177/0013164406288165 Educational Graham / Congeneric and Psychological Reliability Measurement Congeneric and (Essentially) Tau-Equivalent Estimates of Score Reliability What They Are and How to Use

More information