Models in Educational Measurement

Size: px
Start display at page:

Download "Models in Educational Measurement"

Transcription

1 Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg

2 Background Measurement in education and psychology has increasingly come to rely on explicitly formulated statistical models. Statistical models offer power, precision and flexibility by focusing on a few essential aspects of the phenomenon under study. But if the model assumptions do not agree with the phenomenon, inferences may be incorrect.therefore, issues of model fit are centrally important. However, it is not always an easy task to determine whether a model fits data or not, or which the consequences are of model misfit.

3 The Rasch model According to the Rasch model the probability of a correct response to a test item is a function of the ability of the person, and of the difficulty of the item: As the ability of the person increases, the probability of a correct answer increases. As the difficulty of the item increases, the probability of a correct answer decreases. Under certain assumptions, it is possible to estimate the difficulty of items independently of the ability of persons, and to estimate the ability of persons independenty of the difficulty of the items.

4 The Rasch model, cont Having estimated the difficulty parameters of a set of items they can be used to construct different tests, which all measure the same ability. Different test-takers can be given different items to measure one and the same ability. This offers great advantages for solving practical measurement problems, such as adaptive testing, horizontal and vertical linking of tests, and constructing matrix-sampling test designs.

5 Does the model fit data? The Rasch model is attractive, because if it fits data, the properties of the model guarantee simple and powerful solutions of practical measurement problems. Assumptions: Unidimensionality Homogeneous discrimination of all items These assumptions may be tested with statistical tests constructed within the framework of the Rasch model.

6 Gustafsson, J. E. (1980). Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33(2), Two categories of statistical tests: ICC-tests investigate if item parameters are invariant across subsets of persons (e.g., high scorers vs. low scorers; boys vs girls) PCC-tests investigate if ability parameters are invariant across subsets of items Some results: ICC-tests do not detect multidimensionality. Muthén (1978) analyzed 15 locus of control items with a new method for factor analysis of dichotomous variables and found three lowly correlated factors. However, according to an ICC test the Rasch model had good fit to these data. The PCC-test supported the conclusion that there were three separate dimensions. The statistical power of the ICC-tests is strongly dependent on sample size and on the heterogeneity of the sample. The Rasch model does not fit speeded tests or tests which allow guessing

7 Gustafsson, J. E. & Lindblad, T. (1978) The Rasch model for dichotomous items: A solution of the conditional estimation problem for long tests and some thoughts about item screening procedures. Paper presented at the European Conference on Psychometrics and Mathematical Psychology, Uppsala, June 15 17, The Rasch model was used to analyze a test of English grammar for Swedish students. The model had poor fit, which was primarily due to a set of items measuring knowledge of irregular verbs having too high discrimination. In separate analyses, good fit was found for the irregular verb items, as well as for the other items, after some poorly constructed items had been excluded.

8 What to do when model fit is poor? Exclude the offending items This would have caused unacceptable construct underrepresentation. It also would have been illogical because too high or too low discrimination typically is not an intrinsic characteristic of an item, but rather whether the other items have similar discrimination or not Put the problematic items in a separate scale This would have been impractical, unless we aimed to differentiate between different domains of English grammar. But to do that reliably we would need more items testing irregular verbs Turn to another, less restrictive, model such as Verhelsts OPLM model which is a Rasch model which allows different but fixed discrimination parameters. This model was not developed at the time Keep the items in the test and accept the poor fit This could imply loss of credibility

9 Robustness George Box: Essentially, all models are wrong, but some are useful Many applications of the Rasch model and other IRT models, such as TIMSS and PISA define both an overall score, and subscores for different domains or processes. This must be a violation of the unidimensionality assumption. Still, this practice seems meaningful and useful. Coefficient α is often described as not being based on any strict assumptions but the formula is in fact based on the same assumptions as the Rasch model: unidimensionality and homogeneous item discrimination. If the assumptions are violated, α is underestimated. However, even in the presence of large variation in the item discrimination, the underestimation is marginal (e,g,, Reuterberg & Gustafsson, 1992)

10 Some conclusions It is difficult to assess the fit of the Rasch model It is even more difficult to develop well-fitting models There is a risk of conflict between the model requirements and the validity of the test Use of the model needs to rely on trust in robustness The capability of the Rasch model to deal with issues of multidimensionality is limited

11 Dimensionality of cognitive abilities Factor analysis was invented to investigate the dimensionality of variables: Spearman invented factor analysis to test the hypothesis that individual differences in cognition can be captured by a g-factor Thurstone invented exploratory factor analysis and demonstrated that there are seven primary mental abilities. Followers of Thurstone extended this number to at least 100 primary abilities Cattell applied factor analysis to correlations among factors to identify second- and third-order factors and introduced the distinction between Fluid Intelligence (Gf) and Crystallized Intelligence (Gc) Jöreskog developed confirmatory factor analysis and structural equation modeling allowing flexible and powerful building and testing of latent variable models

12 Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8(3),

13 Does the model hold? The g = Gf relation was replicated in several studies, but far from all Carroll s (1993) meta-analysis did not replicate the perfect relation but it showed that Gf was the broad ability most highly related to g.

14 Valentin Kvist, A., & Gustafsson, J. E. (2008). The relation between fluid intelligence and the general factor as a function of cultural background: A test of Cattell's investment theory. Intelligence, 36(5), The correlation between g and Gf was.83 in a heterogeneous group of adults, but correlations of were found within each of the three sub-groups Non-immigrants, European immigrants; Non-European immigrants. Explanation: Gf is a determinant of learning in all domains, along with motivation, effort and opportunity to learn When everyone has equal opportunity to learn, Gf influences learning and development in all domains, and so it becomes a general factor. If subgroups of persons have had different opportunities to learn certain domains, the generality of Gf breaks down. These results provide general support for Cattell s Investment theory

15 The Investment theory The Investment theory basically says that Gf is a causal factor in the development of individual differences in learning. If we knew the mechanisms through which Gf influences development of fundamental skills such as decoding and vocabulary we would have a better basis for educational interventions.

16 Methodological aspects of the hierarchical model The standard view of measurement implies that the phenomenon can be described in terms of a set of correlated dimensions, which all are unidimensional. However, some constructs are broad and encompass a very wide range of phenomena (e.g., g), others are broad and encompass wide domains of phenomena (e.g., Gc) while other constructs are narrow and encompass a more limited range of phenomena (e.g., knowledge of irregular verbs). The constructs differ in referent generality. When the unidimensionality requirement is imposed this has implied a focus on constructs with narrow referent generality. Typically it has had the consequence that broad constructs have been splintered into more and more narrow constructs, as happened in the research on cognitive abilities.

17 Gustafsson, J.E. (2002). Measurement from a hierarchical point of view. In In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.) The role of constructs in psychological and educational measurement (pp ). London: Lawrence Erlbaum Associates, Publishers. Three propositions: To measure constructs with high referent generality it is necessary to use heterogeneous measurement devices. A homogenous test always measures several dimensions. To measure constructs with low referent generality it is also necessary to measure constructs with high generality.

18 Measurement from a hierarchical point of view, cont The principle of aggregation: Aggregation causes the general factor to account for a larger proportion of variance in the sum of scores than it does in each observed measure. Each observed variable is complex, but aggregate scores may be essentially unidimensional Aggregation over broad domains of performance is a way to approximate unidimensionality so that robust use of the Rasch model and other IRT models may still be possible.

19 Grading in Sweden In Sweden grades have always been high-stakes, because they have been the primary instrument for eligibility and selection to the next level of the educational system Teachers have always been trusted to grade their students Exams were abolished in the 1960s and standardized testing has traditionally had a comparably limited role Up until the mid 1990s the grading system was normreferenced, but in 1998 a criterion-referenced grading system was introduced.

20 The norm-referenced grading system The norm-referenced grading system was developed in the 1940s by Frits Wigforss (SOU 1942:11), after it had been observed that the grades used for admission to grammar school ( realskola ) lacked severely in comparability across schools and teachers The proposed system specified that grades should be normally distributed in the population, with a specified percentage of the students at each step of the grading scale So called standard tests were developed to guide the teachers grading at the class-level With the introduction of the comprehensive school ( grundskola ) in 1962 a five-step grading scale (1-5) was introduced, without any pass level

21 Critique of the norm-referenced grading system The norm-referenced grading system was criticized on many grounds: It inspired competition rather than cooperation It was unfair to students in different classes ( There are no 5s left ) Because the grade distribution was specified to be Normal (3,1) in the population the grades could not be used to describe change in levels of knowledge and skills Along with a curriculum reform in 1994, the norm-referenced grades were abolished, and a criterion-referenced system was introduced

22 The criterion-referenced grading system The system which was first put to use in 1998 had a scale with 4 steps: Pass with Special Distinction (MVG), Pass with Distinction (VG), Pass (G) and Fail (IG) In 2011 a new scale with six steps was introduced (A-F). F = fail According to the original plans, the number of failed students was expected be a few percentage points, but the first results showed the percentage of failed students to be much higher (9 %). It has since increased to 14 % The grading is guided by verbally formulated knowledge requirements for the different steps of the grading scale

23 Knowledge requirements (partial) for grades /E/C/A/ at the end of year 9 Grade E: Pupils can choose and use basically functional mathematical methods with some adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with satisfactory results. Grade C: Pupils can choose and use appropriate mathematical methods with relatively good adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with good results. Grade A: Pupils can choose and use appropriate and effective mathematical methods with good adaptation to the context in order to make calculations and solve routine tasks in arithmetic, algebra, geometry, probability, statistics, and also relationships and change with very good results.

24 Gustafsson, J.-E., Cliffordson, C., & Erickson. G. (2014). Likvärdig kunskapsbedömning i och av den svenska skolan problem och möjligheter [Equitable knowledge assessment in and of the Swedish school problems and possibilitties]. Stockholm: SNS Förlag Substantial grade inflation, particularly in upper secondary school Considerable variation in grading practices among teachers and schools Instability in the national tests across years and subjects These problems, along with several others, seem to be due to the lack of precision in the verbally formulated knowledge requirements for the different steps on the grading scale. Wigforss (1942) concluded that it is not possible to achieve sufficient comparability in grading based on verbally formulated criteria, which was why he developed the norm-referenced grading system.

25 Olsen, R.V., & Nilsen, T. (in press). Standard setting in PISA and TIMSS. In S. Blömeke & J.E. Gustafsson (Eds.), Standard Setting in Education - The Nordic Countries in an International Perspective, New York: Springer Publishing The authors compare and discuss similarities and differences in the way PISA and TIMSS set and formulate descriptions of standards or do scale anchoring (International Benchmarks in TIMSS based on a curriculum model; Proficiency Levels in PISA based on a competence model). Focus on the empirical basis for development of performancelevel descriptors. Their interest in standard setting and performance-level descriptions were partially driven by the fact that the Norwegian grading system has problems of comparability in grading.

26 An generic example of an item map (from Olsen & Nilsen, in press) Decide on the number and location of cut-scores to be used Develop Performance-Level Descriptors (PLDs) based on descriptions of the clusters of items identified and on the general description of the construct stated in the framework. This typically requires lots of items, given that the PLDs should not be formulated in item specific terms.

27 TIMSS International Benchmarks (partial) Low (400): Students have some knowledge of whole numbers and decimals, operations, and basic graphs. Intermediate: (475) Students can solve problems involving decimals, fractions, proportions, and percentages in a variety of settings. For example, they can determine proportions of a whole in order to construct pie charts and calculate unit prices to solve a problem. High (550): Students can use information from several sources to solve problems involving different types of numbers and operations. Students can relate fractions, decimals, and percents to each other. They can solve problems with fractions, proportions, and percentages. Students show understanding of whole number exponents. They can identify the prime factorization of a given number. Advanced (625): Students can solve a variety of fraction, proportion, and percent problems and justify their conclusions. They can reason with different types of numbers, including whole numbers, negative numbers, fractions, and percentages in abstract and non-routine situations. For example, given two points on a number line representing unspecified fractions, students can identify the point that represents their product.

28 PLDs and grading Empirically based PLDs could potentially provide a more stable foundation to support class-room based assessment and criterion-referenced grading than the currently used knowledge requirements, if formulated at an appropriate level of abstraction Linking national tests to PISA, TIMSS and the other international studies could provide a broader basis for constructing PLDs Dimensionality?

29 Individual differences versus development of competence The PLDs could, perhaps, be developed into empirically and theoretically based descriptions of learning trajectories, which could inform curricula, instruction and assessment The classical measurement models focus on individual differences: The notions of dimensionality, discrimination, reliability and validity are defined in terms of variance and covariance, and their application requires that the population of persons is defined. These models have limited applicability for the study of individual growth. The main aim of education is to support development of competence, so in educational measurement issues of development should be a central concern The tensions between differential and developmental psychology illustrate the difficulties to integrate research on individual differences and development. However, progress has lately been made through growth curve modeling and applications of IRT to solve measurement problems. Hopefully we will see more of integration of differential and developmental approaches in the future.

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

CEMO RESEARCH PROGRAM

CEMO RESEARCH PROGRAM 1 CEMO RESEARCH PROGRAM Methodological Challenges in Educational Measurement CEMO s primary goal is to conduct basic and applied research seeking to generate new knowledge in the field of educational measurement.

More information

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state

On indirect measurement of health based on survey data. Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state On indirect measurement of health based on survey data Responses to health related questions (items) Y 1,..,Y k A unidimensional latent health state A scaling model: P(Y 1,..,Y k ;α, ) α = item difficulties

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Paul Irwing, Manchester Business School

Paul Irwing, Manchester Business School Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Measurement Invariance (MI): a general overview

Measurement Invariance (MI): a general overview Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Measurement Issues in Concussion Testing

Measurement Issues in Concussion Testing EVIDENCE-BASED MEDICINE Michael G. Dolan, MA, ATC, CSCS, Column Editor Measurement Issues in Concussion Testing Brian G. Ragan, PhD, ATC University of Northern Iowa Minsoo Kang, PhD Middle Tennessee State

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

2 Critical thinking guidelines

2 Critical thinking guidelines What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

The relation between fluid intelligence and the general factor as a function of cultural background: a test of Cattell s investment theory

The relation between fluid intelligence and the general factor as a function of cultural background: a test of Cattell s investment theory The relation between fluid intelligence and the general factor as a function of cultural background: a test of Cattell s investment theory Ann Valentin Kvist Jan-Eric Gustafsson WORKING PAPER 2007:23 The

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Psych 1Chapter 2 Overview

Psych 1Chapter 2 Overview Psych 1Chapter 2 Overview After studying this chapter, you should be able to answer the following questions: 1) What are five characteristics of an ideal scientist? 2) What are the defining elements of

More information

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Curt Hagquist Karlstad University, Karlstad, Sweden Address: Karlstad University SE-651 88 Karlstad Sweden

More information

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis

The Modification of Dichotomous and Polytomous Item Response Theory to Structural Equation Modeling Analysis Canadian Social Science Vol. 8, No. 5, 2012, pp. 71-78 DOI:10.3968/j.css.1923669720120805.1148 ISSN 1712-8056[Print] ISSN 1923-6697[Online] www.cscanada.net www.cscanada.org The Modification of Dichotomous

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Martin Senkbeil and Jan Marten Ihme

Martin Senkbeil and Jan Marten Ihme neps Survey papers Martin Senkbeil and Jan Marten Ihme NEPS Technical Report for Computer Literacy: Scaling Results of Starting Cohort 4 for Grade 12 NEPS Survey Paper No. 25 Bamberg, June 2017 Survey

More information

Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital

Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital Paper I Olsen, E. (2008). Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital. In J. Øvretveit and P. J. Sousa (Eds.), Quality and Safety Improvement Research:

More information

Work, Employment, and Industrial Relations Theory Spring 2008

Work, Employment, and Industrial Relations Theory Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 15.676 Work, Employment, and Industrial Relations Theory Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Methodological Issues in Measuring the Development of Character

Methodological Issues in Measuring the Development of Character Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Introduction to Test Theory & Historical Perspectives

Introduction to Test Theory & Historical Perspectives Introduction to Test Theory & Historical Perspectives Measurement Methods in Psychological Research Lecture 2 02/06/2007 01/31/2006 Today s Lecture General introduction to test theory/what we will cover

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

Parallel Forms for Diagnostic Purpose

Parallel Forms for Diagnostic Purpose Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

2 Types of psychological tests and their validity, precision and standards

2 Types of psychological tests and their validity, precision and standards 2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimating with Confidence Key Vocabulary: point estimator point estimate confidence interval margin of error interval confidence level random normal independent four step process level C confidence

More information

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. Department of Political Science UNIVERSITY OF CALIFORNIA, SAN DIEGO Philip G. Roeder Research Prospectus Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. A

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Good Assessment by Design

Good Assessment by Design February 2013 Good Assessment by Design An International Comparative Analysis of Science and Mathematics Assessments Dr Rose Clesham International Comparative analysis of Mathematics and Science assessments

More information

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179 185. (3362 citations in Google Scholar as of 4/1/2016) Who would have thought that a paper

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES

ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES Comparing science, reading and mathematics performance across PISA cycles The PISA 2006, 2009, 2012

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

The MHSIP: A Tale of Three Centers

The MHSIP: A Tale of Three Centers The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.

More information

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University,

More information

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section Research Approach & Design Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section Content: Introduction Definition of research design Process of designing & conducting

More information

CSC2130: Empirical Research Methods for Software Engineering

CSC2130: Empirical Research Methods for Software Engineering CSC2130: Empirical Research Methods for Software Engineering Steve Easterbrook sme@cs.toronto.edu www.cs.toronto.edu/~sme/csc2130/ 2004-5 Steve Easterbrook. This presentation is available free for non-commercial

More information

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe

More information

The Psychometric Principles Maximizing the quality of assessment

The Psychometric Principles Maximizing the quality of assessment Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence

More information

Psychological testing

Psychological testing Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining

More information

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected

More information

Intelligence What is intelligence? Intelligence Tests and Testing

Intelligence What is intelligence? Intelligence Tests and Testing 1 2 3 4 1 2 Intelligence What is intelligence? What you know or how well you learn? Psychologist disagree. INTELLIGENCE Is the cognitive abilities (thinking, reasoning, and problem solving) of a person

More information

2. Which pioneer in intelligence testing first introduced performance scales in addition to verbal scales? David Wechsler

2. Which pioneer in intelligence testing first introduced performance scales in addition to verbal scales? David Wechsler Open Your Class with this Tomorrow Intelligence: All That Really Matters KEY Exploring IQ with Graphs and Charts Directions: Review each of the following statements about intelligence and the associated

More information

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops.

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Short Term: Develop a valid and reliable measure of students level of understanding

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Double perspective taking processes of primary children adoption and application of a psychological instrument

Double perspective taking processes of primary children adoption and application of a psychological instrument Double perspective taking processes of primary children adoption and application of a psychological instrument Cathleen Heil Leuphana University, IMD, Lüneburg, Germany; cathleen.heil@leuphana.de Perspective

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

Reliability & Validity Dr. Sudip Chaudhuri

Reliability & Validity Dr. Sudip Chaudhuri Reliability & Validity Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian

More information

Convergence Principles: Information in the Answer

Convergence Principles: Information in the Answer Convergence Principles: Information in the Answer Sets of Some Multiple-Choice Intelligence Tests A. P. White and J. E. Zammarelli University of Durham It is hypothesized that some common multiplechoice

More information

Maike Krannich, Odin Jost, Theresa Rohm, Ingrid Koller, Steffi Pohl, Kerstin Haberkorn, Claus H. Carstensen, Luise Fischer, and Timo Gnambs

Maike Krannich, Odin Jost, Theresa Rohm, Ingrid Koller, Steffi Pohl, Kerstin Haberkorn, Claus H. Carstensen, Luise Fischer, and Timo Gnambs neps Survey papers Maike Krannich, Odin Jost, Theresa Rohm, Ingrid Koller, Steffi Pohl, Kerstin Haberkorn, Claus H. Carstensen, Luise Fischer, and Timo Gnambs NEPS Technical Report for reading: Scaling

More information

Multiple Act criterion:

Multiple Act criterion: Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Standaarden voor kerndoelen basisonderwijs : de ontwikkeling van standaarden voor kerndoelen basisonderwijs op basis van resultaten uit peilingsonderzoek van der

More information

How Do We Assess Students in the Interpreting Examinations?

How Do We Assess Students in the Interpreting Examinations? How Do We Assess Students in the Interpreting Examinations? Fred S. Wu 1 Newcastle University, United Kingdom The field of assessment in interpreter training is under-researched, though trainers and researchers

More information

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction

More information

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information