Evaluating a Special Education Teacher Observation Tool through G-theory and MFRM Analysis

Size: px
Start display at page:

Download "Evaluating a Special Education Teacher Observation Tool through G-theory and MFRM Analysis"

Transcription

1 Evaluating a Special Education Teacher Observation Tool through G-theory and MFRM Analysis Project RESET, Boise State University Evelyn Johnson, Angie Crawford, Yuzhu Zheng, & Laura Moylan NCME Annual Conference San Antonio, April 2017

2 RESET Purpose - Create a special education teacher (SET) observation tool that is: 1) aligned to evidence-based instructional practices (EBPs) 2)provides feedback to SETs to improve their implementation of EBPs 3)is connected to student outcomes

3 Evidence-Centered Design* *Mislevy, Steinberg & Almond, 2003

4 RESET: Conceptual Framework 1. Instructional Delivery Subscale Explicit Instruction Cognitive Strategy Instruction Peer-Mediated Instruction 2. Content Subscale Reading Math Writing 3. Individualization Subscale

5 Assessment Development To create the items for each rubric we: conducted a review of research on that practice synthesized across the common elements of the EBP across studies drafted rubric items that capture the common elements of that EBP

6 Explicit Instruction Rubric 3 - Implemented 2 - Partially Implemented 1 - Not Implemented 1 The goals of the lesson are clearly communicated to the students. The goals of the lesson are not clearly communicated to the students. The goals of the lesson are not communicated to the students. 2 The stated goal is specific. The stated goal is broad or vague. There is no stated goal. 3 The teacher clearly explains the relevance of the stated goal to the students. The teacher tries to explain the relevance of the stated goal to the students, but the explanation is unclear or lacks detail. The teacher does not explain the relevance of the stated goal to the students. 4 Instruction is completely aligned to the stated or implied goal. 5 All of the examples or materials selected are aligned to the stated or implied goal. Instruction is partially or loosely aligned to the stated or implied goal. Some of the examples or materials are aligned to the stated or implied goal; OR examples and materials are somewhat aligned to the stated or implied goal. Instruction is not aligned to the stated or implied goal. Examples or materials selected are not aligned to the stated or implied goal.

7 Assessment Delivery (Validation) There are numerous sources of error that influence the precision of observation results (Hill, Charalambous, & Kraft, 2012 ). Generalizability studies are useful for understanding the relative importance of various sources of error to assist in the design of more efficient measurement procedures (Brennan, 1992; Shavelson & Webb, 1991).

8 G-study purpose to estimate the variance attributable to different facets of measurement: teachers, lessons (occasions), raters, and items

9 Observation and Estimation Design Three facet, mixed model, partially nested design EduG v. 6.1 software Measurement Design: TL/RI

10 G-Study Table Grand mean for levels used: Variance error of the mean for levels used: Standard error of the grand mean: Minimum threshold (Nunnaly & Bernstein, 1994)

11 Analysis of Variance Source Variance SE % Teachers (T) Lessons (L) nested within (:) T Raters (R) Items (I) TR TI LR:T LI:T RI TRI LRI:T, Error

12 G-theory views the data largely from a group-level perspective - disentangling the sources of measurement error and estimating their magnitude - Many-Facets Rasch Measurement MFRM analysis focuses on individual level information and allows for substantive investigation into the individual element of the facets under consideration

13 Measr -Items +Teacher -Lesson -Raters Scale 2 (3) 3 Teacher 9 Teacher 2 Teacher 6 13 Teacher Teacher 7 Teacher , 8 Teacher 4 4 2, 9, 7 24 Teacher 1 Teacher 3 15, Teacher ,22 4, 17, , Figure 1. Variable map of the RESET facets items, teachers, lessons and raters.

14 Item Measure Report Item Number Difficulty (Logits) Model SE Infit MNSQ Outfit MNSQ Mean SD

15 Teacher Measure Report from Many-Facet Rasch Measurement Analysis Teacher Number Ability (Logits) Model SE Infit MNSQ Outfit MNSQ Mean (count = 10) SD Note. Root mean square error (model) =.08; adjusted SD =.61; separation = 7.39; reliability =.98; fixed chi-square = 492.7; df = 9; significance =.00.

16 Lesson Measure Report from Many- Facet Rasch Measurement Analysis Lesson Number Difficulty (Logits) Model SE Infit MNSQ Outfit MNSQ Mean (count = 4) SD Note. Root mean square error (model) =.05; adjusted SD =.19; separation = 3.72; reliability =.93; fixed chi-square = 43.4; df = 3; significance =.00.

17 Rater Measure Report from Many- Facet Rasch Measurement Analysis Rater Number Severity (Logits) Model SE Infit MNSQ Outfit MNSQ Mean (count = 4) SD Note. Root mean square error (model) =.05; adjusted SD =.47; separation = 9.07; reliability =.98; fixed chi-square = 232.7; df = 3; significance =.00.

18 THANK YOU

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with

More information

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach Tabaran Institute of Higher Education ISSN 2251-7324 Iranian Journal of Language Testing Vol. 3, No. 1, March 2013 Received: Feb14, 2013 Accepted: March 7, 2013 Validation of an Analytic Rating Scale for

More information

Using Generalizability Theory to Investigate the Psychometric Property of an Assessment Center in Indonesia

Using Generalizability Theory to Investigate the Psychometric Property of an Assessment Center in Indonesia Using Generalizability Theory to Investigate the Psychometric Property of an Assessment Center in Indonesia Urip Purwono received his Ph.D. (psychology) from the University of Massachusetts at Amherst,

More information

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors for Performance Ratings

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors for Performance Ratings 0 The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors for Performance Ratings Mark R. Raymond, Polina Harik and Brain E. Clauser National Board of Medical Examiners 1

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of

More information

Examining the Validity of an Essay Writing Test Using Rasch Analysis

Examining the Validity of an Essay Writing Test Using Rasch Analysis Secondary English Education, 5(2) Examining the Validity of an Essay Writing Test Using Rasch Analysis Taejoon Park (KICE) Park, Taejoon. (2012). Examining the validity of an essay writing test using Rasch

More information

A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters Reliability

A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters Reliability KURAM VE UYGULAMADA EĞİTİM BİLİMLERİ EDUCATIONAL SCIENCES: THEORY & PRACTICE Received: May 13, 2016 Revision received: December 6, 2016 Accepted: January 23, 2017 OnlineFirst: February 28, 2017 Copyright

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

R E S E A R C H R E P O R T

R E S E A R C H R E P O R T RR-00-13 R E S E A R C H R E P O R T INVESTIGATING ASSESSOR EFFECTS IN NATIONAL BOARD FOR PROFESSIONAL TEACHING STANDARDS ASSESSMENTS FOR EARLY CHILDHOOD/GENERALIST AND MIDDLE CHILDHOOD/GENERALIST CERTIFICATION

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

On the Construct Validity of an Analytic Rating Scale for Speaking Assessment

On the Construct Validity of an Analytic Rating Scale for Speaking Assessment On the Construct Validity of an Analytic Rating Scale for Speaking Assessment Chunguang Tian 1,2,* 1 Foreign Languages Department, Binzhou University, Binzhou, P.R. China 2 English Education Department,

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

Understanding the influence of different cycles of an OSCE exam on students scores using Many Facet Rasch Modelling.

Understanding the influence of different cycles of an OSCE exam on students scores using Many Facet Rasch Modelling. Hawks, Doves and Rasch decisions Understanding the influence of different cycles of an OSCE exam on students scores using Many Facet Rasch Modelling. Authors: Peter Yeates Stefanie S. Sebok-Syer Corresponding

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

RATER EFFECTS & ALIGNMENT 1

RATER EFFECTS & ALIGNMENT 1 RATER EFFECTS & ALIGNMENT 1 Running head: RATER EFFECTS & ALIGNMENT Modeling Rater Effects in a Formative Mathematics Alignment Study Daniel Anderson, P. Shawn Irvin, Julie Alonzo, and Gerald Tindal University

More information

Published online: 16 Nov 2012.

Published online: 16 Nov 2012. This article was downloaded by: [National Taiwan Normal University] On: 12 September 2014, At: 01:02 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Introduction. 1.1 Facets of Measurement

Introduction. 1.1 Facets of Measurement 1 Introduction This chapter introduces the basic idea of many-facet Rasch measurement. Three examples of assessment procedures taken from the field of language testing illustrate its context of application.

More information

A Comparison of Traditional and IRT based Item Quality Criteria

A Comparison of Traditional and IRT based Item Quality Criteria A Comparison of Traditional and IRT based Item Quality Criteria Brian D. Bontempo, Ph.D. Mountain ment, Inc. Jerry Gorham, Ph.D. Pearson VUE April 7, 2006 A paper presented at the Annual Meeting of the

More information

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English

Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Reliability and Validity of a Task-based Writing Performance Assessment for Japanese Learners of English Yoshihito SUGITA Yamanashi Prefectural University Abstract This article examines the main data of

More information

Math Released Item Grade 3. Find the Area and Identify Equal Areas 1749-M23082

Math Released Item Grade 3. Find the Area and Identify Equal Areas 1749-M23082 Math Released Item 2018 Grade 3 Find the Area and Identify Equal Areas 1749-M23082 Anchor Set A1 A6 With Annotations Prompt 1749-M23082 Rubric Part A Score Description 1 This part of the item is machine

More information

A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters

A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters ISSN 1799-2591 Theory and Practice in Language Studies, Vol. 1, No. 11, pp. 1531-1540, November 2011 Manufactured in Finland. doi:10.4304/tpls.1.11.1531-1540 A Many-facet Rasch Model to Detect Halo Effect

More information

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale

Construct Invariance of the Survey of Knowledge of Internet Risk and Internet Behavior Knowledge Scale University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 Construct Invariance of the Survey

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

The following is an example from the CCRSA:

The following is an example from the CCRSA: Communication skills and the confidence to utilize those skills substantially impact the quality of life of individuals with aphasia, who are prone to isolation and exclusion given their difficulty with

More information

A tale of two models: Psychometric and cognitive perspectives on rater-mediated assessments using accuracy ratings

A tale of two models: Psychometric and cognitive perspectives on rater-mediated assessments using accuracy ratings Psychological Test and Assessment Modeling, Volume 60, 2018 (1), 33-52 A tale of two models: Psychometric and cognitive perspectives on rater-mediated assessments using accuracy ratings George Engelhard,

More information

World Academy of Science, Engineering and Technology International Journal of Psychological and Behavioral Sciences Vol:8, No:1, 2014

World Academy of Science, Engineering and Technology International Journal of Psychological and Behavioral Sciences Vol:8, No:1, 2014 Validity and Reliability of Competency Assessment Implementation (CAI) Instrument Using Rasch Model Nurfirdawati Muhamad Hanafi, Azmanirah Ab Rahman, Marina Ibrahim Mukhtar, Jamil Ahmad, Sarebah Warman

More information

Key words: classical test theory; many-facet Rasch measurement; reliability; bias analysis

Key words: classical test theory; many-facet Rasch measurement; reliability; bias analysis 2010 年 4 月中国应用语言学 ( 双月刊 ) Apr. 2010 第 33 卷第 2 期 Chinese Journal of Applied Linguistics (Bimonthly) Vol. 33 No. 2 An Application of Classical Test Theory and Manyfacet Rasch Measurement in Analyzing the

More information

Released Test Answer and Alignment Document Mathematics Grade 3 Spring 2018

Released Test Answer and Alignment Document Mathematics Grade 3 Spring 2018 Released Test Answer and Alignment Document Mathematics Grade 3 Spring 2018 Item Number Answer Key Evidence Statement Key 1. 3.OA.1 OR 2. 816 3.NBT.2 3. 26 3.MD.1-1 4. Part A: See Rubric Part B: See Rubric

More information

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners APPLIED MEASUREMENT IN EDUCATION, 22: 1 21, 2009 Copyright Taylor & Francis Group, LLC ISSN: 0895-7347 print / 1532-4818 online DOI: 10.1080/08957340802558318 HAME 0895-7347 1532-4818 Applied Measurement

More information

THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS

THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS Russell F. Waugh Edith Cowan University Key words: attitudes, graduates, university, measurement Running head: COURSE EXPERIENCE

More information

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting

Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting Pediatrics Milestones and Meaningful Assessment Translating the Pediatrics Milestones into Assessment Items for use in the Clinical Setting Ann Burke Susan Guralnick Patty Hicks Jeanine Ronan Dan Schumacher

More information

The MHSIP: A Tale of Three Centers

The MHSIP: A Tale of Three Centers The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.

More information

Validation the Measures of Self-Directed Learning: Evidence from Confirmatory Factor Analysis and Multidimensional Item Response Analysis

Validation the Measures of Self-Directed Learning: Evidence from Confirmatory Factor Analysis and Multidimensional Item Response Analysis Doi:10.5901/mjss.2015.v6n4p579 Abstract Validation the Measures of Self-Directed Learning: Evidence from Confirmatory Factor Analysis and Multidimensional Item Response Analysis Chaiwichit Chianchana Faculty

More information

Research Analysis MICHAEL BERNSTEIN CS 376

Research Analysis MICHAEL BERNSTEIN CS 376 Research Analysis MICHAEL BERNSTEIN CS 376 Last time What is a statistical test? Chi-square t-test Paired t-test 2 Today ANOVA Posthoc tests Two-way ANOVA Repeated measures ANOVA 3 Recall: hypothesis testing

More information

A Rasch Analysis of the Statistical Anxiety Rating Scale

A Rasch Analysis of the Statistical Anxiety Rating Scale University of Wyoming From the SelectedWorks of Eric D Teman, J.D., Ph.D. 2013 A Rasch Analysis of the Statistical Anxiety Rating Scale Eric D Teman, Ph.D., University of Wyoming Available at: https://works.bepress.com/ericteman/5/

More information

Methodological Issues in Measuring the Development of Character

Methodological Issues in Measuring the Development of Character Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton

More information

Measuring change in training programs: An empirical illustration

Measuring change in training programs: An empirical illustration Psychology Science Quarterly, Volume 50, 2008 (3), pp. 433-447 Measuring change in training programs: An empirical illustration RENATO MICELI 1, MICHELE SETTANNI 1 & GIULIO VIDOTTO 2 Abstract The implementation

More information

Measuring measuring: Toward a theory of proficiency with the Constructing Measures framework

Measuring measuring: Toward a theory of proficiency with the Constructing Measures framework San Jose State University SJSU ScholarWorks Faculty Publications Secondary Education 1-1-2009 Measuring measuring: Toward a theory of proficiency with the Constructing Measures framework Brent M. Duckor

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

Research. Reports. Monitoring Sources of Variability Within the Test of Spoken English Assessment System. Carol M. Myford Edward W.

Research. Reports. Monitoring Sources of Variability Within the Test of Spoken English Assessment System. Carol M. Myford Edward W. Research TEST OF ENGLISH AS A FOREIGN LANGUAGE Reports REPORT 65 JUNE 2000 Monitoring Sources of Variability Within the Test of Spoken English Assessment System Carol M. Myford Edward W. Wolfe Monitoring

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

Using Direct Behavior Ratings in a Middle School Setting

Using Direct Behavior Ratings in a Middle School Setting Using Direct Behavior Ratings in a Middle School Setting P R E S E N T E R S : R I V K A H R O S E N, U N I V E R S I T Y O F C O N N E C T I C U T N I C H O L A S C R O V E L L O, U N I V E R S I T Y

More information

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven Introduction The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven outcome measure that was developed to address the growing need for an ecologically valid functional communication

More information

Constructing a described achievement scale for the International Civic and Citizenship Education Study

Constructing a described achievement scale for the International Civic and Citizenship Education Study Australian Council for Educational Research ACEReSearch Civics and Citizenship Assessment National and International Surveys 9-2008 Constructing a described achievement scale for the International Civic

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

leadership REPORT Sam Sample Other Raters (3), Family/Friends (3), Direct Reports (3), Peers (4), and Manager (3)

leadership REPORT Sam Sample Other Raters (3), Family/Friends (3), Direct Reports (3), Peers (4), and Manager (3) Coach leadership EQ 360 REPORT Sam Sample Other Raters (3), Family/Friends (3), Direct Reports (3), Peers (4), and Manager (3) Sample Report Multi-Health Systems Inc. December 05, 2014 Copyright 2014 Multi-Health

More information

Emmanuel Kuntsche, Ph.D.

Emmanuel Kuntsche, Ph.D. Family, peers, school, and culture Social and environmental factors of adolescent substance use and problem behaviors Emmanuel Kuntsche, Ph.D. s, Research Department, Lausanne Who I am / who we are...

More information

Revisiting the Halo Effect Within a Multitrait, Multimethod Framework

Revisiting the Halo Effect Within a Multitrait, Multimethod Framework Revisiting the Halo Effect Within a Multitrait, Multimethod Framework Annual meeting of the American Educational Research Association San Francisco, CA Emily R. Lai Edward W. Wolfe Daisy H. Vickers April,

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important

Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Alignment in Educational Testing: What it is, What it isn t, and Why it is Important Stephen G. Sireci University of Massachusetts Amherst Presentation delivered at the Connecticut Assessment Forum Rocky

More information

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating

More information

An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts. in Mixed-Format Tests. Xuan Tan. Sooyeon Kim. Insu Paek.

An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts. in Mixed-Format Tests. Xuan Tan. Sooyeon Kim. Insu Paek. An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts in Mixed-Format Tests Xuan Tan Sooyeon Kim Insu Paek Bihua Xiang ETS, Princeton, NJ Paper presented at the annual meeting of the

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

LAPORAN AKHIR PROJEK PENYELIDIKAN TAHUN AKHIR (PENILAI) FINAL REPORT OF FINAL YEAR PROJECT (EXAMINER)

LAPORAN AKHIR PROJEK PENYELIDIKAN TAHUN AKHIR (PENILAI) FINAL REPORT OF FINAL YEAR PROJECT (EXAMINER) UMK/FSB/FYP-R-C2-EX (EDITION 2017) LAPORAN AKHIR PROJEK PENYELIDIKAN TAHUN AKHIR (PENILAI) FINAL REPORT OF FINAL YEAR PROJECT (EXAMINER) PART A (FINAL REPORT): (30.0 %) Title Category Excellent (5) Informative,

More information

Hanne Søberg Finbråten 1,2*, Bodil Wilde-Larsson 2,3, Gun Nordström 3, Kjell Sverre Pettersen 4, Anne Trollvik 3 and Øystein Guttersrud 5

Hanne Søberg Finbråten 1,2*, Bodil Wilde-Larsson 2,3, Gun Nordström 3, Kjell Sverre Pettersen 4, Anne Trollvik 3 and Øystein Guttersrud 5 Finbråten et al. BMC Health Services Research (2018) 18:506 https://doi.org/10.1186/s12913-018-3275-7 RESEARCH ARTICLE Open Access Establishing the HLS-Q12 short version of the European Health Literacy

More information

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign Reed Larson 2 University of Illinois, Urbana-Champaign February 28,

More information

Running head: PRELIM KSVS SCALES 1

Running head: PRELIM KSVS SCALES 1 Running head: PRELIM KSVS SCALES 1 Psychometric Examination of a Risk Perception Scale for Evaluation Anthony P. Setari*, Kelly D. Bradley*, Marjorie L. Stanek**, & Shannon O. Sampson* *University of Kentucky

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

Clarence D. Kreiter, PhD. University of Iowa, Carver College of Medicine

Clarence D. Kreiter, PhD. University of Iowa, Carver College of Medicine Clarence D. Kreiter, PhD University of Tokyo, IRCME University of Iowa, Carver College of Medicine Three Broad Areas in Medical Education Assessment 1. Clinical Knowledge 2. Clinical Skills (technical

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

PSYC 3018 Abnormal Psychology

PSYC 3018 Abnormal Psychology PSYC 3018 Abnormal Psychology Unit of Study Code: Coordinator: Other Teaching Staff: PSYC3018 Dr Marianna Szabó Office: 417 Brennan-MacCallum Building Phone: 9351 5147 E-mail: Marianna.szabo@sydney.edu.au

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Exploring rater errors and systematic biases using adjacent-categories Mokken models

Exploring rater errors and systematic biases using adjacent-categories Mokken models Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.

More information

Rasch Validation of the Falls Prevention Strategies Survey

Rasch Validation of the Falls Prevention Strategies Survey ORIGINAL ARTICLE Rasch Validation of the Falls Prevention Strategies Survey Marcia L. Finlayson, PhD, Elizabeth W. Peterson, MPH, Ken A. Fujimoto, MFA, Matthew A. Plow, PhD ABSTRACT. Finlayson ML, Peterson

More information

Nature and significance of the local problem

Nature and significance of the local problem Revised Standards for Quality Improvement Reporting Excellence (SQUIRE 2.0) September 15, 2015 Text Section and Item Section or Item Description Name The SQUIRE guidelines provide a framework for reporting

More information

Crossing boundaries between disciplines: A perspective on Basil Bernstein s legacy

Crossing boundaries between disciplines: A perspective on Basil Bernstein s legacy Crossing boundaries between disciplines: A perspective on Basil Bernstein s legacy Ana M. Morais Department of Education & Centre for Educational Research School of Science University of Lisbon Revised

More information

Detecting and Correcting for Rater Effects in Performance Assessment

Detecting and Correcting for Rater Effects in Performance Assessment A C T Research Report Series 90-14 Detecting and Correcting for Rater Effects in Performance Assessment Mark R. Raymond Walter M. Houston December 1990 For additional copies write: ACT Research Report

More information

University of Connecticut Professional Athletic Training Program Prospective Athletic Training Student Admission Assessment

University of Connecticut Professional Athletic Training Program Prospective Athletic Training Student Admission Assessment Personal Statement of Interest Grading Rubric Please rate the Personal Statement on the following items by circling the most appropriate number: 1=Poor; 2=Acceptable; 3=Good; 4=Excellent Criteria Rating

More information

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Kelly D. Bradley 1, Linda Worley, Jessica D. Cunningham, and Jeffery P. Bieber University

More information

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Background of problem in assessment for elderly Key feature of CCAS Structural Framework of CCAS Methodology Result

More information

Validation of the Behavioral Complexity Scale (BCS) to the Rasch Measurement Model, GAIN Methods Report 1.1

Validation of the Behavioral Complexity Scale (BCS) to the Rasch Measurement Model, GAIN Methods Report 1.1 Page 1 of 36 Validation of the Behavioral Complexity Scale (BCS) to the Rasch Measurement Model, GAIN Methods Report 1.1 Kendon J. Conrad University of Illinois at Chicago Karen M. Conrad University of

More information

Validation of the HIV Scale to the Rasch Measurement Model, GAIN Methods Report 1.1

Validation of the HIV Scale to the Rasch Measurement Model, GAIN Methods Report 1.1 Page 1 of 35 Validation of the HIV Scale to the Rasch Measurement Model, GAIN Methods Report 1.1 Kendon J. Conrad University of Illinois at Chicago Karen M. Conrad University of Illinois at Chicago Michael

More information

Diagnostic Opportunities Using Rasch Measurement in the Context of a Misconceptions-Based Physical Science Assessment

Diagnostic Opportunities Using Rasch Measurement in the Context of a Misconceptions-Based Physical Science Assessment Science Education Diagnostic Opportunities Using Rasch Measurement in the Context of a Misconceptions-Based Physical Science Assessment STEFANIE A. WIND, 1 JESSICA D. GALE 2 1 The University of Alabama,

More information

Competency Rubric Bank for the Sciences (CRBS)

Competency Rubric Bank for the Sciences (CRBS) Competency Rubric Bank for the Sciences (CRBS) Content Knowledge 1 Content Knowledge: Accuracy of scientific understanding Higher Order Cognitive Skills (HOCS) 3 Analysis: Clarity of Research Question

More information

The Rasch Model Analysis for Statistical Anxiety Rating Scale (STARS)

The Rasch Model Analysis for Statistical Anxiety Rating Scale (STARS) Creative Education, 216, 7, 282-2828 http://www.scirp.org/journal/ce ISSN Online: 2151-4771 ISSN Print: 2151-4755 The Rasch Model Analysis for Statistical Anxiety Rating Scale (STARS) Siti Mistima Maat,

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Comparing standard toughness through weighted and unweighted scores by three standard setting procedures

Comparing standard toughness through weighted and unweighted scores by three standard setting procedures Comparing standard toughness through weighted and unweighted scores by three standard setting procedures Abstract Tsai-Wei Huang National Chiayi University, Taiwan Ayres G. D Costa The Ohio State University

More information

A simple guide to IRT and Rasch 2

A simple guide to IRT and Rasch 2 A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling Chong Ho Yu, Ph.Ds Email: chonghoyu@gmail.com Website: http://www.creative-wisdom.com Updated: October 27, 2017 This document, which

More information

Perception-Based Evidence of Validity

Perception-Based Evidence of Validity Perception-Based Evidence of Validity Tzur M. Karelitz National Institute for Testing & Evaluation (NITE), Israel Charles Secolsky Measurement and Evaluation Consultant Can public opinion threaten validity?

More information

School orientation and mobility specialists School psychologists School social workers Speech language pathologists

School orientation and mobility specialists School psychologists School social workers Speech language pathologists 2013-14 Pilot Report Senate Bill 10-191, passed in 2010, restructured the way all licensed personnel in schools are supported and evaluated in Colorado. The ultimate goal is ensuring college and career

More information

RUNNING HEAD: EVALUATING SCIENCE STUDENT ASSESSMENT. Evaluating and Restructuring Science Assessments: An Example Measuring Student s

RUNNING HEAD: EVALUATING SCIENCE STUDENT ASSESSMENT. Evaluating and Restructuring Science Assessments: An Example Measuring Student s RUNNING HEAD: EVALUATING SCIENCE STUDENT ASSESSMENT Evaluating and Restructuring Science Assessments: An Example Measuring Student s Conceptual Understanding of Heat Kelly D. Bradley, Jessica D. Cunningham

More information

Validity Arguments for Alternate Assessment Systems

Validity Arguments for Alternate Assessment Systems Validity Arguments for Alternate Assessment Systems Scott Marion, Center for Assessment Reidy Interactive Lecture Series Portsmouth, NH September 25-26, 26, 2008 Marion. Validity Arguments for AA-AAS.

More information

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops.

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Short Term: Develop a valid and reliable measure of students level of understanding

More information

Using a Multifocal Lens Model and Rasch Measurement Theory to Evaluate Rating Quality in Writing Assessments

Using a Multifocal Lens Model and Rasch Measurement Theory to Evaluate Rating Quality in Writing Assessments Pensamiento Educativo. Revista de Investigación Educacional Latinoamericana 2017, 54(2), 1-16 Using a Multifocal Lens Model and Rasch Measurement Theory to Evaluate Rating Quality in Writing Assessments

More information

Improving Measurement of Ambiguity Tolerance (AT) Among Teacher Candidates. Kent Rittschof Department of Curriculum, Foundations, & Reading

Improving Measurement of Ambiguity Tolerance (AT) Among Teacher Candidates. Kent Rittschof Department of Curriculum, Foundations, & Reading Improving Measurement of Ambiguity Tolerance (AT) Among Teacher Candidates Kent Rittschof Department of Curriculum, Foundations, & Reading What is Ambiguity Tolerance (AT) and why should it be measured?

More information

RASCH ANALYSIS OF SOME MMPI-2 SCALES IN A SAMPLE OF UNIVERSITY FRESHMEN

RASCH ANALYSIS OF SOME MMPI-2 SCALES IN A SAMPLE OF UNIVERSITY FRESHMEN International Journal of Arts & Sciences, CD-ROM. ISSN: 1944-6934 :: 08(03):107 150 (2015) RASCH ANALYSIS OF SOME MMPI-2 SCALES IN A SAMPLE OF UNIVERSITY FRESHMEN Enrico Gori University of Udine, Italy

More information

Measuring Academic Misconduct: Evaluating the Construct Validity of the Exams and Assignments Scale

Measuring Academic Misconduct: Evaluating the Construct Validity of the Exams and Assignments Scale American Journal of Applied Psychology 2015; 4(3-1): 58-64 Published online June 29, 2015 (http://www.sciencepublishinggroup.com/j/ajap) doi: 10.11648/j.ajap.s.2015040301.20 ISSN: 2328-5664 (Print); ISSN:

More information

Long Essay Question: Evidence/Body Paragraphs

Long Essay Question: Evidence/Body Paragraphs Long Essay Question: Evidence/Body Paragraphs Part B Argument Development: Using Targeted Skill 2 points Revisiting the Rubric Comparison Causation CCOT Periodization 1 Point: Describes similarities AND

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Providing Evidence for the Generalizability of a Speaking Placement Test Scores

Providing Evidence for the Generalizability of a Speaking Placement Test Scores Providing Evidence for the Generalizability of a Speaking Placement Test Scores Payman Vafaee 1, Behrooz Yaghmaeyan 2 Received: 15 April 2015 Accepted: 10 August 2015 Abstract Three major potential sources

More information

Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer

Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer Utah State University DigitalCommons@USU Psychology Faculty Publications Psychology 4-2017 Structural Equation Modeling of Multiple- Indicator Multimethod-Multioccasion Data: A Primer Christian Geiser

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

The Effect of Option Homogeneity in Multiple- Choice Items

The Effect of Option Homogeneity in Multiple- Choice Items Manuscripts The Effect of Option Homogeneity in Multiple- Choice Items Applied Psychological Measurement 1 12 Ó The Author(s) 2018 Reprints and permissions: sagepub.com/journalspermissions.nav DOI: 10.1177/0146621618770803

More information

Optimizing distribution of Rating Scale Category in Rasch Model

Optimizing distribution of Rating Scale Category in Rasch Model Optimizing distribution of Rating Scale Category in Rasch Model Han-Dau Yau, Graduate Institute of Sports Training Science, National Taiwan Sport University, Taiwan Wei-Che Yao, Department of Statistics,

More information

Assessing the Validity and Reliability of Dichotomous Test Results Using Item Response Theory on a Group of First Year Engineering Students

Assessing the Validity and Reliability of Dichotomous Test Results Using Item Response Theory on a Group of First Year Engineering Students Dublin Institute of Technology ARROW@DIT Conference papers School of Civil and Structural Engineering 2015-07-13 Assessing the Validity and Reliability of Dichotomous Test Results Using Item Response Theory

More information