Demonstrating validity

Size: px
Start display at page:

Download "Demonstrating validity"

Transcription

1 Demonstrating validity Nivja de Jong & Jelle Goeman

2 What is validity? Construct validity Criterion validity Face validity Content validity Consequential validity

3 Validity, back to basics Cattell, 1946; Kelley, 1927; Borsboom al., 2004: Whether an instrument actually measures what it is set out to measure Borsboom (2004): a test is valid for measuring an attribute if (a) the attribute exists and (b) variations in the attribute causally produce variation in the measurement outcomes

4 What is validity? Whether an instrument actually measures what it is set out to measure To demonstrate validity, we need theory that specifies the processes that bring about the causal effect between variations in the attribute and variation in measurement outcome Item difficulty should be theoretically predictable

5 What does a valid test look like? Item difficulty should be theoretically predictable Some form of unidimensionality of a test or sub-test is important because we want to summarize over all items Purposeful construction of items: sum-score should summarize

6 How do we check validity? In practice: what do we do about validity? Posthoc relate item difficulty to item characteristics? Correlation between test scores and scores from other tests? Unidimensionality: Cronbach s alpha? Is this OK?

7 Correlation between test scores and scores from other tests? A new test on English language proficiency should be strongly related to our old (previously validated) test of English language proficiency A new scale to measure weight should be strongly related to our old (previously validated) scale that measured weight

8 r =.8? We are measuring the same construct?

9 How do we check validity? In practice: what do we do about validity? Posthoc relate item difficulty to item characteristics? Correlation between test scores and scores from other tests? Unidimensionality: Cronbach s alpha? Is this OK?

10 Cronbach s alpha persisting confusion Cronbach s alpha is intended to measure reliability (test retest) Reliability is not the same as unidimensionality / internal consistency.

11 Cronbach s alpha persisting confusion Sijtsma, 2009: A single number alpha that expresses both reliability and internal consistency conceived of as an aspect of validity that suggests that items measure the same thing is a blessing for the assessment of test quality. In the meantime, alpha only is a lower bound to the reliability and not even a realistic one

12 Cronbach s alpha persisting confusion Cronbach s alpha is used as: Test retest Agreement between judges Unidimensionality across items In textbooks and handbooks: alpha >.x allows you to sum the scores But is this true? Does alpha reveal anything about summability?

13 Simulation 1 We simulate subjects two unrelated skills or one skill Each skill we test with k items We make a test with all 1k or 2k items Item score = ability + noise Ability explains 50% of variance of each item k ranges from 2 to 50. We calculate Cronbach s alpha for one and two skills within one test

14 Cronbach s alpha for one or two constructs within one test

15 Cronbach s alpha is an excellent measure of test length!

16 The concept of summability Unidimensionality? Alpha is not about unidimensionality but about reliability Factor analysis is a good alternative BUT do we actually look for unidimensionality? Example: miscoded multiple choice will happily load onto the strongest first factor Summability: We need the sum-score to summarize, not just any factor! items are purposefully constructed

17 Measuring summability How much of the variance of the item scores is captured by the sum-score? Our definition of summability: percentage of total item variance explained by the sumscore Like an R 2 in regression Comes in unadjusted and adjusted form

18 Summability formula Unadjusted: Adjusted: v is the sum of all item variances c is the sum of all item variances and covariances k is the number of items

19 Simulation 2 two unrelated skills or one skill Each skill tested with k items test with all 1k or 2k items Item score = ability + noise Ability explains 50% of variance of each item k ranges from 2 to 50. We compare Cronbach s alpha with Summability

20 Simulation 2: comparing Cronbach s alpha with Summability for one construct

21 Simulation 2: comparing Cronbach s alpha with Summability for two constructs

22 Recap valid test characteristics Item difficulty should be theoretically predictable Some form of unidimensionality of a test or sub-test is important because we want to summarize over all items Purposeful construction of items: sum-score should summarize

23 Variation between items Item difficulty should relate to the theoretically grounded item characteristics Example: in a multiple choice reading comprehension test texts differ in (linguistic) difficulty (sentence length,,, ) answer-options differ in plausibility In a valid MC reading test, (linguistic) difficulty of the texts predict item difficulty. This is easy to check with a correlation or regression analysis

24 Application 1: productive vocabulary knowledge 90 vocabulary items performed by 198 pps (binomial score). Item example: Het was gisteren dinsdag, dus is het v woensdag. Tested words taken from 10 frequency bands (rank , , ) Is this a valid test of vocabulary knowledge? Can we sum the scores? Can we predict item difficulty in a theoretically grounded manner? De Jong et al, 2012; Hulstijn et al., 2012

25 Application 1: productive vocabulary knowledge Summability of item-scores: Summability:.28 (Cronbach s alpha:.97) Item difficulty: Calculated as number of correct answers / number of all answers Related to (log) tested word frequency: R 2 =.45 De Jong et al, 2012; Hulstijn et al., 2012

26 Application 2: human ratings 100 judges rating on 5 different aspects (5 groups of 20 judges) for 90 speech samples: 1. Fluency (pauses, speed, and repairs) 2. Pausing 3. Speed 4. Repairs 5. Accentedness Are these valid measures of fluency/accent? Can we sum the scores for each group of 20 judges? Can we predict item difficulty in a theoretically grounded manner? Bosker et al., 2013; Pinget et al., submitted

27 Application 2: human ratings Summability of judge-scores for group 1 (fluency): Summability:.56 (Cronbach s alpha:.97) NB: Collapsing over 80 judges in groups 2 5 (pausing, speed, repairs, accentedness) Summability:.25 (Cronbach s alpha:.97) Item difficulty : mean judge score Calculated as mean score over all judges Related to a combination of objectively measured fluency characteristics of the speech samples: R 2 =.84 Bosker et al., 2013; Pinget et al., submitted

28 Discussion On summability: In the end a (sub)test is reduced to a single sum score; validity is relevant for that sum score Concept of summability: useful in language testing practice whenever scores are summarized with a single sum-score Whether.28 and.56 are high or low, more experience needed. On validity: Purposeful construction of items: item characteristics that are indentified beforehand must relate to post-hoc item difficulty

29 Questions?

In many fields, such as education or second language acquisition,

In many fields, such as education or second language acquisition, Educational Measurement: Issues and Practice xxxx 2018, Vol. 00, No. 0, pp. 1 10 How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency J. J. Goeman and N. H.

More information

11-3. Learning Objectives

11-3. Learning Objectives 11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

Running head: CPPS REVIEW 1

Running head: CPPS REVIEW 1 Running head: CPPS REVIEW 1 Please use the following citation when referencing this work: McGill, R. J. (2013). Test review: Children s Psychological Processing Scale (CPPS). Journal of Psychoeducational

More information

Psychometric Properties of the Mean Opinion Scale

Psychometric Properties of the Mean Opinion Scale Psychometric Properties of the Mean Opinion Scale James R. Lewis IBM Voice Systems 1555 Palm Beach Lakes Blvd. West Palm Beach, Florida jimlewis@us.ibm.com Abstract The Mean Opinion Scale (MOS) is a seven-item

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI

Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI-2015-02298 Appendix 1 Role of TPB in changing other behaviors TPB has been applied

More information

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Internal Consistency: Do We Really Know What It Is and How to. Assess it?

Internal Consistency: Do We Really Know What It Is and How to. Assess it? Internal Consistency: Do We Really Know What It Is and How to Assess it? Wei Tang, Ying Cui, CRAME, Department of Educational Psychology University of Alberta, CANADA The term internal consistency has

More information

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests : Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a good item Item Analysis Internal Reliability Spearman-Brown Prophesy Formla -- α & # items

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Self Report Measures

Self Report Measures Self Report Measures I. Measuring Self-Report Variables A. The Survey Research Method - Many participants are randomly (e.g. stratified random sampling) or otherwise (nonprobablility sampling) p.140-146

More information

Global Perspective Inventory (GPI) Report

Global Perspective Inventory (GPI) Report Global Perspective Inventory (GPI) 2012-2013 Report Executive Summary display higher levels of global competence than freshmen in all of the GPI scales except for the interpersonal social responsibility

More information

Testing and Intelligence. What We Will Cover in This Section. Psychological Testing. Intelligence. Reliability Validity Types of tests.

Testing and Intelligence. What We Will Cover in This Section. Psychological Testing. Intelligence. Reliability Validity Types of tests. Testing and Intelligence 10/19/2002 Testing and Intelligence.ppt 1 What We Will Cover in This Section Psychological Testing Reliability Validity Types of tests. Intelligence Overview Models Summary 10/19/2002

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Internal Consistency and Reliability of the Networked Minds Social Presence Measure

Internal Consistency and Reliability of the Networked Minds Social Presence Measure Internal Consistency and Reliability of the Networked Minds Social Presence Measure Chad Harms, Frank Biocca Iowa State University, Michigan State University Harms@iastate.edu, Biocca@msu.edu Abstract

More information

commentary Time is a jailer: what do alpha and its alternatives tell us about reliability?

commentary Time is a jailer: what do alpha and its alternatives tell us about reliability? commentary Time is a jailer: what do alpha and its alternatives tell us about reliability? Rik Psychologists do not have it easy, but the article by Peters Maastricht University (2014) paves the way for

More information

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up Lab 4: Alpha and Kappa Today s Activities Consider Alpha Consider Kappa Homework and Media Write-Up Reliability Reliability refers to consistency Types of reliability estimates Test-retest reliability

More information

2013 Supervisor Survey Reliability Analysis

2013 Supervisor Survey Reliability Analysis 2013 Supervisor Survey Reliability Analysis In preparation for the submission of the Reliability Analysis for the 2013 Supervisor Survey, we wanted to revisit the purpose of this analysis. This analysis

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

Overview of Experimentation

Overview of Experimentation The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview;

More information

Multiple Act criterion:

Multiple Act criterion: Common Features of Trait Theories Generality and Stability of Traits: Trait theorists all use consistencies in an individual s behavior and explain why persons respond in different ways to the same stimulus

More information

WWC STUDY REVIEW STANDARDS

WWC STUDY REVIEW STANDARDS WWC STUDY REVIEW STANDARDS INTRODUCTION The What Works Clearinghouse (WWC) reviews studies in three stages. First, the WWC screens studies to determine whether they meet criteria for inclusion within the

More information

CHAPTER III RESEARCH METHODOLOGY

CHAPTER III RESEARCH METHODOLOGY CHAPTER III RESEARCH METHODOLOGY In this chapter, the researcher will elaborate the methodology of the measurements. This chapter emphasize about the research methodology, data source, population and sampling,

More information

Internal Consistency and Reliability of the Networked Minds Measure of Social Presence

Internal Consistency and Reliability of the Networked Minds Measure of Social Presence Internal Consistency and Reliability of the Networked Minds Measure of Social Presence Chad Harms Iowa State University Frank Biocca Michigan State University Abstract This study sought to develop and

More information

VALIDITY OF QUANTITATIVE RESEARCH

VALIDITY OF QUANTITATIVE RESEARCH Validity 1 VALIDITY OF QUANTITATIVE RESEARCH Recall the basic aim of science is to explain natural phenomena. Such explanations are called theories (Kerlinger, 1986, p. 8). Theories have varying degrees

More information

University of Wollongong. Research Online. Australian Health Services Research Institute

University of Wollongong. Research Online. Australian Health Services Research Institute University of Wollongong Research Online Australian Health Services Research Institute Faculty of Business 2011 Measurement of error Janet E. Sansoni University of Wollongong, jans@uow.edu.au Publication

More information

Strategies to Develop Food Frequency Questionnaire

Strategies to Develop Food Frequency Questionnaire Strategies to Develop Food Frequency www.makrocare.com Food choices are one of the health related behaviors that are culturally determined. Public health experts and nutritionists have recognized the influence

More information

Reliability and Validity

Reliability and Validity Reliability and Today s Objectives Understand the difference between reliability and validity Understand how to develop valid indicators of a concept Reliability and Reliability How accurate or consistent

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Reliability AND Validity. Fact checking your instrument

Reliability AND Validity. Fact checking your instrument Reliability AND Validity Fact checking your instrument General Principles Clearly Identify the Construct of Interest Use Multiple Items Use One or More Reverse Scored Items Use a Consistent Response Format

More information

Version 1.1 Edition date 07 February 2018 ELPAC. English Language Proficiency for Aeronautical Communication ELPAC paper 1 test specifications

Version 1.1 Edition date 07 February 2018 ELPAC. English Language Proficiency for Aeronautical Communication ELPAC paper 1 test specifications Version 1.1 Edition date 07 February 2018 ELPAC English Language Proficiency for Aeronautical Communication ELPAC paper 1 test specifications ELPAC is developed in cooperation with: ENOVATE is responsible

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Global Perspective Inventory (GPI) - Pilot Report

Global Perspective Inventory (GPI) - Pilot Report Global Perspective Inventory (GPI) - Pilot 2010-11 Report Introduction The Global Perspectives Inventory is a nationally recognized instrument designed to measure a student s global perspective. The GPI

More information

CHAPTER FOUR. Any scientific research involves the application of various methods. (also referred to as strategies or approaches) and procedures to

CHAPTER FOUR. Any scientific research involves the application of various methods. (also referred to as strategies or approaches) and procedures to CHAPTER FOUR 4. RESEARCH METHODOLOGY Any scientific research involves the application of various methods (also referred to as strategies or approaches) and procedures to create scientific knowledge (welman

More information

Smarter Balanced Interim Assessment Blocks Total Number of Items and hand scoring Requirements by Grade and Subject.

Smarter Balanced Interim Assessment Blocks Total Number of Items and hand scoring Requirements by Grade and Subject. Smarter Balanced Interim Assessment Blocks of Items and hand scoring Requirements by Grade and Subject. The following tables are intended to assist coordinators, site coordinators, and test administrators

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

Why Mixed Effects Models?

Why Mixed Effects Models? Why Mixed Effects Models? Mixed Effects Models Recap/Intro Three issues with ANOVA Multiple random effects Categorical data Focus on fixed effects What mixed effects models do Random slopes Link functions

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right.

THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right. THE TEST STATISTICS REPORT provides a synopsis of the test attributes and some important statistics. A sample is shown here to the right. The Test reliability indicators are measures of how well the questions

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Intelligence. Intelligence Assessment Individual Differences

Intelligence. Intelligence Assessment Individual Differences Intelligence Intelligence Assessment Individual Differences Intelligence Theories of Intelligence Intelligence Testing Test Construction Extremes of Intelligence Differences in Intelligence Creativity

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

A framework for predicting item difficulty in reading tests

A framework for predicting item difficulty in reading tests Australian Council for Educational Research ACEReSearch OECD Programme for International Student Assessment (PISA) National and International Surveys 4-2012 A framework for predicting item difficulty in

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

California Subject Examinations for Teachers

California Subject Examinations for Teachers California Subject Examinations for Teachers TEST GUIDE AMERICAN SIGN LANGUAGE SUBTEST III Subtest Description This document contains the World Languages: American Sign Language (ASL) subject matter requirements

More information

Survey Project Data Analysis Guide

Survey Project Data Analysis Guide Survey Project Data Analysis Guide I. Computing Scale Scores. - In the data file that I have given you, I have already done the following. - Reverse scored all of the appropriate items. For: Aggression

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

Chapter 3 Psychometrics: Reliability and Validity

Chapter 3 Psychometrics: Reliability and Validity 34 Chapter 3 Psychometrics: Reliability and Validity Every classroom assessment measure must be appropriately reliable and valid, be it the classic classroom achievement test, attitudinal measure, or performance

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Theory, Models, Variables

Theory, Models, Variables Theory, Models, Variables Y520 Strategies for Educational Inquiry 2-1 Three Meanings of Theory A set of interrelated conceptions or ideas that gives an account of intrinsic (aka, philosophical) values.

More information

Introduction to Reliability

Introduction to Reliability Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What

More information

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners

Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners Hossein Barati Department of English, Faculty of Foreign Languages, University of Isfahan barati@yahoo.com Zohreh Kashkoul*

More information

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method Chapter -6 Reliability and Validity of the test 6.1 Introduction 6.2 Reliability of the test 6.2.1 Test - Retest Method 6.2.2 Rational Equivalence Method 6.2.3 Split-Half Method 6.3 Validity of the test

More information

TRANSCRIBING AND CODING P.A.R.T. SESSION SESSION1: TRANSCRIBING

TRANSCRIBING AND CODING P.A.R.T. SESSION SESSION1: TRANSCRIBING TRANSCRIBING AND CODING P.A.R.T. SESSION SESSION1: TRANSCRIBING May 3 rd and 5th, 10 to 12 Library Classroom Prof. Tenley Conway Department of Geography Topics for the Two Sessions What Is transcription?

More information

Between-word regressions as part of rational reading

Between-word regressions as part of rational reading Between-word regressions as part of rational reading Klinton Bicknell & Roger Levy UC San Diego CUNY 2010: New York Bicknell & Levy (UC San Diego) Regressions as rational reading CUNY 2010 1 / 23 Introduction

More information

Identifying Extraneous Threats to Test Validity for Improving Tests

Identifying Extraneous Threats to Test Validity for Improving Tests Identifying Extraneous Threats to Test Validity for Improving Tests Yi Du, Ph.D. Data Recognition Corporation Presentation at the 40th National Conference on Student Assessment, June 22, 2010 1 If only

More information

Survey of Knowledge Base Content

Survey of Knowledge Base Content Survey of Content Introduction Fundamental Expression Types Top Level Collections Time and Dates Spatial Properties and Relations Event Types Information More Content Areas Copyright 2002 Cycorp This set

More information

Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015

Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015 Constructing Indices and Scales Hsueh-Sheng Wu CFDR Workshop Series June 8, 2015 1 Outline What are scales and indices? Graphical presentation of relations between items and constructs for scales and indices

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory Closed Coding Analyzing Qualitative Data Tutorial @ VIS17 Melanie Tory A code in qualitative inquiry is most often a word or short phrase that symbolically assigns a summative, salient, essence capturing,

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Foundations of Research Methods

Foundations of Research Methods 1 Foundations of Research Methods Relevant dates and information 09/03/2017 Intermediate test (final test) >> to be defined 1 Things to do: Enrol yourself in the page of the course (obligatory) Sign every

More information

The HeartQol questionnaire. Reliability, validity and responsiveness?

The HeartQol questionnaire. Reliability, validity and responsiveness? The HeartQol questionnaire. Reliability, validity and responsiveness? Stefan Höfer, PhD, MSc Associate Professor Medical University Innsbruck The HeartQoL Story Data collection: 2000 2010 Presentations:

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

Reliability and Validity of the Divided

Reliability and Validity of the Divided Aging, Neuropsychology, and Cognition, 12:89 98 Copyright 2005 Taylor & Francis, Inc. ISSN: 1382-5585/05 DOI: 10.1080/13825580590925143 Reliability and Validity of the Divided Aging, 121Taylor NANC 52900

More information

Study Registration for the KPU Study Registry

Study Registration for the KPU Study Registry KPU Registry ID Number: 1022 Date submitted: 21st March 2016 Study Registration for the KPU Study Registry 1. The title or name of the experiment (for listing the experiment in the registry). Correlational

More information

Variable Measurement, Norms & Differences

Variable Measurement, Norms & Differences Variable Measurement, Norms & Differences 1 Expectations Begins with hypothesis (general concept) or question Create specific, testable prediction Prediction can specify relation or group differences Different

More information

All reverse-worded items were scored accordingly and are in the appropriate direction in the data set.

All reverse-worded items were scored accordingly and are in the appropriate direction in the data set. PSYC 948: Latent Trait Measurement and Structural Equation Modeling Homework #7 (Total 10 Points) Due: Wednesday, April 10, 2013 at 11:59pm. Homework Problems: The homework problems are based a simulated

More information

Using Lertap 5 in a Parallel-Forms Reliability Study

Using Lertap 5 in a Parallel-Forms Reliability Study Lertap 5 documents series. Using Lertap 5 in a Parallel-Forms Reliability Study Larry R Nelson Last updated: 16 July 2003. (Click here to branch to www.lertap.curtin.edu.au.) This page has been published

More information

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014

Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Brent Duckor Ph.D. (SJSU) Kip Tellez, Ph.D. (UCSC) BEAR Seminar April 22, 2014 Studies under review ELA event Mathematics event Duckor, B., Castellano, K., Téllez, K., & Wilson, M. (2013, April). Validating

More information

Chapter 7 BAYLEY SCALES OF INFANT DEVELOPMENT

Chapter 7 BAYLEY SCALES OF INFANT DEVELOPMENT Chapter 7 BAYLEY SCALES OF INFANT DEVELOPMENT 7.1 Introduction The Bayley Scales of Infant Development III (BSID-III) will be administered at the 24 months +/- 2 months (adjusted age) visit. The BSID-III

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Signing High School Science

Signing High School Science 1 Signing High School Science Judy Vesel TERC, Inc. Judy_Vesel@terc.edu Abstract This paper discusses evaluation research conducted by TERC, Inc. in classrooms with students in grades 9 12 who are deaf

More information

CLINICAL BIOSTATISTICS

CLINICAL BIOSTATISTICS 09/06/17 1 Overview and Descriptive Statistics a. Application of statistics in biomedical research b. Type of data c. Graphic representation of data d. Summary statistics: central tendency and dispersion

More information

Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review

Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review Prepared for: American Council on the Teaching of Foreign Languages (ACTFL) White Plains, NY Prepared by SWA Consulting

More information

NEW SEX THERAPY: ACTIVE TREATMENT OF SEXUAL DYSFUNCTIONS BY HELEN SINGER KAPLAN

NEW SEX THERAPY: ACTIVE TREATMENT OF SEXUAL DYSFUNCTIONS BY HELEN SINGER KAPLAN NEW SEX THERAPY: ACTIVE TREATMENT OF SEXUAL DYSFUNCTIONS BY HELEN SINGER KAPLAN DOWNLOAD EBOOK : NEW SEX THERAPY: ACTIVE TREATMENT OF SEXUAL Click link bellow and free register to download ebook: NEW SEX

More information

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit Overview Quality of Measurement Instruments Introduction SPSS Read:

More information

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial :: Validity and Reliability PDF Created with deskpdf PDF Writer - Trial :: http://www.docudesk.com Validity Is the translation from concept to operationalization accurately representing the underlying concept.

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

ATTITUDE SCALES. Dr. Sudip Chaudhuri. M. Sc., M. Tech., Ph.D. (Sc.) (SINP / Cal), M. Ed. Assistant Professor (Stage-3) / Reader

ATTITUDE SCALES. Dr. Sudip Chaudhuri. M. Sc., M. Tech., Ph.D. (Sc.) (SINP / Cal), M. Ed. Assistant Professor (Stage-3) / Reader ATTITUDE SCALES Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D. (Sc.) (SINP / Cal), M. Ed. Assistant Professor (Stage-3) / Reader Gandhi Centenary B.T. College, Habra, India, Honorary Researcher, Saha Institute

More information

How to Measure Attitudes. From: Simonson, M.R. (1979). Attitude measurement: Why and how. Educational Technology, 19,

How to Measure Attitudes. From: Simonson, M.R. (1979). Attitude measurement: Why and how. Educational Technology, 19, How to Measure Attitudes From: Simonson, M.R. (1979). Attitude measurement: Why and how. Educational Technology, 19, 34-38. When reviewing the literature that deals with attitude change and instructional

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

Any phenomenon we decide to measure in psychology, whether it is

Any phenomenon we decide to measure in psychology, whether it is 05-Shultz.qxd 6/4/2004 6:01 PM Page 69 Module 5 Classical True Score Theory and Reliability Any phenomenon we decide to measure in psychology, whether it is a physical or mental characteristic, will inevitably

More information

Confirmatory Factor Analysis. Professor Patrick Sturgis

Confirmatory Factor Analysis. Professor Patrick Sturgis Confirmatory Factor Analysis Professor Patrick Sturgis Plan Measuring concepts using latent variables Exploratory Factor Analysis (EFA) Confirmatory Factor Analysis (CFA) Fixing the scale of latent variables

More information