Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Size: px
Start display at page:

Download "Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison"

Transcription

1 Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison

2 Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological traits, abilities, and attitudes.

3 Purpose of Session Introduce several key psychometric concepts and gain an appreciation for the theoretical underpinnings of standardized tests. Scales, Norms, and Equating Validity Reliability Test Theory (as time permits) Classical Test Theory Item Response Theory

4 Poll Question Because of my comfort-level with psychometrics, I am to discuss measurement-related issues with faculty and students. A. eager B. willing C. hesitant D. unwilling

5 SECTION I Scales, Norms, and Equating

6 Measurement and Scaling Measurement is the process of assigning scores in a systematic and coherent way For purposes of reporting, these scores are often transformed in some way to facilitate interpretations Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees.

7 Why Scale? Imagine three students (A, B, and C) take a standardized test Student A answered 20 of 30 items correct. Student B answered 20 of 29 items correct. Student C answered 21 of 30 items correct. What can we say about the achievement level of these three students?

8 What can we say about A, B, and C? A: 20/30 B: 20/29 C: 21/30 Suppose you learned that the students completed different test forms? the 3 hardest items were all on Test A? the 12 easiest items were all on Test C? The overall difficulty levels of two different tests are rarely identical Even if tests are of equal average difficulty, they may be differently difficult for students at different levels.

9 The Rationale Behind Scaling Raw scores (number correct scores) depend on the items on the test and do not have consistent meaning across forms. Same is true for percentage correct scores Makes score interpretations very difficult

10 Score Scales The score scale is the metric which is actually used for purposes of reporting scores to users. Moving from raw scores to the score scale involves either a linear or non-linear transformation The transformed scores themselves are called scaled scores or derived scores

11 Common Scales

12 Common Scales 16%

13 Advantage of Measurement Scales Standardization Scale must not measure differently depending on what it is that s being measured Pieces, bites, handfuls, and number/percent correct Pounds, inches, ºF, level on construct Without a standardized reporting metric, direct comparisons are impossible

14 Transforming between two scales Mean = 67 St. Dev = 8.5 Mean = 72 St. Dev =

15 Transforming between two scales Mean = 67 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 1. Make means equal Add (72 67) = 5 to all blue scores

16 Transforming between two scales Mean = 72 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 1. Make means equal Add (72 67) = 5 to all blue scores

17 Transforming between two scales Mean = 72 St. Dev = 8.5 Mean = 72 St. Dev = Linear transformation 2. Make St. devs equal Multiply all blue scores by (5.7/8.5)

18 Transforming between two scales Mean = 72 St. Dev = 5.7 Mean = 72 St. Dev = Linear transformation 2. Make St. devs equal Multiply all blue scores by (5.7/8.5)

19 Any set of test scores can easily be transformed to some other metric. This allows for direct norm-referenced comparisons Candidate A scored 82 on red test (96 th percentile) Candidate B scored 82 on blue test (96 th percentile)

20 Transforming between two scales Mean = 72 St. Dev = 5.7 Mean = 72 St. Dev = 5.7 Score = th percentile

21 Transforming between two scales Are these two students comparable? Score = th percentile

22 Poll Question Are these two students comparable? A. Yes B. No C. It cannot be determined

23 Are two students comparable? We don t know. Students scaled scores and percentile ranks are relative to other students who completed that same form. If the populations of test takers were different, it is quite likely that the examinees are not of equal ability.

24 SAT GRE GED 600 SAT = 600 GRE = 600?

25 Norming Perform initial scaling on a single (base) test form (calibration) The sample completing the base form should be large and representative of the target population The sample taking the base form is used as a reference point for purposes of comparison with all subsequent samples Norming

26 Norm Group The norm group is the group of individuals for whom the test scale was established SAT is scaled to have an average of 500 the average of the norm group was 500

27 Test Equating Need to transform the data so that the scores candidates receive are the same scores they would have received if they Were part of the normative sample, and Had been administered the base form This process, known as equating, ensures that test scores have identical meaning across administrations, even as items and populations change.

28 Equating The process of determining the transformation to convert between the raw score metric and the reporting metric (based on the norm group) Equating is a topic worthy of a full-length graduate-level course Requires comparison across common elements between base form and new form : Common items Assumed randomly equivalent populations Very difficult assumption to make across years

29 Simple Equating Design Base Test Common New Test Items Average Std. Dev After Equating In much the same way we did before, we can now align these two assessments using only data from common items Add ( ) = 15 to all New Test scores Multiple all New Test scores by 98 / 105 = 0.93

30 SECTION I Scales, Norms, and Equating Questions?

31 SECTION II Validity

32 Definitions of Validity Formal Definition of Validity: Degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores. Less Formal Definition of Validity: Degree to which the inferences made from a test (e.g., that the student knows the material of interest, is competent to practice in discipline, is likely to be successful in training/school, etc.) are justified and accurate. Informal Definition of Validity: A test is valid if it measures what it s supposed to measure

33 Problem with the Informal Definition Informal Definition of Validity: A test is valid if it measures what it s supposed to measure Tests cannot be valid Tests are valid only for specific purposes Placement/Admissions exams for exemption VSA Testing (differences in target population) Validity is a matter of degree

34 Measuring Educ/Psychological Variables Unlike physical attributes (height, weight, etc.), educational and psychological attributes cannot be directly observed, hence cannot be directly measured. To the extent that a variable is abstract and latent rather than concrete and observable, it is called a construct. In order to move from construct formation to measurement, the construct which is to be measured must first be operationally defined.

35 Embodiment of a Construct Operationalize What behaviors comprise the construct? How does this construct relate to or distinguish itself from other constructs? Plan How will samples of these behaviors be obtained? Instrument development Develop a standard procedure for obtaining the sample of behaviors from the specified domain. Measurement

36 Test Validation The degree to which the evidence supports the claim that the assessment measures the intended construct. Three types of validity? Construct Validity Criterion-Related Validity Content Validity No, just one. It s really all about Construct Validity

37 Assessing Construct Validity Target group differences Is there a logical differentiation between groups? e.g., placement test math scores for students who completed HS Calculus vs. those who only completed HS Algebra Correlational studies between test and related (or unrelated) measures Convergent validity: Does test correlate with other measures that are theoretically related? ACT and SAT, Compass & Accuplacer, Different IQ tests Divergent validity: Does test fail to correlate with other measures that are theoretically unrelated? ACT and Stanford Binet Math and English placement scores

38 Assessing Construct Validity Factor Analysis Statistical procedure to empirically test whether performance on observed variables (items) can be explained by a smaller number of unobserved constructs. Dimensionality assessment Does empirical structure match theoretical structure? Can also assess whether clusters of items are related in ways that are expected. Are items that are intended to measure the same subscores (e.g., trigonometry, algebra, etc.) more similar to each other than to other items?

39 Assessing Construct Validity Content Validity The extent to which the set of items on the test are representative of and relevant to the construct Items should cover the breadth and depth of the construct Weight assigned to each content area should reflect importance of that content area within construct For employment and certification exams, often necessary to conduct a practice analysis Panels of content experts are often utilized to assess relevance of items

40 Assessing Construct Validity Criterion-Related Validity Examines the relationship of the test results to other variables/criteria external to the test Predictive The extent to which an individual s future level on the criterion can be predicted from prior test performance Correlation between ACT/SAT scores and first year GPA Concurrent The extent to which test scores estimate an individual s present standing on the criterion. Correlation between Prior Learning Assessment and final course grade

41 SECTION II Validity Questions?

42 SECTION III Reliability

43 A Game of Darts Validity: Confidence that the test will hit the bullseye Reliability: Confidence that any one dart is a good predictor of where next dart would go. Clustering the Darts together

44 Unreliable

45 Reliable, but not valid

46 Reasonable reliable and valid

47 Highly reliable and valid

48 Reliability and Validity A test cannot be valid (for any purpose) unless it is reliable. Validity: Confidence that the test will hit the bullseye Not that it will average out to the bullseye

49 Working Definitions of Reliability The degree to which a test is consistent and stable in measuring what it is intended to measure Measurement repeatability Will an examinee score similarly when administered an independent alternate form of the test administered under the same conditions and with no opportunity for learning (or forgetting)?

50 Understanding Reliability No two tests will consistently produce identical results. All test scores contain some random error Observed Score = True Score + Random Error = Signal + Noise This equation is often written as X = T + E

51 What is Random Error? Any non-systematic source of variance that is unrelated to the construct of interest. Examinee-specific factors Motivation Concentration Fatigue Boredom Test-specific factors Specific questions Ambiguous items Memory lapses Carelessness Luck in guessing Clarity of directions Reading load of items Scoring-specific factors Non-uniform scoring Carelessness Computational errors

52 Formal Definition of Reliability X = T + E A measure of the extent to which an examinee s score reflects their true score, (as opposed to random measurement error) Reliability = Variance True / Variance Observed = 1 Variance Error / Variance Observed Test with reliability of.80 contains 20% random error

53 Reliability and the SEM If reliability is a measure of the stability of measurement, the standard error of measurement (SEM) provides a measure of the instability of measurement. SEM = (st. dev.) 1 Provides a measure of the expected variability in an individual s score (X i ) upon retesting. Score Interval Probability of score falling in interval X i ± SEM 68% X i ± 2 SEM 95%

54 Why Care About Reliability? Measurement error is random its effect on a student s test score is unpredictable. In an unreliable test, students scores consist largely of measurement error. An unreliable test offers no advantage over randomly assigning test scores to students. Reliability is a necessary precursor to validity

55 Estimating Reliability Test-Retest Reliability Administer the same exam to the same group of candidates and correlate the scores Interval should be short enough for no learning, and long enough for no remembering Parallel/Alternate Forms Reliability 1. Develop equivalent forms of the test. 2. Have examinees take both tests. 3. Correlate the scores.

56 Estimating Reliability: 1 Administration Split-Half Reliability 1. Split exam in two random halves 2. Correlate scores across the two halves. 3. Apply a formula to estimate reliability Internal Consistency Cronbach s Coefficient α, KR-20, KR-21 Measures of the extent to which the test items throughout a test are homogeneous α is average split-half reliability across all possible split-halves. α and KR-20 are lower-bound estimates of reliability

57 Reliability In Practice High Stakes Standardized Testing Subscores or Low-stakes tests

58 Improving Reliability Improve item quality Increase the number of points or item alternatives Increase the number of items

59 SECTION III Reliability Questions? End?

60 SECTION IV Test Theory Classical Test Theory Item Response Theory

61 Classical Test Theory X = T + E Person characteristics Total test score serves as a proxy for examinee s level on the construct Item characteristics Item difficulty is estimated as the proportion of examinees who answer an item correctly Item discrimination measures how effectively the item differentiates between high- and lowperforming examinees. Correlation between item score (1/0) and total score

62 Item Response Theory Mathematical modeling approach to test scoring and analysis Less intuitive, but more sophisticated approach Solves many problems with CTT Sample-dependency of item/exam statistics Test-dependency of total scores Tough to compare people and items Equal item weighting No good way to account for guessing

63 Trait Level vs. Prob. Correct Response 1.0 Item Probability θ (Examinee Trait Level) 63

64 An Item Characteristic Curve Probability θ (Examinee Trait Level) 64

65 Sample Independent Same Curve Probability θ (Examinee Trait Level) 65

66 Item Response Theory Directly models the probability of a candidate getting an item correct based on their overall level on the construct and item characteristics is the person s level on the construct a i, b i, and c i are item parameters corresponding to the item s discrimination, difficulty, and guessing likelihood

67 Item Difficulty Probability θ (Examinee Trait Level)

68 Item Difficulty 1.0 Probability θ (Examinee Trait Level)

69 Item Discrimination Probability = = θ (Examinee Trait Level)

70 Accounting for Guessing Probability θ (Examinee Trait Level)

71 Putting it all Together Probability θ (Examinee Trait Level)

72 Test Characteristic Curve (TCC) Describes relationship between total test score and examinee trait level (θ) TCC is obtained by adding item characteristic curves across all values of θ Each test has its own TCC

73 PT Test Characteristic Curve A form with slightly easier items will shift the TCC to the left, requiring the examinee to answer a greater number of items correctly in order to pass

74 PT Test Characteristic Curve A form with slightly harder easier items will shift the TCC to the right, left, requiring the the examinee to answer a smaller greater number of items correctly in order to pass

75 3 Hypothetical TCCs Projected Test Score θ IRT is also independent of characteristics of the specific test form Easier (Top) Anchor (Middle) Harder (Bottom)

76 IRT Summary Although dealing with raw scores is conceptually appealing, it is problematic in practice IRT overcomes many of these problems IRT difficulty and person trait estimates are scaled together Item and person parameters are properties of the items and people, and do not change across samples or test forms. Majority of programs use IRT scoring and linearly transform θ to scale of interest

77 SECTION IV Test Theory Classical Test Theory Item Response Theory Questions?

78 Thank you For more information, please contact Jim Wollack University of Wisconsin Madison

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Introduction to Reliability

Introduction to Reliability Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What

More information

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals Psychometrics for Beginners Lawrence J. Fabrey, PhD Applied Measurement Professionals Learning Objectives Identify key NCCA Accreditation requirements Identify two underlying models of measurement Describe

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

Reliability & Validity Dr. Sudip Chaudhuri

Reliability & Validity Dr. Sudip Chaudhuri Reliability & Validity Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Examining the Psychometric Properties of The McQuaig Occupational Test

Examining the Psychometric Properties of The McQuaig Occupational Test Examining the Psychometric Properties of The McQuaig Occupational Test Prepared for: The McQuaig Institute of Executive Development Ltd., Toronto, Canada Prepared by: Henryk Krajewski, Ph.D., Senior Consultant,

More information

Regression. Lelys Bravo de Guenni. April 24th, 2015

Regression. Lelys Bravo de Guenni. April 24th, 2015 Regression Lelys Bravo de Guenni April 24th, 2015 Outline Regression Simple Linear Regression Prediction of an individual value Estimate Percentile Ranks Regression Simple Linear Regression The idea behind

More information

Overview of Experimentation

Overview of Experimentation The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview;

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971) Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity

More information

Validation of Scales

Validation of Scales Validation of Scales ἀγεωμέτρητος μηδεὶς εἰσίτω (Let none enter without a knowledge of mathematics) D R D I N E S H R A M O O Introduction Validity and validation are crucial to understanding psychological

More information

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION Timothy Olsen HLM II Dr. Gagne ABSTRACT Recent advances

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Chapter 3 Psychometrics: Reliability and Validity

Chapter 3 Psychometrics: Reliability and Validity 34 Chapter 3 Psychometrics: Reliability and Validity Every classroom assessment measure must be appropriately reliable and valid, be it the classic classroom achievement test, attitudinal measure, or performance

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera Homework assignment topics 32-36 Georgina Salas Topics 32-36 EDCI Intro to Research 6300.62 Dr. A.J. Herrera Topic 32 1. Researchers need to use at least how many observers to determine interobserver reliability?

More information

Handout 5: Establishing the Validity of a Survey Instrument

Handout 5: Establishing the Validity of a Survey Instrument In this handout, we will discuss different types of and methods for establishing validity. Recall that this concept was defined in Handout 3 as follows. Definition Validity This is the extent to which

More information

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS

EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971) Ch. 5: Validity Validity History Griggs v. Duke Power Ricci vs. DeStefano Defining Validity Aspects of Validity Face Validity Content Validity Criterion Validity Construct Validity Reliability vs. Validity

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP

Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP 2 0 17 ABPP Annual Conference & Workshops S a n Diego, CA M a y 1 8, 2 017 Basic Psychometrics for The Practicing

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

Validity, Reliability, and Defensibility of Assessments in Veterinary Education Instructional Methods Validity, Reliability, and Defensibility of Assessments in Veterinary Education Kent Hecker g Claudio Violato ABSTRACT In this article, we provide an introduction to and overview

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012

Introduction: Speaker. Introduction: Buros. Buros & Education. Introduction: Participants. Goal 10/5/2012 Introduction: Speaker PhD in Educational Measurement University of Nebraska-Lincoln October 28, 2012 CRITICAL TESTING AND MEASUREMENT CONCEPTS: ASSESSMENT PROFESSIONALS 13 years experience HE Assessment

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

Page 1 of 11 Glossary of Terms Terms Clinical Cut-off Score: A test score that is used to classify test-takers who are likely to possess the attribute being measured to a clinically significant degree

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification What is? IOP 301-T Test Validity It is the accuracy of the measure in reflecting the concept it is supposed to measure. In simple English, the of a test concerns what the test measures and how well it

More information

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit Overview Quality of Measurement Instruments Introduction SPSS Read:

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Chapter 2--Norms and Basic Statistics for Testing

Chapter 2--Norms and Basic Statistics for Testing Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS Attached are the performance reports and analyses for participants from your surgery program on the

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Chapter 9: Intelligence and Psychological Testing

Chapter 9: Intelligence and Psychological Testing Chapter 9: Intelligence and Psychological Testing Intelligence At least two major "consensus" definitions of intelligence have been proposed. First, from Intelligence: Knowns and Unknowns, a report of

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Samantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd.

Samantha Sample 01 Feb 2013 EXPERT STANDARD REPORT ABILITY ADAPT-G ADAPTIVE GENERAL REASONING TEST. Psychometrics Ltd. 01 Feb 2013 EXPERT STANDARD REPORT ADAPTIVE GENERAL REASONING TEST ABILITY ADAPT-G REPORT STRUCTURE The Standard Report presents s results in the following sections: 1. Guide to Using This Report Introduction

More information

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1 SLEEP DISTURBANCE A brief guide to the PROMIS Sleep Disturbance instruments: ADULT PROMIS Item Bank v1.0 Sleep Disturbance PROMIS Short Form v1.0 Sleep Disturbance 4a PROMIS Short Form v1.0 Sleep Disturbance

More information

Intelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10

Intelligence. Exam 3. iclicker. My Brilliant Brain. What is Intelligence? Conceptual Difficulties. Chapter 10 Exam 3 iclicker Mean: 32.8 Median: 33 Mode: 33 SD = 6.4 How many of you have one? Do you think it would be a good addition for this course in the future? Top Score: 49 Top Cumulative Score to date: 144

More information

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015 On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:

More information

Introduction to the HBDI and the Whole Brain Model. Technical Overview & Validity Evidence

Introduction to the HBDI and the Whole Brain Model. Technical Overview & Validity Evidence Introduction to the HBDI and the Whole Brain Model Technical Overview & Validity Evidence UPDATED 2016 OVERVIEW The theory on which the Whole Brain Model and the Herrmann Brain Dominance Instrument (HBDI

More information

2016 Technical Report National Board Dental Hygiene Examination

2016 Technical Report National Board Dental Hygiene Examination 2016 Technical Report National Board Dental Hygiene Examination 2017 Joint Commission on National Dental Examinations All rights reserved. 211 East Chicago Avenue Chicago, Illinois 60611-2637 800.232.1694

More information

The Psychometric Principles Maximizing the quality of assessment

The Psychometric Principles Maximizing the quality of assessment Summer School 2009 Psychometric Principles Professor John Rust University of Cambridge The Psychometric Principles Maximizing the quality of assessment Reliability Validity Standardisation Equivalence

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

Chapter 12. The One- Sample

Chapter 12. The One- Sample Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's

More information

Chapter 1 Applications and Consequences of Psychological Testing

Chapter 1 Applications and Consequences of Psychological Testing Chapter 1 Applications and Consequences of Psychological Testing Topic 1A The Nature and Uses of Psychological Testing The Consequences of Testing From birth to old age, people encounter tests at all most

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

PÄIVI KARHU THE THEORY OF MEASUREMENT

PÄIVI KARHU THE THEORY OF MEASUREMENT PÄIVI KARHU THE THEORY OF MEASUREMENT AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition

More information

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Variables in Research 9/20/2005 P767 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 9/20/2005 P767

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Reliability and Validity

Reliability and Validity Reliability and Today s Objectives Understand the difference between reliability and validity Understand how to develop valid indicators of a concept Reliability and Reliability How accurate or consistent

More information

6. Assessment. 3. Skew This is the degree to which a distribution of scores is not normally distributed. Positive skew

6. Assessment. 3. Skew This is the degree to which a distribution of scores is not normally distributed. Positive skew 6. Assessment 1. Measurement: general process of determining the dimensions of an attribute or trait. Assessment: Processes and procedures for collecting information about human behavior. Assessment tools:

More information

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that it purports to perform. Does an indicator accurately

More information

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee Associate Prof. Dr Anne Yee Dr Mahmoud Danaee 1 2 What does this resemble? Rorschach test At the end of the test, the tester says you need therapy or you can't work for this company 3 Psychological Testing

More information

DATA GATHERING. Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings.

DATA GATHERING. Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings. DATA GATHERING Define : Is a process of collecting data from sample, so as for testing & analyzing before reporting research findings. 2012 John Wiley & Sons Ltd. Measurement Measurement: the assignment

More information

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method Chapter -6 Reliability and Validity of the test 6.1 Introduction 6.2 Reliability of the test 6.2.1 Test - Retest Method 6.2.2 Rational Equivalence Method 6.2.3 Split-Half Method 6.3 Validity of the test

More information

AP PSYCH Unit 11.2 Assessing Intelligence

AP PSYCH Unit 11.2 Assessing Intelligence AP PSYCH Unit 11.2 Assessing Intelligence Review - What is Intelligence? Mental quality involving skill at information processing, learning from experience, problem solving, and adapting to new or changing

More information

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007

Reliability Theory for Total Test Scores. Measurement Methods Lecture 7 2/27/2007 Reliability Theory for Total Test Scores Measurement Methods Lecture 7 2/27/2007 Today s Class Reliability theory True score model Applications of the model Lecture 7 Psych 892 2 Great Moments in Measurement

More information

The Current State of Our Education

The Current State of Our Education 1 The Current State of Our Education 2 Quantitative Research School of Management www.ramayah.com Mental Challenge A man and his son are involved in an automobile accident. The man is killed and the boy,

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

2 Types of psychological tests and their validity, precision and standards

2 Types of psychological tests and their validity, precision and standards 2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is

More information

11-3. Learning Objectives

11-3. Learning Objectives 11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types

More information

Psychological testing

Psychological testing Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE

THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE THE ROLE OF PSYCHOMETRIC ENTRANCE TEST IN ADMISSION PROCESSES FOR NON-SELECTIVE ACADEMIC DEPARTMENTS: STUDY CASE IN YEZREEL VALLEY COLLEGE Tal Shahor The Academic College of Emek Yezreel Emek Yezreel 19300,

More information

Underlying Theory & Basic Issues

Underlying Theory & Basic Issues Underlying Theory & Basic Issues Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1 All Too True 2 Validity In software engineering, we worry about various issues: E-Type systems: Usefulness is it doing what

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Chapter 4: Defining and Measuring Variables

Chapter 4: Defining and Measuring Variables Chapter 4: Defining and Measuring Variables A. LEARNING OUTCOMES. After studying this chapter students should be able to: Distinguish between qualitative and quantitative, discrete and continuous, and

More information