Basic concepts and principles of classical test theory

Size: px
Start display at page:

Download "Basic concepts and principles of classical test theory"

Transcription

1 Basic concepts and principles of classical test theory Jan-Eric Gustafsson

2 What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must be defined in theoretical terms. Measurement should be understood in a broad sense and encompasses also classification and assessment.

3 Why should one measure? Increases precision and comparability Gives access to a well developed set of tools for developing measurement instruments" collecting data" summarizing and describing data" analyzing data" making inferences and generalizations" However: Measurement suits certain purposes but not others

4 Classical and modern theories of measurement Theories of measurement to a large extent deal with how to put together components (items or subscales) into scales with known properties: Classical theories of measurement assume simple relations between the components and the dimension to be measured. Measures of test properties are typically group-dependent. " Modern theories of measurement (IRT) are based on probabilistic models of relations between item scores and characteristics of persons and item. This allows for estimation of ability from different items for different persons, and for estimating item characteristics which are invariant over groups of persons. "

5 An example: the IEA Reading Literacy study 1991 In 1991 some Swedish students in grade 3 participated in a study of reading literacy along with samples of students in about 30 other countries (RL 1991). A large number of instruments: Reading literacy tests: 15 texts from three categories and 66 multiple choice items." Student questionnaire: Questions about home background, attitudes towards reading, and reading habits." Parental questionnaire: Literacy activities in the home, economic and cultural resources, reading habits and attitudes, relations between home and school, education and occupation." Teacher questionnaire: Questions about the class, the teaching of reading, resources and the teacher." School questionnaire: Characteristics of the school, resources, school climate, and relations between home and school."

6 Starting points for the construction of the reading literacy test Definition: Reading literacy is the ability to understand and use such forms of written language which are required in society and/or are of value for the individual Requirements on the texts: The students should not have met them before" The texts should be possible to use again after 10 years" The texts should be appropriate for all countries, languages, ethnic and socioeconomic groups and both genders." They should be possible to use stand alone in such a way that they could provide a meaningful reading experience" They should not be formulated in such general terms that the students would be able to answer the questions without reading the texts" They should comprise different levels of difficulty"

7 The reading literacy test Three types of texts Narrative prose. These are continous texts which aim to tell a story. The texts typically follow a linear time sequence, and are usually intended to entertain or to involve the reader emotionally. The texts ranged in length from short fables to longer stories." Expository prose. This category comprises continuous texts which aim to convey factual information or opinion to the reader. " Documents. These are structured presentations of information, in the form of graphs, charts, maps, lists, or sets of instructions. The reader can process the information in a nonlinear fashion without reading the whole text, and typically the number of words is limited." Items" In relation to each text between two and six questions were asked." Altogether there were 66 multiple-choice items and two open-ended questions. The latter were not included in the analysis because of too low inter-rater agreement." Booklets The 15 texts and the 66 multiple-choice items were distributed over two booklets (A and B)."

8 Test components This test, and most other tests, thus consists of different types of components: Single questions (items), which here are scored 0 (incorrect choice) and 1 (correct choice)" Text passages (testlets, parcels, or item bundles) with between 2 and 6 as the miximum score." Booklets" If test components (items) are independent, there is more flexibility and power in designing tests than if there are dependencies among the components."

9 A minitest A minitest has been created from the items (10) to two of the narrative texts ( Bird and Shark )

10 Statistical measures for the minitest Descriptive Statistics minitest Valid N (listwise) N Minimum Maximum Mean Std. Deviation

11 Distribution of scores for the minitest

12 Means and standard deviations of the items Item Statistics nbird1r nbird2r nbird3r nbird4r nbird5r nshak1r nshak2r nshak3r nshak4r nshak5r Mean Std. Deviation N

13 Correlations among the items Inter-Item Correlation Matrix nbird1r nbird2r nbird3r nbird4r nbird5r nshak1r nshak2r nshak3r nshak4r nshak5r nbird1r nbird2r nbird3r nbird4r nbird5r nshak1r nshak2r nshak3r nshak4r nshak5r

14 Relations between items and the total score Item-Total Statistics nbird1r nbird2r nbird3r nbird4r nbird5r nshak1r nshak2r nshak3r nshak4r nshak5r Scale Corrected Squared Cronbach's Scale Mean if Variance if Item-Total Multiple Alpha if Item Item Deleted Item Deleted Correlation Correlation Deleted

15 Reliability The precision of an instrument How well an instrument resists the influence of random variation Does the instrument give the same result upon repeated measurements?

16 Definition of reliability Observed score = True score + Error An instruments correlation with itself The ratio between true score variance and observed score variance (observed score variance = true score variance + error variance) What is a true score and what is error?

17 Sources of variance in test scores (after Thorndike, 1951) Individual characteristics External/situational factors

18 Reasons for reliability loss Factors at test administration Rating of responses Guessing Selection of items for the test Variation in individuals true scores

19 Ways to determine reliability To determine reliability we would like to be able to compute the correlation between the test and itself, or to know the true scores and the error scores. This is not possible, so different approaches have been devised: Test retest: Administer the same test twice (memory effects may be a problem; sensitive to temporal instability, but not to effects of item selection)" Parallel test: Create an identical twin of the test (sensitive to effects of item selection; may or may not be sensitive to temporal instability)" Split-half: Create two parallel tests by randomly splitting the items into two groups (sensitive to effects of item selection; not sensitive to temporal instability). The splithalf correlation gives the reliability for a half test, and to get it for the full test it needs to be corrected with the Spearman-Brown prophecy formula."

20 Ways to determine reliability, cont Cronbach s α " A measure of internal consistency among items" Sensitive to effects of item selection; not sensitive to temporal instability" The mean of all possible split-half coefficients" Increases as a function of the correlation among the items and as a function of the number of items" " α = (k/(k-1))*[1-σ(var(itemscores))/var(totscore)]

21 Computation of Cronbach s α with SPSS RELIABILITY /VARIABLES=nbird1r nbird2r nbird3r nbird4r nbird5r nshak1r nshak2r nshak3r nshak4r nshak5r /SCALE('ALL VARIABLES') ALL /MODEL=ALPHA.

22 Reliability and test length The reliability increases as a function of the test length according to the Spearman-Brown prophecy formula : If we increase our minitest 6.5 times to 65 items we expect a reliability of r(6.5) = 6.5*.737/(1+5.5*.737) =.948 If we compute Cronbach s α from the 65 items we obtain:

23 Reliability as a function of test length

24 Split-half reliability (first 33 items versus last 32 items) This analysis yields a much lower reliability estimate than Cronbach s α did! Cronbach's Alpha Reliability Statistics Part 1 Value.888 N of Items 33 Part 2 Value.913 N of Items 32 Total N of Items Equal Length Unequal Length Correlation Between Forms Spearman- Brown Guttman Coefficient Split-Half Coefficient

25 Cronbach s α for passage scores This analysis too yields a much lower reliability estimate than Cronbach s α for item scores did!

26 Cronbach s α assumptions 1. All components measure the same underlying dimension 2. All components have the same relation to the underlying dimension 3. All components have the same error variance If assumptions 2 and 3 are violated α will underestimate reliability but will provide a lower bound to reliability If assumption 1 is violated, α misestimates reliability, but we also run into interpretational difficulties We need methods to test these assumptions

27 The Birds passage

28 A congeneric latent variable model for the items in the Birds passage

29 Estimating the model Needed: estimates of 10 parameters (4 regression coefficients, 5 error variances, 1 variance of the latent variable). Available: 15 elements of the covariance matrix (10 covariances and 5 variances). Express the known entities in terms of the unknown parameters through application of path rules, e. g.: Cov(NBIRD4R, NBIRD5R) = b4 Var(nbird) b3" Cov(NBIRD2R, NBIRD1R) = b1 Var(nbird) 1" Var(NBIRD2R) = b1 Var(nbird) b1 + 1 Var(NBIRD2R&) 1" Solve the 15 equations for the 10 unknown parameters.

30 Unstandardized parameter estimates Standardized parameter estimates

31 Does the model fit the data? Reproduce the covariance matrix from the estimated parameters (the implied matrix) and compare it with the observed matrix, e.g.: Cov(NBIRD4R, NBIRD5R) = 0.54 x 0.05 x 1.20 = (observed value = 0.035) Cov(NBIRD1R, NBIRD2R) = 1.00 x 0.05 x 0.74 = (observed value = 0.037) A chi-square test of model fit may be computed: Chi-square = , df = 5, p <.00

32 Problems with the Chi-square Goodness of Fit test The test is χ 2 distributed only when data has a multivariate normal distribution under maximum likelihood estimation. When the sample size is large even trivial deviations between model and data cause the χ 2 test to be significant. When the sample size is small even important deviations from the true model may be undetected. A model with many free parameters has a better χ 2 value than a model with few free parameters. However, models with few free parameters are generally to be preferred over models with many free parameters.

33 The Root Mean Square Error of Approximation (RMSEA) The RMSEA measures the amount of discrepancy between model and data in the population, taking model complexity (i. e., number of estimated parameters) into account. Values less than 0.05 indicate good fit, and values up to 0.07 or 0.08 may be accepted." The Test of Close Fit tests the hypothesis that RMSEA < 0.05." A 90 % confidence interval of RMSEA may be contructed. The lower limit of interval should be less than.05 and the upper limit of the interval should not be higher than " The nbird model: RMSEA = 0.064, 90 % CI

34 A parallel latent variable model for the items in the Birds passage

35 Unstandardized parameter estimates Standardized parameter estimates Chi-square = , df = 13, RMSEA = 0.226, 90 % CI

36 Reliability calculations ρ for a single item = 0.04/( ) = 0.23 ρ for 5 items according to S-B = 5*0.23/(1+4*0.23) = ρ for the total score can also be computed with formula: ρ = Latvar * Σ(b i ) 2 /(Latvar * Σ(b i ) 2 + Σ Resvar i ) Parallell model: ρ = 0.04*25/(0.04*25 + 5*0.134) = Congeneric model: 0.049* /(0.049* ) = 0.598

37 Some conclusions Cronbach s α is based on several assumptions, and one of them is that items should be identical in terms of relation to the latent variable ( discrimination ) and residual variance (the parallelity assumption). However, our results indicate that estimation of α is robust against deviations from the parallelity assumption. This seems to be quite a general finding. The unidimensionality assumption is another assumption, which we now turn to.

38 A two-dimensional model for the passages Bird and Shark

39 Standardized estimates for the two-dimensional model Chi-square , df = 35, RMSEA = 0.046, 90 % CI

40 Standardized estimates for a one-dimensional model Chi-square = , df = 36, RMSEA = 0.056, 90 % CI

41 An orthogonal model with general and specific factors

42 Results from a model with one general factor and 15 passage factors Fit statistics for the one-dimensional model: Chi-square = , df = 2015, RMSEA = 0.056, 90 % CI Fit statistics for a model with one general factor and 15 passage factors: Chi-square = , df = 2000, RMSEA = 0.032, 90 % CI Estimated components of variance in sum of scores READGEN ECARD 0.03 NBIRD 0.33 DISLA 0.38 DMARIA 0.12 NDOG 0.64 EWLR 1.66 ESND 0.13 NSHK 0.38 DBTT 0.37 DBUS 0.35 DCNT 0.34 DTEMP 0.30 EMRM 0.30 NGRP 1.44 ETRE 1.84 Error 7.58 Estimated total variance Sum of passage variances 8.61 Estimated systematic variance Estimated total reliability Estimated reliability for ReadGen 0.904

43 Definition of validity Does the instrument measure what it intends to measure? Validity is an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989, p 5).

44 The three classical forms of validity Content validity. How well do the items in a test cover a certain domain? Criterion-related validity. How well does the test predict a criterion? Construct validity. How well does the test function as an indicator of a construct?

45 Construct validity as the overarching validity construct Content validity and criterion-related validity are insufficient forms of validity and require construct validity. This has led to the view that construct validity is the only needed validity construct The meaning of construct validity has been broadened, particularly by Messick through introduction of consequential aspects of validity.

46 Threats against construct validity Construct underrepresentation. The instrument covers only parts of the construct, and leaves out important dimensions or facets. Construct-irrelevant variance. The instrumentet is influenced by sources of variation which have nothing to do with the construct.

47 Testing construct validity... test validation embraces all of the experimental, statistical, and philosophical means by which hypotheses and scientific theories are evaluated. (Messick, 1989, p 6).

48 Sources of information about construct validity Internal structure (explorative and confirmatory factor analysis) Relations with other variables (external structure) Assessment of content Studies of processes Differences over time and between groups Effects of experimental interventions Value implications and social consequences, concerning both intended and unintended effects

49 A three-dimensional model for the RL-test

50 Messick s progressive matrix

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Subescala D CULTURA ORGANIZACIONAL. Factor Analysis

Subescala D CULTURA ORGANIZACIONAL. Factor Analysis Subescala D CULTURA ORGANIZACIONAL Factor Analysis Descriptive Statistics Mean Std. Deviation Analysis N 1 3,44 1,244 224 2 3,43 1,258 224 3 4,50,989 224 4 4,38 1,118 224 5 4,30 1,151 224 6 4,27 1,205

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Models in Educational Measurement

Models in Educational Measurement Models in Educational Measurement Jan-Eric Gustafsson Department of Education and Special Education University of Gothenburg Background Measurement in education and psychology has increasingly come to

More information

Subescala B Compromisso com a organização escolar. Factor Analysis

Subescala B Compromisso com a organização escolar. Factor Analysis Subescala B Compromisso com a organização escolar Factor Analysis Descriptive Statistics Mean Std. Deviation Analysis N 1 4,42 1,108 233 2 4,41 1,001 233 3 4,99 1,261 233 4 4,37 1,055 233 5 4,48 1,018

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

Statistics for Psychosocial Research Session 1: September 1 Bill

Statistics for Psychosocial Research Session 1: September 1 Bill Statistics for Psychosocial Research Session 1: September 1 Bill Introduction to Staff Purpose of the Course Administration Introduction to Test Theory Statistics for Psychosocial Research Overview: a)

More information

APÊNDICE 6. Análise fatorial e análise de consistência interna

APÊNDICE 6. Análise fatorial e análise de consistência interna APÊNDICE 6 Análise fatorial e análise de consistência interna Subescala A Missão, a Visão e os Valores A ação do diretor Factor Analysis Descriptive Statistics Mean Std. Deviation Analysis N 1 4,46 1,056

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children Dr. KAMALPREET RAKHRA MD MPH PhD(Candidate) No conflict of interest Child Behavioural Check

More information

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC)

Instrument equivalence across ethnic groups. Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Instrument equivalence across ethnic groups Antonio Olmos (MHCD) Susan R. Hutchinson (UNC) Overview Instrument Equivalence Measurement Invariance Invariance in Reliability Scores Factorial Invariance Item

More information

PÄIVI KARHU THE THEORY OF MEASUREMENT

PÄIVI KARHU THE THEORY OF MEASUREMENT PÄIVI KARHU THE THEORY OF MEASUREMENT AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition

More information

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT

More information

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology

Running head: CFA OF TDI AND STICSA 1. p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Running head: CFA OF TDI AND STICSA 1 p Factor or Negative Emotionality? Joint CFA of Internalizing Symptomology Caspi et al. (2014) reported that CFA results supported a general psychopathology factor,

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Personal Style Inventory Item Revision: Confirmatory Factor Analysis Personal Style Inventory Item Revision: Confirmatory Factor Analysis This research was a team effort of Enzo Valenzi and myself. I m deeply grateful to Enzo for his years of statistical contributions to

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

Validity, Reliability, and Fairness in Music Testing

Validity, Reliability, and Fairness in Music Testing chapter 20 Validity, Reliability, and Fairness in Music Testing Brian C. Wesolowski and Stefanie A. Wind The focus of this chapter is on validity, reliability, and fairness in music testing. A test can

More information

Chapter 3. Psychometric Properties

Chapter 3. Psychometric Properties Chapter 3 Psychometric Properties Reliability The reliability of an assessment tool like the DECA-C is defined as, the consistency of scores obtained by the same person when reexamined with the same test

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

Class 7 Everything is Related

Class 7 Everything is Related Class 7 Everything is Related Correlational Designs l 1 Topics Types of Correlational Designs Understanding Correlation Reporting Correlational Statistics Quantitative Designs l 2 Types of Correlational

More information

Internal structure evidence of validity

Internal structure evidence of validity Internal structure evidence of validity Dr Wan Nor Arifin Lecturer, Unit of Biostatistics and Research Methodology, Universiti Sains Malaysia. E-mail: wnarifin@usm.my Wan Nor Arifin, 2017. Internal structure

More information

The MHSIP: A Tale of Three Centers

The MHSIP: A Tale of Three Centers The MHSIP: A Tale of Three Centers P. Antonio Olmos-Gallo, Ph.D. Kathryn DeRoche, M.A. Mental Health Center of Denver Richard Swanson, Ph.D., J.D. Aurora Research Institute John Mahalik, Ph.D., M.P.A.

More information

International Conference on Humanities and Social Science (HSS 2016)

International Conference on Humanities and Social Science (HSS 2016) International Conference on Humanities and Social Science (HSS 2016) The Chinese Version of WOrk-reLated Flow Inventory (WOLF): An Examination of Reliability and Validity Yi-yu CHEN1, a, Xiao-tong YU2,

More information

EFFECTS OF ITEM ORDER ON CONSISTENCY AND PRECISION UNDER DIFFERENT ORDERING SCHEMES IN ATTITUDINAL SCALES: A CASE OF PHYSICAL SELF-CONCEPT SCALES

EFFECTS OF ITEM ORDER ON CONSISTENCY AND PRECISION UNDER DIFFERENT ORDERING SCHEMES IN ATTITUDINAL SCALES: A CASE OF PHYSICAL SELF-CONCEPT SCALES Item Ordering 1 Edgeworth Series in Quantitative Educational and Social Science (Report No.ESQESS-2001-3) EFFECTS OF ITEM ORDER ON CONSISTENCY AND PRECISION UNDER DIFFERENT ORDERING SCHEMES IN ATTITUDINAL

More information

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA

Running head: CFA OF STICSA 1. Model-Based Factor Reliability and Replicability of the STICSA Running head: CFA OF STICSA 1 Model-Based Factor Reliability and Replicability of the STICSA The State-Trait Inventory of Cognitive and Somatic Anxiety (STICSA; Ree et al., 2008) is a new measure of anxiety

More information

SPSS output for 420 midterm study

SPSS output for 420 midterm study Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM) International Journal of Advances in Applied Sciences (IJAAS) Vol. 3, No. 4, December 2014, pp. 172~177 ISSN: 2252-8814 172 Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement

More information

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected

More information

SPSS output for 420 midterm study

SPSS output for 420 midterm study Ψ Psy Midterm Part In lab (5 points total) Your professor decides that he wants to find out how much impact amount of study time has on the first midterm. He randomly assigns students to study for hours,

More information

Reliability. Internal Reliability

Reliability. Internal Reliability 32 Reliability T he reliability of assessments like the DECA-I/T is defined as, the consistency of scores obtained by the same person when reexamined with the same test on different occasions, or with

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

CHAPTER-III METHODOLOGY

CHAPTER-III METHODOLOGY CHAPTER-III METHODOLOGY 3.1 INTRODUCTION This chapter deals with the methodology employed in order to achieve the set objectives of the study. Details regarding sample, description of the tools employed,

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES) Assessing the Validity and Reliability of the Teacher Keys Effectiveness System (TKES) and the Leader Keys Effectiveness System (LKES) of the Georgia Department of Education Submitted by The Georgia Center

More information

Paul Irwing, Manchester Business School

Paul Irwing, Manchester Business School Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human

More information

Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI

Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI Title: The Theory of Planned Behavior (TPB) and Texting While Driving Behavior in College Students MS # Manuscript ID GCPI-2015-02298 Appendix 1 Role of TPB in changing other behaviors TPB has been applied

More information

Psychologist use statistics for 2 things

Psychologist use statistics for 2 things Psychologist use statistics for 2 things O Summarize the information from the study/experiment O Measures of central tendency O Mean O Median O Mode O Make judgements and decisions about the data O See

More information

Anumber of studies have shown that ignorance regarding fundamental measurement

Anumber of studies have shown that ignorance regarding fundamental measurement 10.1177/0013164406288165 Educational Graham / Congeneric and Psychological Reliability Measurement Congeneric and (Essentially) Tau-Equivalent Estimates of Score Reliability What They Are and How to Use

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit Overview Quality of Measurement Instruments Introduction SPSS Read:

More information

Development of self efficacy and attitude toward analytic geometry scale (SAAG-S)

Development of self efficacy and attitude toward analytic geometry scale (SAAG-S) Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 55 ( 2012 ) 20 27 INTERNATIONAL CONFERENCE ON NEW HORIZONS IN EDUCATION INTE2012 Development of self efficacy and attitude

More information

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students

Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students 611456SGOXXX10.1177/2158244015611456SAGE OpenYockey and Kralowec research-article2015 Article Confirmatory Factor Analysis of the Procrastination Assessment Scale for Students SAGE Open October-December

More information

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee Associate Prof. Dr Anne Yee Dr Mahmoud Danaee 1 2 What does this resemble? Rorschach test At the end of the test, the tester says you need therapy or you can't work for this company 3 Psychological Testing

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

CHAPTER 3. Research Methodology

CHAPTER 3. Research Methodology CHAPTER 3 Research Methodology The research studies the youth s attitude towards Thai cuisine in Dongguan City, China in 2013. Researcher has selected survey methodology by operating under procedures as

More information

Running Head: MULTIPLE CHOICE AND CONSTRUCTED RESPONSE ITEMS. The Contribution of Constructed Response Items to Large Scale Assessment:

Running Head: MULTIPLE CHOICE AND CONSTRUCTED RESPONSE ITEMS. The Contribution of Constructed Response Items to Large Scale Assessment: Running Head: MULTIPLE CHOICE AND CONSTRUCTED RESPONSE ITEMS The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding their Impact Robert W. Lissitz 1 and Xiaodong

More information

Answer Key to Problem Set #1

Answer Key to Problem Set #1 Answer Key to Problem Set #1 Two notes: q#4e: Please disregard q#5e: The frequency tables of the total CESD scales of 94, 96 and 98 in question 5e should sum up to 328 observation not 924 (the student

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Saville Consulting Wave Professional Styles Handbook

Saville Consulting Wave Professional Styles Handbook Saville Consulting Wave Professional Styles Handbook PART 4: TECHNICAL Chapter 19: Reliability This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed

More information

Psychometric Instrument Development

Psychometric Instrument Development Psychometric Instrument Development Image source: http://commons.wikimedia.org/wiki/file:soft_ruler.jpg, CC-by-SA 3.0 Lecture 6 Survey Research & Design in Psychology James Neill, 2017 Creative Commons

More information

Psychometric Instrument Development

Psychometric Instrument Development Psychometric Instrument Development Image source: http://commons.wikimedia.org/wiki/file:soft_ruler.jpg, CC-by-SA 3.0 Lecture 6 Survey Research & Design in Psychology James Neill, 2017 Creative Commons

More information

Psychometric Instrument Development

Psychometric Instrument Development Psychometric Instrument Development Image source: http://commons.wikimedia.org/wiki/file:soft_ruler.jpg, CC-by-SA 3.0 Lecture 6 Survey Research & Design in Psychology James Neill, 2016 Creative Commons

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

How!Good!Are!Our!Measures?! Investigating!the!Appropriate! Use!of!Factor!Analysis!for!Survey! Instruments!

How!Good!Are!Our!Measures?! Investigating!the!Appropriate! Use!of!Factor!Analysis!for!Survey! Instruments! 22 JournalofMultiDisciplinaryEvaluation Volume11,Issue25,2015 HowGoodAreOurMeasures? InvestigatingtheAppropriate UseofFactorAnalysisforSurvey Instruments MeganSanders TheOhioStateUniversity P.CristianGugiu

More information

Business Research Methods. Introduction to Data Analysis

Business Research Methods. Introduction to Data Analysis Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of

More information

Measurement is the process of observing and recording the observations. Two important issues:

Measurement is the process of observing and recording the observations. Two important issues: Farzad Eskandanian Measurement is the process of observing and recording the observations. Two important issues: 1. Understanding the fundamental ideas: Levels of measurement: nominal, ordinal, interval

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Connectedness DEOCS 4.1 Construct Validity Summary

Connectedness DEOCS 4.1 Construct Validity Summary Connectedness DEOCS 4.1 Construct Validity Summary DEFENSE EQUAL OPPORTUNITY MANAGEMENT INSTITUTE DIRECTORATE OF RESEARCH DEVELOPMENT AND STRATEGIC INITIATIVES Directed by Dr. Daniel P. McDonald, Executive

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming

The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming Curr Psychol (2009) 28:194 201 DOI 10.1007/s12144-009-9058-x The Psychometric Properties of Dispositional Flow Scale-2 in Internet Gaming C. K. John Wang & W. C. Liu & A. Khoo Published online: 27 May

More information

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Midterm Exam MMI 409 Spring 2009 Gordon Bleil Midterm Exam MMI 409 Spring 2009 Gordon Bleil Table of contents: (Hyperlinked to problem sections) Problem 1 Hypothesis Tests Results Inferences Problem 2 Hypothesis Tests Results Inferences Problem 3

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

Psychometric Instrument Development

Psychometric Instrument Development Psychometric Instrument Development Lecture 6 Survey Research & Design in Psychology James Neill, 2012 Readings: Psychometrics 1. Bryman & Cramer (1997). Concepts and their measurement. [chapter - ereserve]

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015 On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA 1 International Journal of Advance Research, IJOAR.org Volume 1, Issue 2, MAY 2013, Online: ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT

More information

Knowledge as a driver of public perceptions about climate change reassessed

Knowledge as a driver of public perceptions about climate change reassessed 1. Method and measures 1.1 Sample Knowledge as a driver of public perceptions about climate change reassessed In the cross-country study, the age of the participants ranged between 20 and 79 years, with

More information

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology* Timothy Teo & Chwee Beng Lee Nanyang Technology University Singapore This

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

validscale: A Stata module to validate subjective measurement scales using Classical Test Theory

validscale: A Stata module to validate subjective measurement scales using Classical Test Theory : A Stata module to validate subjective measurement scales using Classical Test Theory Bastien Perrot, Emmanuelle Bataille, Jean-Benoit Hardouin UMR INSERM U1246 - SPHERE "methods in Patient-centered outcomes

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

RESULTS. Chapter INTRODUCTION

RESULTS. Chapter INTRODUCTION 8.1 Chapter 8 RESULTS 8.1 INTRODUCTION The previous chapter provided a theoretical discussion of the research and statistical methodology. This chapter focuses on the interpretation and discussion of the

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Packianathan Chelladurai Troy University, Troy, Alabama, USA.

Packianathan Chelladurai Troy University, Troy, Alabama, USA. DIMENSIONS OF ORGANIZATIONAL CAPACITY OF SPORT GOVERNING BODIES OF GHANA: DEVELOPMENT OF A SCALE Christopher Essilfie I.B.S Consulting Alliance, Accra, Ghana E-mail: chrisessilfie@yahoo.com Packianathan

More information

Validity, Reliability and Classical Assumptions

Validity, Reliability and Classical Assumptions , Reliability and Classical Assumptions Presented by Mahendra AN Sources: www-psych.stanford.edu/~bigopp/.ppt http://ets.mnsu.edu/darbok/ethn402-502/reliability.ppt http://5martconsultingbandung.blogspot.com/2011/01/uji-asumsi-klasik.html

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Discriminant Analysis with Categorical Data

Discriminant Analysis with Categorical Data - AW)a Discriminant Analysis with Categorical Data John E. Overall and J. Arthur Woodward The University of Texas Medical Branch, Galveston A method for studying relationships among groups in terms of

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information