PÄIVI KARHU THE THEORY OF MEASUREMENT

AGENDA 1. Quality of Measurement a) Validity Definition and Types of validity Assessment of validity Threats of Validity b) Reliability True Score Theory Definition and Types of reliability Assessment of reliability 2. Levels of Measurement 2

Quality of Measurement - Validity - Definition and Types of validity WHAT IS MEASUREMENT? Measurement is the process of observing and recording the observations that are collected as part of a research effort. 4

Quality of Measurement - Validity - Definition and Types of validity VALIDITY QUESTIONS ARE CUMULATIVE. Conclusion Internal Is the relationship causal? Is there a relationship between the cause and effect? Construct Did you measure what you wanted to measure? External How well can you generalize from your sample to other persons, place, times? Validity The best available approximation of the truth of a given proposition, inference or conclusion. 5

Quality of Measurement - Validity - Definition and Types of validity VALIDITY TYPES CONSTRUCT VALIDITY An assessment of how well your actual programs or measures reflect your ideas or theories Is the operationalization* a good reflection of the construct? Does the operationalization* behave the way it should? TRANSLATION VALIDITY CRITERION-RELATED VALIDITY Face Validity Content Validity *)act of translating a construct into its manifestation Predictive Validity Concurrent Validity Convergent Validity Discriminant Validity 6

Quality of Measurement - Validity - Definition and Types of validity TRANSLATION VALIDITY Face Validity you look at the operationalization and see whether "on its face" it seems like a good translation of the construct Content Validity you essentially check the operationalization against the relevant content domain for the construct 7

Quality of Measurement - Validity - Definition and Types of validity CRITERION-RELATED VALIDITY Predictive Validity we assess the operationalization's ability to predict something it should theoretically be able to predict Concurrent Validity we assess the operationalization's ability to distinguish between groups that it should theoretically be able to distinguish between Convergent Validity we examine the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be similar to Discriminant Validity we examine the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should be not be similar to 8

Quality of Measurement - Validity - Definition and Types of validity Land of Observation Land of Theory CONSTRUCT VALIDITY Theory Cause construct What you think Cause-effect Construct Effect construct Can we generalize to the constructs? Program What you do Program-outcome Relationship Observations What you see Observation

Quality of Measurement - Validity - Definition and Types of validity CONSTRUCT VALIDITY Convergent correlations should be > Discriminant correlations They work together: Evidence for both Evidence for construct validity Convergent Validity Discriminant Validity To establish convergent validity, you need to show that measures that should be related are in reality related To establish discriminant validity, you need to show that measures that should not be related are in reality not related 10

Quality of Measurement - Validity - Definition and Types of validity CONSTRUCT VALIDITY WHY? The more complex your theoretical model, the more evidence you are providing that you know what you re talking about! Convergent correlations should be > Discriminant correlations They work together: Evidence for both Evidence for construct validity Convergent Validity Discriminant Validity To establish convergent validity, you need to show that measures that should be related are in reality related To establish discriminant validity, you need to show that measures that should not be related are in reality not related 11

GREAT! This analysis provided evidence for both convergent and discriminant validity. It shows that the three self-esteem measures reflect the same construct and locus-of-control measures reflect the same construct and that these two sets of measures reflect two different constructs. BUT: What are the constructs measuring?! How do you show that your measures are actually measuring self-esteem or locus-ofcontrol? 12

Quality of Measurement - Validity - Assessment of Construct Validity CONSTRUCT VALIDITY ASSESSMENT Assessment tools: The Nomological Network The Multitrait-Multimethod matrix (MTMM) Pattern Matching Structural Equation Modeling (SEM) 13

Quality of Measurement - Validity - Assessment of Construct Validity THE NOMOLOGICAL NETWORK Developed by Lee Cronbach and Paul Meehl (1955) Nomological network=lawful network This network includes Theoretical framework for what you are trying to measure Empirical framework for how you are going to measure it A specification of the linkages among and between those two frameworks Not practical Only useful as a philosophical foundation for construct validity 14

Quality of Measurement - Validity - Assessment of Construct Validity THE NOMOLOGICAL NETWORK 15

Quality of Measurement - Validity - Assessment of Construct Validity THE MULTITRAIT-MULTIMETHOD MATRIX (MTMM) Developed by Campbell and Fiske (1959) An attempt to provide a practical methodology Introduced convergent and discriminant validity as subcategories of construct validity A matrix of correlations arranged to facilitate the assessment of construct validity Assumes that you have measured each construct (trait) with different methods in a fully crossed design (traits by methods) A restrictive methodology 16

Quality of Measurement - Validity - Assessment of Construct Validity MTMM Example: MTMM for 3 concepts Correlations 3 kinds of shapes 1. Diagonals 2. Triangels 3. Blocks 3 Concepts Traits A, B, C 3 Methods 1, 2, 3 17 17

Quality of Measurement - Validity - Assessment of Construct Validity INTERPRETATION OF THE MTMM Coefficients in the reliability diagonal should consistently be the highest in the matrix Coefficients in the validity diagonals should be significantly different from zero and high enough to warrant further investigation A validity coefficient should be higher than values lying in its column and row in the same heteromethod block A validity coefficient should be higher than all coefficients in the heterotrait-monomethod triangles 18

Quality of Measurement - Validity - Assessment of Construct Validity PROS AND CONS OF MTMM + - An operational methodology for assessing construct validity A rigorous framework for assessing construct validity Requirement of a fully crossed measurement design Judgemental nature hindering adoption Impossible to quantifiy the degree of construct validity in a study 20

Quality of Measurement - Validity - Assessment of Construct Validity PATTERN* MATCHING Pattern* Matching is the of construct validity Degree of correspondence between the theoretical and the observed pattern *)any arrangement of objects or entities; nonrandom and at least potentially 21 describable.

Quality of Measurement - Validity - Assessment of Construct Validity A PATTERN MATCHING EXAMPLE 22

Quality of Measurement - Validity - Assessment of Construct Validity PROS AND CONS OF PATTERN MATCHING + - Higher generality and flexibility that MTMMs: it does not require that you measure each construct with multiple methods Treatment of convergence and discrimination as continuum Estimation of the overall construct validity for a set of measures in a specific context Specification of what you think about the constructs Requires the precise specification of the theory of the constructs Requires the qualification of both patterns Requires the description in matrices that have the same construct 23

Quality of Measurement - Validity - Threats of Validity THREATS TO CONSTRUCT VALIDITY Design threats Inadequate preoperational explication of constructs Mono-Operation Bias single version of your independent variable Mono-Method Bias single method of measurement Interaction of different treatments not able to isolate effects of your program from other treatments Social Threats Hypothesis Guessing participants base behavior on what they guess what the real purpose of the study is Evaluation Apprehension participants afraid of being evaluated Experimenter Expectancies researcher communicates what the desired outcome is Interaction of testing and treatment pretest makes participants more sensitive to the treatment Restricted generalizability across construct side effects of your treatments Confounding constructs and Levels of Constructs dosage level changes results 24

Quality of Measurement - Reliability - True Score Theory TRUE SCORE THEORY Assumes that every observation is composed of two components: true value plus random error Error can be divided further into two subcomponents Random error Systematic error 26

Quality of Measurement - Reliability - True Score Theory MEASUREMENT ERROR Random error Caused by random factor Pushes scores up or down randomly Random errors sum up to 0 Adds variability to data but does not affect the average performance of the group Systematic error Caused systematically by a certain factor Does effect the whole sample Pushes scores consistently either up or down 27

Quality of Measurement - Reliability - Definition and Types of reliability DEFINITION AND TYPES Reliability Repeatability or consistency of a measure Var(T)/Var(X) We cannot directly calculate reliability because we cannot measure the true score component We can estimate the true score component as the covariance or the correlation between two observations of the same measure Cov(X1,X2) It ranges between 0 and 1 (0 r 1) A measure is perfectly reliable (r=1) when the random error equals 0 28

Quality of Measurement - Reliability - Definition and Types of reliability RELIABILITY TYPES Inter-Rater or Inter- Observer Reliability Degree to which different raters/observers give consistent estimates of the same phenomenon Used to test how similarly people categorize and score items Test-Retest Reliability Consistency of a measure from one time to another Good tests have less retest variation over time Parallel-Forms Reliability Consistency of two tests constructed in the same way from the same content domain Evaluates different questions that seek to assess the same construct Consistency of results across items within a test Evaluates how consistent the results are for different items for the same construct within a measure Internal Consistency Reliability 29

Quality of Measurement - Reliability - Assessment of reliability INTERNAL CONSISTENCY RELIABILITY Consistency of results across items within a test Different approches: Split-Half Reliability Average Interitem Correlation Average Item-Total Correlation Cronbach's Alpha ( 30

Quality of Measurement - Reliability - Assessment of reliability SPLIT-HALF RELIABILITY Randomly divides items that measure the same construct into two sets Entire instrument is administered to a sample of people and the total scores for each half is calculated Then the correlation between the two total scores is calculated split half reliability 31

Quality of Measurement - Reliability - Assessment of reliability AVERAGE INTER-ITEM CORRELATION Compares correlations between all pairs of questions that measure the same construct by calculating the mean of all paired correlations Average Item-total correlation Takes the average interitem correlations and calculates a total score for the 6 items (seventh variable), then averages these. 32

Quality of Measurement - Reliability - Assessment of reliability CRONBACH'S ALPHA Calculates an equivalent to the average of all possible split-half correlations; but you re not calculating it that way Computation is quicker; the most used estimate 33

Quality of Measurement - Reliability Levels of measurement LEVELS OF MEASUREMENT Helps in deciding how to interpret data from a variable Defines the kind of statistical analysis Data is classified with different levels of precision or levels of measurement The higher the better 35

Quality of Measurement - Reliability Levels of measurement LEVELS OF MEASUREMENT Nominal Attributes are just named No ordering of the attributes (categories) is implied Example: jersey numbers in basketball Ordinal Attributes can be rank-ordered Distances between attributes do not have any meaning Example: grades at school Interval The distance between attributes does have a meaning Example: temperature (in Fahrenheit) Ratio There is always an absolute zero that is meaningful You can construct a meaningful fraction (or ratio) Example: weight 36

Quality of Measurement - Reliability Levels of measurement VALIDITY AND RELIABILITY Validity: Quality of Operationalization Reliability: Quality of Measurement 37

38 EXERCISES

EXERCISE 1. Gender 2. Hair colour 3. Puls rate (in bpm) 4. Body temperatur 5. Team number 6. Shoe size 39

EXERCISE 40

HOW DO I REMEMBER ALL THIS? IT S EASY! The 5 C s of Validity The DPEFT Validity NoNeMuMuPaMaSEM Construct Validity TIROPI Reliability SACA Reliability Assessment NOIR Levels of Measurement 41

THE 5 C S OF VALIDITY Concurrent validity Construct validity Content validity Convergent validity Criterion-related validity 42

THE DPEFT VALIDITY Discriminant validity Predictive validity External validity Face validity Translation validity 43

NONEMUMUPAMASEM CONSTRUCT VALIDITY ASSESSMENT TOOLS o o o o The Nomological Network The Multitrait-Multimethod matrix (MTMM) Pattern Matching Structural Equation Modeling (SEM) 44

TIROPI RELIABILITY Test-Retest Reliability Inter-Rater or Inter-Observer Reliability Parallel-Forms Reliability Internal Consistency Reliability 45

SACA RELIABILITY ASSESSMENT Split-half correlation Average interitem correlation Cronbach's alpha Average item-total correlation 46

NOIR LEVELS OF MEASUREMENT Nominal Ordinal Interval Ratio 47