Validity and reliability of measurements

Similar documents
Validity and reliability of measurements

Reliability and Validity checks S-005

DATA is derived either through. Self-Report Observation Measurement

Importance of Good Measurement

PTHP 7101 Research 1 Chapter Assignments

Missing Data and Imputation

11-3. Learning Objectives

10 Intraclass Correlations under the Mixed Factorial Design

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Advanced Handling of Missing Data


Research Questions and Survey Development

Overview of Experimentation

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

ADMS Sampling Technique and Survey Studies

CHAPTER III RESEARCH METHODOLOGY

Appendix 3: Definition of the types of missingness and terminology used by each paper to

Help! Statistics! Missing data. An introduction

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory

DATA GATHERING METHOD

Introduction to Reliability

Week 10 Hour 1. Shapiro-Wilks Test (from last time) Cross-Validation. Week 10 Hour 2 Missing Data. Stat 302 Notes. Week 10, Hour 2, Page 1 / 32

Assessing the efficacy of mobile phone interventions using randomised controlled trials: issues and their solutions. Dr Emma Beard

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Manuscript Presentation: Writing up APIM Results

CHAPTER VI RESEARCH METHODOLOGY

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Missing Data and Institutional Research

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

1. Evaluate the methodological quality of a study with the COSMIN checklist

Measurement Error 2: Scale Construction (Very Brief Overview) Page 1

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Module 14: Missing Data Concepts

Tilburg University. Published in: Methodology: European Journal of Research Methods for the Behavioral and Social Sciences

Practical Statistical Reasoning in Clinical Trials

02a: Test-Retest and Parallel Forms Reliability

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Homework #2 is due next Friday at 5pm.

M2. Positivist Methods

Section on Survey Research Methods JSM 2009

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Adaptation and evaluation of early intervention for older people in Iranian.

Best Practice in Handling Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Measurement. 500 Research Methods Mike Kroelinger

How Do We Assess Students in the Interpreting Examinations?

Sample Exam Questions Psychology 3201 Exam 1

Basic concepts and principles of classical test theory

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

The prevention and handling of the missing data

What are Indexes and Scales

By Hui Bian Office for Faculty Excellence

Multiple Imputation For Missing Data: What Is It And How Can I Use It?

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

Estimating Individual Rater Reliabilities John E. Overall and Kevin N. Magee University of Texas Medical School

DESIGN TYPE AND LEVEL OF EVIDENCE: Randomized controlled trial, Level I

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Statistics for Psychosocial Research Session 1: September 1 Bill

MISSING DATA IN FAMILY RESEARCH: EXAMINING DIFFERENT LEVELS OF MISSINGNESS

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

INFLUENCING FLU VACCINATION BEHAVIOR: Identifying Drivers & Evaluating Campaigns for Future Promotion Planning

Statistical data preparation: management of missing values and outliers

Conducting Survey Research. John C. Ricketts

Lessons in biostatistics

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Agreement Coefficients and Statistical Inference

Maltreatment Reliability Statistics last updated 11/22/05

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

EPIDEMIOLOGY. Training module

Analysis of TB prevalence surveys

Research Methods in Human Computer Interaction by J. Lazar, J.H. Feng and H. Hochheiser (2010)

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Reliability AND Validity. Fact checking your instrument

Introduction On Assessing Agreement With Continuous Measurement

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Maintenance of weight loss and behaviour. dietary intervention: 1 year follow up

Designing and Analyzing RCTs. David L. Streiner, Ph.D.

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

The Bengali Adaptation of Edinburgh Postnatal Depression Scale

English 10 Writing Assessment Results and Analysis

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

MORECare: A framework for conducting research in palliative care

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Asch Model Answers. Aims and Context

how good is the Instrument? Dr Dean McKenzie

OBSERVATION METHODS: EXPERIMENTS

The Concept of Validity

The Concept of Validity

Internal structure evidence of validity

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist

What to do with missing data in clinical registry analysis?

Transcription:

Validity and reliability of measurements

2

Validity and reliability of measurements

4

5 Components in a dataset

Why bother (examples from research) What is reliability? What is validity? How should I treat missing values? I cannot find an instrument: should I develop my own, translate, adapt? 6

Random error The difference, due to chance alone, of an observation on a sample from the true population value Leading to lack of precision in the measurement (of association) Three major sources Individual biological variation Sampling variation Measurement variation

Random error.. Random error can never be completely eliminated since we can study only a sample of the population. Random error can be reduced by the careful measurement Sampling error can be reduced by optimal sampling procedures and by increasing sample size

9 Why bother

Examples Unreliable histological score the score is not measuring what we were planning to measure. wrong conclusion about effectiveness of treatment? Imprecise or Inaccurate protein measurement too big variation to be able to draw conclusion too low/ high detection in our samples to be able to compare differences. 10

Examples Unreliable weighing scale Is my intervention that aimed at weight loss really unsuccessful, or was the weighing scale unreliable? Imprecise heart rate monitor Did the medication really change heart rate? Inaccurate observation Did I really observe an improvement in scarring? 11 Instable depression questionnaire Did these patients really get worse after counselling?

12 What is reliability?

13 Components in a dataset

Reliable measures (instruments) maximize the true score component and minimize error Terms: Stability Consistency Accuracy Dependability 14

Reliability Reliability refers to the degree of consistency or accuracy with which it measures the target attribute.

The idea behind reliability is that any significant results must be more than a oneoff finding and be repeatable. Other people must be able to perform exactly the same test, under the same conditions and generate the same results

Reliability Stability Internal consistency Equivalence 17

Test- retest (stability) Can be used in Observational Physiologic Questionnaires

19

20

Questions with stability What is the best time frame between test-retest How many people should do the test-retest How to ask /prepare people EPN? Same sample /new sample? 21

Internal consistency Ensure that each part of the test generates similar results, and that each part of a test measures the correct construct. For example, a test of anxiety should measure anxiety only, and every single question must also contribute. If people s responses to Item 1 are completely unrelated to their responses to Item 2, then it does not seem reasonable to think that these two items are both measuring anxiety so they probably should not be on the same test

Cronbach s alpha

Or in plain english The most common way to assessing and reporting internal consistency for multi item scales A function of two things: The average inter-item correlation The number of items A common interpretation of the α coefficient <0.7 Unsatisfactory 0.7 Acceptable 0.8 Good 0.9 Excellent

Internal consistency Item-item correlation Item-total correlation (corrected) Split-half

Internal consistency

Internal consistency

Reliability Stability Internal consistency Equivalence : the degree to which two or more independent observers or coders agree about scoring - interrater and intrarater 30

Inter-Rater or Inter-Observer Reliability SIMPLE Number of agreement ---------------------------- Number of agreement + disagreement Cohens Kappa Adjusts for chance agreement

32

33 Example

Intra Class Correlation (ICC) the assessment of consistency or reproducibility of quantitative measurements made by different observers measuring the same quantity Since the intraclass correlation coefficient gives a composite of intra-observer and inter-observer variability Example: How correct can ER doctors predict the severity of disease based on echo results: how consistent the scores are to each other? 34

36 3. What is validity?

Validity Test validity is the degree to which an instrument measures what it is supposed to measure

Does this instrument measure what I want to measure? How is self care Not valid. To measure self care

Does this instrument measure what I want to measure? What is her temperature? Not valid. To measure temperature

Does this instrument measure what I want to measure? What is the digoxin level in the blood? Not valid. To measure digoxin level

41 Does this histological count really reflect recovery

Experimental examples of Validity Adapting a new methodology for a new set of data or samples Golden standard? Species dependent? 42

Validity Content validity Criterion-Related Validity Construct Validity

Content validity How much a measure represents every single element of a construct Face validity

Based on judgement Content validity Expert panels of users or Patient interviews

Criterion-related validity The degree to which scores on an instrument are correlated with some external criterion

For example Compare to other criteria - Measures on a newly developed personality test Compare to - Behavioral criterion on which psychologists agree

Construct validity The degree to which an instrument measures the construct under investigation Constructs are abstractions that are deliberately created by researchers in order to conceptualize (e.g. happiness, quality of life, infection)

Construct validity Known groups technique Factor analysis

KNOWN GROUPS

Factor analyses Based on classical test theory (CTT) /traditional test theory (TTT) Are used to explore or confirm dimensionality in data Different forms of factor analyses Principal component analysis Exploratory factor analysis Confirmatory factor analysis

HADS

What would possibly effect VALIDITY in YOUR research? (What would possibly effect validity in research you heard about?) 55

1. Why bother (examples from research) 2. What is reliability? 3. What is validity? 4. How should I treat missing values? 5. I cannot find an instrument: should I develop my own, translate, adapt? 56

4. How should I treat missing values? Reasons for missing data Participants can fail to respond to questions Equipment malfunction Power failure Sample loss Data collecting or recording mechanisms can malfunction Subjects can withdraw from studies before they are completed Incomplete registry data Data entry errors 57

Attrition due to social/natural processes Example: Accident of the patients, snowfall during datacollection, Patients I not able to drive to the clinic, death Power failure in the lab, sick lab technician Skip pattern in survey Example: Certain questions only asked to respondents who indicate they have symptoms (see SCFHI) 58

Attrition due to social/natural processes Example: accident, not able to drive to the clinic, death Skip pattern in survey Example: Certain questions only asked to respondents who indicate they have symptoms (see SCFHI) Random data collection issues Respondent refusal/non-response Is it the same patients who misses all Are there certain item? (e.g. MLwHFQ) 59

60

Missing Data Mechanisms Missing Completely at Random (MCAR)(uniform non-response) Missing at Random (MAR) Missing not at Random (NMAR) non-ignorable 61

Missing Data Mechanisms Missing Completely at Random (MCAR)(uniform non-response) Data missing are independent both of observable variables and of unobservable parameters of interest Example: postal questionnaire is missing, sample is dropped 62

Missing Data Mechanisms Missing Completely at Random (MCAR)(uniform non-response) Missing at Random (MAR) Missingness is related to a particular variable, but it is not related to the value of the variable that has missing data Example: male participants are more likely to refuse to fill out the depression survey, but it does not depend on the level of their depression. Example: technical error of a machine or handling of sample during measurements 63

Missing Data Mechanisms Missing Completely at Random (MCAR)(uniform non-response) Missing at Random (MAR) Missing not at Random (NMAR) non-ignorable The probability of a missing value depends on the variable that is missing Example: Respondents with high income less likely to report income a particular treatment causes discomfort, a patient is more likely to drop out of the study. This missingness is not at random 64

4. How to treat missing values/data? Deletion Methods Listwise deletion, pairwise deletion Imputation? Methods Mean/mode substitution, dummy variable method, single regression Model-Based Methods Maximum Likelihood, Multiple imputation 65

Listwise deletion Only analyzes cases with data on each variable Simple + comparability in all analysis But Losing power Losing information Bias if data are not MCAR 66

Pairwise deletion Analysis of all cases in wich the variables of interest are present Keeps as many cases as possible Keeps many information But Different n per analysis 67

IMPUTATION Rather than removing variables or observations with missing data, another approach is to fill in or impute missing values. Examples Last value carried forward Mean imputation Imputation based on logical rules Using regression predictions to perform deterministic imputation 68

69

5. I cannot find an instrument: should I develop my own, translate, adapt? Discuss Advantages Disadvantages Consequences 70

Developing a new questionnaire 71 Phase 1 Decision on theoretical framework Phase 2 Item development Systematic review or qualitative analysis of the literature Qualitative interviews of patients, physician, nurses Phase 3 Pilot testing Initial validity and reliability assessment Reduction or revision of items or scale structure Scoring Phase 4 Further field testing, more advanced validity and reliability estimation

Translating

Translation = translation? Translation = cultural? Translation = validation? Culture and language

Concerns in translation Use of expressions E.g. Thirst scale: feels like cotton in the mouth 74

75

76 Can I add /delete, change procedures?

78

79 Changing scoring method?

Adding items / changing an instrument Yes you can do that Consider revised version Validity and reliability Legal rights / copy rights Consequences: Comparability? Reliability, validity 80

81

82

Every line is the perfect length if you don't measure it. Marty Rubin 83