Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Size: px
Start display at page:

Download "Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)"

Transcription

1 Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive (TP) False positive (FP) Gold/reference standard indicates the patient does not have the False negative (FN) True negative (TN) Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

2 Page 1 of 1 Table 15: Results recommended for the evaluation of a diagnostic test validity using an independent blind study with gold/reference standard comparison Statistic Description Desired Values Sensitivity Specificity Positive predictive value Negative predictive value Accuracy The ratio of true positive results from the diagnostic test evaluated to the total number of positive tests obtained with the gold/reference standard (TP/[TP+FP] using Figure 1)(DCEB, 1981b). Numerically, this is expressed as a value between 0 and 1. This is an index of the tests ability to detect the (DCEB, 1981b; Lalkhen and McCluskey, 2008; Altman and Bland, 1994a). The ratio of true negative results from the diagnostic test evaluated to the total number of negative tests obtained with the gold/reference standard (TN/[FN+TN] using Figure 1) (DCEB, 1981b). Numerically, this is expressed as a value between 0 and 1. This is an index of the tests ability to detect the absence of (DCEB, 1981b; Lalkhen and McCluskey, 2008; Altman and Bland, 1994a). The ratio of true positive tests to the total positive tests obtained with the diagnostic test examined (TP/[TP+FN] using Figure 1) (DCEB, 1981b). Numerically, this is expressed as a value between 0 and 1. Indicates the number of positive tests that correctly diagnosed the presence of a (DCEB, 1981b; Lalkhen and McCluskey, 2008). This value can also be calculated from the sensitivity and specificity of the test, and the prevalence of a in the population being tested ([sensitivity x prevalence]/[sensitivity * prevalence + (1 specificity) * (1 prevalence)]) (Altman and Bland, 1994b). The ration of true negative tests to the total negative tests obtained with the diagnostic test examined (TN/[FP+TN] using Figure 1) (DCEB, 1981b). Numerically, this is expressed as a value between 0 and 1. Indicates the number of negative tests that correctly diagnosed the absence of a (DCEB, 1981b; Lalkhen and McCluskey, 2008). This value can also be calculated from the sensitivity and specificity of the test, and the prevalence of a in the population being tested ([specificity * (1-prevalence)]/ [(1-sensitivity) * prevalence + specificity) * (1 prevalence)]) (Altman and Bland, 1994b). The ratio of the total number of true positive and true negative tests from the diagnostic test evaluated to the total number of tests conducted ([TP+TN]/n using Figure 1) (DCEB, 1981b). Numerically, this is expressed as a value between 0 and 1. It is the overall rate of agreement between the diagnostic test examined and the gold/reference standard (DCEB, 1981b; Bossuyt et al., 2003). greater sensitivity greater specificity greater capability of correctly diagnosing the presence of a greater capability of correctly diagnosing the absence of a greater accuracy of the diagnostic test

3 Page 1 of 3 Table 16: Results used for the evaluation of diagnostic test reliability Statistic Description Desired Values Test-Retest Reliability Pearson's r-correlation coefficient The ratio of the sum of the product of test administrator results to the square root of the product of the sum of squares results for each test administrator results. This provides a measure of the strength of the linear relationship between the results obtained by the various test administrators (Osborn, 2006; Blaisdell, 1998). Values range from -1 to 1 (Blaisdell, 1998). An r-value of 1 or -1 indicates a perfect linear relationship (Blaisdell, 1998). An r-value of 0 indicates no linear relationship (Blaisdell, 1998). The ratio of the variance between the results obtained on different occasions for a single subject and the total variance in all the results collected for that subject (Streiner and Norman, 2008; Cleophas, Zwinderman and Cleophas, 2006). Numerically, this is expressed as a value between 0 and 1, with 1 being perfect reliability/ reproducibility and 0 being no reliability/ reproducibility (Streiner and Norman, 2008; Cleophas, Zwinderman and Cleophas, 2005). Also known as test-retest stability (Streiner and Norman, 2008), proportion of variance (Cleophas, Zwinderman and Cleophas, 2006), or correlation ratio (Cleophas, Zwinderman and Cleophas, 2006). It is reasonable to demand measures greater than 0.5 (Streiner and Norman, 2008). Kuder Richardson-20 α test An index of the homogeneity of measurements for a given set of results, used to assess the internal consistency reliability of a measurement instrument (Thompson, 2010; Ramsey et al., 1991). Values range from 0 to 1, with a value of 1 representing perfect internal consistency (Thompson, 2010). The squared Kuder Richardson-20 α value represents the proportion of score variance not resulting from error (Thompson, 2010). A KR-20 value >0.7 is acceptable. Tests that use 50 or more items in their assessment should accept values >0.8 (Thompson, 2010). KR-20 2 values <0.7 indicate that the majority of score variance results from error (Thompson, 2010).

4 Page 2 of 3 Inter-Rater Reliability The percentage of tests with which the test administrators obtained the same results Similar to the test-retest description except variance among the results of different test administrators on a single subject is used instead of the variability between several tests administered on different occasions (Streiner and Norman, 2008; Gulliford, 2005; Tzannes and Murrell 2002; Cleophas, Zwinderman and Cleophas, 2006). Cadogan et al. (2011) indicated that a percent agreement greater that 80% is required for a test to be considered appropriate for inclusion in a clinical examination. Same as test-retest description, with the addition of; Tzannes and Murrell (2002) determined an intra-class coefficient of to be reasonable inter-rater reliability. Tzannes et al. (2004) determined a good intraclass coefficient to be >0.65, and <0.31 to be poor inter-rater reliability. Cohen s kappa The ratio of the sum of observed agreements minus the sum of expected agreements to the total number of observations minus the sum of expected agreements (Sheskin, 2004). Calculations indicate the degree of agreement between the results obtained by different (or the same) test administers after the random chance of observers agreeing is corrected for (Sheskin, 2004). Calculations are based on results organized into contingency tables specific to the number of test administers and outcomes measured (Sheskin, 2004). Landis and Koch (1977) suggest that Kappa values of indicate slight agreement; fair; moderate; substantial; and indicate almost perfect agreement. Walsworth et al. (2008) indicated that a kappa value greater than 0.8 indicates strong agreement. Cadogan et al. (2011) indicated that a kappa value greater that 0.6 is required for a test to be considered appropriate for inclusion in a clinical examination. Coefficient of interobserver variability The ratio of inter-rater variability to the total observer related variability (Haber et al., 2005). Total observer related variability is the sum of intra- and inter-rater variability (Haber et al., 2005). Inter-rater variability is the variance in replicated measurements made on the same subject with all methods by all observers (Barnhart, Song and Haber, 2005; Haber et al., 2005). A higher Coefficient of Inter-Observer Variability value indicates a lower level of inter-rater agreement (Haber et al., 2005).

5 Page 3 of 3 Coefficient of interobserver agreement 1 minus the coefficient of inter-observer variability (Haber et al., 2005). A higher coefficient of inter-observer agreement value indicates a greater level of inter-rater agreement (Haber et al., 2005). See above Intra-Rater Reliability Similar to the test-retest description except variability among the results of a single test administrator is used instead of the variability between several tests administered on different occasions (Streiner and Norman, 2008; Cleophas, Zwinderman and Cleophas, 2006) Same as test-retest description Cohen s kappa Similar to the inter-rater description except variability among the results of a single test administrator is used instead of the variability between test administrators. Same as inter-rater description Intra-observer variability Intra-observer variability is the variance in replicated measurements made on the same subject with the same method by the same observer (Barnhart, Song and Haber, 2005; Haber et al., 2005). See above

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson

More information

Screening (Diagnostic Tests) Shaker Salarilak

Screening (Diagnostic Tests) Shaker Salarilak Screening (Diagnostic Tests) Shaker Salarilak Outline Screening basics Evaluation of screening programs Where we are? Definition of screening? Whether it is always beneficial? Types of bias in screening?

More information

Chapter 10. Screening for Disease

Chapter 10. Screening for Disease Chapter 10 Screening for Disease 1 Terminology Reliability agreement of ratings/diagnoses, reproducibility Inter-rater reliability agreement between two independent raters Intra-rater reliability agreement

More information

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course David W. Dowdy, MD, PhD Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604 Measurement and Descriptive Statistics Katie Rommel-Esham Education 604 Frequency Distributions Frequency table # grad courses taken f 3 or fewer 5 4-6 3 7-9 2 10 or more 4 Pictorial Representations Frequency

More information

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE Questionnaire design OUTLINE Questionnaire design tests Reliability Validity POINTS TO CONSIDER Identify your research objectives. Identify your population or study sample Decide how to collect the information

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors affecting reliability ON DEFINING RELIABILITY Non-technical

More information

Evaluation of a clinical test. I: Assessment of reliability

Evaluation of a clinical test. I: Assessment of reliability British Journal of Obstetrics and Gynaecology June 2001, Vol. 108, pp. 562±567 COMMENTARY Evaluation of a clinical test. I: Assessment of reliability Introduction Testing and screening are critical parts

More information

L ecografia cerebrale: accuratezza diagnostica Dr Patrizio Prati Neurologia CIDIMU Torino

L ecografia cerebrale: accuratezza diagnostica Dr Patrizio Prati Neurologia CIDIMU Torino L ecografia cerebrale: accuratezza diagnostica Dr Patrizio Prati Neurologia CIDIMU Torino Ecografia cerebrale: l accuratezza diagnostica. Lo studio NOBIS Dr Patrizio Prati Neurologia CIDIMU Torinorin Normal

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

reproducibility of the interpretation of hysterosalpingography pathology

reproducibility of the interpretation of hysterosalpingography pathology Human Reproduction vol.11 no.6 pp. 124-128, 1996 Reproducibility of the interpretation of hysterosalpingography in the diagnosis of tubal pathology Ben WJ.Mol 1 ' 2 ' 3, Patricia Swart 2, Patrick M-M-Bossuyt

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

Examining Inter-Rater Reliability of a CMH Needs Assessment measure in Ontario

Examining Inter-Rater Reliability of a CMH Needs Assessment measure in Ontario Examining Inter-Rater Reliability of a CH Needs Assessment measure in Ontario CAHSPR, Halifax, ay 2011 Team: Janet Durbin, Elizabeth Lin, Carolyn Dewa, Brenda Finlayson, Stephen Gallant, April Collins

More information

Relationship Between Intraclass Correlation and Percent Rater Agreement

Relationship Between Intraclass Correlation and Percent Rater Agreement Relationship Between Intraclass Correlation and Percent Rater Agreement When raters are involved in scoring procedures, inter-rater reliability (IRR) measures are used to establish the reliability of measures.

More information

The recommended method for diagnosing sleep

The recommended method for diagnosing sleep reviews Measuring Agreement Between Diagnostic Devices* W. Ward Flemons, MD; and Michael R. Littner, MD, FCCP There is growing interest in using portable monitoring for investigating patients with suspected

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen.

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen. Adapted from Fertil Steril 2007;87:373-80 Intraobserver and interobserver reliability of videotaped laparoscopy evaluations for endometriosis and adhesions 2 Philomeen Weijenborg, Moniek ter Kuile and

More information

Introduction On Assessing Agreement With Continuous Measurement

Introduction On Assessing Agreement With Continuous Measurement Introduction On Assessing Agreement With Continuous Measurement Huiman X. Barnhart, Michael Haber, Lawrence I. Lin 1 Introduction In social, behavioral, physical, biological and medical sciences, reliable

More information

Introduction to Reliability

Introduction to Reliability Reliability Thought Questions: How does/will reliability affect what you do/will do in your future job? Which method of reliability analysis do you find most confusing? Introduction to Reliability What

More information

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study DATAWorks 2018 - March 21, 2018 Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study Christopher Drake Lead Statistician, Small Caliber Munitions QE&SA Statistical

More information

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests : Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a good item Item Analysis Internal Reliability Spearman-Brown Prophesy Formla -- α & # items

More information

S4. Summary of the GALNS assay validation. Intra-assay variation (within-run precision)

S4. Summary of the GALNS assay validation. Intra-assay variation (within-run precision) S4. Summary of the GALNS assay validation (i.) Intra-assay variation (within-run precision) Intra-assay variation was determined by measuring standard blood samples (low activity standard; medium activity

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

Statistics for Psychosocial Research Session 1: September 1 Bill

Statistics for Psychosocial Research Session 1: September 1 Bill Statistics for Psychosocial Research Session 1: September 1 Bill Introduction to Staff Purpose of the Course Administration Introduction to Test Theory Statistics for Psychosocial Research Overview: a)

More information

Statistical Validation of the Grand Rapids Arch Collapse Classification

Statistical Validation of the Grand Rapids Arch Collapse Classification Statistical Validation of the Grand Rapids Arch Collapse Classification David Burkard, BS Michelle Padley, CRTM John Anderson, MD Donald Bohay, MD John Maskill, MD Daniel Patton, MD Orthopaedic Associates

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Various performance measures in Binary classification An Overview of ROC study

Various performance measures in Binary classification An Overview of ROC study Various performance measures in Binary classification An Overview of ROC study Suresh Babu. Nellore Department of Statistics, S.V. University, Tirupati, India E-mail: sureshbabu.nellore@gmail.com Abstract

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD

Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD 64 ACADEMC EMERGENCY hledlclne JAN 1997 VOL?/NO 1 SPECAL CONTRBUTONS... Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD For any testing instrument

More information

BMI 541/699 Lecture 16

BMI 541/699 Lecture 16 BMI 541/699 Lecture 16 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Proportions & contingency tables -

More information

Probability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012

Probability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012 Probability Revision MED INF 406 Assignment 5 Golkonda, Jyothi 11/4/2012 Problem Statement Assume that the incidence for Lyme disease in the state of Connecticut is 78 cases per 100,000. A diagnostic test

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

The cross sectional study design. Population and pre-test. Probability (participants). Index test. Target condition. Reference Standard

The cross sectional study design. Population and pre-test. Probability (participants). Index test. Target condition. Reference Standard The cross sectional study design. and pretest. Probability (participants). Index test. Target condition. Reference Standard Mirella Fraquelli U.O. Gastroenterologia 2 Fondazione IRCCS Cà Granda Ospedale

More information

Statistics, Probability and Diagnostic Medicine

Statistics, Probability and Diagnostic Medicine Statistics, Probability and Diagnostic Medicine Jennifer Le-Rademacher, PhD Sponsored by the Clinical and Translational Science Institute (CTSI) and the Department of Population Health / Division of Biostatistics

More information

The Assessment of Physiotherapy Practice (APP) is a reliable measure of professional competence of physiotherapy students: a reliability study

The Assessment of Physiotherapy Practice (APP) is a reliable measure of professional competence of physiotherapy students: a reliability study Dalton et al: Reliability of the Assessment of Physiotherapy Practice The Assessment of Physiotherapy Practice (APP) is a reliable measure of professional competence of physiotherapy students: a reliability

More information

Maltreatment Reliability Statistics last updated 11/22/05

Maltreatment Reliability Statistics last updated 11/22/05 Maltreatment Reliability Statistics last updated 11/22/05 Historical Information In July 2004, the Coordinating Center (CORE) / Collaborating Studies Coordinating Center (CSCC) identified a protocol to

More information

Measurement and Reliability: Statistical Thinking Considerations

Measurement and Reliability: Statistical Thinking Considerations VOL 7, NO., 99 Measurement and Reliability: Statistical Thinking Considerations 8 by John J. Bartko Abstract Reliability is defined as the degree to which multiple assessments of a subject agree (reproducibility).

More information

Clinical biostatistics: Assessing agreement and diagnostic test evaluation

Clinical biostatistics: Assessing agreement and diagnostic test evaluation 1/66 Clinical biostatistics: Assessing agreement and diagnostic test evaluation Dr Cameron Hurst cphurst@gmail.com DAMASAC and CEU, Khon Kaen University 26 th September, 2557 2/66 What we will cover...

More information

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up Lab 4: Alpha and Kappa Today s Activities Consider Alpha Consider Kappa Homework and Media Write-Up Reliability Reliability refers to consistency Types of reliability estimates Test-retest reliability

More information

Interpreting the Item Analysis Score Report Statistical Information

Interpreting the Item Analysis Score Report Statistical Information Interpreting the Item Analysis Score Report Statistical Information This guide will provide information that will help you interpret the statistical information relating to the Item Analysis Report generated

More information

Issues in assessing the validity of nutrient data obtained from a food-frequency questionnaire: folate and vitamin B 12 examples

Issues in assessing the validity of nutrient data obtained from a food-frequency questionnaire: folate and vitamin B 12 examples Public Health Nutrition: 7(6), 751 756 DOI: 10.1079/PHN2004604 Issues in assessing the validity of nutrient data obtained from a food-frequency questionnaire: folate and vitamin B 12 examples Victoria

More information

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including Week 17 and 21 Comparing two assays and Measurement of Uncertainty 2.4.1.4. Explain tools used to compare the performance of two assays, including 2.4.1.4.1. Linear regression 2.4.1.4.2. Bland-Altman plots

More information

Interrater and Intrarater Reliability of the Assisting Hand Assessment

Interrater and Intrarater Reliability of the Assisting Hand Assessment Interrater and Intrarater Reliability of the Assisting Hand Assessment Marie Holmefur, Lena Krumlinde-Sundholm, Ann-Christin Eliasson KEY WORDS hand pediatric reliability OBJECTIVE. The aim of this study

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Saville Consulting Wave Professional Styles Handbook

Saville Consulting Wave Professional Styles Handbook Saville Consulting Wave Professional Styles Handbook PART 4: TECHNICAL Chapter 19: Reliability This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed

More information

EPIDEMIOLOGY. Training module

EPIDEMIOLOGY. Training module 1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

Psychometric Evaluation of Self-Report Questionnaires - the development of a checklist

Psychometric Evaluation of Self-Report Questionnaires - the development of a checklist pr O C e 49C c, 1-- Le_ 5 e. _ P. ibt_166' (A,-) e-r e.),s IsONI 53 5-6b

More information

Preliminary Reliability and Validity Report

Preliminary Reliability and Validity Report Preliminary Reliability and Validity Report Breckenridge Type Indicator (BTI ) Prepared For: Breckenridge Institute PO Box 7950 Boulder, CO 80306 1-800-303-2554 info@breckenridgeinstitute.com www.breckenridgeinstitute.com

More information

Demonstration of active Side Shift Type1(Mirror Image ) in Right (Major) Thoracic curve.

Demonstration of active Side Shift Type1(Mirror Image ) in Right (Major) Thoracic curve. Side Shift The Development of a Classification System for the Use of The (modified) Side-Shift Approach to The Conservative Management of Scoliosis Tony Betts Royal National Orthopaedic Hospital Background

More information

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Journal of Mathematics and System Science (01) 8-1 D DAVID PUBLISHING Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Elisabeth Svensson Department of Statistics, Örebro

More information

Repeatability of a questionnaire to assess respiratory

Repeatability of a questionnaire to assess respiratory Journal of Epidemiology and Community Health, 1988, 42, 54-59 Repeatability of a questionnaire to assess respiratory symptoms in smokers CELIA H WITHEY,' CHARLES E PRICE,' ANTHONY V SWAN,' ANNA 0 PAPACOSTA,'

More information

Diagnostic accuracy of the Structured Inventory of Malingered Symptomatology (SIMS) in detecting instructed malingering

Diagnostic accuracy of the Structured Inventory of Malingered Symptomatology (SIMS) in detecting instructed malingering Archives of Clinical Neuropsychology 18 (2003) 145 152 Diagnostic accuracy of the Structured Inventory of Malingered Symptomatology (SIMS) in detecting instructed malingering Harald Merckelbach a,, Glenn

More information

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance) Normal distribution The normal distribution is also known as the Gaussian distribution or 'bell-shaped' distribution. It describes the spread of many biological and clinical measurements Properties of

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics : the kappa statistic Mary L. McHugh Department of Nursing, National University, Aero Court, San Diego, California Corresponding author: mchugh8688@gmail.com Abstract The kappa

More information

how good is the Instrument? Dr Dean McKenzie

how good is the Instrument? Dr Dean McKenzie how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize

More information

Ryan Mattek, PhD Letitia Johnson PhD. SRA-FV: Evidence of Inter-rater Reliability in a Combined SOMMI Sample

Ryan Mattek, PhD Letitia Johnson PhD. SRA-FV: Evidence of Inter-rater Reliability in a Combined SOMMI Sample Ryan Mattek, PhD Letitia Johnson PhD SRA-FV: Evidence of Inter-rater Reliability in a Combined SOMMI Sample Declarations We have no financial interests to declare Goals/Objectives 1. Participants will

More information

Study protocol v. 1.0 Systematic review of the Sequential Organ Failure Assessment score as a surrogate endpoint in randomized controlled trials

Study protocol v. 1.0 Systematic review of the Sequential Organ Failure Assessment score as a surrogate endpoint in randomized controlled trials Study protocol v. 1.0 Systematic review of the Sequential Organ Failure Assessment score as a surrogate endpoint in randomized controlled trials Harm Jan de Grooth, Jean Jacques Parienti, [to be determined],

More information

Week 2 Video 2. Diagnostic Metrics, Part 1

Week 2 Video 2. Diagnostic Metrics, Part 1 Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today we ll focus on metrics for classifiers Later this week we ll discuss metrics for regressors And metrics for other methods

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen Evaluating Quality in Creative Systems Graeme Ritchie University of Aberdeen Graeme Ritchie {2007} Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17 {1}, pp.67-99.

More information

Reliability Analysis: Its Application in Clinical Practice

Reliability Analysis: Its Application in Clinical Practice Reliability Analysis: Its Application in Clinical Practice NahathaiWongpakaran Department of Psychiatry, Faculty of Medicine Chiang Mai University, Thailand TinakonWongpakaran Department of Psychiatry,

More information

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record Al-Janabi, H., Flynn, T. N., Peters, T. J., Bryan, S., & Coast, J. (2015). Testretest reliability of capability measurement in the UK general population. Health Economics, 24(5), 625-30..3100 Publisher's

More information

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method Chapter -6 Reliability and Validity of the test 6.1 Introduction 6.2 Reliability of the test 6.2.1 Test - Retest Method 6.2.2 Rational Equivalence Method 6.2.3 Split-Half Method 6.3 Validity of the test

More information

Any phenomenon we decide to measure in psychology, whether it is

Any phenomenon we decide to measure in psychology, whether it is 05-Shultz.qxd 6/4/2004 6:01 PM Page 69 Module 5 Classical True Score Theory and Reliability Any phenomenon we decide to measure in psychology, whether it is a physical or mental characteristic, will inevitably

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients Eric A. Youngstrom, Ph.D., Ahmad Hameed, M.D., Michael Mitchell,

More information

Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J.

Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J. UvA-DARE (Digital Academic Repository) Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J. Link to publication Citation for published version (APA): Mol,

More information

Answer Key to Problem Set #1

Answer Key to Problem Set #1 Answer Key to Problem Set #1 Two notes: q#4e: Please disregard q#5e: The frequency tables of the total CESD scales of 94, 96 and 98 in question 5e should sum up to 328 observation not 924 (the student

More information

GATE CAT Diagnostic Test Accuracy Studies

GATE CAT Diagnostic Test Accuracy Studies GATE: a Graphic Approach To Evidence based practice updates from previous version in red Critically Appraised Topic (CAT): Applying the 5 steps of Evidence Based Practice Using evidence from Assessed by:

More information

Performance of intraclass correlation coefficient (ICC) as a reliability index under various distributions in scale reliability studies

Performance of intraclass correlation coefficient (ICC) as a reliability index under various distributions in scale reliability studies Received: 6 September 2017 Revised: 23 January 2018 Accepted: 20 March 2018 DOI: 10.1002/sim.7679 RESEARCH ARTICLE Performance of intraclass correlation coefficient (ICC) as a reliability index under various

More information

Victoria YY Xu PGY-3 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong

Victoria YY Xu PGY-3 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong Validity, Reliability, Feasibility and Acceptability of Using the Consultation Letter Rating Scale to Assess Written Communication Competencies Among Geriatric Medicine Postgraduate Trainees Victoria YY

More information

Addressing error in laboratory biomarker studies

Addressing error in laboratory biomarker studies Addressing error in laboratory biomarker studies Elizabeth Selvin, PhD, MPH Associate Professor of Epidemiology and Medicine Co-Director, Biomarkers and Diagnostic Testing Translational Research Community

More information

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding

More information

The Patient and Observer Scar Assessment Scale: A Reliable and Feasible Tool for Scar Evaluation

The Patient and Observer Scar Assessment Scale: A Reliable and Feasible Tool for Scar Evaluation The Patient and Observer Scar Assessment Scale: A Reliable and Feasible Tool for Scar Evaluation Lieneke J. Draaijers, M.D., Fenike R. H. Tempelman, M.D., Yvonne A. M. Botman, M.D., Wim E. Tuinebreijer,

More information

HOW STATISTICS IMPACT PHARMACY PRACTICE?

HOW STATISTICS IMPACT PHARMACY PRACTICE? HOW STATISTICS IMPACT PHARMACY PRACTICE? CPPD at NCCR 13 th June, 2013 Mohamed Izham M.I., PhD Professor in Social & Administrative Pharmacy Learning objective.. At the end of the presentation pharmacists

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Body Weight Behavior

Body Weight Behavior Body Weight Behavior U122015 Scale/Subscale Name: Body Weight Behavior Source: Youth Risk Behavior Survey 2009 Middle School Version Developers: Centers for Disease Control and Prevention (CDC) Year: First

More information

N Utilization of Nursing Research in Advanced Practice, Summer 2008

N Utilization of Nursing Research in Advanced Practice, Summer 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 N 536 - Utilization of Nursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, October 1). Utilization of Nursing

More information

Worksheet for Structured Review of Physical Exam or Diagnostic Test Study

Worksheet for Structured Review of Physical Exam or Diagnostic Test Study Worksheet for Structured Review of Physical Exam or Diagnostic Study Title of Manuscript: Authors of Manuscript: Journal and Citation: Identify and State the Hypothesis Primary Hypothesis: Secondary Hypothesis:

More information

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions Senior Educational Consultant Pearson Disclosures Dr. Scheller is an employee of Pearson, publisher of the CELF-5. No other language

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

Victoria YY Xu PGY-2 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong

Victoria YY Xu PGY-2 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong Validity, Reliability, Feasibility, and Acceptability of Using the Consultation Letter Rating Scale to Assess Written Communication Competencies Among Geriatric Medicine Postgraduate Trainees Victoria

More information

Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time

Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time Teri Ang a, Elaine F Harkness b,c, Anthony J Maxwell b,c,d, Yit Y Lim b,c, Richard

More information

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview Meta-analysis of diagnostic research Karen R Steingart, MD, MPH karenst@uw.edu Chennai, 15 December 2010 Overview Describe key steps in a systematic review/ meta-analysis of diagnostic test accuracy studies

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

loudspeaker. When this occurs, respondents agree, on average, only half the time (as one would expect). Recalculate percentage agreement and for the s

loudspeaker. When this occurs, respondents agree, on average, only half the time (as one would expect). Recalculate percentage agreement and for the s ANNALS OF EMERGENCY MEDICINE JOURNAL CLUB A Consideration of the Measurement and Reporting of Interrater Reliability Answers to the July 2009 Journal Club Questions Frank C. Day, MD, MPH David L. Schriger,

More information

Performance Characteristics of the Daktari CD4 System

Performance Characteristics of the Daktari CD4 System Daktari Diagnostics, Inc. 85 Bolton Street Cambridge, MA 02140 USA Performance Characteristics of the Daktari CD4 System 2 February 2015 Scope The Daktari CD4 system has been evaluated through studies

More information

appstats26.notebook April 17, 2015

appstats26.notebook April 17, 2015 Chapter 26 Comparing Counts Objective: Students will interpret chi square as a test of goodness of fit, homogeneity, and independence. Goodness of Fit A test of whether the distribution of counts in one

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Relationship, Correlation, & Causation DR. MIKE MARRAPODI

Relationship, Correlation, & Causation DR. MIKE MARRAPODI Relationship, Correlation, & Causation DR. MIKE MARRAPODI Topics Relationship Correlation Causation Relationship Definition The way in which two or more people or things are connected, or the state of

More information

Chest x-ray clues to osteoporosis : criteria, correlations, and consistency

Chest x-ray clues to osteoporosis : criteria, correlations, and consistency Yale University EliScholar A Digital Platform for Scholarly Publishing at Yale Yale Medicine Thesis Digital Library School of Medicine 2009 Chest x-ray clues to osteoporosis : criteria, correlations, and

More information

VU Biostatistics and Experimental Design PLA.216

VU Biostatistics and Experimental Design PLA.216 VU Biostatistics and Experimental Design PLA.216 Julia Feichtinger Postdoctoral Researcher Institute of Computational Biotechnology Graz University of Technology Outline for Today About this course Background

More information