4 Diagnostic Tests and Measures of Agreement

Size: px
Start display at page:

Download "4 Diagnostic Tests and Measures of Agreement"

Transcription

1 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure how useful a test is in a given set of circumstances. In practice, of course, we rarely know the true state of the individual and hence we evaluate the test in comparison with some other, more accurate classification. To simplify terminology here, we shall assume that the reference procedure (often called the gold standard ) indicates the true status of the subject. 4.1 Sensitivity and specificity To measure the effectiveness of a test, we need to consider two measures: sensitivity: (Se) the probability that if the disease is present the test is positive specificity: (Sp) the probability that if the disease is absent the test is negative Sensitivity is a measure of how good a test is at correctly identifying those who have the condition. If the test is not sensitive to the condition of interest then we would observe many false negatives. Specificity is a measure of how good a test is at correctly identifying those who do not have the condition. If the test is not specific to the condition of interest then we would observe many false positives. Sometimes the false negative (1 Se) and false positive (1 Sp) rates are given. Setting aside wider issues we can look at a simple measure of the efficiency of a screening test by comparing the prevalence in the whole population with the prevalence in the screen positive group. If the costs of the gold standard are high it may not be economically viable to apply the gold standard to the entire population, but it might be cost effective to apply it to a screen positive group Confidence Intervals Sensitivity and specificity are both estimates, so we can find confidence intervals for them. They are both Binomial proportions; however, since they are often close to 1 using the normal approximation may not always be appropriate. Instead we should use exact methods. 4.2 Tests on a continuous scale When a test result is expressed on a continuous scale, as in most haematological and biochemical tests, it is often convenient to think in terms of a cut off point (orfrequentlybothupperandlower cut off points) beyond which the result will be regarded as being abnormal. This simplifies the test to a binary (positive/negative) result. We need to define a critical value C as the cut-off point beyond which an individual would be referred for further investigation. Clearly the position of C is crucial. For example, suppose the cut-off point is such that most healthy individuals have values less than C and most diseased individuals have values greater than C; diseased individuals less than C are false negatives, missed by the test. Reducing C moves it closer to the mean of the healthy individuals and will reduce the number of false negatives. This improves the sensitivity of the test but at the expense of its 21

2 Table 4: Radiation of pain and diagnosis of gallstones Gallstones Not gallstones Total Pain radiates to shoulder Pain radiates to other site Pain does not radiate Total specificity (and hence the number of false positives). The converse happens if the cut-off point is moved the other way. A receiver operating characteristic (ROC) curve can be useful to examine the trade-off between sensitivity and specificity. We choose a number of different cut-off points, calculate the sensitivity and specificity for each cut-off point and then plot sensitivity against the false positive rate. Tests with ROC curves which go furthest into the top left corner are usually best. The area under the ROC curve estimates the probability that a member of one population chosen at random will have avaluegreaterthanamemberoftheotherpopulation(similartothemann-whitneyutest). It can be useful in comparing different tests. 4.3 Positive and negative predictive value Sensitivity and specificity give only part of the picture. In evaluating a test that might be used for screening purposes, we need a measure of the predictive power ofeitherapositiveoranegative test result. The predictive power of a positive test result, the positive predictive value (PPV) is the proportion of those with positive test results who turn out eventually to have the condition. The predictive value of a negative test result, the negative predictive value (NPV) is the proportion of those with negative test results who eventually turn out not to have the condition. PPV = NPV = Se p Se p +(1 Sp)(1 p) Sp(1 p) Sp(1 p)+(1 Se)p where p is the prevalence Positive and negative evidence When deciding between alternative diagnoses, different items of information contribute more or less weight of evidence for or against particular diagnoses. The presence of guarding in a patient with acute abdominal pain, for example, carries considerable weight in favour of the diagnosis of acute appendicitis. The items of information that help to exclude a diagnosis may, however, be different from those that help to establish it. These considerations may be important when deciding which of several possible subsequent investigations are likely to be helpful. 4.4 Comparing Two Methods There is a general class of problems relating to how one device whichmeasuressomecontinuous variable compares with a second device. The particular problem which occurs most frequently 22

3 in medicine, and will be discussed here, is whether a (usually cheaper) device can satisfactorily substitute for a device which measures with no appreciable error, this is method comparison. An apparently slightly different problem is to do with whether one method which measures with error can be substituted for another method which also measures with error. This has been dubbed the method conversion problem, and it is quite different from the method comparison problem and will not be discussed here. AcommonmistakeistocalculatePearson scorrelationcoefficient on the data, get a result which is very highly statistically significant and hence declare good agreement. However,thisisnotsuitable because the null hypothesis, that the two measurement scales areunrelated,isnotplausible;so showing the results were unlikely to occur by chance under the nullhypothesisofnoagreement is not useful. Rather we need a method which shows how much the results deviate from total agreement. Plot the data as a scatterplot and add the line of equality (y = x). This will give a quick visualisation of the association between the data. Perform a paired sample t-test on the data against the null hypothesis of no difference in the pairs of results. The mean difference is an estimate of the bias; a confidence interval will quantify the extent of the plausible bias, while the p value will show the weight of evidence in favour of a true difference existing. Plot the difference between the methods against the average. This gives an estimate of the size of the bias against the true value. A 95% range, based on the mean and standard deviation of the difference (assuming normality), is often added to the plot; these lines are sometimes called limits of agreement. Pearson s correlation coefficient can be calculated on these data to test the null hypothesis that the difference and mean and unrelated; that is, that the size of bias is unrelated to the true value. 4.5 Measures of Agreement Suppose two observers are asked to rate the same subjects for the presence or absence of a disease. Cohen s kappa coefficient can be used to assess the agreement between the two raters. Rater 2 Rater 1 Present Absent Total Present n 11 n 10 n 1+ Absent n 01 n 00 n 0+ Total n +1 n +0 n ++ Define I o as the observed proportion of agreement and I e as the proportion of expected agreement due to chance: I o = n 11 + n 00 n ++ I e = n +1n 1+ + n +0 n 0+ n 2 ++ Then kappa, κ, is the excess agreement expressed as a fraction of the maximum possible excess: κ = I o I e 1 I e 23

4 If there is complete agreement, κ =1;ifobservedagreementisequaltochance,κ =0;ifobserved agreement is greater than by chance κ>0. An important assumption underlying the use of the kappa coefficient is that errors associated with the two sets of ratings are independent. This requires the subjects to be independent and Rater 1 s ratings to be independent of Rater 2 s. The kappa coefficient, therefore, is not appropriate for a situation in which one observer is required to either confirm or disconfirm a known previous rating from another observer. When margin totals are not the same we may use I max I e as the denominator, where I max is the maximum possible agreement, keeping the margins fixed. Another alternative, using weighted observations, so that it attaches greater emphasis to large differences between ratings than to small differences. 4.6 Measurement Scales Validity Avalidscalemeasureswhatitintendstomeasure. Validitycan be judged in several ways; the scale should look as if it makes sense (face validity); all the itemsshouldberelevantandallaspects of the concept being measured should be included (content validity); the scale should be able to predict outcome (predictive validity); the scale should produce similar results to an established scale measuring a different concept (convergent and divergent validity); finally, a scale should be able to distinguish groups of patients who, a priori, are deemed to be different (discriminant validity) Sensitivity and specificity When a scale is used to categorise people, it should be capable ofcategorisingthemaccurately. For example, it would be most useful to detect patients with previously unrecognised problems or those with problems that are amenable to intervention. When screening, sensitivity may be more important than specificity; opportunities for clarifying the status of false positive patients will arise but the false negative patient is lost to further scrutiny Reliability A reliable scale produces results which can be replicated with different observers (inter-observer reliability), when repeated (test-retest reliability), when using different sources of information and when administered by different means. Simple correlation between repeat tests is not adequate for the assessment of reliability - it is more appropriate to analyse the differences between scores to see if they are larger than might be expected by chance Responsiveness to change Ascaleshouldbecapableofdetectingchangeduetointerventions or over time at all levels of the scale. Floor and ceiling effects present particular difficulties; a scale may not be able to detect meaningful differences between subjects who score respectively at the bottom or the top of a scale. 24

5 4.6.5 Format and language Ascaleshouldbewell-designed,andinanappropriateformatandlanguageforthesubjectsand users of the scale who may have differing knowledge and skills. 25

Question Sheet. Prospective Validation of the Pediatric Appendicitis Score in a Canadian Pediatric Emergency Department

Question Sheet. Prospective Validation of the Pediatric Appendicitis Score in a Canadian Pediatric Emergency Department Question Sheet Prospective Validation of the Pediatric Appendicitis Score in a Canadian Pediatric Emergency Department Bhatt M, Joseph L, Ducharme FM et al. Acad Emerg Med 2009;16(7):591-596 1. Provide

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson

More information

EPIDEMIOLOGY. Training module

EPIDEMIOLOGY. Training module 1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease

More information

Screening (Diagnostic Tests) Shaker Salarilak

Screening (Diagnostic Tests) Shaker Salarilak Screening (Diagnostic Tests) Shaker Salarilak Outline Screening basics Evaluation of screening programs Where we are? Definition of screening? Whether it is always beneficial? Types of bias in screening?

More information

Binary Diagnostic Tests Paired Samples

Binary Diagnostic Tests Paired Samples Chapter 536 Binary Diagnostic Tests Paired Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary measures

More information

Diagnostic tests, Laboratory tests

Diagnostic tests, Laboratory tests Diagnostic tests, Laboratory tests I. Introduction II. III. IV. Informational values of a test Consequences of the prevalence rate Sequential use of 2 tests V. Selection of a threshold: the ROC curve VI.

More information

Importance of Good Measurement

Importance of Good Measurement Importance of Good Measurement Technical Adequacy of Assessments: Validity and Reliability Dr. K. A. Korb University of Jos The conclusions in a study are only as good as the data that is collected. The

More information

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables? Research Questions, Variables, and Hypotheses: Part 2 RCS 6740 6/7/04 1 Review What are research questions? What are variables? Definition Function Measurement Scale 2 Hypotheses OK, now that we know how

More information

Statistics, Probability and Diagnostic Medicine

Statistics, Probability and Diagnostic Medicine Statistics, Probability and Diagnostic Medicine Jennifer Le-Rademacher, PhD Sponsored by the Clinical and Translational Science Institute (CTSI) and the Department of Population Health / Division of Biostatistics

More information

Validity of measurement instruments used in PT research

Validity of measurement instruments used in PT research Validity of measurement instruments used in PT research Mohammed TA, Omar Ph.D. PT, PGDCR-CLT Rehabilitation Health Science Department Momarar@ksu.edu.sa A Word on Embedded Assessment Discusses ways of

More information

Chapter 10. Screening for Disease

Chapter 10. Screening for Disease Chapter 10 Screening for Disease 1 Terminology Reliability agreement of ratings/diagnoses, reproducibility Inter-rater reliability agreement between two independent raters Intra-rater reliability agreement

More information

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera Homework assignment topics 51-63 Georgina Salas Topics 51-63 EDCI Intro to Research 6300.62 Dr. A.J. Herrera Topic 51 1. Which average is usually reported when the standard deviation is reported? The mean

More information

Glossary of Practical Epidemiology Concepts

Glossary of Practical Epidemiology Concepts Glossary of Practical Epidemiology Concepts - 2009 Adapted from the McMaster EBCP Workshop 2003, McMaster University, Hamilton, Ont. Note that open access to the much of the materials used in the Epi-546

More information

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE Questionnaire design OUTLINE Questionnaire design tests Reliability Validity POINTS TO CONSIDER Identify your research objectives. Identify your population or study sample Decide how to collect the information

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

AP STATISTICS 2008 SCORING GUIDELINES (Form B) AP STATISTICS 2008 SCORING GUIDELINES (Form B) Question 4 Intent of Question The primary goals of this question were to assess a student s ability to (1) design an experiment to compare two treatments

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

Edinburgh Imaging Academy online distance learning courses. Statistics

Edinburgh Imaging Academy online distance learning courses. Statistics Statistics Semester 1 / Autumn 10 Credits Each Course is composed of Modules & Activities. Modules: Introduction to Statistics IMSc NI4R How to Read a Paper IMSc NI4R Assessing the Accuracy of Diagnostic

More information

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b) Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

11-3. Learning Objectives

11-3. Learning Objectives 11-1 Measurement Learning Objectives 11-3 Understand... The distinction between measuring objects, properties, and indicants of properties. The similarities and differences between the four scale types

More information

An update on the analysis of agreement for orthodontic indices

An update on the analysis of agreement for orthodontic indices European Journal of Orthodontics 27 (2005) 286 291 doi:10.1093/ejo/cjh078 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics Society. All rights reserved. For

More information

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit Overview Quality of Measurement Instruments Introduction SPSS Read:

More information

University of Wollongong. Research Online. Australian Health Services Research Institute

University of Wollongong. Research Online. Australian Health Services Research Institute University of Wollongong Research Online Australian Health Services Research Institute Faculty of Business 2011 Measurement of error Janet E. Sansoni University of Wollongong, jans@uow.edu.au Publication

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes: This sheets starts from slide #83 to the end ofslide #4. If u read this sheet you don`t have to return back to the slides at all, they are included here. Categorical Data (Qualitative data): Data that

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

CRITICAL APPRAISAL AP DR JEMAIMA CHE HAMZAH MD (UKM) MS (OFTAL) UKM PHD (UK) DEPARTMENT OF OPHTHALMOLOGY UKM MEDICAL CENTRE

CRITICAL APPRAISAL AP DR JEMAIMA CHE HAMZAH MD (UKM) MS (OFTAL) UKM PHD (UK) DEPARTMENT OF OPHTHALMOLOGY UKM MEDICAL CENTRE CRITICAL APPRAISAL AP DR JEMAIMA CHE HAMZAH MD (UKM) MS (OFTAL) UKM PHD (UK) DEPARTMENT OF OPHTHALMOLOGY UKM MEDICAL CENTRE MINGGU PENYELIDIKAN PERUBATAN & KESIHATAN PPUKM Lecture content Introduction

More information

Enumerative and Analytic Studies. Description versus prediction

Enumerative and Analytic Studies. Description versus prediction Quality Digest, July 9, 2018 Manuscript 334 Description versus prediction The ultimate purpose for collecting data is to take action. In some cases the action taken will depend upon a description of what

More information

Validity and responsiveness of the Core Outcome Measures Index (COMI) for the neck

Validity and responsiveness of the Core Outcome Measures Index (COMI) for the neck Validity and responsiveness of the Core Outcome Measures Index (COMI) for the neck C. D. Fankhauser 1 U. Mutter 1 E. Aghayev 2 A. F. Mannion 1 1, Schulthess Klinik, Zürich, Switzerland 2 Institute for

More information

SEED HAEMATOLOGY. Medical statistics your support when interpreting results SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015

SEED HAEMATOLOGY. Medical statistics your support when interpreting results SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015 SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015 SEED HAEMATOLOGY Medical statistics your support when interpreting results The importance of statistical investigations Modern medicine is often

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

SYMSYS 130: Research Methods in the Cognitive and Information Sciences (Spring 2013)

SYMSYS 130: Research Methods in the Cognitive and Information Sciences (Spring 2013) SYMSYS 130: Research Methods in the Cognitive and Information Sciences (Spring 2013) Take Home Final June 20, 2013 Instructor' s Responses Please respond to the following questions with short essays (300-500

More information

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research Introduction to the Principles and Practice of Clinical Research Measures David Black, Ph.D. Pediatric and Developmental Neuroscience, NIMH With thanks to Audrey Thurm Daniel Pine With thanks to Audrey

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Psychology, 2010, 1: doi: /psych Published Online August 2010 ( Psychology, 2010, 1: 194-198 doi:10.4236/psych.2010.13026 Published Online August 2010 (http://www.scirp.org/journal/psych) Using Generalizability Theory to Evaluate the Applicability of a Serial Bayes

More information

Reviewing IPA studies: Developing and evaluating a new tool

Reviewing IPA studies: Developing and evaluating a new tool Reviewing IPA studies: Developing and evaluating a new tool Dr Sherrill Snelgrove¹, Dr Annmarie Nelson ²,Dr Stephanie Sivell², Dr Mala Mann², Dr Bridie Evans ¹ ¹ Swansea University ² Cardiff University

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T. Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement

More information

Statistical Tools in Biology

Statistical Tools in Biology Statistical Tools in Biology Research Methodology Design protocol/procedure. (2 types) Cross sectional study comparing two different grps. e.g, comparing LDL levels between athletes and couch potatoes.

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course David W. Dowdy, MD, PhD Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

More information

PRACTICAL STATISTICS FOR MEDICAL RESEARCH

PRACTICAL STATISTICS FOR MEDICAL RESEARCH PRACTICAL STATISTICS FOR MEDICAL RESEARCH Douglas G. Altman Head of Medical Statistical Laboratory Imperial Cancer Research Fund London CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C. Contents

More information

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events. Definition Slides Hindsight Bias = the tendency to believe, after learning an outcome, that one would have foreseen it. Also known as the I knew it all along phenomenon. Critical Thinking = thinking that

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Chapter 8 Estimating with Confidence

Chapter 8 Estimating with Confidence Chapter 8 Estimating with Confidence Introduction Our goal in many statistical settings is to use a sample statistic to estimate a population parameter. In Chapter 4, we learned if we randomly select the

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Evaluation of diagnostic tests

Evaluation of diagnostic tests Evaluation of diagnostic tests Biostatistics and informatics Miklós Kellermayer Overlapping distributions Assumption: A classifier value (e.g., diagnostic parameter, a measurable quantity, e.g., serum

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Evidence Based Medicine Prof P Rheeder Clinical Epidemiology. Module 2: Applying EBM to Diagnosis

Evidence Based Medicine Prof P Rheeder Clinical Epidemiology. Module 2: Applying EBM to Diagnosis Evidence Based Medicine Prof P Rheeder Clinical Epidemiology Module 2: Applying EBM to Diagnosis Content 1. Phases of diagnostic research 2. Developing a new test for lung cancer 3. Thresholds 4. Critical

More information

Reliability. Internal Reliability

Reliability. Internal Reliability 32 Reliability T he reliability of assessments like the DECA-I/T is defined as, the consistency of scores obtained by the same person when reexamined with the same test on different occasions, or with

More information

ROC (Receiver Operating Characteristic) Curve Analysis

ROC (Receiver Operating Characteristic) Curve Analysis ROC (Receiver Operating Characteristic) Curve Analysis Julie Xu 17 th November 2017 Agenda Introduction Definition Accuracy Application Conclusion Reference 2017 All Rights Reserved Confidential for INC

More information

Developing and Testing Hypotheses Kuba Glazek, Ph.D. Methodology Expert National Center for Academic and Dissertation Excellence Los Angeles

Developing and Testing Hypotheses Kuba Glazek, Ph.D. Methodology Expert National Center for Academic and Dissertation Excellence Los Angeles Developing and Testing Hypotheses Kuba Glazek, Ph.D. Methodology Expert National Center for Academic and Dissertation Excellence Los Angeles NATIONAL CENTER FOR ACADEMIC & DISSERTATION EXCELLENCE Overview

More information

Introduction to ROC analysis

Introduction to ROC analysis Introduction to ROC analysis Andriy I. Bandos Department of Biostatistics University of Pittsburgh Acknowledgements Many thanks to Sam Wieand, Nancy Obuchowski, Brenda Kurland, and Todd Alonzo for previous

More information

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra Basic Biostatistics Dr. Kiran Chaudhary Dr. Mina Chandra Overview 1.Importance of Biostatistics 2.Biological Variations, Uncertainties and Sources of uncertainties 3.Terms- Population/Sample, Validity/

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimating with Confidence Section 8.1 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Introduction Our goal in many statistical settings is to use a sample statistic

More information

About Reading Scientific Studies

About Reading Scientific Studies About Reading Scientific Studies TABLE OF CONTENTS About Reading Scientific Studies... 1 Why are these skills important?... 1 Create a Checklist... 1 Introduction... 1 Abstract... 1 Background... 2 Methods...

More information

Psychometric evaluation of the self-test (PST) in the responsible gambling tool Playscan (GamTest)

Psychometric evaluation of the self-test (PST) in the responsible gambling tool Playscan (GamTest) Psychometric evaluation of the self-test (PST) in the responsible gambling tool Playscan (GamTest) Background I Originally called GamTest. A questionnaire consisting of 15 items plus one general item.

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Clinical biostatistics: Assessing agreement and diagnostic test evaluation

Clinical biostatistics: Assessing agreement and diagnostic test evaluation 1/66 Clinical biostatistics: Assessing agreement and diagnostic test evaluation Dr Cameron Hurst cphurst@gmail.com DAMASAC and CEU, Khon Kaen University 26 th September, 2557 2/66 What we will cover...

More information

Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755

Incorporating quantitative information into a linear ordering GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Memory & Cognition 1974, Vol. 2, No.3, 533 538 Incorporating quantitative information into a linear ordering" GEORGE R. POTTS Dartmouth College, Hanover, New Hampshire 03755 Ss were required to learn linear

More information

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing Chapter IR:VIII VIII. Evaluation Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing IR:VIII-1 Evaluation HAGEN/POTTHAST/STEIN 2018 Retrieval Tasks Ad hoc retrieval:

More information

Overview of Experimentation

Overview of Experimentation The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview;

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

Vocabulary. Bias. Blinding. Block. Cluster sample

Vocabulary. Bias. Blinding. Block. Cluster sample Bias Blinding Block Census Cluster sample Confounding Control group Convenience sample Designs Experiment Experimental units Factor Level Any systematic failure of a sampling method to represent its population

More information

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory Closed Coding Analyzing Qualitative Data Tutorial @ VIS17 Melanie Tory A code in qualitative inquiry is most often a word or short phrase that symbolically assigns a summative, salient, essence capturing,

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen Evaluating Quality in Creative Systems Graeme Ritchie University of Aberdeen Graeme Ritchie {2007} Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17 {1}, pp.67-99.

More information

An Evaluation of Interrater Reliability Measures on Binary Tasks Using d-prime

An Evaluation of Interrater Reliability Measures on Binary Tasks Using d-prime Article An Evaluation of Interrater Reliability Measures on Binary Tasks Using d-prime Applied Psychological Measurement 1 13 Ó The Author(s) 2017 Reprints and permissions: sagepub.com/journalspermissions.nav

More information

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions

Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions Understanding CELF-5 Reliability & Validity to Improve Diagnostic Decisions Senior Educational Consultant Pearson Disclosures Dr. Scheller is an employee of Pearson, publisher of the CELF-5. No other language

More information

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

CHAMP: CHecklist for the Appraisal of Moderators and Predictors CHAMP - Page 1 of 13 CHAMP: CHecklist for the Appraisal of Moderators and Predictors About the checklist In this document, a CHecklist for the Appraisal of Moderators and Predictors (CHAMP) is presented.

More information

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests :

Types of Tests. Measurement Reliability. Most self-report tests used in Psychology and Education are objective tests : Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a good item Item Analysis Internal Reliability Spearman-Brown Prophesy Formla -- α & # items

More information

2 Critical thinking guidelines

2 Critical thinking guidelines What makes psychological research scientific? Precision How psychologists do research? Skepticism Reliance on empirical evidence Willingness to make risky predictions Openness Precision Begin with a Theory

More information

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1 Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1 Study identification Including author, title, reference, year of publication Guideline topic: Checklist completed

More information

Gastric ulcers at endoscopy: brush, biopsy, or both Sadowski D C, Rabeneck L

Gastric ulcers at endoscopy: brush, biopsy, or both Sadowski D C, Rabeneck L Gastric ulcers at endoscopy: brush, biopsy, or both Sadowski D C, Rabeneck L Record Status This is a critical abstract of an economic evaluation that meets the criteria for inclusion on NHS EED. Each abstract

More information

LEVEL ONE MODULE EXAM PART TWO [Reliability Coefficients CAPs & CATs Patient Reported Outcomes Assessments Disablement Model]

LEVEL ONE MODULE EXAM PART TWO [Reliability Coefficients CAPs & CATs Patient Reported Outcomes Assessments Disablement Model] 1. Which Model for intraclass correlation coefficients is used when the raters represent the only raters of interest for the reliability study? A. 1 B. 2 C. 3 D. 4 2. The form for intraclass correlation

More information

Supporting Information: Cognitive capacities for cooking in chimpanzees Felix Warneken & Alexandra G. Rosati

Supporting Information: Cognitive capacities for cooking in chimpanzees Felix Warneken & Alexandra G. Rosati Supporting Information: Cognitive capacities for cooking in chimpanzees Felix Warneken & Alexandra G. Rosati Subject Information Name Sex Age Testing Year 1 Testing Year 2 1 2 3 4 5a 5b 5c 6a 6b 7 8 9

More information

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients Eric A. Youngstrom, Ph.D., Ahmad Hameed, M.D., Michael Mitchell,

More information

Review: Conditional Probability. Using tests to improve decisions: Cutting scores & base rates

Review: Conditional Probability. Using tests to improve decisions: Cutting scores & base rates Review: Conditional Probability Using tests to improve decisions: & base rates Conditional probabilities arise when the probability of one thing [A] depends on the probability of something else [B] In

More information

Probability Models for Sampling

Probability Models for Sampling Probability Models for Sampling Chapter 18 May 24, 2013 Sampling Variability in One Act Probability Histogram for ˆp Act 1 A health study is based on a representative cross section of 6,672 Americans age

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Biostatistics 2 - Correlation and Risk

Biostatistics 2 - Correlation and Risk BROUGHT TO YOU BY Biostatistics 2 - Correlation and Risk Developed by Pfizer January 2018 This learning module is intended for UK healthcare professionals only. PP-GEP-GBR-0957 Date of preparation Jan

More information

Variables in Research. What We Will Cover in This Section. What Does Variable Mean?

Variables in Research. What We Will Cover in This Section. What Does Variable Mean? Variables in Research 9/20/2005 P767 Variables in Research 1 What We Will Cover in This Section Nature of variables. Measuring variables. Reliability. Validity. Measurement Modes. Issues. 9/20/2005 P767

More information

Title:Validity and Reliability of Arm Abduction Angle Measured on Smartphone: a cross-sectional study

Title:Validity and Reliability of Arm Abduction Angle Measured on Smartphone: a cross-sectional study Author's response to reviews Authors: Antonio I Cuesta-Vargas (acuesta.var@gmail.com) Cristina Roldan-Jimenez (CRISTINA.ROLDAN005@gmail.com) Version:3Date:27 January 2016 Author's response to reviews:

More information

Evaluation of CBT for increasing threat detection performance in X-ray screening

Evaluation of CBT for increasing threat detection performance in X-ray screening Evaluation of CBT for increasing threat detection performance in X-ray screening A. Schwaninger & F. Hofer Department of Psychology, University of Zurich, Switzerland Abstract The relevance of aviation

More information

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value SPORTSCIENCE Perspectives / Research Resources A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value Will G Hopkins sportsci.org Sportscience 11,

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

GATE CAT Diagnostic Test Accuracy Studies

GATE CAT Diagnostic Test Accuracy Studies GATE: a Graphic Approach To Evidence based practice updates from previous version in red Critically Appraised Topic (CAT): Applying the 5 steps of Evidence Based Practice Using evidence from Assessed by:

More information

Chapter 11: Experiments and Observational Studies p 318

Chapter 11: Experiments and Observational Studies p 318 Chapter 11: Experiments and Observational Studies p 318 Observation vs Experiment An observational study observes individuals and measures variables of interest but does not attempt to influence the response.

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including Week 17 and 21 Comparing two assays and Measurement of Uncertainty 2.4.1.4. Explain tools used to compare the performance of two assays, including 2.4.1.4.1. Linear regression 2.4.1.4.2. Bland-Altman plots

More information