(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

Size: px
Start display at page:

Download "(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d"

Transcription

1 Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson & Trapp Basic & Clinical Biostatistics Objectives of this chapter The objective is to understand the diagnostic uses of a 2x2 table, and how to interpret terms associated with the table. (true) Disease Condition Test + Total + a b a + b True Positive False Positive c d c + d False Negative True Negative Total a + c b + d a + b + c + d Prevalence Proportion of population affected with the disease = (a + c) / (a + b + c + d) Sensitivity Proportion of true positives. = a / (a + c) A sensitive test is a good screening test because it identifies most of the people who have the disease, and perhaps a few who do not. Specificity Proportion of true negatives. = d / (b + d) A specific test is a good diagnostic test because it identifies most of the people who do not have the disease, and maybe a few who do. False Positive Proportion of false positives. =b / (b + d) = 1-specificity False Negative Proportion of false negatives. = c / (a + c) = 1-sensitivity Positive Predictive Value (PPV) Negative Predictive Value (NPV) Likelihood Ratio For A Positive Test Likelihood Ratio For A Negative Test Proportion of subjects with a positive test result who have the disease. = a / (a + b); (prevalence)(sensitivity) / ((prevalence)(sensitivity) + (1-prevalence)(1 specificity)). If table reflects prevalence then PPV = a / (a + b) Proportion of subjects with a negative test result who do not have the disease. = d / (c + d); (1 - prevalence)(specificity) / ((1 prevalence)(specificity) + (prevalence)(1 sensitivity)). If table reflects prevalence then NPV = d / (b + d) Accuracy Proportion of correct results or the probability the test will detect true findings; (prevalence)(sensitivity) + (1 prevalence)(specificity) (a + d) / (a + b + c + d) Odds that the given level of the test would occur in a subject who has the disease. = ((a/a + c)) / ((b/b + d)); sensitivity / (1 specificity) Odds that the given level of the test would occur in a subject who has the disease. = ((d/b + d)) / ((c/a + c)); (1 sensitivity) / specificity Page 1 of 7

2 Estimating Sensitivity & Specificity Table 12-3: ST elevation Test + Total Sensitivity= 19.4% Specificity= 81.9% False Positive= 18.1% Total False Negative= 80.6% PPV= 31.6% NPV= 70.2% Accuracy= 63.1% Prevalence= 30.1% Dawson gives this bottom line: 1. To rule out a disease, we want to be sure that a negative result is really negative; therefore, not very many false-negatives should occur. A sensitive test is the best choice to obtain as few falsenegatives as possible if factors such as cost and risk are similar; that is, high sensitivity helps rule out if the test is negative. As a handy acronym, if we abbreviate sensitivity by SN, and use a sensitive test to rule OUT, we have SNOUT. 2. To find evidence of a disease, we want a positive result to indicate a high probability that the patient has the disease that is, a positive test result should really indicate disease. We therefore want few false-positives. The best method for achieving this is a highly specific test that is, high specificity helps rule in if the test is positive. Again, if we abbreviate specificity by SP, and use a specific test to rule IN, we have SPIN. 3. To make accurate diagnoses, we must understand the role of prior probability of disease. If the prior probability of disease is extremely small, a positive result does not mean very much and should be followed by a test that is highly specific. The usefulness of a negative result depends on the sensitivity of the test. Page 2 of 7

3 Effect of Prevalence Assume Prevalence of 20% Test + Total Sensitivity= 78.0% Specificity= 67.0% False Positive= 33.0% Total False Negative= 22.0% Assumed PPV= 37.1% NPV= 92.4% Accuracy= 69.2% Prevalence= 20.0% Likelihood ratio for Pos test= 2.4 Likelihood ratio for Neg test= 0.3 Odds ratio= 7.20 or Change Prevalence to 50% Test + Total Sensitivity= 78.0% Specificity= 67.0% False Positive= 33.0% Total False Negative= 22.0% Assumed PPV= 70.3% NPV= 75.3% Accuracy= 72.5% Prevalence= 50.0% Likelihood ratio for Pos test= 2.4 Likelihood ratio for Neg test= 0.3 Odds ratio= 7.20 Page 3 of 7

4 Effect of accuracy More accurate, 20% prevalence Test + Total Sensitivity= 95.0% Specificity= 95.0% False Positive= 5.0% Total False Negative= 5.0% Assumed PPV= 82.6% NPV= 98.7% Accuracy= 95.0% Prevalence= 20.0% Likelihood ratio for Pos test= 19.0 Likelihood ratio for Neg test= 0.1 Odds ratio= or More accurate, 5% prevalence Test + Total Sensitivity= 95.0% Specificity= 95.0% False Positive= 5.0% Total False Negative= 5.0% Assumed PPV= 50.0% NPV= 99.7% Accuracy= 95.0% Prevalence= 5.0% Likelihood ratio for Pos test= 19.0 Likelihood ratio for Neg test= 0.1 Odds ratio= Page 4 of 7

5 Agreement Reading Assignment Measuring agreement, in Chapter 5 of Dawson & Trapp Basic & Clinical Biostatistics Objectives of this section The objective is to understand how to describe agreement between two imperfect measures, understand how chance can affect apparent agreement, and to interpret Kappa. Variability is pervasive All measurements will vary depending upon: Actual changes in the characteristics being measured Variation introduced by the examiner Variation of the measurement method In addition, measurements by one individual may be affected by the measurements obtained by another. The process of measuring may change the characteristic. Expectancy what you except to see influences what you see. Bias will occur unless one examiner is blinded to all other examiner s results. Reliability Reliability is reproducibility. Whether or not there is a true gold standard of truth, do different measurements agree with each other. It has nothing to do with agreement with what s true. Just with how close two (error prone) measures are. Intrarater reliability is the reproducibility of measures by the same examiner (One examiner measures the same characteristic twice). Sometimes called within-examiner reliability. Interrater reliability is the agreement of measures by different examiners. (Two examiners measure the same characteristic). Sometimes called between-examiner reliability. Test-retest reliability is used in the context of questionnaires. It s like intrarater reliability but, since questionnaires often have multiple items (supposedly) getting at the same construct, we can also get a measure of internal consistency. [Reliability is not quantified by things like sensitivity, specificity, false positive rate, and false negative rate; These all assume you know the true value.] It s often of interest to assess how well different classification methods agree. The different classification methods may refer to multiple raters making a clinical diagnosis, to multiple software algorithms classifying digitized images, to the scores from different rating scales determining probable etiology, or to comparing any two methods that yield classifications of individuals. In these situations there is no true or known classification and so assessing reliability (repeatability, reproducibility, agreement) is of interest. This is in contrast to being interested in the validity of a classification scheme. In the most typical case where each of N subjects is classified into one of R categories by two classification methods, the observations may be summarized in a RxR contingency table where rows describe classification by one method and columns describe classification by the other method. If nij is the number of subjects classified into the row classification value i and the column classification value j, then one natural index of raw agreement is the proportion of R subjects where the two classification methods agree, po i 1nii N. The problem with po is that it reflects both chance agreement and agreement beyond chance. The fact that it reflects chance agreement is easily seen in the following example: Assume the prevalence in a Page 5 of 7

6 population of interest of characteristic A is Further, assume that one rater uses information to classify subjects as A or not-a. Note that if the other rater simply always diagnoses every patient as A, the two will agree p o = Thus, a simple proportion-agreement score is insufficient to assess reliability. The proportion agreement expected by chance, p e, is easily calculated from the marginal proportions of the two raters, exactly as in the chi-square test of independence. So, to calculate a chance-corrected index of agreement Cohen [1] defined the kappa index: p κ o p e 1 pe He describes this as the proportion of agreement after chance agreement is removed from consideration. Landis and Koch [1977, The measurement of observer agreement for categorical data, Biometrics 33, ] suggest that κ < 0.40 reflects poor agreement, 0.40 κ < 0.75 reflects fair to good agreement, and κ > 0.75 reflects excellent agreement (also see the table on the bottom of page 119). Example: A study was done to compare different methods for packing filling material. Two methods were used assess voids in the filling methods. A portion of the report stated, Across all of the assessments, the agreement between the two methods (radiograph and microscope) were good, with over 80% of the assessments in complete agreement (see the table below). The largest disagreement occurred where no voids were evident with the microscope but more than half of the area had a void by the radiographic method (n = cases). There was also n= 18 cases where the radiograph indicted no void but the microscope indicated larger than half of the area was incomplete. Observed Microscope <50% >50% Radiograph no voids incomplete incomplete Total no voids (66.0) (1.8) (3.6) (71.4) <50% incomplete (5.0) (1.0) (1.8) (7.8) >50% incomplete (5.8) (1.6) (13.4) (20.8) Total (Percent) (76.8) (4.4) (18.8) (100.0) Observed agreement= ( ) = 402 = Page 6 of 7

7 Kappa measures the amount of agreement we would expect by chance alone. The expected level of agreement is below: expected Microscope Radiograph no voids <50% incomplete >50% incomplete Total no voids (54.8) (3.1) (13.4) (71.4) <50% incomplete (6.0) (0.3) (1.5) (7.8) >50% incomplete (16.0) (0.9) (3.9) (20.8) Total (Percent) (76.8) (4.4) (18.8) (100.0) Expected agreement= ( ) = = So, the chance-corrected measure of agreement is: Kappa= = = (SE = 0.039) Question: What s your conclusion now? Does the microscope method and the radiograph method agree in their assessment of filling voids? Page 7 of 7

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b) Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study DATAWorks 2018 - March 21, 2018 Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study Christopher Drake Lead Statistician, Small Caliber Munitions QE&SA Statistical

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen.

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen. Adapted from Fertil Steril 2007;87:373-80 Intraobserver and interobserver reliability of videotaped laparoscopy evaluations for endometriosis and adhesions 2 Philomeen Weijenborg, Moniek ter Kuile and

More information

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course David W. Dowdy, MD, PhD Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

Reliability and Validity checks S-005

Reliability and Validity checks S-005 Reliability and Validity checks S-005 Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time Test-Retest

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Screening (Diagnostic Tests) Shaker Salarilak

Screening (Diagnostic Tests) Shaker Salarilak Screening (Diagnostic Tests) Shaker Salarilak Outline Screening basics Evaluation of screening programs Where we are? Definition of screening? Whether it is always beneficial? Types of bias in screening?

More information

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up

Lab 4: Alpha and Kappa. Today s Activities. Reliability. Consider Alpha Consider Kappa Homework and Media Write-Up Lab 4: Alpha and Kappa Today s Activities Consider Alpha Consider Kappa Homework and Media Write-Up Reliability Reliability refers to consistency Types of reliability estimates Test-retest reliability

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics : the kappa statistic Mary L. McHugh Department of Nursing, National University, Aero Court, San Diego, California Corresponding author: mchugh8688@gmail.com Abstract The kappa

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

reproducibility of the interpretation of hysterosalpingography pathology

reproducibility of the interpretation of hysterosalpingography pathology Human Reproduction vol.11 no.6 pp. 124-128, 1996 Reproducibility of the interpretation of hysterosalpingography in the diagnosis of tubal pathology Ben WJ.Mol 1 ' 2 ' 3, Patricia Swart 2, Patrick M-M-Bossuyt

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

Lecture 5. Contingency /incidence tables Sensibility, specificity Relative Risk Odds Ratio CHI SQUARE test

Lecture 5. Contingency /incidence tables Sensibility, specificity Relative Risk Odds Ratio CHI SQUARE test Lecture 5 Contingency /incidence tables Sensibility, specificity Relative Risk Odds Ratio CHI SQUARE test Contingency tables - example Factor 2 Present + Absent - Total Factor 1 Present + a b a+b Absent

More information

Chapter 10. Screening for Disease

Chapter 10. Screening for Disease Chapter 10 Screening for Disease 1 Terminology Reliability agreement of ratings/diagnoses, reproducibility Inter-rater reliability agreement between two independent raters Intra-rater reliability agreement

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations? Clause 9.3.5 Appropriate methodology and procedures (e.g. collecting and maintaining statistical data) shall be documented and implemented in order to affirm, at justified defined intervals, the fairness,

More information

Measuring Performance Of Physicians In The Diagnosis Of Endometriosis Using An Expectation-Maximization Algorithm

Measuring Performance Of Physicians In The Diagnosis Of Endometriosis Using An Expectation-Maximization Algorithm Yale University EliScholar A Digital Platform for Scholarly Publishing at Yale Public Health Theses School of Public Health January 2014 Measuring Performance Of Physicians In The Diagnosis Of Endometriosis

More information

Evidence Based Medicine Prof P Rheeder Clinical Epidemiology. Module 2: Applying EBM to Diagnosis

Evidence Based Medicine Prof P Rheeder Clinical Epidemiology. Module 2: Applying EBM to Diagnosis Evidence Based Medicine Prof P Rheeder Clinical Epidemiology Module 2: Applying EBM to Diagnosis Content 1. Phases of diagnostic research 2. Developing a new test for lung cancer 3. Thresholds 4. Critical

More information

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Psychology, 2010, 1: doi: /psych Published Online August 2010 ( Psychology, 2010, 1: 194-198 doi:10.4236/psych.2010.13026 Published Online August 2010 (http://www.scirp.org/journal/psych) Using Generalizability Theory to Evaluate the Applicability of a Serial Bayes

More information

Package CompareTests

Package CompareTests Type Package Package CompareTests February 6, 2017 Title Correct for Verification Bias in Diagnostic Accuracy & Agreement Version 1.2 Date 2016-2-6 Author Hormuzd A. Katki and David W. Edelstein Maintainer

More information

CommonKnowledge. Pacific University. Gina Clark Pacific University. Lauren Murphy Pacific University. Recommended Citation.

CommonKnowledge. Pacific University. Gina Clark Pacific University. Lauren Murphy Pacific University. Recommended Citation. Pacific University CommonKnowledge PT Critically Appraised Topics School of Physical Therapy 2012 The diagnostic accuracy of patient subjective history compared to the gold standard of urodynamic testing

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information

EPIDEMIOLOGY. Training module

EPIDEMIOLOGY. Training module 1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease

More information

SUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013)

SUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013) Supplementary Online Information S2 Analysis of raw data Forty-two out of the 48 experts finished the experiment, and only data from these 42 experts are considered in the remainder of the analysis. We

More information

Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures

Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures Collin, David; Dunker, Dennis; Gothlin, Jan H.; Geijer, Mats Published in: Acta Radiologica

More information

Maltreatment Reliability Statistics last updated 11/22/05

Maltreatment Reliability Statistics last updated 11/22/05 Maltreatment Reliability Statistics last updated 11/22/05 Historical Information In July 2004, the Coordinating Center (CORE) / Collaborating Studies Coordinating Center (CSCC) identified a protocol to

More information

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes: This sheets starts from slide #83 to the end ofslide #4. If u read this sheet you don`t have to return back to the slides at all, they are included here. Categorical Data (Qualitative data): Data that

More information

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen Evaluating Quality in Creative Systems Graeme Ritchie University of Aberdeen Graeme Ritchie {2007} Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17 {1}, pp.67-99.

More information

BMI 541/699 Lecture 16

BMI 541/699 Lecture 16 BMI 541/699 Lecture 16 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Proportions & contingency tables -

More information

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory

Closed Coding. Analyzing Qualitative Data VIS17. Melanie Tory Closed Coding Analyzing Qualitative Data Tutorial @ VIS17 Melanie Tory A code in qualitative inquiry is most often a word or short phrase that symbolically assigns a summative, salient, essence capturing,

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Victoria YY Xu PGY-3 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong

Victoria YY Xu PGY-3 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong Validity, Reliability, Feasibility and Acceptability of Using the Consultation Letter Rating Scale to Assess Written Communication Competencies Among Geriatric Medicine Postgraduate Trainees Victoria YY

More information

Psychometric qualities of the Dutch Risk Assessment Scales (RISc)

Psychometric qualities of the Dutch Risk Assessment Scales (RISc) Summary Psychometric qualities of the Dutch Risk Assessment Scales (RISc) Inter-rater reliability, internal consistency and concurrent validity 1 Cause, objective and research questions The Recidive InschattingsSchalen

More information

Mohegan Sun Casino/Resort Uncasville, CT AAPP Annual Seminar

Mohegan Sun Casino/Resort Uncasville, CT AAPP Annual Seminar Mohegan Sun Casino/Resort Uncasville, CT 06382 2016 AAPP Annual Seminar Low Base Rate Screening Survival Analysis 1 & Successive Hurdles Mark Handler 2 AAPP Research & Information Chair Greetings my fellow

More information

Comparison of the Null Distributions of

Comparison of the Null Distributions of Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs

More information

Introduction to ROC analysis

Introduction to ROC analysis Introduction to ROC analysis Andriy I. Bandos Department of Biostatistics University of Pittsburgh Acknowledgements Many thanks to Sam Wieand, Nancy Obuchowski, Brenda Kurland, and Todd Alonzo for previous

More information

Statistical Validation of the Grand Rapids Arch Collapse Classification

Statistical Validation of the Grand Rapids Arch Collapse Classification Statistical Validation of the Grand Rapids Arch Collapse Classification David Burkard, BS Michelle Padley, CRTM John Anderson, MD Donald Bohay, MD John Maskill, MD Daniel Patton, MD Orthopaedic Associates

More information

Victoria YY Xu PGY-2 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong

Victoria YY Xu PGY-2 Internal Medicine University of Toronto. Supervisor: Dr. Camilla Wong Validity, Reliability, Feasibility, and Acceptability of Using the Consultation Letter Rating Scale to Assess Written Communication Competencies Among Geriatric Medicine Postgraduate Trainees Victoria

More information

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients

A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients A Cross-sectional, Randomized, Non-interventional Methods Study to Compare Three Methods of Assessing Suicidality in Psychiatric Inpatients Eric A. Youngstrom, Ph.D., Ahmad Hameed, M.D., Michael Mitchell,

More information

Relationship Between Intraclass Correlation and Percent Rater Agreement

Relationship Between Intraclass Correlation and Percent Rater Agreement Relationship Between Intraclass Correlation and Percent Rater Agreement When raters are involved in scoring procedures, inter-rater reliability (IRR) measures are used to establish the reliability of measures.

More information

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE Questionnaire design OUTLINE Questionnaire design tests Reliability Validity POINTS TO CONSIDER Identify your research objectives. Identify your population or study sample Decide how to collect the information

More information

An Exploratory Case Study of the Use of Video Digitizing Technology to Detect Answer-Copying on a Paper-and-Pencil Multiple-Choice Test

An Exploratory Case Study of the Use of Video Digitizing Technology to Detect Answer-Copying on a Paper-and-Pencil Multiple-Choice Test An Exploratory Case Study of the Use of Video Digitizing Technology to Detect Answer-Copying on a Paper-and-Pencil Multiple-Choice Test Carlos Zerpa and Christina van Barneveld Lakehead University czerpa@lakeheadu.ca

More information

Evidence-based Imaging: Critically Appraising Studies of Diagnostic Tests

Evidence-based Imaging: Critically Appraising Studies of Diagnostic Tests Evidence-based Imaging: Critically Appraising Studies of Diagnostic Tests Aine Marie Kelly, MD Critically Appraising Studies of Diagnostic Tests Aine Marie Kelly B.A., M.B. B.Ch. B.A.O., M.S. M.R.C.P.I.,

More information

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding

More information

appstats26.notebook April 17, 2015

appstats26.notebook April 17, 2015 Chapter 26 Comparing Counts Objective: Students will interpret chi square as a test of goodness of fit, homogeneity, and independence. Goodness of Fit A test of whether the distribution of counts in one

More information

Designing Psychology Experiments: Data Analysis and Presentation

Designing Psychology Experiments: Data Analysis and Presentation Data Analysis and Presentation Review of Chapter 4: Designing Experiments Develop Hypothesis (or Hypotheses) from Theory Independent Variable(s) and Dependent Variable(s) Operational Definitions of each

More information

Binary Diagnostic Tests Paired Samples

Binary Diagnostic Tests Paired Samples Chapter 536 Binary Diagnostic Tests Paired Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary measures

More information

AP Psychology -- Chapter 02 Review Research Methods in Psychology

AP Psychology -- Chapter 02 Review Research Methods in Psychology AP Psychology -- Chapter 02 Review Research Methods in Psychology 1. In the opening vignette, to what was Alicia's condition linked? The death of her parents and only brother 2. What did Pennebaker s study

More information

Lecture 15 Chapters 12&13 Relationships between Two Categorical Variables

Lecture 15 Chapters 12&13 Relationships between Two Categorical Variables Lecture 15 Chapters 12&13 Relationships between wo Categorical ariables abulating and Summarizing able of Expected Counts Statistical Significance for wo-way ables Constructing & Assessing a wo-way able

More information

Evidence-Based Medicine: Diagnostic study

Evidence-Based Medicine: Diagnostic study Evidence-Based Medicine: Diagnostic study What is Evidence-Based Medicine (EBM)? Expertise in integrating 1. Best research evidence 2. Clinical Circumstance 3. Patient values in clinical decisions Haynes,

More information

Statistics, Probability and Diagnostic Medicine

Statistics, Probability and Diagnostic Medicine Statistics, Probability and Diagnostic Medicine Jennifer Le-Rademacher, PhD Sponsored by the Clinical and Translational Science Institute (CTSI) and the Department of Population Health / Division of Biostatistics

More information

CHAPTER 8 EXPERIMENTAL DESIGN

CHAPTER 8 EXPERIMENTAL DESIGN CHAPTER 8 1 EXPERIMENTAL DESIGN LEARNING OBJECTIVES 2 Define confounding variable, and describe how confounding variables are related to internal validity Describe the posttest-only design and the pretestposttest

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/19149 holds various files of this Leiden University dissertation. Author: Maljaars, Janne Pieternella Wilhelmina Title: Communication problems in children

More information

Appraising Diagnostic Test Studies

Appraising Diagnostic Test Studies Appraising Diagnostic Test Studies Martin Bland Prof. of Health Statistics Dept. of Health Sciences University of York http://www-users.york.ac.uk/~mb55/msc/ Diagnostic Test Studies How well does a test

More information

Two-sample Categorical data: Measuring association

Two-sample Categorical data: Measuring association Two-sample Categorical data: Measuring association Patrick Breheny October 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 40 Introduction Study designs leading to contingency

More information

10/26/2017. Diagnostic Tests vs. Screening. Dysphagia Screening: What it is and what it is not

10/26/2017. Diagnostic Tests vs. Screening. Dysphagia Screening: What it is and what it is not Dysphagia Screening: What it is and what it is not Debra M. Suiter, Ph.D., CCC-SLP, BCS-S Director University of Kentucky Voice & Swallow Clinic Lexington, Kentucky ASHA's Preferred Practice Pattern on

More information

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA PharmaSUG 2014 - Paper SP08 Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA ABSTRACT Randomized clinical trials serve as the

More information

Biosta's'cs Board Review. Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016

Biosta's'cs Board Review. Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016 Biosta's'cs Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016 Review key biosta's'cs concepts Understand 2 X 2 tables Objec'ves By the end of this

More information

CLASS II ETIOLOGY AND ITS EFFECT ON TREATMENT APPROACH AND OUTCOME

CLASS II ETIOLOGY AND ITS EFFECT ON TREATMENT APPROACH AND OUTCOME CLASS II ETIOLOGY AND ITS EFFECT ON TREATMENT APPROACH AND OUTCOME By DAVID O. MANSOUR A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Diagnostic screening. Department of Statistics, University of South Carolina. Stat 506: Introduction to Experimental Design

Diagnostic screening. Department of Statistics, University of South Carolina. Stat 506: Introduction to Experimental Design Diagnostic screening Department of Statistics, University of South Carolina Stat 506: Introduction to Experimental Design 1 / 27 Ties together several things we ve discussed already... The consideration

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra Basic Biostatistics Dr. Kiran Chaudhary Dr. Mina Chandra Overview 1.Importance of Biostatistics 2.Biological Variations, Uncertainties and Sources of uncertainties 3.Terms- Population/Sample, Validity/

More information

Comparing multiple proportions

Comparing multiple proportions Comparing multiple proportions February 24, 2017 psych10.stanford.edu Announcements / Action Items Practice and assessment problem sets will be posted today, might be after 5 PM Reminder of OH switch today

More information

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013 Evidence-Based Medicine Journal Club A Primer in Statistics, Study Design, and Epidemiology August, 2013 Rationale for EBM Conscientious, explicit, and judicious use Beyond clinical experience and physiologic

More information

Interpreting Kappa in Observational Research: Baserate Matters

Interpreting Kappa in Observational Research: Baserate Matters Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Sonoma State University Paul Yoder Vanderbilt University Abstract Kappa (Cohen, 1960) is a popular agreement statistic

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/29572 holds various files of this Leiden University dissertation. Author: Berg, Rosaline van den Title: Spondyloarthritis : recognition, imaging, treatment

More information

Quality of Clinical Practice Guidelines

Quality of Clinical Practice Guidelines Evidence Based Dentistry Quality of Clinical Practice Guidelines Asbjørn Jokstad University of Oslo, Norway 07/04/2005 1 Justification for developing guidelines Demand for effectiveness and efficacy studies

More information

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide

and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide 03-Fink Research.qxd 11/1/2004 10:57 AM Page 103 3 Searching and Screening Methodological Quality (Part 2: Data Collection, Interventions, Analysis, Results, and Conclusions A Reader s Guide Purpose of

More information

Designing Psychology Experiments: Data Analysis and Presentation

Designing Psychology Experiments: Data Analysis and Presentation Data Analysis and Presentation Review of Chapter 4: Designing Experiments Develop Hypothesis (or Hypotheses) from Theory Independent Variable(s) and Dependent Variable(s) Operational Definitions of each

More information

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions Essential Skills for Evidence-based Practice: Statistics for Therapy Questions Jeanne Grace Corresponding author: J. Grace E-mail: Jeanne_Grace@urmc.rochester.edu Jeanne Grace RN PhD Emeritus Clinical

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

Measuring association in contingency tables

Measuring association in contingency tables Measuring association in contingency tables Patrick Breheny April 3 Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 1 / 28 Hypothesis tests and confidence intervals Fisher

More information

Matthew L. Jensen. MIS Division, Price College of Business, University of Oklahoma, 307 West Brooks, Norman, OK U.S.A.

Matthew L. Jensen. MIS Division, Price College of Business, University of Oklahoma, 307 West Brooks, Norman, OK U.S.A. RESEARCH ARTICLE PROMINENCE AND INTERPRETATION OF ONLINE CONFLICT OF INTEREST DISCLOSURES Matthew L. Jensen MIS Division, Price College of Business, University of Oklahoma, 307 West Brooks, Norman, OK

More information

SOME NOTES ON STATISTICAL INTERPRETATION

SOME NOTES ON STATISTICAL INTERPRETATION 1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation. These are intended to serve as a resource for the Soci 380 data analysis. The information provided

More information

Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J.

Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J. UvA-DARE (Digital Academic Repository) Tubal subfertility and ectopic pregnancy. Evaluating the effectiveness of diagnostic tests Mol, B.W.J. Link to publication Citation for published version (APA): Mol,

More information

Understanding Statistics for Research Staff!

Understanding Statistics for Research Staff! Statistics for Dummies? Understanding Statistics for Research Staff! Those of us who DO the research, but not the statistics. Rachel Enriquez, RN PhD Epidemiologist Why do we do Clinical Research? Epidemiology

More information

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Circle the best answer. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Examining Inter-Rater Reliability of a CMH Needs Assessment measure in Ontario

Examining Inter-Rater Reliability of a CMH Needs Assessment measure in Ontario Examining Inter-Rater Reliability of a CH Needs Assessment measure in Ontario CAHSPR, Halifax, ay 2011 Team: Janet Durbin, Elizabeth Lin, Carolyn Dewa, Brenda Finlayson, Stephen Gallant, April Collins

More information

SUMMARY AND DISCUSSION

SUMMARY AND DISCUSSION Risk factors for the development and outcome of childhood psychopathology SUMMARY AND DISCUSSION Chapter 147 In this chapter I present a summary of the results of the studies described in this thesis followed

More information

Finding Good Diagnosis Studies

Finding Good Diagnosis Studies Finding Good Diagnosis Studies n MESH Term Sensitivity and Specificity n Sensitivity prob that someone with disease will test positive (Pr [+ D]) true positive n Specificity prob that someone without the

More information

Basic Concepts in Research and DATA Analysis

Basic Concepts in Research and DATA Analysis Basic Concepts in Research and DATA Analysis 1 Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...2 The Research Question...3 The Hypothesis...3 Defining the

More information

NIH Public Access Author Manuscript Tutor Quant Methods Psychol. Author manuscript; available in PMC 2012 July 23.

NIH Public Access Author Manuscript Tutor Quant Methods Psychol. Author manuscript; available in PMC 2012 July 23. NIH Public Access Author Manuscript Published in final edited form as: Tutor Quant Methods Psychol. 2012 ; 8(1): 23 34. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

More information

Pediatric Lung Ultrasound (PLUS) In Diagnosis of Community Acquired Pneumonia (CAP)

Pediatric Lung Ultrasound (PLUS) In Diagnosis of Community Acquired Pneumonia (CAP) Pediatric Lung Ultrasound (PLUS) In Diagnosis of Community Acquired Pneumonia (CAP) Dr Neetu Talwar Senior Consultant, Pediatric Pulmonology Fortis Memorial Research Institute, Gurugram Study To compare

More information

DEPRESSION-FOCUSED INTERVENTION FOR PREGNANT SMOKERS 1. Supplemental Material For Online Use Only

DEPRESSION-FOCUSED INTERVENTION FOR PREGNANT SMOKERS 1. Supplemental Material For Online Use Only DEPRESSION-FOCUSED INTERVENTION FOR PREGNANT SMOKERS 1 Supplemental Material For Online Use Only Effects of an Intensive Depression-Focused Intervention for Smoking Cessation in Pregnancy DEPRESSION-FOCUSED

More information

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)

Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) This is an experiment in the economics of strategic decision making. Various agencies have provided funds for this research.

More information

ROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way:

ROC Curves. I wrote, from SAS, the relevant data to a plain text file which I imported to SPSS. The ROC analysis was conducted this way: ROC Curves We developed a method to make diagnoses of anxiety using criteria provided by Phillip. Would it also be possible to make such diagnoses based on a much more simple scheme, a simple cutoff point

More information

A study of adverse reaction algorithms in a drug surveillance program

A study of adverse reaction algorithms in a drug surveillance program A study of adverse reaction algorithms in a drug surveillance program To improve agreement among observers, several investigators have recently proposed methods (algorithms) to standardize assessments

More information

AN ASSESSMENT OF INTER-RATER RELIABILITY IN THE TREATMENT OF CAROTID ARTERY STENOSIS

AN ASSESSMENT OF INTER-RATER RELIABILITY IN THE TREATMENT OF CAROTID ARTERY STENOSIS Pak Heart J ORIGINAL ARTICLE AN ASSESSMENT OF INTER-RATER RELIABILITY IN THE TREATMENT OF CAROTID ARTERY STENOSIS 1 2 3 4 5 Abhishek Nemani, Arshad Ali, Arshad Rehan, Ali Aboufaris, Jabar Ali 1-4 Guthrie

More information

SkillBuilder Shortcut: Levels of Evidence

SkillBuilder Shortcut: Levels of Evidence SkillBuilder Shortcut: Levels of Evidence This shortcut sheet was developed by Research Advocacy Network to assist advocates in understanding Levels of Evidence and how these concepts apply to clinical

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information