Evaluation of a clinical test. I: Assessment of reliability
|
|
- Melvyn Bond
- 5 years ago
- Views:
Transcription
1 British Journal of Obstetrics and Gynaecology June 2001, Vol. 108, pp. 562±567 COMMENTARY Evaluation of a clinical test. I: Assessment of reliability Introduction Testing and screening are critical parts of the clinical process, since inappropriate testing strategies put patients at risk and entail a serious waste of resources 1,2. Based on our recent experiences of evaluating diagnostic literature 3±7, we have come to believe that there is much misunderstanding about the evaluation of clinical tests. Some tests, introduced into practice without proper evaluation, are so inef cient as to be almost useless. In our view, the absence of clear methodological guidelines about the evaluation of clinical tests is a major impediment. Just as robust research methods in assessing the effectiveness of treatments have been actively pursued over the last decade, so attention needs to be focused on how research on diagnostic tests and their impact on clinical practice might be improved. Our commentary is prompted by the concern that there is a huge disparity between the number of clinical tests and the availability of robust research evidence to help make decisions about their most appropriate clinical application. We must rst ask why inef ciency in clinical testing leads to mismanagement of patients. The answer is quite simple. By missing a diagnosis, early therapy cannot be undertaken, thereby prolonging morbidity. On the other hand, by making a diagnosis in the absence of disease unnecessary therapy may be undertaken with the risk of adverse effects. But how does inef ciency in clinical testing arise in the rst place? We need to understand that the results of our tests are the outcomes of the clinical measurements. It is the errors in these clinical measurements that lead to inef ciency in clinical testing. Errors in clinical measurements 8±10 are of two sorts. Firstly, measurement may be inconsistent if the same attribute recorded by another observer (or recorded a second time by the same observer) leads to a different reading. The term reliability refers to this type of measurement error. Secondly, the measurement obtained may not be accurate when compared with the `true' state of the attribute estimated by a suitable reference standard. This type of measurement error is referred to as validity. The goal of research is to determine whether a clinical test measures what is intended (validity), but rst it should be established that it measures something in a consistent fashion (reliability). Based on these two types of errors in clinical measurement, our commentary is divided into two parts. In the rst part, the focus is on appropriate strategies for q RCOG 2001 British Journal of Obstetrics and Gynaecology PII: S (00) conducting and analysing studies of the reliability of a clinical test. In the second part, strategies for conducting and analysing studies of the validity of a clinical test will be described. Design of a study of reliability Reliability studies are generally reported in the literature as observer variability studies. The study is designed to compare measurements obtained by two or more observers (inter-rater reliability) or by one observer on two or more different occasions (intra-rater reliability). Intra-rater reliability is a prerequisite for inter-rater reliability 8. We will restrict our description to inter-rater reliability. The objective of the study is to measure independently, the same clinical attribute on at least two occasions and then to discern the agreement between these measurements. In order for the reliability of a test to be replicated from a published study, researchers should provide suf cient information about the manner in which the test was conducted 9. The information should cover all important issues with regard to the conduct of the test, such as the preparation of the patients, measurements of biophysical recordings, details of laboratory assays, and computation of results. In studies of reliability, one possible source of bias is the use of measurements from a sample which is not representative of the population being studied 9. Reliability will appear to be more optimistic if the researchers have deliberately discarded dif cult or borderline cases from the study. Such omissions are more likely to occur with convenience or arbitrary methods of sampling. Selection bias is less likely with consecutive or random sampling 6. Sadly, sampling is inadequate in many studies 4,6,7. Studies of reliability also require the observers to be blinded to one another's measurements. Blind recording of measurements avoid bias, since recordings made by one observer are not in uenced by the knowledge of the measurements obtained by other observers; usually blinding is not the case in studies of reliability. In our systematic review of the reliability of bladder volume by ultrasound, we found that blinding of the ultrasonic bladder volume measurements was adequate in only 40% of the studies 5. Unless the sample is representative of the population being studied and unless the recordings of the observers are not made available to one another, we can
2 COMMENTARIES 563 Table 1. Some types of measurement scales and examples of different reliability studies. Scale Description Example Nominal Non-ranked categories Presence or absence of hypertension (based on particular cut-off level of blood pressure) Ordinal Three or more ranked categories None, mild, moderate, or severe hypertension (based on categories of a range of normal, low high, medium high and very high values of blood pressure) Dimensional Continuous or decimal scale Exact blood pressure values expressed in mm Hg have no con dence in a study of reliability of a clinical test. The precise estimation of the reliability of a test requires an adequate sample size. The calculation of the sample size for studies of reliability can be quite complex. Although methods for the estimation of the appropriate sample size for studies of reliability are available, such calculations are seldom performed. Data analysis of a study of reliability Table 1 shows the different types of measurement encountered in clinical practice, with some examples: nominal (dichotomous); ordinal (ranked); and dimensional (continuous). The important point is that in studies of the reliability of a clinical test, the measurements recorded by the two observers should be expressed on the same type of scale, and with the same number of categories if the data are ordinal. It is important to remember that the purpose of a study of reliability is to determine the agreement (or concordance) of the measurements obtained by two observers, measuring the same clinical attribute independently 8. Nominal scale When dealing with dichotomous data (for example, the presence or absence of hypertension), many researchers will report the percentage agreement as the index of reliability. From the hypothetical example in Table 2, the percentage agreement between the two midwives recording whether pregnant women are hypertensive or normotensive is 91.3%, a statistic that looks impressive because of its closeness to 100% (the value depicting perfect agreement). However, this statistic does not take into account the agreement that was expected to occur by chance alone. In the bottom row of Table 2, we have calculated the chance-expected percentage agreement. It represents a value fairly close to the observed percentage agreement, such that one may conclude that the agreement beyond chance is not very great. The statistic of choice for estimating the agreement between observers using the same nominal or dichotomous scale of measurement is kappa 11, which corrects for the agreement expected by chance. Kappa is the observed agreement minus the agreement expected by chance, divided by perfect agreement minus the agreement Table 2. Agreement (disagreement) between two midwives recording mid trimester diastolic blood pressure in a high-risk antenatal clinic and classifying it as normal or hypertension based on a cut-off level of 90 mmhg. Midwife A Hypertension Normal Total Midwife B Hypertension 10 a 10 b 20 R1 Normal 10 c 200 d 210 R2 Total 20 C1 210 C2 230 N Prevalence of hypertension ˆ (R1 or C1)/N ˆ 20/230 ˆ 8.7% Observed percentage agreement ˆ [(a 1 d)/n] x 100 ˆ [( )/230] x 100 ˆ 91.3% Chance-expected percentage agreement Kappa coef cient (k) ˆ observed ± chance-expected percentage agreement perfect±chance-expected percentage agreement ˆ (91.3±84.1)/(100±84.1) ˆ 0.45 (95% CI 0.32±0.58) ˆ {[(R1 x C1)/N] 1 [(R2 x C2)/N]}x100/N ˆ [(R1 x C1) 1 (R2 x C2)] x 100/N 2 ˆ [(20 x 20) 1 (210 x 210)] x 100/230 2 ˆ 84.1%
3 564 COMMENTARIES Table 3. Guidelines for interpretation of kappa statistic 13. Kappa value Strength of agreement 0 Poor 0±0.20 Slight 0.21±0.40 Fair 0.41±0.60 Moderate 0.61±0.80 Substantial 0.81±1.0 Excellent expected by chance 9 : Po 2 Pe Kappa ˆ 1 2 Pe Where P o is the observed agreement and P e is the agreement expected by chance. Kappa therefore gives more information than simple percentage agreement. Its values range from 0 to 1, with 0 representing no agreement beyond chance and 1 representing perfect agreement. The standard error of kappa allows us to estimate its statistical signi cance and also its 95% con dence interval. These computations can be performed using a computer 12. The magnitude of kappa is a far more important measure of agreement than its statistical signi cance. The guidelines 13 for interpretation of the values of kappa are given in Table 3. Using the example of the agreement between the blood pressure recordings of two midwives in Table 2, the kappa value obtained is 0.45 (95% CI 0.32±0.58), indicating moderate agreement. We should note that the interpretation of kappa is subjective and the values of kappa in Table 3 are considered to be optimistic by some investigators 14,15. The value of kappa depends on the prevalence of the disorder being studied. Suppose the study in Table 2 was repeated in a different population, where the prevalence of hypertension was 40% (Table 4). The observer agreement was found to be lower (kappa ˆ 0.17) (Table 4a). This is because a high prevalence of hypertension results in a high level of chance-expected agreement and hence a lower kappa value; conversely a condition with a low prevalence will tend to give higher values of kappa 16. Therefore, kappa values generated from studies on disparate populations are not easily comparable 14. When there is a systematic difference between the two midwives in recording the presence or absence of hypertension, higher than expected values of the kappa statistic can also be obtained 14. In the above two examples, the prevalences of hypertension diagnosed by both midwives were identical (9% and 40% respectively). Let us assume that the prevalence of hypertension diagnosed by Midwife A was 7% and the prevalence of hypertension diagnosed by Midwife B was 11% (Table 4b). The kappa value is 0.46 which is almost identical to that obtained in the rst study and better than that obtained with the second study, despite the systematic disagreement in the diagnosis of hypertension between the two midwives. The examples in Table 4 illustrate the paradoxes of kappa 17 : in Table 4a, the agreement is moderate, despite the lower value of kappa; and in Table 4b the agreement is poor, despite the higher value of kappa. McNemar's test will estimate the probability that the difference in the number of disagreements between the two observers could have occurred by chance 14. For the information in Table 4b, P ˆ 0.04, Table 4. Effect of different sample recruitment strategies and systematic difference in diagnosis of hypertension on the kappa (k) statistic for antenatal blood pressure measurement by two midwives. A. Study 2 Midwife A Hypertension Normal Total Midwife B Hypertension Normal Total Prevalence of hypertension ˆ 40.0% Kappa coef cient (k) ˆ 0.17 (95% CI ) Study 3 Midwife A Hypertension Normal Total B. Midwife B Hypertension Normal Total Prevalence of hypertension (for midwife A) ˆ 6.5% (under diagnosis) Prevalence of hypertension (for midwife B) ˆ 10.9% (over diagnosis) Kappa coef cient (k) ˆ 0.46 (95 %CI )
4 COMMENTARIES 565 suggesting that there is systematic disagreement between the two observers. McNemar's test is available on electronic statistical packages 12. Ordinal scale Here again percentage agreement is commonly reported in the literature, but simple percentage agreement is best avoided since it does not take into account any chance-expected agreement. If the two midwives in Table 1 were asked to classify pregnant women into four ordered categories of blood pressure (i.e. normal blood pressure, mild hypertension, moderate hypertension, severe hypertension), then it is obvious that there are various levels of disagreement. The discrepancy between normotensive and severe hypertensive categories is much worse than that between normotensive and mild hypertensive categories. It is logical to allow some credit for partial agreement and simple percentage agreement fails to do this, the observer agreement appearing less favourable than it actually is. Again the kappa statistic comes to the rescue but here the statistic of choice is the weighted version of kappa 18. Weighted kappa corrects for chance agreement and it also allows credit for partial agreement. Again, electronic statistical packages are available to calculate weighted kappa, its precision (95% con dence intervals) and its statistical signi cance 12. The guidelines in Table 3 can be used to assess the agreement. As before, the quantitative signi cance of weighted kappa is far more important clinically than its statistical signi cance 15. Dimensional scale Pearson's correlation coef cient of the measurements obtained by the two observers has been popular for the assessment of the reliability of clinical tests on a continuous scale 5,14. However, Pearson's correlation coef cient measures the association between two sets of measurements, but not their agreement 8,19. Fig. 1 represents two sets of measurements obtained by two observers, A and B. Line 1 shows perfect association, the correlation coef- cient being 1.0, and also perfect agreement, the measurements obtained by Observer A, being the same as the measurements obtained by Observer B. Line 2 shows perfect association, the correlation coef cient being 1.0, but no agreement, since the measurements obtained by Observer B are always two points greater than the measurements obtained by Observer A. To measure agreement, Bland and Altman recommend the method of limits of agreement 19. This involves a scatter plot of the difference between the measurements obtained by the two observers against the mean of the measurements, for each subject in the study. The 95% limits of agreement is the 95% data interval of the differences between the measurements obtained by the two observers, and is expressed as a range which will encompass 95% of the differences. An example of limits of agreement is the comparison between two-and three-dimensional measurements of the volume of a balloon using ultrasound 20. These measurements were performed, independently, by two observers on 30 balloons of different sizes. Fig. 2 shows the 95% limits of agreement of measurement of the balloon Fig. 1. Plot of measurements obtained by observer A versus that obtained by observer B. Line 1 represents perfect correlation and agreement. Line 2 represents perfect correlation but not agreement. Fig. 2. Limits of agreement with two-dimensional ultrasound.
5 566 COMMENTARIES volume by two-dimensional ultrasound to be ml to ml. Fig. 3 shows the 95% limits of agreement of measurement of the balloon volume by three-dimensional ultrasound to be -24.5mL to 118.9mL. The range of the 95% limits of agreement of three-dimensional ultrasound (43.4mL) is less than the range of the 95% limits of agreement of two-dimensional ultrasound (62.0mL), suggesting that three-dimensional ultrasound may be the more reliable test. The interpretation of whether or not there is acceptable agreement between the two observers depends on subjective comparison of the limits of agreement with the range of the measurements normally encountered in clinical practice. As long as the range of the limits of agreement is considered not to be clinically important, then the agreement is acceptable 19. One disadvantage of the method of limits of agreement is that it measures formally the variation between the observers, but takes no formal account of the variation between the subjects in the study. Another method of measuring agreement for continuous data is the intra-class correlation coef cient, which measures formally both the variation between the observers and the variation between the subjects in the study by analysis of variance 8,21. Mathematically, the intraclass correlation coef cient is the proportion of the total variance which is due to the variation between the subjects. An intra-class correlation coef cient of 1 indicates that the total variance is due solely to the variation between the subjects, there being no contribution to the total variance from variation between the observers; while an intra-class correlation coef cient of 0 indicates that none of the total variance is due to variation between Fig. 3. Limits of agreement with three-dimensional ultrasound. subjects and all the total variance being attributed to variation between observers. Therefore, like the kappa statistic, the intra-class correlation coef cient ranges from 0 to 1 where 0 shows no agreement and 1 shows perfect agreement 22. An intra-class correlation coef cient greater than 0.75, is considered to be good agreement 15. An approximate 95% con dence interval for the intraclass correlation coef cient can also be estimated 9. There is considerable debate about the advantages and disadvantages of limits of agreement and the intra-class correlation coef cient 8. Until there is some consensus, we would encourage the use of both the limits of agreement and the intra-class correlation coef cient as measures of reliability of tests using continuous data, and discourage the use of Pearson's correlation coef cient. Khalid S. Khan, Patrick F.W. Chien * Departments of Obstetrics and Gynaecology, Birmingham Women's Hospital, UK and Departments of Obstetrics and Gynaecology, Ninewells Hospital, Dundee, UK References 1. Koran LM. The reliability of clinical methods, data and judgements [two parts]. N Engl J Med 1975;293:642± Department of Clinical Epidemiology and Biostatistics. Clinical disagreement: I. How often it occurs and why. Can Med Assoc J 1980;123:499± Khan KS, Khan SF, Nwosu CR, Chien PFW. Misleading authors' inferences in obstetric diagnostic test literature. Am J Obstet Gynecol 1999;181:112± Nwosu CR, Khan KS, Chien PFW, Honest MR. Is real-time ultrasonic bladder volume estimation reliable and valid? A systematic overview. Scand J Urol Nephrol 1998;32:325± Khan KS, Chien PFW, Honest MR. Evaluating the measurement variability of clinical investigations: The case of ultrasonic estimation of urinary bladder volume. Br J Obstet Gynaecol 1997;104:1036± Chien PFW, Khan KS, Ogston S, Owen P. The diagnostic accuracy of cervico-vaginal fetal bronectin in predicting preterm delivery: an overview. Br J Obstet Gynaecol 1997;104:436± Chien PFW, Arnott N, Gordon A, Owen P, Khan KS. How useful is uterine artery Doppler ow velocimetry in the prediction of preeclampsia, intrauterine growth retardation and perinatal death? An overview. Br J Obstet Gynaecol 2000;107:196± Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. New York: Oxford University Press, Dunn G, Everitt B. Clinical Biostatistics. An Introduction to Evidence- Based Medicine. London: Edward Arnold, Healy MJ. Measuring measuring errors. Stat Med 1989;8:893± Fleiss JL. Measuring agreement between two judges on the presence or absence of a trait. Biometrics 1975;31:651± Buchan I. Arcus QuickStat (Biomedical) Version 1.2. Cambridge: Addision Wesley Longman, Landis RJ, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159± Brenan P, Silman A. Statistical methods for assessing observer variability in clinical measures. BMJ 1992;304:1491± Kramer MS, Feinstein AR. The biostatistics of concordance. Clin Pharmacol Ther 1981;29:111±123.
6 COMMENTARIES Thompson WG, Walter DW. A reappraisal of the kappa coef cient. J Clin Epidemiol 1988;41:949± Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problem of two paradoxes. J Clin Epidemiol 1990;43:543± Cohen J. Weighted Kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psycol Bull 1968;70:213± Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;i: 307± Farrell T, Leslie JR, Chien PFW, Agustsson P. The reliability and validity of three dimensional ultrasound volumetric measurements using an in vitro balloon and in vivo uterine model. Br J Obstet Gynaecol 2001;108:??. 21. Bartko JJ. The intraclass correlation coef cient as a measure of reliability. Psychol Rep 1966;19:3± Fleiss JL, Cohen J. The equivalence of weighted kappa and intraclass correlation coef cient as measures of reliability. Educ Psychol Meas 1973;2:113±117.
COMPUTING READER AGREEMENT FOR THE GRE
RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing
More informationFigure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)
Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive
More informationUnequal Numbers of Judges per Subject
The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a
More informationBladder neck mobility in continent nulliparous women
British Journal of Obstetrics and Gynaecology March 2001, Vol. 108, pp. 320±324 Bladder neck mobility in continent nulliparous women Ursula M. Peschers a, *, Gabi Fanger b, Gabriel N. Schaer c, David B.
More informationreproducibility of the interpretation of hysterosalpingography pathology
Human Reproduction vol.11 no.6 pp. 124-128, 1996 Reproducibility of the interpretation of hysterosalpingography in the diagnosis of tubal pathology Ben WJ.Mol 1 ' 2 ' 3, Patricia Swart 2, Patrick M-M-Bossuyt
More informationComparison of the Null Distributions of
Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs
More informationCOMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study
DATAWorks 2018 - March 21, 2018 Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study Christopher Drake Lead Statistician, Small Caliber Munitions QE&SA Statistical
More informationA review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *
A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many
More informationAn update on the analysis of agreement for orthodontic indices
European Journal of Orthodontics 27 (2005) 286 291 doi:10.1093/ejo/cjh078 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics Society. All rights reserved. For
More informationalternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over
More informationEditorial. An audit of the editorial process and peer review in the journal Clinical Rehabilitation. Introduction
Clinical Rehabilitation 2004; 18: 117 124 Editorial An audit of the editorial process and peer review in the journal Clinical Rehabilitation Objective: To investigate the editorial process on papers submitted
More information02a: Test-Retest and Parallel Forms Reliability
1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures
More information2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen.
Adapted from Fertil Steril 2007;87:373-80 Intraobserver and interobserver reliability of videotaped laparoscopy evaluations for endometriosis and adhesions 2 Philomeen Weijenborg, Moniek ter Kuile and
More informationEnglish 10 Writing Assessment Results and Analysis
Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment
More informationRepeatability of a questionnaire to assess respiratory
Journal of Epidemiology and Community Health, 1988, 42, 54-59 Repeatability of a questionnaire to assess respiratory symptoms in smokers CELIA H WITHEY,' CHARLES E PRICE,' ANTHONY V SWAN,' ANNA 0 PAPACOSTA,'
More informationTeaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers
Journal of Mathematics and System Science (01) 8-1 D DAVID PUBLISHING Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Elisabeth Svensson Department of Statistics, Örebro
More informationAgreement Coefficients and Statistical Inference
CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the
More informationA novel approach to assess diagnostic test and observer agreement for ordinal data. Zheng Zhang Emory University Atlanta, GA 30322
A novel approach to assess diagnostic test and observer agreement for ordinal data Zheng Zhang Emory University Atlanta, GA 30322 Corresponding address: Zheng Zhang, Ph. D. Department of Biostatistics
More informationRelationship Between Intraclass Correlation and Percent Rater Agreement
Relationship Between Intraclass Correlation and Percent Rater Agreement When raters are involved in scoring procedures, inter-rater reliability (IRR) measures are used to establish the reliability of measures.
More informationLessons in biostatistics
Lessons in biostatistics : the kappa statistic Mary L. McHugh Department of Nursing, National University, Aero Court, San Diego, California Corresponding author: mchugh8688@gmail.com Abstract The kappa
More informationStatistical techniques to evaluate the agreement degree of medicine measurements
Statistical techniques to evaluate the agreement degree of medicine measurements Luís M. Grilo 1, Helena L. Grilo 2, António de Oliveira 3 1 lgrilo@ipt.pt, Mathematics Department, Polytechnic Institute
More information(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d
Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson
More informationA study of adverse reaction algorithms in a drug surveillance program
A study of adverse reaction algorithms in a drug surveillance program To improve agreement among observers, several investigators have recently proposed methods (algorithms) to standardize assessments
More informationIntroduction On Assessing Agreement With Continuous Measurement
Introduction On Assessing Agreement With Continuous Measurement Huiman X. Barnhart, Michael Haber, Lawrence I. Lin 1 Introduction In social, behavioral, physical, biological and medical sciences, reliable
More informationDATA is derived either through. Self-Report Observation Measurement
Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or
More informationMaltreatment Reliability Statistics last updated 11/22/05
Maltreatment Reliability Statistics last updated 11/22/05 Historical Information In July 2004, the Coordinating Center (CORE) / Collaborating Studies Coordinating Center (CSCC) identified a protocol to
More informationREPRODUCTIVE ENDOCRINOLOGY
FERTILITY AND STERILITY VOL. 74, NO. 2, AUGUST 2000 Copyright 2000 American Society for Reproductive Medicine Published by Elsevier Science Inc. Printed on acid-free paper in U.S.A. REPRODUCTIVE ENDOCRINOLOGY
More informationEPIDEMIOLOGY. Training module
1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease
More informationCochrane Pregnancy and Childbirth Group Methodological Guidelines
Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews
More informationStatistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD
64 ACADEMC EMERGENCY hledlclne JAN 1997 VOL?/NO 1 SPECAL CONTRBUTONS... Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD For any testing instrument
More informationThe recommended method for diagnosing sleep
reviews Measuring Agreement Between Diagnostic Devices* W. Ward Flemons, MD; and Michael R. Littner, MD, FCCP There is growing interest in using portable monitoring for investigating patients with suspected
More informationThe development of a questionnaire to measure the severity of symptoms and the quality of life before and after surgery for stress incontinence
BJOG: an International Journal of Obstetrics and Gynaecology November 2003, Vol. 110, pp. 983 988 The development of a questionnaire to measure the severity of symptoms and the quality of life before and
More information7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course
Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course David W. Dowdy, MD, PhD Department of Epidemiology Johns Hopkins Bloomberg School of Public Health
More informationWeek 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including
Week 17 and 21 Comparing two assays and Measurement of Uncertainty 2.4.1.4. Explain tools used to compare the performance of two assays, including 2.4.1.4.1. Linear regression 2.4.1.4.2. Bland-Altman plots
More informationVisual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time
Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time Teri Ang a, Elaine F Harkness b,c, Anthony J Maxwell b,c,d, Yit Y Lim b,c, Richard
More informationAssessment of Cardiovascular Autonomic Functions to Predict Development of Pregnancy Induced Hypertension
NJOG 2011 May-June; 6 (1): 41-45 Assessment of Cardiovascular Autonomic Functions to Predict Development of Pregnancy Induced Hypertension Nandini Kapoor 1, Rajeev Sharma 1, Munish Ashat 1, Anju Huria
More informationWhat is indirect comparison?
...? series New title Statistics Supported by sanofi-aventis What is indirect comparison? Fujian Song BMed MMed PhD Reader in Research Synthesis, Faculty of Health, University of East Anglia Indirect comparison
More informationRevised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials
Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Edited by Julian PT Higgins on behalf of the RoB 2.0 working group on cross-over trials
More informationStatistical probability was first discussed in the
COMMON STATISTICAL ERRORS EVEN YOU CAN FIND* PART 1: ERRORS IN DESCRIPTIVE STATISTICS AND IN INTERPRETING PROBABILITY VALUES Tom Lang, MA Tom Lang Communications Critical reviewers of the biomedical literature
More informationObserved Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification?
Clinical Chemistry 53:10 1725 1729 (2007) Overview Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification? Corné Biesheuvel,
More informationADMS Sampling Technique and Survey Studies
Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As
More informationEvaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability
Original article 1049 Evaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability Authors Institution submitted 29. January 2014
More informationClinical Chemistry / INTENSIVE INSULIN THERAPY AND GLUCOSE VALUES
Clinical Chemistry / INTENSIVE INSULIN THERAPY AND GLUCOSE VALUES Accuracy of Roche Accu-Chek Inform Whole Blood Capillary, Arterial, and Venous Glucose Values in Patients Receiving Intensive Intravenous
More informationWeek 2 Video 2. Diagnostic Metrics, Part 1
Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today we ll focus on metrics for classifiers Later this week we ll discuss metrics for regressors And metrics for other methods
More informationScaling the quality of clinical audit projects: a pilot study
International Journal for Quality in Health Care 1999; Volume 11, Number 3: pp. 241 249 Scaling the quality of clinical audit projects: a pilot study ANDREW D. MILLARD Scottish Clinical Audit Resource
More informationObserver variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures
Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures Collin, David; Dunker, Dennis; Gothlin, Jan H.; Geijer, Mats Published in: Acta Radiologica
More informationChapter 5: Field experimental designs in agriculture
Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction
More informationOn the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA
On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction
More informationNIH Public Access Author Manuscript Tutor Quant Methods Psychol. Author manuscript; available in PMC 2012 July 23.
NIH Public Access Author Manuscript Published in final edited form as: Tutor Quant Methods Psychol. 2012 ; 8(1): 23 34. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial
More informationScientific Research. The Scientific Method. Scientific Explanation
Scientific Research The Scientific Method Make systematic observations. Develop a testable explanation. Submit the explanation to empirical test. If explanation fails the test, then Revise the explanation
More informationGlobal Clinical Trials Innovation Summit Berlin October 2016
Global Clinical Trials Innovation Summit Berlin 20-21 October 2016 BIOSTATISTICS A FEW ESSENTIALS: USE AND APPLICATION IN CLINICAL RESEARCH Berlin, 20 October 2016 Dr. Aamir Shaikh Founder, Assansa Here
More informationComparing Vertical and Horizontal Scoring of Open-Ended Questionnaires
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to
More informationREVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine
A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication
More informationFurther data analysis topics
Further data analysis topics Jonathan Cook Centre for Statistics in Medicine, NDORMS, University of Oxford EQUATOR OUCAGS training course 24th October 2015 Outline Ideal study Further topics Multiplicity
More informationMain article An introduction to medical statistics for health care professionals: Describing and presenting data
218 Musculoskeletal Care Volume 2 Number 4 Whurr Publishers 2004 Main article An introduction to medical statistics for health care professionals: Describing and presenting data Elaine Thomas PhD MSc BSc
More informationSystematic Reviews. Simon Gates 8 March 2007
Systematic Reviews Simon Gates 8 March 2007 Contents Reviewing of research Why we need reviews Traditional narrative reviews Systematic reviews Components of systematic reviews Conclusions Key reference
More informationHow to Conduct a Meta-Analysis
How to Conduct a Meta-Analysis Faculty Development and Diversity Seminar ludovic@bu.edu Dec 11th, 2017 Periodontal disease treatment and preterm birth We conducted a metaanalysis of randomized controlled
More informationControl of Confounding in the Assessment of Medical Technology
International Journal of Epidemiology Oxford University Press 1980 Vol.9, No. 4 Printed in Great Britain Control of Confounding in the Assessment of Medical Technology SANDER GREENLAND* and RAYMOND NEUTRA*
More informationProper analysis in clinical trials: how to report and adjust for missing outcome data
DOI: 10.1111/1471-0528.12219 www.bjog.org Commentary Proper analysis in clinical trials: how to report and adjust for missing outcome data M Joshi, a, * A Royuela, b,c J Zamora b,c a Centre for Primary
More informationAssessing Agreement Between Methods Of Clinical Measurement
University of York Department of Health Sciences Measuring Health and Disease Assessing Agreement Between Methods Of Clinical Measurement Based on Bland JM, Altman DG. (1986). Statistical methods for assessing
More informationObservational Studies Week #2. Dr. Michelle Edwards October 24, 2018
Observational Studies Week #2 Dr. Michelle Edwards October 24, 2018 Paper review Dimensions? Levels of behaviour studied? Measures they chose? How they recorded their measures? Medium used? 3 Dimensions
More informationResearch Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners
Evidence-Based Complementary and Alternative Medicine Volume 202, Article ID 7808, 5 pages doi:055/202/7808 Research Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners
More informationAn introduction to power and sample size estimation
453 STATISTICS An introduction to power and sample size estimation S R Jones, S Carley, M Harrison... Emerg Med J 2003;20:453 458 The importance of power and sample size estimation for study design and
More informationDimensionality, internal consistency and interrater reliability of clinical performance ratings
Medical Education 1987, 21, 130-137 Dimensionality, internal consistency and interrater reliability of clinical performance ratings B. R. MAXIMt & T. E. DIELMANS tdepartment of Mathematics and Statistics,
More informationEstablishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES
Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES Mark C. Hogrebe Washington University in St. Louis MACTE Fall 2018 Conference October 23, 2018 Camden
More informationA profiling system for the assessment of individual needs for rehabilitation with hearing aids
A profiling system for the assessment of individual needs for rehabilitation with hearing aids WOUTER DRESCHLER * AND INGE BRONS Academic Medical Center, Department of Clinical & Experimental Audiology,
More informationThe reliability and validity of the Index of Complexity, Outcome and Need for determining treatment need in Dutch orthodontic practice
European Journal of Orthodontics 28 (2006) 58 64 doi:10.1093/ejo/cji085 Advance Access publication 8 November 2005 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics
More informationReview Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball
Available online http://ccforum.com/content/6/2/143 Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball *Lecturer in Medical Statistics, University of Bristol, UK Lecturer
More informationBiases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University
Biases in clinical research Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Learning objectives Describe the threats to causal inferences in clinical studies Understand the role of
More informationPrevalence of thyroid disorder in pregnancy and pregnancy outcome
Original Research Article Prevalence of thyroid disorder in pregnancy and pregnancy outcome Praveena K.R. 1, Pramod Kumar K.R. 2*, Prasuna K.R. 3, Krishna Kumar TV 4 1 Assistant Professor, Department of
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat
More informationA scoring system for the assessment of bowel and lower urinary tract symptoms in women
BJOG: an International Journal of Obstetrics and Gynaecology April 2002, Vol. 109, pp. 424 430 A scoring system for the assessment of bowel and lower urinary tract symptoms in women L. Hiller a, H.D. Bradshaw
More information4 Diagnostic Tests and Measures of Agreement
4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure
More informationW e have previously described the disease impact
606 THEORY AND METHODS Impact numbers: measures of risk factor impact on the whole population from case-control and cohort studies R F Heller, A J Dobson, J Attia, J Page... See end of article for authors
More informationhow good is the Instrument? Dr Dean McKenzie
how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize
More informationA practical tool for locomotion scoring in sheep: Reliability when used by veterinary surgeons and sheep farmers
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/272945558 A practical tool for locomotion scoring in sheep: Reliability when used by veterinary
More informationRunning head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models
Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding
More informationPoint-of-service questionnaires can reliably assess patients experiences
European Journal for Person Centered Healthcare Vol 2 Issue pp 85-91 ARTICLE Point-of-service questionnaires can reliably assess patients experiences Stephen D. Gill PhD Head of Safety and Quality Improvement,
More informationSRDC Technical Paper Series How Random Must Random Assignment Be in Random Assignment Experiments?
SRDC Technical Paper Series 03-01 How Random Must Random Assignment Be in Random Assignment Experiments? Paul Gustafson Department of Statistics University of British Columbia February 2003 SOCIAL RESEARCH
More informationEvidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co.
Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co. Meta-Analysis Defined A meta-analysis is: the statistical combination of two or more separate studies In other words: overview,
More informationDoctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving
Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, 2018 The Scientific Method of Problem Solving The conceptual phase Reviewing the literature, stating the problem,
More informationReproducibility of childhood respiratory symptom questions
Eur Respir J 1992. 5, 90-95 Reproducibility of childhood respiratory symptom questions B. Brunekreef*, B. Groat**, B. Rijcken***, G. Hoek*, A. Steenbekkers*, A. de Boer* Reproducibility of childhood respiratory
More information10 Intraclass Correlations under the Mixed Factorial Design
CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a
More informationSurvey on repeat prescribing for acid suppression drugs in primary care in Cornwall and the Isles of Scilly
Aliment Pharmacol Ther 1999; 13: 813±817. Survey on repeat prescribing for acid suppression drugs in primary care in Cornwall and the Isles of Scilly R. BOUTET, M. WILCOCK & I. MACKENZIE 1 Department of
More informationInterpreting Kappa in Observational Research: Baserate Matters
Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Sonoma State University Paul Yoder Vanderbilt University Abstract Kappa (Cohen, 1960) is a popular agreement statistic
More informationUniversity of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record
Al-Janabi, H., Flynn, T. N., Peters, T. J., Bryan, S., & Coast, J. (2015). Testretest reliability of capability measurement in the UK general population. Health Economics, 24(5), 625-30..3100 Publisher's
More informationCRITICAL EVALUATION OF BIOMEDICAL LITERATURE
Chapter 9 CRITICAL EVALUATION OF BIOMEDICAL LITERATURE M.G.Rajanandh, Department of Pharmacy Practice, SRM College of Pharmacy, SRM University. INTRODUCTION Reviewing the Biomedical Literature poses a
More informationPain Assessment in Elderly Patients with Severe Dementia
48 Journal of Pain and Symptom Management Vol. 25 No. 1 January 2003 Original Article Pain Assessment in Elderly Patients with Severe Dementia Paolo L. Manfredi, MD, Brenda Breuer, MPH, PhD, Diane E. Meier,
More informationThis is a repository copy of Testing for asymptomatic bacteriuria in pregnancy..
This is a repository copy of Testing for asymptomatic bacteriuria in pregnancy.. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/15569/ Version: Accepted Version Article:
More informationChapter 9. Ellen Hiemstra Navid Hossein pour Khaledian J. Baptist M.Z. Trimbos Frank Willem Jansen. Submitted
Chapter Implementation of OSATS in the Residency Program: a benchmark study Ellen Hiemstra Navid Hossein pour Khaledian J. Baptist M.Z. Trimbos Frank Willem Jansen Submitted Introduction The exposure to
More informationIntroduction & Basics
CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...
More informationEvaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen
Evaluating Quality in Creative Systems Graeme Ritchie University of Aberdeen Graeme Ritchie {2007} Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17 {1}, pp.67-99.
More informationJournal of Biostatistics and Epidemiology
Journal of Biostatistics and Epidemiology Original Article Usage of statistical methods and study designs in publication of specialty of general medicine and its secular changes Swati Patel 1*, Vipin Naik
More informationSanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.
Research Methodology Sample size and power analysis in medical research Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra,
More informationTypes of data and how they can be analysed
1. Types of data British Standards Institution Study Day Types of data and how they can be analysed Martin Bland Prof. of Health Statistics University of York http://martinbland.co.uk In this lecture we
More informationAAPOR Exploring the Reliability of Behavior Coding Data
Exploring the Reliability of Behavior Coding Data Nathan Jurgenson 1 and Jennifer Hunter Childs 1 Center for Survey Measurement, U.S. Census Bureau, 4600 Silver Hill Rd. Washington, DC 20233 1 Abstract
More informationExperimentalPhysiology
Exp Physiol 97.5 (2012) pp 557 561 557 Editorial ExperimentalPhysiology Categorized or continuous? Strength of an association and linear regression Gordon B. Drummond 1 and Sarah L. Vowler 2 1 Department
More informationTHE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.
Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES
More information