Evaluation of a clinical test. I: Assessment of reliability

Size: px
Start display at page:

Download "Evaluation of a clinical test. I: Assessment of reliability"


1 British Journal of Obstetrics and Gynaecology June 2001, Vol. 108, pp. 562±567 COMMENTARY Evaluation of a clinical test. I: Assessment of reliability Introduction Testing and screening are critical parts of the clinical process, since inappropriate testing strategies put patients at risk and entail a serious waste of resources 1,2. Based on our recent experiences of evaluating diagnostic literature 3±7, we have come to believe that there is much misunderstanding about the evaluation of clinical tests. Some tests, introduced into practice without proper evaluation, are so inef cient as to be almost useless. In our view, the absence of clear methodological guidelines about the evaluation of clinical tests is a major impediment. Just as robust research methods in assessing the effectiveness of treatments have been actively pursued over the last decade, so attention needs to be focused on how research on diagnostic tests and their impact on clinical practice might be improved. Our commentary is prompted by the concern that there is a huge disparity between the number of clinical tests and the availability of robust research evidence to help make decisions about their most appropriate clinical application. We must rst ask why inef ciency in clinical testing leads to mismanagement of patients. The answer is quite simple. By missing a diagnosis, early therapy cannot be undertaken, thereby prolonging morbidity. On the other hand, by making a diagnosis in the absence of disease unnecessary therapy may be undertaken with the risk of adverse effects. But how does inef ciency in clinical testing arise in the rst place? We need to understand that the results of our tests are the outcomes of the clinical measurements. It is the errors in these clinical measurements that lead to inef ciency in clinical testing. Errors in clinical measurements 8±10 are of two sorts. Firstly, measurement may be inconsistent if the same attribute recorded by another observer (or recorded a second time by the same observer) leads to a different reading. The term reliability refers to this type of measurement error. Secondly, the measurement obtained may not be accurate when compared with the `true' state of the attribute estimated by a suitable reference standard. This type of measurement error is referred to as validity. The goal of research is to determine whether a clinical test measures what is intended (validity), but rst it should be established that it measures something in a consistent fashion (reliability). Based on these two types of errors in clinical measurement, our commentary is divided into two parts. In the rst part, the focus is on appropriate strategies for q RCOG 2001 British Journal of Obstetrics and Gynaecology PII: S (00) conducting and analysing studies of the reliability of a clinical test. In the second part, strategies for conducting and analysing studies of the validity of a clinical test will be described. Design of a study of reliability Reliability studies are generally reported in the literature as observer variability studies. The study is designed to compare measurements obtained by two or more observers (inter-rater reliability) or by one observer on two or more different occasions (intra-rater reliability). Intra-rater reliability is a prerequisite for inter-rater reliability 8. We will restrict our description to inter-rater reliability. The objective of the study is to measure independently, the same clinical attribute on at least two occasions and then to discern the agreement between these measurements. In order for the reliability of a test to be replicated from a published study, researchers should provide suf cient information about the manner in which the test was conducted 9. The information should cover all important issues with regard to the conduct of the test, such as the preparation of the patients, measurements of biophysical recordings, details of laboratory assays, and computation of results. In studies of reliability, one possible source of bias is the use of measurements from a sample which is not representative of the population being studied 9. Reliability will appear to be more optimistic if the researchers have deliberately discarded dif cult or borderline cases from the study. Such omissions are more likely to occur with convenience or arbitrary methods of sampling. Selection bias is less likely with consecutive or random sampling 6. Sadly, sampling is inadequate in many studies 4,6,7. Studies of reliability also require the observers to be blinded to one another's measurements. Blind recording of measurements avoid bias, since recordings made by one observer are not in uenced by the knowledge of the measurements obtained by other observers; usually blinding is not the case in studies of reliability. In our systematic review of the reliability of bladder volume by ultrasound, we found that blinding of the ultrasonic bladder volume measurements was adequate in only 40% of the studies 5. Unless the sample is representative of the population being studied and unless the recordings of the observers are not made available to one another, we can

2 COMMENTARIES 563 Table 1. Some types of measurement scales and examples of different reliability studies. Scale Description Example Nominal Non-ranked categories Presence or absence of hypertension (based on particular cut-off level of blood pressure) Ordinal Three or more ranked categories None, mild, moderate, or severe hypertension (based on categories of a range of normal, low high, medium high and very high values of blood pressure) Dimensional Continuous or decimal scale Exact blood pressure values expressed in mm Hg have no con dence in a study of reliability of a clinical test. The precise estimation of the reliability of a test requires an adequate sample size. The calculation of the sample size for studies of reliability can be quite complex. Although methods for the estimation of the appropriate sample size for studies of reliability are available, such calculations are seldom performed. Data analysis of a study of reliability Table 1 shows the different types of measurement encountered in clinical practice, with some examples: nominal (dichotomous); ordinal (ranked); and dimensional (continuous). The important point is that in studies of the reliability of a clinical test, the measurements recorded by the two observers should be expressed on the same type of scale, and with the same number of categories if the data are ordinal. It is important to remember that the purpose of a study of reliability is to determine the agreement (or concordance) of the measurements obtained by two observers, measuring the same clinical attribute independently 8. Nominal scale When dealing with dichotomous data (for example, the presence or absence of hypertension), many researchers will report the percentage agreement as the index of reliability. From the hypothetical example in Table 2, the percentage agreement between the two midwives recording whether pregnant women are hypertensive or normotensive is 91.3%, a statistic that looks impressive because of its closeness to 100% (the value depicting perfect agreement). However, this statistic does not take into account the agreement that was expected to occur by chance alone. In the bottom row of Table 2, we have calculated the chance-expected percentage agreement. It represents a value fairly close to the observed percentage agreement, such that one may conclude that the agreement beyond chance is not very great. The statistic of choice for estimating the agreement between observers using the same nominal or dichotomous scale of measurement is kappa 11, which corrects for the agreement expected by chance. Kappa is the observed agreement minus the agreement expected by chance, divided by perfect agreement minus the agreement Table 2. Agreement (disagreement) between two midwives recording mid trimester diastolic blood pressure in a high-risk antenatal clinic and classifying it as normal or hypertension based on a cut-off level of 90 mmhg. Midwife A Hypertension Normal Total Midwife B Hypertension 10 a 10 b 20 R1 Normal 10 c 200 d 210 R2 Total 20 C1 210 C2 230 N Prevalence of hypertension ˆ (R1 or C1)/N ˆ 20/230 ˆ 8.7% Observed percentage agreement ˆ [(a 1 d)/n] x 100 ˆ [( )/230] x 100 ˆ 91.3% Chance-expected percentage agreement Kappa coef cient (k) ˆ observed ± chance-expected percentage agreement perfect±chance-expected percentage agreement ˆ (91.3±84.1)/(100±84.1) ˆ 0.45 (95% CI 0.32±0.58) ˆ {[(R1 x C1)/N] 1 [(R2 x C2)/N]}x100/N ˆ [(R1 x C1) 1 (R2 x C2)] x 100/N 2 ˆ [(20 x 20) 1 (210 x 210)] x 100/230 2 ˆ 84.1%

3 564 COMMENTARIES Table 3. Guidelines for interpretation of kappa statistic 13. Kappa value Strength of agreement 0 Poor 0±0.20 Slight 0.21±0.40 Fair 0.41±0.60 Moderate 0.61±0.80 Substantial 0.81±1.0 Excellent expected by chance 9 : Po 2 Pe Kappa ˆ 1 2 Pe Where P o is the observed agreement and P e is the agreement expected by chance. Kappa therefore gives more information than simple percentage agreement. Its values range from 0 to 1, with 0 representing no agreement beyond chance and 1 representing perfect agreement. The standard error of kappa allows us to estimate its statistical signi cance and also its 95% con dence interval. These computations can be performed using a computer 12. The magnitude of kappa is a far more important measure of agreement than its statistical signi cance. The guidelines 13 for interpretation of the values of kappa are given in Table 3. Using the example of the agreement between the blood pressure recordings of two midwives in Table 2, the kappa value obtained is 0.45 (95% CI 0.32±0.58), indicating moderate agreement. We should note that the interpretation of kappa is subjective and the values of kappa in Table 3 are considered to be optimistic by some investigators 14,15. The value of kappa depends on the prevalence of the disorder being studied. Suppose the study in Table 2 was repeated in a different population, where the prevalence of hypertension was 40% (Table 4). The observer agreement was found to be lower (kappa ˆ 0.17) (Table 4a). This is because a high prevalence of hypertension results in a high level of chance-expected agreement and hence a lower kappa value; conversely a condition with a low prevalence will tend to give higher values of kappa 16. Therefore, kappa values generated from studies on disparate populations are not easily comparable 14. When there is a systematic difference between the two midwives in recording the presence or absence of hypertension, higher than expected values of the kappa statistic can also be obtained 14. In the above two examples, the prevalences of hypertension diagnosed by both midwives were identical (9% and 40% respectively). Let us assume that the prevalence of hypertension diagnosed by Midwife A was 7% and the prevalence of hypertension diagnosed by Midwife B was 11% (Table 4b). The kappa value is 0.46 which is almost identical to that obtained in the rst study and better than that obtained with the second study, despite the systematic disagreement in the diagnosis of hypertension between the two midwives. The examples in Table 4 illustrate the paradoxes of kappa 17 : in Table 4a, the agreement is moderate, despite the lower value of kappa; and in Table 4b the agreement is poor, despite the higher value of kappa. McNemar's test will estimate the probability that the difference in the number of disagreements between the two observers could have occurred by chance 14. For the information in Table 4b, P ˆ 0.04, Table 4. Effect of different sample recruitment strategies and systematic difference in diagnosis of hypertension on the kappa (k) statistic for antenatal blood pressure measurement by two midwives. A. Study 2 Midwife A Hypertension Normal Total Midwife B Hypertension Normal Total Prevalence of hypertension ˆ 40.0% Kappa coef cient (k) ˆ 0.17 (95% CI ) Study 3 Midwife A Hypertension Normal Total B. Midwife B Hypertension Normal Total Prevalence of hypertension (for midwife A) ˆ 6.5% (under diagnosis) Prevalence of hypertension (for midwife B) ˆ 10.9% (over diagnosis) Kappa coef cient (k) ˆ 0.46 (95 %CI )

4 COMMENTARIES 565 suggesting that there is systematic disagreement between the two observers. McNemar's test is available on electronic statistical packages 12. Ordinal scale Here again percentage agreement is commonly reported in the literature, but simple percentage agreement is best avoided since it does not take into account any chance-expected agreement. If the two midwives in Table 1 were asked to classify pregnant women into four ordered categories of blood pressure (i.e. normal blood pressure, mild hypertension, moderate hypertension, severe hypertension), then it is obvious that there are various levels of disagreement. The discrepancy between normotensive and severe hypertensive categories is much worse than that between normotensive and mild hypertensive categories. It is logical to allow some credit for partial agreement and simple percentage agreement fails to do this, the observer agreement appearing less favourable than it actually is. Again the kappa statistic comes to the rescue but here the statistic of choice is the weighted version of kappa 18. Weighted kappa corrects for chance agreement and it also allows credit for partial agreement. Again, electronic statistical packages are available to calculate weighted kappa, its precision (95% con dence intervals) and its statistical signi cance 12. The guidelines in Table 3 can be used to assess the agreement. As before, the quantitative signi cance of weighted kappa is far more important clinically than its statistical signi cance 15. Dimensional scale Pearson's correlation coef cient of the measurements obtained by the two observers has been popular for the assessment of the reliability of clinical tests on a continuous scale 5,14. However, Pearson's correlation coef cient measures the association between two sets of measurements, but not their agreement 8,19. Fig. 1 represents two sets of measurements obtained by two observers, A and B. Line 1 shows perfect association, the correlation coef- cient being 1.0, and also perfect agreement, the measurements obtained by Observer A, being the same as the measurements obtained by Observer B. Line 2 shows perfect association, the correlation coef cient being 1.0, but no agreement, since the measurements obtained by Observer B are always two points greater than the measurements obtained by Observer A. To measure agreement, Bland and Altman recommend the method of limits of agreement 19. This involves a scatter plot of the difference between the measurements obtained by the two observers against the mean of the measurements, for each subject in the study. The 95% limits of agreement is the 95% data interval of the differences between the measurements obtained by the two observers, and is expressed as a range which will encompass 95% of the differences. An example of limits of agreement is the comparison between two-and three-dimensional measurements of the volume of a balloon using ultrasound 20. These measurements were performed, independently, by two observers on 30 balloons of different sizes. Fig. 2 shows the 95% limits of agreement of measurement of the balloon Fig. 1. Plot of measurements obtained by observer A versus that obtained by observer B. Line 1 represents perfect correlation and agreement. Line 2 represents perfect correlation but not agreement. Fig. 2. Limits of agreement with two-dimensional ultrasound.

5 566 COMMENTARIES volume by two-dimensional ultrasound to be ml to ml. Fig. 3 shows the 95% limits of agreement of measurement of the balloon volume by three-dimensional ultrasound to be -24.5mL to 118.9mL. The range of the 95% limits of agreement of three-dimensional ultrasound (43.4mL) is less than the range of the 95% limits of agreement of two-dimensional ultrasound (62.0mL), suggesting that three-dimensional ultrasound may be the more reliable test. The interpretation of whether or not there is acceptable agreement between the two observers depends on subjective comparison of the limits of agreement with the range of the measurements normally encountered in clinical practice. As long as the range of the limits of agreement is considered not to be clinically important, then the agreement is acceptable 19. One disadvantage of the method of limits of agreement is that it measures formally the variation between the observers, but takes no formal account of the variation between the subjects in the study. Another method of measuring agreement for continuous data is the intra-class correlation coef cient, which measures formally both the variation between the observers and the variation between the subjects in the study by analysis of variance 8,21. Mathematically, the intraclass correlation coef cient is the proportion of the total variance which is due to the variation between the subjects. An intra-class correlation coef cient of 1 indicates that the total variance is due solely to the variation between the subjects, there being no contribution to the total variance from variation between the observers; while an intra-class correlation coef cient of 0 indicates that none of the total variance is due to variation between Fig. 3. Limits of agreement with three-dimensional ultrasound. subjects and all the total variance being attributed to variation between observers. Therefore, like the kappa statistic, the intra-class correlation coef cient ranges from 0 to 1 where 0 shows no agreement and 1 shows perfect agreement 22. An intra-class correlation coef cient greater than 0.75, is considered to be good agreement 15. An approximate 95% con dence interval for the intraclass correlation coef cient can also be estimated 9. There is considerable debate about the advantages and disadvantages of limits of agreement and the intra-class correlation coef cient 8. Until there is some consensus, we would encourage the use of both the limits of agreement and the intra-class correlation coef cient as measures of reliability of tests using continuous data, and discourage the use of Pearson's correlation coef cient. Khalid S. Khan, Patrick F.W. Chien * Departments of Obstetrics and Gynaecology, Birmingham Women's Hospital, UK and Departments of Obstetrics and Gynaecology, Ninewells Hospital, Dundee, UK References 1. Koran LM. The reliability of clinical methods, data and judgements [two parts]. N Engl J Med 1975;293:642± Department of Clinical Epidemiology and Biostatistics. Clinical disagreement: I. How often it occurs and why. Can Med Assoc J 1980;123:499± Khan KS, Khan SF, Nwosu CR, Chien PFW. Misleading authors' inferences in obstetric diagnostic test literature. Am J Obstet Gynecol 1999;181:112± Nwosu CR, Khan KS, Chien PFW, Honest MR. Is real-time ultrasonic bladder volume estimation reliable and valid? A systematic overview. Scand J Urol Nephrol 1998;32:325± Khan KS, Chien PFW, Honest MR. Evaluating the measurement variability of clinical investigations: The case of ultrasonic estimation of urinary bladder volume. Br J Obstet Gynaecol 1997;104:1036± Chien PFW, Khan KS, Ogston S, Owen P. The diagnostic accuracy of cervico-vaginal fetal bronectin in predicting preterm delivery: an overview. Br J Obstet Gynaecol 1997;104:436± Chien PFW, Arnott N, Gordon A, Owen P, Khan KS. How useful is uterine artery Doppler ow velocimetry in the prediction of preeclampsia, intrauterine growth retardation and perinatal death? An overview. Br J Obstet Gynaecol 2000;107:196± Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. New York: Oxford University Press, Dunn G, Everitt B. Clinical Biostatistics. An Introduction to Evidence- Based Medicine. London: Edward Arnold, Healy MJ. Measuring measuring errors. Stat Med 1989;8:893± Fleiss JL. Measuring agreement between two judges on the presence or absence of a trait. Biometrics 1975;31:651± Buchan I. Arcus QuickStat (Biomedical) Version 1.2. Cambridge: Addision Wesley Longman, Landis RJ, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159± Brenan P, Silman A. Statistical methods for assessing observer variability in clinical measures. BMJ 1992;304:1491± Kramer MS, Feinstein AR. The biostatistics of concordance. Clin Pharmacol Ther 1981;29:111±123.

6 COMMENTARIES Thompson WG, Walter DW. A reappraisal of the kappa coef cient. J Clin Epidemiol 1988;41:949± Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problem of two paradoxes. J Clin Epidemiol 1990;43:543± Cohen J. Weighted Kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psycol Bull 1968;70:213± Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;i: 307± Farrell T, Leslie JR, Chien PFW, Agustsson P. The reliability and validity of three dimensional ultrasound volumetric measurements using an in vitro balloon and in vivo uterine model. Br J Obstet Gynaecol 2001;108:??. 21. Bartko JJ. The intraclass correlation coef cient as a measure of reliability. Psychol Rep 1966;19:3± Fleiss JL, Cohen J. The equivalence of weighted kappa and intraclass correlation coef cient as measures of reliability. Educ Psychol Meas 1973;2:113±117.


COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b) Page 1 of 1 Diagnostic test investigated indicates the patient has the Diagnostic test investigated indicates the patient does not have the Gold/reference standard indicates the patient has the True positive

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

Bladder neck mobility in continent nulliparous women

Bladder neck mobility in continent nulliparous women British Journal of Obstetrics and Gynaecology March 2001, Vol. 108, pp. 320±324 Bladder neck mobility in continent nulliparous women Ursula M. Peschers a, *, Gabi Fanger b, Gabriel N. Schaer c, David B.

More information

reproducibility of the interpretation of hysterosalpingography pathology

reproducibility of the interpretation of hysterosalpingography pathology Human Reproduction vol.11 no.6 pp. 124-128, 1996 Reproducibility of the interpretation of hysterosalpingography in the diagnosis of tubal pathology Ben WJ.Mol 1 ' 2 ' 3, Patricia Swart 2, Patrick M-M-Bossuyt

More information

Comparison of the Null Distributions of

Comparison of the Null Distributions of Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs

More information

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study DATAWorks 2018 - March 21, 2018 Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study Christopher Drake Lead Statistician, Small Caliber Munitions QE&SA Statistical

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

An update on the analysis of agreement for orthodontic indices

An update on the analysis of agreement for orthodontic indices European Journal of Orthodontics 27 (2005) 286 291 doi:10.1093/ejo/cjh078 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics Society. All rights reserved. For

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

Editorial. An audit of the editorial process and peer review in the journal Clinical Rehabilitation. Introduction

Editorial. An audit of the editorial process and peer review in the journal Clinical Rehabilitation. Introduction Clinical Rehabilitation 2004; 18: 117 124 Editorial An audit of the editorial process and peer review in the journal Clinical Rehabilitation Objective: To investigate the editorial process on papers submitted

More information

02a: Test-Retest and Parallel Forms Reliability

02a: Test-Retest and Parallel Forms Reliability 1 02a: Test-Retest and Parallel Forms Reliability Quantitative Variables 1. Classic Test Theory (CTT) 2. Correlation for Test-retest (or Parallel Forms): Stability and Equivalence for Quantitative Measures

More information

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen.

2 Philomeen Weijenborg, Moniek ter Kuile and Frank Willem Jansen. Adapted from Fertil Steril 2007;87:373-80 Intraobserver and interobserver reliability of videotaped laparoscopy evaluations for endometriosis and adhesions 2 Philomeen Weijenborg, Moniek ter Kuile and

More information

English 10 Writing Assessment Results and Analysis

English 10 Writing Assessment Results and Analysis Academic Assessment English 10 Writing Assessment Results and Analysis OVERVIEW This study is part of a multi-year effort undertaken by the Department of English to develop sustainable outcomes assessment

More information

Repeatability of a questionnaire to assess respiratory

Repeatability of a questionnaire to assess respiratory Journal of Epidemiology and Community Health, 1988, 42, 54-59 Repeatability of a questionnaire to assess respiratory symptoms in smokers CELIA H WITHEY,' CHARLES E PRICE,' ANTHONY V SWAN,' ANNA 0 PAPACOSTA,'

More information

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers

Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Journal of Mathematics and System Science (01) 8-1 D DAVID PUBLISHING Teaching A Way of Implementing Statistical Methods for Ordinal Data to Researchers Elisabeth Svensson Department of Statistics, Örebro

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

A novel approach to assess diagnostic test and observer agreement for ordinal data. Zheng Zhang Emory University Atlanta, GA 30322

A novel approach to assess diagnostic test and observer agreement for ordinal data. Zheng Zhang Emory University Atlanta, GA 30322 A novel approach to assess diagnostic test and observer agreement for ordinal data Zheng Zhang Emory University Atlanta, GA 30322 Corresponding address: Zheng Zhang, Ph. D. Department of Biostatistics

More information

Relationship Between Intraclass Correlation and Percent Rater Agreement

Relationship Between Intraclass Correlation and Percent Rater Agreement Relationship Between Intraclass Correlation and Percent Rater Agreement When raters are involved in scoring procedures, inter-rater reliability (IRR) measures are used to establish the reliability of measures.

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics : the kappa statistic Mary L. McHugh Department of Nursing, National University, Aero Court, San Diego, California Corresponding author: mchugh8688@gmail.com Abstract The kappa

More information

Statistical techniques to evaluate the agreement degree of medicine measurements

Statistical techniques to evaluate the agreement degree of medicine measurements Statistical techniques to evaluate the agreement degree of medicine measurements Luís M. Grilo 1, Helena L. Grilo 2, António de Oliveira 3 1 lgrilo@ipt.pt, Mathematics Department, Polytechnic Institute

More information

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson

More information

A study of adverse reaction algorithms in a drug surveillance program

A study of adverse reaction algorithms in a drug surveillance program A study of adverse reaction algorithms in a drug surveillance program To improve agreement among observers, several investigators have recently proposed methods (algorithms) to standardize assessments

More information

Introduction On Assessing Agreement With Continuous Measurement

Introduction On Assessing Agreement With Continuous Measurement Introduction On Assessing Agreement With Continuous Measurement Huiman X. Barnhart, Michael Haber, Lawrence I. Lin 1 Introduction In social, behavioral, physical, biological and medical sciences, reliable

More information

DATA is derived either through. Self-Report Observation Measurement

DATA is derived either through. Self-Report Observation Measurement Data Management DATA is derived either through Self-Report Observation Measurement QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or

More information

Maltreatment Reliability Statistics last updated 11/22/05

Maltreatment Reliability Statistics last updated 11/22/05 Maltreatment Reliability Statistics last updated 11/22/05 Historical Information In July 2004, the Coordinating Center (CORE) / Collaborating Studies Coordinating Center (CSCC) identified a protocol to

More information


REPRODUCTIVE ENDOCRINOLOGY FERTILITY AND STERILITY VOL. 74, NO. 2, AUGUST 2000 Copyright 2000 American Society for Reproductive Medicine Published by Elsevier Science Inc. Printed on acid-free paper in U.S.A. REPRODUCTIVE ENDOCRINOLOGY

More information

EPIDEMIOLOGY. Training module

EPIDEMIOLOGY. Training module 1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD

Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD 64 ACADEMC EMERGENCY hledlclne JAN 1997 VOL?/NO 1 SPECAL CONTRBUTONS... Statistical Methodology: 11. Reliability and Validity Assessment in Study Design, Part A Daiid J. Karl-as, MD For any testing instrument

More information

The recommended method for diagnosing sleep

The recommended method for diagnosing sleep reviews Measuring Agreement Between Diagnostic Devices* W. Ward Flemons, MD; and Michael R. Littner, MD, FCCP There is growing interest in using portable monitoring for investigating patients with suspected

More information

The development of a questionnaire to measure the severity of symptoms and the quality of life before and after surgery for stress incontinence

The development of a questionnaire to measure the severity of symptoms and the quality of life before and after surgery for stress incontinence BJOG: an International Journal of Obstetrics and Gynaecology November 2003, Vol. 110, pp. 983 988 The development of a questionnaire to measure the severity of symptoms and the quality of life before and

More information

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course David W. Dowdy, MD, PhD Department of Epidemiology Johns Hopkins Bloomberg School of Public Health

More information

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including Linear regression Bland-Altman plots

More information

Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time

Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time Teri Ang a, Elaine F Harkness b,c, Anthony J Maxwell b,c,d, Yit Y Lim b,c, Richard

More information

Assessment of Cardiovascular Autonomic Functions to Predict Development of Pregnancy Induced Hypertension

Assessment of Cardiovascular Autonomic Functions to Predict Development of Pregnancy Induced Hypertension NJOG 2011 May-June; 6 (1): 41-45 Assessment of Cardiovascular Autonomic Functions to Predict Development of Pregnancy Induced Hypertension Nandini Kapoor 1, Rajeev Sharma 1, Munish Ashat 1, Anju Huria

More information

What is indirect comparison?

What is indirect comparison? ...? series New title Statistics Supported by sanofi-aventis What is indirect comparison? Fujian Song BMed MMed PhD Reader in Research Synthesis, Faculty of Health, University of East Anglia Indirect comparison

More information

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Edited by Julian PT Higgins on behalf of the RoB 2.0 working group on cross-over trials

More information

Statistical probability was first discussed in the

Statistical probability was first discussed in the COMMON STATISTICAL ERRORS EVEN YOU CAN FIND* PART 1: ERRORS IN DESCRIPTIVE STATISTICS AND IN INTERPRETING PROBABILITY VALUES Tom Lang, MA Tom Lang Communications Critical reviewers of the biomedical literature

More information

Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification?

Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification? Clinical Chemistry 53:10 1725 1729 (2007) Overview Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification? Corné Biesheuvel,

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

Evaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability

Evaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability Original article 1049 Evaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability Authors Institution submitted 29. January 2014

More information


Clinical Chemistry / INTENSIVE INSULIN THERAPY AND GLUCOSE VALUES Clinical Chemistry / INTENSIVE INSULIN THERAPY AND GLUCOSE VALUES Accuracy of Roche Accu-Chek Inform Whole Blood Capillary, Arterial, and Venous Glucose Values in Patients Receiving Intensive Intravenous

More information

Week 2 Video 2. Diagnostic Metrics, Part 1

Week 2 Video 2. Diagnostic Metrics, Part 1 Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today we ll focus on metrics for classifiers Later this week we ll discuss metrics for regressors And metrics for other methods

More information

Scaling the quality of clinical audit projects: a pilot study

Scaling the quality of clinical audit projects: a pilot study International Journal for Quality in Health Care 1999; Volume 11, Number 3: pp. 241 249 Scaling the quality of clinical audit projects: a pilot study ANDREW D. MILLARD Scottish Clinical Audit Resource

More information

Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures

Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures Collin, David; Dunker, Dennis; Gothlin, Jan H.; Geijer, Mats Published in: Acta Radiologica

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA MASARY K UNIVERSITY, CZECH REPUBLIC Overview Background and research aims Focus on RQ2 Introduction

More information

NIH Public Access Author Manuscript Tutor Quant Methods Psychol. Author manuscript; available in PMC 2012 July 23.

NIH Public Access Author Manuscript Tutor Quant Methods Psychol. Author manuscript; available in PMC 2012 July 23. NIH Public Access Author Manuscript Published in final edited form as: Tutor Quant Methods Psychol. 2012 ; 8(1): 23 34. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

More information

Scientific Research. The Scientific Method. Scientific Explanation

Scientific Research. The Scientific Method. Scientific Explanation Scientific Research The Scientific Method Make systematic observations. Develop a testable explanation. Submit the explanation to empirical test. If explanation fails the test, then Revise the explanation

More information

Global Clinical Trials Innovation Summit Berlin October 2016

Global Clinical Trials Innovation Summit Berlin October 2016 Global Clinical Trials Innovation Summit Berlin 20-21 October 2016 BIOSTATISTICS A FEW ESSENTIALS: USE AND APPLICATION IN CLINICAL RESEARCH Berlin, 20 October 2016 Dr. Aamir Shaikh Founder, Assansa Here

More information

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Further data analysis topics

Further data analysis topics Further data analysis topics Jonathan Cook Centre for Statistics in Medicine, NDORMS, University of Oxford EQUATOR OUCAGS training course 24th October 2015 Outline Ideal study Further topics Multiplicity

More information

Main article An introduction to medical statistics for health care professionals: Describing and presenting data

Main article An introduction to medical statistics for health care professionals: Describing and presenting data 218 Musculoskeletal Care Volume 2 Number 4 Whurr Publishers 2004 Main article An introduction to medical statistics for health care professionals: Describing and presenting data Elaine Thomas PhD MSc BSc

More information

Systematic Reviews. Simon Gates 8 March 2007

Systematic Reviews. Simon Gates 8 March 2007 Systematic Reviews Simon Gates 8 March 2007 Contents Reviewing of research Why we need reviews Traditional narrative reviews Systematic reviews Components of systematic reviews Conclusions Key reference

More information

How to Conduct a Meta-Analysis

How to Conduct a Meta-Analysis How to Conduct a Meta-Analysis Faculty Development and Diversity Seminar ludovic@bu.edu Dec 11th, 2017 Periodontal disease treatment and preterm birth We conducted a metaanalysis of randomized controlled

More information

Control of Confounding in the Assessment of Medical Technology

Control of Confounding in the Assessment of Medical Technology International Journal of Epidemiology Oxford University Press 1980 Vol.9, No. 4 Printed in Great Britain Control of Confounding in the Assessment of Medical Technology SANDER GREENLAND* and RAYMOND NEUTRA*

More information

Proper analysis in clinical trials: how to report and adjust for missing outcome data

Proper analysis in clinical trials: how to report and adjust for missing outcome data DOI: 10.1111/1471-0528.12219 www.bjog.org Commentary Proper analysis in clinical trials: how to report and adjust for missing outcome data M Joshi, a, * A Royuela, b,c J Zamora b,c a Centre for Primary

More information

Assessing Agreement Between Methods Of Clinical Measurement

Assessing Agreement Between Methods Of Clinical Measurement University of York Department of Health Sciences Measuring Health and Disease Assessing Agreement Between Methods Of Clinical Measurement Based on Bland JM, Altman DG. (1986). Statistical methods for assessing

More information

Observational Studies Week #2. Dr. Michelle Edwards October 24, 2018

Observational Studies Week #2. Dr. Michelle Edwards October 24, 2018 Observational Studies Week #2 Dr. Michelle Edwards October 24, 2018 Paper review Dimensions? Levels of behaviour studied? Measures they chose? How they recorded their measures? Medium used? 3 Dimensions

More information

Research Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners

Research Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners Evidence-Based Complementary and Alternative Medicine Volume 202, Article ID 7808, 5 pages doi:055/202/7808 Research Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners

More information

An introduction to power and sample size estimation

An introduction to power and sample size estimation 453 STATISTICS An introduction to power and sample size estimation S R Jones, S Carley, M Harrison... Emerg Med J 2003;20:453 458 The importance of power and sample size estimation for study design and

More information

Dimensionality, internal consistency and interrater reliability of clinical performance ratings

Dimensionality, internal consistency and interrater reliability of clinical performance ratings Medical Education 1987, 21, 130-137 Dimensionality, internal consistency and interrater reliability of clinical performance ratings B. R. MAXIMt & T. E. DIELMANS tdepartment of Mathematics and Statistics,

More information

Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES

Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES Mark C. Hogrebe Washington University in St. Louis MACTE Fall 2018 Conference October 23, 2018 Camden

More information

A profiling system for the assessment of individual needs for rehabilitation with hearing aids

A profiling system for the assessment of individual needs for rehabilitation with hearing aids A profiling system for the assessment of individual needs for rehabilitation with hearing aids WOUTER DRESCHLER * AND INGE BRONS Academic Medical Center, Department of Clinical & Experimental Audiology,

More information

The reliability and validity of the Index of Complexity, Outcome and Need for determining treatment need in Dutch orthodontic practice

The reliability and validity of the Index of Complexity, Outcome and Need for determining treatment need in Dutch orthodontic practice European Journal of Orthodontics 28 (2006) 58 64 doi:10.1093/ejo/cji085 Advance Access publication 8 November 2005 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics

More information

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball Available online http://ccforum.com/content/6/2/143 Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball *Lecturer in Medical Statistics, University of Bristol, UK Lecturer

More information

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Biases in clinical research Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Learning objectives Describe the threats to causal inferences in clinical studies Understand the role of

More information

Prevalence of thyroid disorder in pregnancy and pregnancy outcome

Prevalence of thyroid disorder in pregnancy and pregnancy outcome Original Research Article Prevalence of thyroid disorder in pregnancy and pregnancy outcome Praveena K.R. 1, Pramod Kumar K.R. 2*, Prasuna K.R. 3, Krishna Kumar TV 4 1 Assistant Professor, Department of

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 Validity and reliability of measurements 4 5 Components in a dataset Why bother (examples from research) What is reliability? What is validity? How should I treat

More information

A scoring system for the assessment of bowel and lower urinary tract symptoms in women

A scoring system for the assessment of bowel and lower urinary tract symptoms in women BJOG: an International Journal of Obstetrics and Gynaecology April 2002, Vol. 109, pp. 424 430 A scoring system for the assessment of bowel and lower urinary tract symptoms in women L. Hiller a, H.D. Bradshaw

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

W e have previously described the disease impact

W e have previously described the disease impact 606 THEORY AND METHODS Impact numbers: measures of risk factor impact on the whole population from case-control and cohort studies R F Heller, A J Dobson, J Attia, J Page... See end of article for authors

More information

how good is the Instrument? Dr Dean McKenzie

how good is the Instrument? Dr Dean McKenzie how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize

More information

A practical tool for locomotion scoring in sheep: Reliability when used by veterinary surgeons and sheep farmers

A practical tool for locomotion scoring in sheep: Reliability when used by veterinary surgeons and sheep farmers See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/272945558 A practical tool for locomotion scoring in sheep: Reliability when used by veterinary

More information

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding

More information

Point-of-service questionnaires can reliably assess patients experiences

Point-of-service questionnaires can reliably assess patients experiences European Journal for Person Centered Healthcare Vol 2 Issue pp 85-91 ARTICLE Point-of-service questionnaires can reliably assess patients experiences Stephen D. Gill PhD Head of Safety and Quality Improvement,

More information

SRDC Technical Paper Series How Random Must Random Assignment Be in Random Assignment Experiments?

SRDC Technical Paper Series How Random Must Random Assignment Be in Random Assignment Experiments? SRDC Technical Paper Series 03-01 How Random Must Random Assignment Be in Random Assignment Experiments? Paul Gustafson Department of Statistics University of British Columbia February 2003 SOCIAL RESEARCH

More information

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co.

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co. Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co. Meta-Analysis Defined A meta-analysis is: the statistical combination of two or more separate studies In other words: overview,

More information

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, 2018 The Scientific Method of Problem Solving The conceptual phase Reviewing the literature, stating the problem,

More information

Reproducibility of childhood respiratory symptom questions

Reproducibility of childhood respiratory symptom questions Eur Respir J 1992. 5, 90-95 Reproducibility of childhood respiratory symptom questions B. Brunekreef*, B. Groat**, B. Rijcken***, G. Hoek*, A. Steenbekkers*, A. de Boer* Reproducibility of childhood respiratory

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

Survey on repeat prescribing for acid suppression drugs in primary care in Cornwall and the Isles of Scilly

Survey on repeat prescribing for acid suppression drugs in primary care in Cornwall and the Isles of Scilly Aliment Pharmacol Ther 1999; 13: 813±817. Survey on repeat prescribing for acid suppression drugs in primary care in Cornwall and the Isles of Scilly R. BOUTET, M. WILCOCK & I. MACKENZIE 1 Department of

More information

Interpreting Kappa in Observational Research: Baserate Matters

Interpreting Kappa in Observational Research: Baserate Matters Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Sonoma State University Paul Yoder Vanderbilt University Abstract Kappa (Cohen, 1960) is a popular agreement statistic

More information

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record Al-Janabi, H., Flynn, T. N., Peters, T. J., Bryan, S., & Coast, J. (2015). Testretest reliability of capability measurement in the UK general population. Health Economics, 24(5), 625-30..3100 Publisher's

More information


CRITICAL EVALUATION OF BIOMEDICAL LITERATURE Chapter 9 CRITICAL EVALUATION OF BIOMEDICAL LITERATURE M.G.Rajanandh, Department of Pharmacy Practice, SRM College of Pharmacy, SRM University. INTRODUCTION Reviewing the Biomedical Literature poses a

More information

Pain Assessment in Elderly Patients with Severe Dementia

Pain Assessment in Elderly Patients with Severe Dementia 48 Journal of Pain and Symptom Management Vol. 25 No. 1 January 2003 Original Article Pain Assessment in Elderly Patients with Severe Dementia Paolo L. Manfredi, MD, Brenda Breuer, MPH, PhD, Diane E. Meier,

More information

This is a repository copy of Testing for asymptomatic bacteriuria in pregnancy..

This is a repository copy of Testing for asymptomatic bacteriuria in pregnancy.. This is a repository copy of Testing for asymptomatic bacteriuria in pregnancy.. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/15569/ Version: Accepted Version Article:

More information

Chapter 9. Ellen Hiemstra Navid Hossein pour Khaledian J. Baptist M.Z. Trimbos Frank Willem Jansen. Submitted

Chapter 9. Ellen Hiemstra Navid Hossein pour Khaledian J. Baptist M.Z. Trimbos Frank Willem Jansen. Submitted Chapter Implementation of OSATS in the Residency Program: a benchmark study Ellen Hiemstra Navid Hossein pour Khaledian J. Baptist M.Z. Trimbos Frank Willem Jansen Submitted Introduction The exposure to

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen Evaluating Quality in Creative Systems Graeme Ritchie University of Aberdeen Graeme Ritchie {2007} Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17 {1}, pp.67-99.

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Usage of statistical methods and study designs in publication of specialty of general medicine and its secular changes Swati Patel 1*, Vipin Naik

More information

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India. Research Methodology Sample size and power analysis in medical research Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra,

More information

Types of data and how they can be analysed

Types of data and how they can be analysed 1. Types of data British Standards Institution Study Day Types of data and how they can be analysed Martin Bland Prof. of Health Statistics University of York http://martinbland.co.uk In this lecture we

More information

AAPOR Exploring the Reliability of Behavior Coding Data

AAPOR Exploring the Reliability of Behavior Coding Data Exploring the Reliability of Behavior Coding Data Nathan Jurgenson 1 and Jennifer Hunter Childs 1 Center for Survey Measurement, U.S. Census Bureau, 4600 Silver Hill Rd. Washington, DC 20233 1 Abstract

More information


ExperimentalPhysiology Exp Physiol 97.5 (2012) pp 557 561 557 Editorial ExperimentalPhysiology Categorized or continuous? Strength of an association and linear regression Gordon B. Drummond 1 and Sarah L. Vowler 2 1 Department

More information



More information