To evaluate a single epidemiological article we need to know and discuss the methods used in the underlying study.

Similar documents
Systematic Reviews. Simon Gates 8 March 2007

5 Bias and confounding

Epidemiology 101. Nutritional Epidemiology Methods and Interpretation Criteria

INTERNAL VALIDITY, BIAS AND CONFOUNDING

3. Factors such as race, age, sex, and a person s physiological state are all considered determinants of disease. a. True

Bias and confounding. Mads Kamper-Jørgensen, associate professor, Section of Social Medicine

Bias. Zuber D. Mulla

Madhukar Pai, MD, PhD Associate Professor Department of Epidemiology & Biostatistics McGill University, Montreal, Canada

Bradford Hill Criteria for Causal Inference Based on a presentation at the 2015 ANZEA Conference

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

False Positives & False Negatives in Cancer Epidemiology

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Understanding Statistics for Research Staff!

GLOSSARY OF GENERAL TERMS

11 questions to help you make sense of a case control study

UNIT 5 - Association Causation, Effect Modification and Validity

Epidemiologic Methods and Counting Infections: The Basics of Surveillance

Beyond Controlling for Confounding: Design Strategies to Avoid Selection Bias and Improve Efficiency in Observational Studies

Systematic Reviews and Meta- Analysis in Kidney Transplantation

Purpose. Study Designs. Objectives. Observational Studies. Analytic Studies

Hypothesis-Driven Research

Two-sample Categorical data: Measuring association

Suggested Answers Part 1 The research study mentioned asked other family members, particularly the mothers if they were available.

Matching study design to research question-interactive learning session

TESTING TREATMENTS SYNTHESIZING INFORMATION FROM RESEARCH

CPH601 Chapter 3 Risk Assessment

Quantitative research Methods. Tiny Jaarsma

Evidence Based Medicine

Publishing Your Study: Tips for Young Investigators. Learning Objectives 7/9/2013. Eric B. Bass, MD, MPH

Overview and Comparisons of Risk of Bias and Strength of Evidence Assessment Tools: Opportunities and Challenges of Application in Developing DRIs

Epidemiology overview

baseline comparisons in RCTs

CONSORT 2010 checklist of information to include when reporting a randomised trial*

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Assignment 4: True or Quasi-Experiment

Confounding and Interaction

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

VALIDITY OF QUANTITATIVE RESEARCH

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Traumatic brain injury

Psychology Research Process

Study Guide for Why We Overeat and How to Stop Copyright 2017, Elizabeth Babcock, LCSW

CRITICAL APPRAISAL SKILLS PROGRAMME Making sense of evidence about clinical effectiveness. 11 questions to help you make sense of case control study

Study design. Chapter 64. Research Methodology S.B. MARTINS, A. ZIN, W. ZIN

ADENIYI MOFOLUWAKE MPH APPLIED EPIDEMIOLOGY WEEK 5 CASE STUDY ASSIGNMENT APRIL

sickness, disease, [toxicity] Hard to quantify

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Unit 5. Thinking Statistically

ACCTG 533, Section 1: Module 2: Causality and Measurement: Lecture 1: Performance Measurement and Causality

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co.

Thomas Widd: Unit and Lessons Plans

Rapid appraisal of the literature: Identifying study biases

Interpretation of Epidemiologic Studies

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Reliability, validity, and all that jazz

Answers to end of chapter questions

Problem solving therapy

ASSOCIATION & CAUSATION IN EPIDEMIOLOGICAL STUDIES. Dr. Sireen Alkhaldi Community Medicine, 2016/ 2017 The University of Jordan

Lab 2: The Scientific Method. Summary

Clinical Epidemiology for the uninitiated

Choosing the right study design

Results. NeuRA Motor dysfunction April 2016

Rival Plausible Explanations

Chapter 5: Field experimental designs in agriculture

Psychology Research Process

UNIT. Experiments and the Common Cold. Biology. Unit Description. Unit Requirements

Epidemiology & Evidence Based Medicine. Patrick Linden & Matthew McCall

Introduction to systematic reviews/metaanalysis

Incorporating Clinical Information into the Label

Distraction techniques

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

How to use this appraisal tool: Three broad issues need to be considered when appraising a case control study:

Controlled Trials. Spyros Kitsiou, PhD

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

In Bed With The Devil: Recognizing Human Teratogenic Exposures

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Pooling Subjective Confidence Intervals

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Should individuals with missing outcomes be included in the analysis of a randomised trial?

First author. Title: Journal

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

TEACHERS TOPICS. The Role of Matching in Epidemiologic Studies. American Journal of Pharmaceutical Education 2004; 68 (3) Article 83.

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Results. NeuRA Treatments for internalised stigma December 2017

Cohesive Writing. Unit 1 Paragraph Structure INDEPENDENT LEARNING RESOURCES. Learning Centre

Please revise your paper to respond to all of the comments by the reviewers. Their reports are available at the end of this letter, below.

Reliability, validity, and all that jazz

Complications of Proton Pump Inhibitor Therapy. Gastroenterology 2017; 153:35-48 발표자 ; F1 김선화

Cohesive Writing Module: Introduction

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

CHAPTER 2 APPLYING SCIENTIFIC THINKING TO MANAGEMENT PROBLEMS

Workshop: Cochrane Rehabilitation 05th May Trusted evidence. Informed decisions. Better health.

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

Epidemiological study design. Paul Pharoah Department of Public Health and Primary Care

Strategies for handling missing data in randomised trials

Electromagnetic fields (EMF) What are electromagnetic fields?

Transcription:

Critical reading 45 6 Critical reading As already mentioned in previous chapters, there are always effects that occur by chance, as well as systematic biases that can falsify the results in population based studies. Based on this, the question about the explanatory value of a single epidemiological study arises. How should we evaluate a study about the relationship between dietary patterns and the development of colorectal cancer? If a relationship is found, certain dietary recommendations should be derived - if the relationship is wrong and people change their diet according to the study results, unnecessary costs are created and this change may be harmful too. To evaluate the causality of a relationship, a list of requirements has been developed. At least some of these requirements have to be fulfilled, in order to accept a causal relationship. The following considerations apply to analytic but also to descriptive studies. To get to a final conclusion, single studies should be evaluated first. In a further step, the results of different studies on the same subject are summarized and finally the causality of a relationship is analyzed. Learning objectives In this chapter the students learn about How to deal with evidence in epidemiologic studies Literature reviews and meta-analyses as well as How causality criteria can be used to evaluate the causal relationship of study results. 6.1 Explanatory value of a study To evaluate a single epidemiological article we need to know and discuss the methods used in the underlying study. 6.1.1 Effect of chance in epidemiological studies Chance plays a central role in all studies that are based on population samples. Its effects can be measured by statistical tests. For example, we can look at the question whether traffic noise during night time increases the risk for high blood pressure or not. The level of significance in a statistical test is usually set at 5%. The significance level is indicated by the p-value in a statistical analysis (a 5% significance level corresponds to a p-value of 0.05). This means that with a probability of 5% we decide on a causal relationship based on the result of the analysis, even if this does not exist in reality. In other words: with a probability of 5% there is no causal relationship between night time traffic noise and higher blood pressure even though the results show such a relationship.

46 Introduction in Epidemiology Another possibility to evaluate effect estimates is to look at the confidence interval. The confidence interval indicates the range in which the real value of the basic population can be found. If the study would be conducted a second time the result of the analysis might differ from the first one but the real value will still be within an estimated confidence interval. A confidence interval is as well as the p-value connected to a probability - usually 95% (written as 95% CI). This means that the real value (for example the risk or a proportion of the population) lies with a probability of 95% within the confidence interval around the estimated estimate. Even though the underlying concepts between significance and confidence interval are different, they are connected with each other. If we decide on a 5% significance level we want to be 95% sure ("confident"), that our result is correct. If we want to be more precise we could decide on a 1% significance level with a corresponding 99% confidence interval. The choice depends on the study question and how precise we can be with the study result. 6.1.2 Effect of the number of cases/participants The number of participants in a study has a huge effect on its quality. To explain this we use the game "Ludo", where we need dice to play it. Everybody who knows that game might also know the feeling: There is never a six on mine but on all other dice! In a first step, we want to test if this dice really has a displaced balance point and never throws a six. You throw the dice 4 times and there is no 6 - which is of course quite likely. We can also calculate this: The probability of a 6 on a fair dice is 1/6. The probability not getting a 6 is 5/6 (=1-(1/6)). The probability of no 6 when throwing the dice 4 times is (5/6)*(5/6)*(5/6)*(5/6)=(5/6) 4 =0.48. We see that even with a fair dice the probability of no 6 is about 50%. Nobody believes after this four throws that the dice has lost its six. In the next step we will throw the dice 16 times more so we have 20 attempts altogether - but no 6 shows up. The probability of this with a fair dice is (5/6) 20 =0.026 or 2.6%, which is very low. We see that this is still lower than the usual significance level of 5%. That means we have a statistically significant result for an unfair dice. The probability in having by chance no 6 is below 0.05. The used probability describes that something happens by chance even though there is no association given. That explains also significance. If this probability is below 0.05 we suggest that is improbable that this happens by chance and therefore the difference is explained as being significant. To learn more about how to calculate a suitable number of participants look at chapter 7.2.2. In our last test we try even more and throw the same dice in total 100 times. The probability of no 6 with a fair dice throwing 100 times is (5/6) 100 =0.000000012. This result is much lower than the significance level and is therefore highly significant. Now we know for sure that the dice is unfair and can throw it away! But we might also think about if it was really necessary to try 100 times. For the Ludo game it would mean that all opponents have their pieces in their home column and the game is done and you still try to start the game with your own piece. Compared to an epidemiological study this means that our study population was too big. If there is an association, you will find it. But this also means that a lot of people were recruited and spent their time unnecessarily and the study was too expensive and time consuming.

Critical reading 47 We can conclude that it is important to consider also information about the number of participants if we want to evaluate the result of a study. When reading an article you should also consider if the power of the study is big enough to find a significant result. 6.1.3 Effect of misclassification Another important point is to look at the way and quality of how exposure and outcome were assessed in a study. As we already explained in chapter 5.2 (information bias) it is not always easy to obtain the desired information. For example there can be unsystematic misclassifications (non-differential misclassifications) meaning that, independently of their disease status, exposed persons will be classified as unexposed or the other way around. This non-differential misclassification in most situations leads to a smaller effect estimate compared to non-biased results. In contrast, if for example, the diseased persons remember their exposure status better than healthy ones, this is called systematic misclassification (differential misclassification). This makes the evaluation of the real association very difficult. A very common example for differential misclassification is the recall bias in casecontrol studies. Cases might recall information about the exposure better than controls because they are more acquainted with the topic and think more thoroughly about the exposure. 6.1.4 Effect of confounders We already explained that in addition to variations by chance there are systematic factors (confounders), which can influence the result of a study as well. Their effect cannot be estimated with the help of significance level and confidence interval like in the case of variations by chance. Expert knowledge in the certain area of interest helps to identify possible confounders that should be considered in a study (compare chapters 5.3 and 5.4). Confounders can be considered during the study design and/or statistical analysis and they should be named and explained in the article. Confounders are one reason why a single epidemiological study is not regarded as a very reliable source of evidence. Only when several studies show similar results is it scientifically accepted. That means at the same time, that several different confounders are considered and it is less likely that one important confounder was neglected. 6.2 Literature reviews of epidemiological studies A vital part of epidemiology is to summarize results and discussions of different studies. Thereby, we make sure that confounders, variations by chance or measurement errors do not distort an effect. There are different ways to summarize results. When doing a systematic literature review, we use a structured way of critically analyzing existing literature to give a qualitative or semi-quantitative overview of the subject. A meta-analysis goes a step further and summarizes existing results in a quantitative manner.

48 Introduction in Epidemiology 6.2.1 Systematic literature review When planning a literature review, we need a clear definition of the research question. To allow for a specific answer the research question should be narrowed down as much as possible to a very precise one. One single article is defined as the unit of analysis. It is important to formulate clear inclusion and exclusion criteria for the articles such as the study design, data quality, and study population. The search for literature requires a systematic approach, as well. The articles need to be included systematically and in a comprehensible manner. Relevant literature data bases, search terms (choice of key-words), language and year of publication need to be considered. A systematic approach also needs to be followed when selecting, evaluating and presenting data from the relevant articles. 6.2.2 Meta-analyses Meta-analyses are sometimes also categorized as a specific study design. They do not include personal data but work with data from other studies. Meta-analyses summarize several studies that are conducted on one question and generate an overarching result that combines all results from those studies. Meta-analyses are based on the notion that single studies without additional information from other sources are not reliable enough to be used as scientific evidence. That is why, normally, several studies about one topic are combined. Considerations about possible factors influencing the result make it clear that it is very unlikely to find two studies with the exact same result even if they were conducted in the same way. Some variations occur by chance or the study populations that were selected have different structures which lead to different results. We need to find out if the results of two studies are contradictory or if this difference can be attributed to certain characteristics of the studies and are therefore compatible. It is not necessary to find identical results but results which fit together. A study with an old study population will always find more diseased persons than a study with a younger population regardless the specific topic. The results of these studies might still be compatible and a meta-analysis would try to formulate a combined result. Example: A meta-analysis on colorectal cancer risk caused by alcohol consumption A forest plot is a good figure to display results in meta-analyses and to evaluate the different study results together with their confidence intervals. The following forest plot (figure 7.1) shows the relative risks from several cohort studies about the relationship between high alcohol consumption and colorectal cancer. The relative risks are quite close to each other and the confidence intervals overlap. That might lead to the conclusion that the difference in study results is due to random variation in the single studies. Studies with wide confidence intervals are mostly small with a lot of uncertainties. These studies are given less weight in the estimation of the overall effect than bigger studies with narrower confidence intervals. The weight of the single studies is indicated by the size of the single squares in the forest plot. This metaanalysis concludes that the results from the studies are compatible and the combined relative risk is 1.62 (see last line in figure 6.1). There is a significant relationship be-

Critical reading 49 tween alcohol consumption and risk of colorectal cancer. The risk of colorectal cancer was compared between people that have a high with those who have low or irregular alcohol consumption. The overall result shows that people with a high consumption have a 62% higher risk to develop colorectal cancer than those who have a low or irregular consumption. Study Risk ratio (95% CI) % Weight Otani et al. 2003 1.44 (1.05,1.98) 16.6 Pedersen et al. 2003 1.12 (0.84,1.50) 21.4 Wakai et al. 2005 1.25 (0.90,1.74) 16.6 Akhter et al 2007 1.74 (1.21,2.51) 12.9 Ferrari et al. 2007 2.25 (1.78,2.85) 19.5 Lim & Parker 2008 1.52 (0.79,2.95) 3.1 Thygesen et al. 2008 2.22 (1.57,3.15) 9.9 Overall (95% CI) 1.62 (1.43,1.83).317037 1 3.1542 Risk ratio Figure 6.1: Forest Plot with the newly calculated pooled results from the meta-analysis about high alcohol consumption and risk of colorectal cancer (data taken from Fedirko et al. 2010). Figure 6.2: Flow chart showing choice of publications in a meta-analysis (Fedirko et al. 2010).

50 Introduction in Epidemiology It is a lot of work to conduct a meta-analysis. All relevant studies have to be identified from adequate databases and their quality has to be evaluated. The figure 6.2 illustrates the necessary steps that need to be undertaken (taken from the same article). Meta-analyses are important instruments to develop recommendations for action or prevention programs using results from epidemiological studies. All relevant studies have to be analyzed and results have to be weighted according to their significance to get valid information in the end. It is important to check if there also are studies that did not find a relationship between exposure and outcome of interest. Such studies are usually seldom published and are therefore harder to find than studies with positive results (i.e. with a significant effect). 6.3 Causality criteria for epidemiological studies The probability of a wrong conclusion is reduced a lot in meta-analyses, even though it cannot be completely avoided. Etiological studies about the relationship of factors and diseases are conducted with high accurateness to confirm an association from several independent studies. To evaluate the causality of an association there are nine criteria developed by Sir Austin Bradford-Hill in 1965 which are widely applied and presented here. There is no rank in the order of these criteria and they are published in different orders depending on the text book. The following order is based on Rothman (2008) and generally the first five criteria are regarded as the most important ones. 1) If several independent epidemiological studies show a similar effect, there are consistent results. An advantage is also if these studies are conducted with different study populations which would increase the possibility to detect several confounders. If there are different study populations we cannot expect to find the exact same effect but they have to be compatible as already explained earlier in this chapter. 2) The strength of the association is another important indicator. An increase in the factor which is the assumed cause should be clearly associated with an increase in the effect variable as well. Today it is an accepted fact that smoking leads to lung cancer: with an RR of 8 this effect is clear and strong. This increased risk is still plausible when we think about possible other confounders and the difference between study populations and study design. But such a high effect is rather seldom, for example in environmental epidemiology RRs of 1.5 or lower are more common. This shows that the strength of an association is an important but not a necessary criterion. There are associations between an exposure and outcome which are not that strong but also important, like passive smoke for the development of lung cancer. 3) Temporality between exposure and outcome is one of the most important causality criteria. The exposure has to be first and the outcome occurs later. If the timely order cannot be definitely assessed this criterion is not fulfilled. 4) Another strong indicator for causality is the dose-response-relationship (also called biological gradient). The following concept lies behind this assumption: the higher the dose of a pathogenic factor, the higher its effect. However, this is not an absolute requirement. For example some substances are health promoting in low doses but harmful and dangerous in higher doses such as alcohol. 5) Even if epidemiological studies do not focus on biological mechanisms their results should be compatible with existing biological and medical knowledge. If the relationship is biologically plausible it is another hint for causality. If there is bio-

Critical reading 51 logical evidence which contradicts the epidemiological results we should not assume causality. However, biological plausibility is not a necessary condition to accept epidemiological evidence. One strength of epidemiology is that we do not need biological models to find relationships between different variables. We can go beyond existing knowledge and detect new fields in which there is no biological research so far. Remember John Snows study (explained in chapter 1) about polluted drinking water and the cholera in London which brought important hints for a causal relationship before the biological mechanisms were known. 6) Another supporting fact for causality is the specific effect of a factor. This means that for example smoking would only cause lung cancer and nothing else. Even though this association is well known there is no specific effect since smoking can also lead to other health impairments like the development of other kinds of cancer or cardiovascular diseases. This criterion should therefore be evaluated with care when interpreting causality. 7) Coherence is another argument for a causal relationship, i.e. if all scientific results align to a common picture. Are there conflicting results to the relationship found? If all scientists agree in their assessment we can assume that this relationship is real. 8) If we can confirm a hypothesized relationship with an experiment we have another hint for a causal relationship. But in epidemiological studies it is mostly not feasible to conduct controlled experiments because we get our data from humans. It is for example not possible to conduct an intervention study where we expose one group of people with environmental pollutants but not the control group. That is why this criterion is usually interpreted as a confirmation of a suspected doseresponse relationship by animal experiment. Higher doses are tested under controlled conditions to confirm the biological plausibility of a relationship. 9) Hill also considered analogy as a criterion. Is there another but similar relationship that can be used as an explanation for the given result? This criterion is useful for the development of new research questions. As an example we can use the problem of a possible health risk from radiofrequency electromagnetic fields. Only recently at the end of the 20 th century have studies been published on this relationship. Previously, no research was done in that context. As a possible explanation for a potential relationship studies about other electromagnetic fields were used. These criteria are conservative and should be applied with caution. If you think about the wide range of consequences that are drawn from assumed causal relationships this is good advice. If large prevention programs are to be designed with a lot of staff and resources and many people change their habits to reduce the risk of disease, evidence should be very clear!

52 Introduction in Epidemiology Remember Every single study and its methods should be evaluated critically. Important parameters are the number of participants, quality of data collection about exposure, outcome and confounders. A single epidemiological study has limited explanatory value for an overall effect or result. More studies are necessary to answer the question precisely. Systematic literature reviews and meta-analyses are important because they combine results from different studies. Several criteria have been developed to evaluate the causal relationship of factors and disease that have been studied. The most important criteria are consistency, strength, temporality, dose-response relationship and biological plausibility. Further readings (1): Aschengrau A, Seage GR. Essentials of Epidemiology in Public Health. 2 nd edition. Jones and Bartlett Publishers 2008 relevant chapters: 14, 15 Bhopal R: Concepts of Epidemiology. 2 nd edition, Oxford University Press 2009. relevant chapter: 5 Rothmann KJ, Greenland S (2008). Modern Epidemiology. 3 rd edition Lippincott-Raven Publishers relevant chapters: 2 Online sources: Good epidemiological practice: IEA guidelines for proper conduct in epidemiologic research (International Epidemiological Association) http://www.ieaweb.org/index.php?option=com_content&view=article&id= 15&Itemid=43&showall=1 CONSORT Statement http://www.consort-statement.org/ STROBE-Statement STrengthening the Reporting of OBservational studies in Epidemiology http://www.strobe-statement.org Enhancing the Quality and transparency of health research http://www.equator-network.org Further reading and online resources

Critical reading 53 Further reading (2) Scientific publications: Fedirko V et al.: Alcohol drinking and colorectal cancer risk: An overall and dose-response meta-analysis of published studies Ann Oncol (2011) 128 (10): 2473-2484. Blettner at al.: Critical reading of epidemiological papers A guide. Eur J Public Health (2001) 11 (1): 97-101. Scientific Guidelines: Observational studies Elm E et al. and the STROBE Initiative: The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). Bulletin of the World Health Organisation 2007;85:867 872. Intervention studies CONSORT Statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340:698-702 Des Jarlais DC et al. and the TREND Group: Improving the reporting quality of nonrandomized evaluations of behavioural and public health interventions The trend statement. Am J Public Health 2004; 94:361-366. Systematic literature review and meta-analysis Liberati A et al.: The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration. PLoS Med 2009; 6(7): e1000100. Moher D et al. PRISMA Group: Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med; 2009: 6(7): e1000097 Stroup DF et al. (2000): Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 283(15): 2008-2012. Further reading and online resources

54 Introduction in Epidemiology