RANDOMIZED TRIALS WITH

Similar documents
Determining adverse events and their frequency for

The QUOROM Statement: revised recommendations for improving the quality of reports of systematic reviews

Empirical evidence on sources of bias in randomised controlled trials: methods of and results from the BRANDO study

Systematic reviews and meta-analyses of observational studies (MOOSE): Checklist.

Live WebEx meeting agenda

SEVERAL AUTHORS AND ORGANIzations

Papers. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses.

Summing up evidence: one answer is not always enough

Early extreme contradictory estimates may appear in published research: The Proteus phenomenon in molecular genetics research and randomized trials

THE REPORTING OF HARM IS AS

Systematic reviewers neglect bias that results from trials stopped early for benefit

Comparative Effectiveness Research Collaborative Initiative (CER-CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS

Citation Characteristics of Research Published in Emergency Medicine Versus Other Scientific Journals

Systematic Reviews and Meta- Analysis in Kidney Transplantation

Database of Abstracts of Reviews of Effects (DARE) Produced by the Centre for Reviews and Dissemination Copyright 2017 University of York.

Drug Class Review on Proton Pump Inhibitors

Workshop: Cochrane Rehabilitation 05th May Trusted evidence. Informed decisions. Better health.

Downloaded from:

Comparative Effectiveness Research Collaborative Initiative (CER-CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS

ACR OA Guideline Development Process Knee and Hip

Combination therapy compared to monotherapy for moderate to severe Alzheimer's Disease. Summary

School of Dentistry. What is a systematic review?

Evidence-Based Review Process to Link Dietary Factors with Chronic Disease Case Study: Cardiovascular Disease and n- 3 Fatty Acids

5-ASA for the treatment of Crohn s disease DR. STEPHEN HANAUER FEINBERG SCHOOL OF MEDICINE, NORTHWESTERN UNIVERSITY, CHICAGO, IL, USA

The Placebo Attributable Fraction in General Medicine: Protocol for a metaepidemiological

Introduction to Systematic Reviews

SYSTEMATIC REVIEW: AN APPROACH FOR TRANSPARENT RESEARCH SYNTHESIS

Statistical probability was first discussed in the

Cardiovascular Disease and Commercial Motor Vehicle Driver Safety. Physical Qualifications Division April 10, 2007

Publishing Your Study: Tips for Young Investigators. Learning Objectives 7/9/2013. Eric B. Bass, MD, MPH

Thank you for agreeing to give us a statement on your organisation s view of the technology and the way it should be used in the NHS.

Study protocol v. 1.0 Systematic review of the Sequential Organ Failure Assessment score as a surrogate endpoint in randomized controlled trials

Evaluating Systematic Reviews and Meta-Analyses

Drug Class Review on Macrolides

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

2. The effectiveness of combined androgen blockade versus monotherapy.

Evidence Based Medicine

ANONINFERIORITY OR EQUIVAlence

Appraising the Literature Overview of Study Designs

Deep vein thrombosis and its prevention in critically ill adults Attia J, Ray J G, Cook D J, Douketis J, Ginsberg J S, Geerts W H

Reflection paper on assessment of cardiovascular safety profile of medicinal products

Results. NeuRA Worldwide incidence April 2016

The Cochrane Collaboration, the US Cochrane Center, and The Cochrane Library

Secular Changes in the Quality of Published Randomized Clinical Trials in Rheumatology

Effective Health Care Program

CADTH Therapeutic Review

9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Pharmacotherapy for Alcohol Dependence

Alcohol interventions in secondary and further education

What is meta-analysis?

ISPOR Task Force Report: ITC & NMA Study Questionnaire

Bandolier. Professional. Independent evidence-based health care ON QUALITY AND VALIDITY. Quality and validity. May Clinical trial quality

NeuRA Sleep disturbance April 2016

Guidance Document for Claims Based on Non-Inferiority Trials

Services for Men at Publicly Funded Family Planning Agencies,

Executive Summary. Parkinson s Disease, Multiple Sclerosis, and Commercial Motor Vehicle Driver Safety. September 2008

REVIEW. What is the quality of reporting in weight loss intervention studies? A systematic review of randomized controlled trials

Gastrointestinal Safety of Coxibs and Outcomes Studies: What s the Verdict?

Appendix 2 protocol addendum (added 15/08/2012)

2016 Hospital Measures

Systematic Reviews. Simon Gates 8 March 2007

Journal of Biostatistics and Epidemiology

Relative efficacy of oral analgesics after third molar extraction

Con - SMBG Should be the Standard of Care in All Patients with Type 2 Diabetes

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Traumatic brain injury

Reflection paper on assessment of cardiovascular risk of medicinal products for the treatment of cardiovascular and metabolic diseases Draft

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Animal-assisted therapy

Practice parameter: immunotherapy for Guillain-Barre syndrome: report of the Quality Standards Subcommittee of the American Academy of Neurology.

Quick Literature Searches

Critical Appraisal of a Meta-Analysis: Rosiglitazone and CV Death. Debra Moy Faculty of Pharmacy University of Toronto

Results. NeuRA Treatments for internalised stigma December 2017

Evidence- and Value-based Solutions for Health Care Clinical Improvement Consults, Content Development, Training & Seminars, Tools

Improving Return on Public Health Investments in Disasters with Evidence Synthesis

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

The prevalence and history of knee osteoarthritis in general practice: a case control study

Qigong for healthcare: an overview of systematic reviews

Principles of meta-analysis

research methods & reporting

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias

Drug combinations and impaired renal function the triple whammy

Pharmacological interventions for smoking cessation: an overview and network meta-analysis (Review)

Systematic Review & Course outline. Lecture (20%) Class discussion & tutorial (30%)

Setting The setting was primary and secondary care. The economic study was carried out in the UK.

Aspirin for the Prevention of Cardiovascular Disease

Introduction to Systematic Reviews

Alectinib Versus Crizotinib for Previously Untreated Alk-positive Advanced Non-small Cell Lung Cancer : A Meta-Analysis

DATE: 04 April 2012 CONTEXT AND POLICY ISSUES

CARDIOVASCULAR RISK and NSAIDs

Problem solving therapy

ARCHE Risk of Bias (ROB) Guidelines

How to Write a Case Report

Predictors of publication: characteristics of submitted manuscripts associated with acceptance at major biomedical journals

Cochrane Breast Cancer Group

Allergen immunotherapy for the treatment of allergic rhinitis and/or asthma

ABSTRACT. KEY WORDS antibiotics; prophylaxis; hysterectomy

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

Meta-analysis of well-designed nonrandomized comparative studies of surgical procedures is as good as randomized controlled trials

Meta Analysis. David R Urbach MD MSc Outcomes Research Course December 4, 2014

Transcription:

ORIGINAL CONTRIBUTION Completeness of Safety Reporting in Randomized Trials An Evaluation of 7 Medical Areas John P. A. Ioannidis, MD Joseph Lau, MD See also p 444. Context Randomized trials with adequate sample size offer an opportunity to assess the safety of new medications in a controlled setting; however, generalizable data on drug safety reporting are sparse. Objective To scrutinize the completeness of safety reporting in randomized trials. Design, Setting, and Patients Survey of safety reporting in 192 randomized drug trials 7 diverse topics with sample sizes of at least 100 patients and at least 50 patients in a study arm (N=130 074 patients). Trial reports were identified from comprehensive meta-analyses in 7 medical areas. Main Outcome Measures Adequate reporting of specific adverse effects and frequency and reasons for withdrawals due to toxic effects; article space allocated to safety reporting and predictors of such reporting. Results Severity of clinical adverse effects and laboratory-determined toxicity was adequately defined in only 39% and 29% of trial reports, respectively. Only 46% of trials stated the frequency of specific reasons for discontinuation of study treatment due to toxicity. For these 3 parameters, there was significant heterogeneity in rates of adequate reporting across topics (P=.003, P.001, and P=.02, respectively). Overall, the median space allocated to safety results was 0.3 page. A similar amount of space was devoted to contributor names and affiliations (P=.16). On average, the percentage of space devoted to safety in the results section was 9.3% larger in trials involving dose comparisons than in those that did not (P.001) and 3.8% smaller in trials reporting statistically significant results for efficacy outcomes (P=.047). Conclusions The quality and quantity of safety reporting vary across medical areas, study designs, and settings but they are largely inadequate. Current standards for safety reporting in randomized trials should be revised to address this inadequacy. JAMA. 2001;285:437-443 www.jama.com RANDOMIZED TRIALS WITH adequate sample size offer a unique opportunity to assess the frequency and severity of adverse events in a controlled and objective setting, with the most comprehensive and systematic accumulation of pertinent information. Such information is important in estimating benefit-harm ratios in the application of medical interventions. However, compared with the heightened scrutiny of the conduct, analysis, and reporting of randomized trials in general, 1-3 assessment of the reporting of adverse events and toxicity has only recently drawn some attention. In a preliminary analysis of the quality of safety reporting, we observed that toxicity data may sometimes be presented erratically or may be missing altogether. 4 A limitation of this preliminary work was that it considered trials in only 1 medical domain, ie, drug therapy for human immunodeficiency virus (HIV) infection. However, the adequacy of safety reporting may be different in other medical areas. Therefore, in the present study, we extended our evaluation of safety reporting to 7 different areas of drug therapy. In doing so, we also sought to understand the settings and predictors that lead to suboptimal safety reporting, and to gain insight for improving these important deficiencies. METHODS Trial Databases Randomized trials of drug therapies qualified for the analysis if they had a sample size of at least 100 patients and at least 50 patients allocated to a study arm. Smaller trials give very uncertain estimates for even the most frequent adverse events, and may completely miss even relatively common toxicity. With 100 patients, when no subjects are observed to experience or report a specific adverse effect, the upper limit of the 95% confidence interval (CI) for the Author Affiliations: Clinical Trials and Evidence- Based Medicine Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece (Dr Ioannidis); and Division of Clinical Care Research, Department of Medicine, New England Medical Center, Boston, Mass (Drs Ioannidis and Lau). Corresponding Author and Reprints: Joseph Lau, MD, Division of Clinical Care Research, New England Medical Center, 750 Washington St, Box 63, Boston, MA 02111 (e-mail: JLau1@Lifespan.org). 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 437

true frequency of this unobserved effect is still 3%. With 50 patients assigned to an experimental arm, the upper limit of the 95% CI is as high as 6%. 5 We analyzed safety reporting in randomized drug trials in the following 7 medical areas: (1) HIV therapy (all therapeutic trials, excluding immunization [passive immunotherapy, vaccines] and treatment of complications); (2) antibiotic therapy for acute sinusitis (all comparisons of antibiotics among themselves or with placebo or no therapy); (3) thrombolysis for acute myocardial infarction (AMI) (all comparisons of different thrombolytic regimens against placebo or no therapy); (4) use of nonsteroidal antiinflammatory drugs (NSAIDs) for rheumatoid arthritis (comparisons of NSAIDs among themselves or with placebo or no therapy); (5) treatment of hypertension in elderly persons (all comparisons of antihypertensive regimens among themselves or with placebo or no therapy in this age group); (6) treatment of Helicobacter pylori with antibiotics (all comparisons of any antibiotic regimen until 1994 and all comparisons involving proton-pump inhibitors [omeprazole] with antibiotics until 1999); and (7) selective decontamination of the gastrointestinal tract (SDGIT) (all comparisons against placebo or no therapy). These topics represent a diverse set of medical questions that have a significant public health impact (ie, the diseases are very common and/or have major morbidity). Also, a comprehensive list of the pertinent trials would be easily retrievable from systematic databases and/or meta-analyses performed by either our group or prior investigators. We purposely included topics spanning the acute and chronic care settings and both inpatient and outpatient settings to maximize generalizability. However, we focused on drug therapies and excluded surgical interventions and vaccines, since safety may be assessed differently for such interventions. We identified the pertinent trials for each topic using several sources. For the HIV, sinusitis, AMI, hypertension, and H pylori topics, we used comprehensive databases of randomized controlled trials that have been developed by members of our research group as part of previous work on evidence reports, meta-analyses, and methodologic studies. 4,6-11 Typically, MEDLINE and EMBASE searches using the names of specific medications and a large array of terms pertaining to randomized trials were complemented with handsearching of journals specializing in the given area, identification of trials referenced in retrieved publications, and communications with experts. For these 5 topics, databases cover until late 1997 or early 1998 (1999 for H pylori). For the H pylori topic, we also retrieved trials from a meta-analysis published by a different team that addressed all trials of antibiotic regimens published until 1994. 12 For the rheumatoid arthritis and SDGIT topics, we used the trial databases from meta-analyses published by other teams. 13-15 These cover randomized trials published until 1993 13,14 and 1989, 15 respectively. For the topic of hypertension in the elderly, we also consulted the respective systematic review in the Cochrane Library. 16 We did not systematically update these metaanalyses, since the aim of the project was not a thorough meta-analysis to estimate the treatment effects based on the most current data. Evaluation and Analysis Qualitative and Quantitative Parameters of Safety Reporting. Wedecided to use both qualitative and quantitative components of adverse event reporting; these components may offer complementary information. The list of these parameters was originally developed for the evaluation of HIVrelated drug trials. 4 Although it is difficult to specify which aspects of safety reporting are most important, some aspects are probably indispensable, if the reported information is to be used and interpreted for clinical purposes. First, data should be given with numbers. Generic statements (eg, few patients had side effects ) cannot be objectively appraised and may be misinterpreted. Second, the severity of adverse effects should be stated and, at minimum, the frequency of severe or life-threatening toxic events should be provided per study arm. Standardized scales for grading toxicity and definitions for severity are both important for this purpose. Third, data should be given separately for each specific type of severe adverse effect so clinicians can determine what kind of harm is involved. Based on the above, we selected the following 2 qualitative components: (1) whether the number of withdrawals and discontinuations of study treatment due to toxicity are reported, and whether the number was given for each specific type of adverse effect leading to withdrawal; and (2) whether the severity of the described clinical adverse events or abnormalities of laboratory tests (laboratory-determined toxicity) were adequately defined, only partially defined, or inadequately defined. Adequate definition of severity requires either detailed description of the severity or reference to a known scale of toxicity severity (typically with grades being 1=mild, 2=moderate, 3=severe, 4=life-threatening), with separate reporting of at least severe or lifethreatening events. At least 2 adverse effects (clinical or laboratory) have to be defined in this way, with numbers or rates given for each study arm. Partial definition of severity means that reports of severity combine moderate with severe or life-threatening toxicity counts, or that the number of severe or life-threatening toxicity cases are separately specified for only 1 of many reported clinical adverse events and laboratory abnormalities per study arm. Inadequate definition of severity includes protocols reporting the total number of severe clinical or laboratorydetermined toxic effects without giving numbers on specific types of adverse events per arm, those that lumped numbers for all grades of toxicity without separating any grades for any specific adverse events, those providing 438 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.

only generic statements, and those not reporting adverse effects at all. The common characteristic of all these situations is that information is missing on the frequency of severe adverse effects information that is directly relevant for the estimation of benefitharm ratios. Quantitative measures assess the relative emphasis given to safety in the results of published trials. We specified these measures as the extent of space (in printed pages) devoted to safety in the results section, and the proportion it represents of the whole results section; and the space devoted to safety as compared with the space devoted to the names and affiliations of authors, participants, and contributors in the same trial report. The space allocated to toxicity is not necessarily correlated with the quality of reporting, but it complements the qualitative assessment, because it is an objective estimate of the relative importance that safety has in the overall clinical trial report. In our study, we measured the space for each section with a resolution of 0.05 page. When there were N columns per page and a printed page had a length of Y centimeters, a section spanning a length of S centimeters in 1 such column was calculated as occupying S/(N Y) pages. Y refers to the printed area, excluding upper and lower margins. Other Trial Parameters. We collected information on trial parameters that may have affected safety reporting. In particular, we wanted to evaluate: (1) whether dose-comparison studies may place more emphasis on safety; (2) whether high-impact journals, 17 and articles with significant results for efficacy, used less space for safety; (3) whether safety reporting was given more emphasis in larger trials, longterm trials, or masked trials; (4) whether sponsorship, type of population, and location of the trial affected reporting; (5) whether safety was less emphasized for drugs that had already been used for a different indication; and (6) whether the first trial for a new indication devoted more emphasis to safety. Furthermore, we evaluated whether the situation improved over time. In making these evaluations, we noted several trial characteristics (TABLE 1). Additional information about the characteristics of the considered trials, the data extracted from the trials, and the trials included in the database is available at: http://www.nemc.org/dccr/projects /safetyreporting/supplements.htm. Regression Analyses All characteristics listed in Table 1 were also used as predictors in least-squares regression models using the percentage of space devoted to safety reporting in the results section as the dependent variable. We estimated univariate models for each predictor adjusting for medical area (with dummy variables for medical areas). We also considered interaction Table 1. Characteristics of Included Trial Databases* Characteristic HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Sample size Median (interquartile range) 381 (217-829) 286 (160-397) 315 (145-670) 194 (150-222) 862 (434-3669) 154 (120-231) 192 (111-324) 286 (155-479) Total 36 820 11 990 53 739 3019 18 479 3872 2155 130 074 Double-blind, 49 (82) 15 (37) 19 (51) 15 (100) 6 (60) 8 (42) 5 (50) 117 (61) Dose comparison only, 7 (12) 5 (12) 0 0 0 1 (19) 0 13 (7) Dose comparison involved, 18 (30) 6 (15) 0 5 (33) 0 4 (21) 0 33 (17) Significant results 32 (53) 6 (15) 23 (62) 13 (87) 8 (80) 13 (68) 7 (70) 102 (53) for efficacy, Mainly government funding, 27 (45) 2 (5) 9 (24) 2 (13) 6 (60) 1 (5) 0 47 (24) Follow-up 1 y, 30 (50) 1 (2) 4 (11) 0 10 (100) 3 (16) 0 48 (25) Pediatric population, 6 (10) 4 (10) 0 0 0 0 0 10 (5) Prior use for other 11 (18) 35 (85) 0 7 (43) 10 (100) 19 (100) 10 (100) 92 (48) indication, Most patients in the US, 37 (62) 10 (24) 4 (11) 9 (60) 2 (20) 2 (11) 1 (10) 65 (34) Year of publication, mean (range) Journal with impact factor 7, First trial with 100 patients, 1994 (1987-1997) 1990 (1971-1998) 1984 (1969-1993) 1980 (1967-1987) 1985 (1972-1997) 1996 (1991-1999) 1991 (1989-1992) 1990 (1967-1999) 34 (57) 1 (2) 17 (46) 0 6 (60) 3 (16) 4 (40) 65 (34) 20 (33) 16 (39) 5 (37) 13 (87) 1 (10) 9 (47) 1 (10) 65 (34) *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract. Significant results indicated by P.05. Journal impact factor is the average number of citations received per year by an article published in a certain journal, within the 2 years following the year of its publication. Adults, children, and pregnant women count as different populations. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 439

terms between covariates and medical areas, but they did not improve model fit substantially. Multivariate models were also considered, either using all variables that were significant (P.05) in univariate analyses by forced entry, or starting from all variables with P.25 on the univariate analyses and stepwise eliminating variables with P.10 in the resulting models. The results were similar when univariate regressions were performed separately in each field and regression coefficients for each predictor were combined with general variance models. 18 We also performed logistic regressions to identify predictors of adequate reporting of clinical adverse events and laboratory-determined toxicity. All analyses were adjusted for medical area using dummy variables. Both univariate and multivariate models were considered. Again, we reached identical final results, whether considering multivariate models using all variables that were significant (P.05) in univariate analyses by forced entry, or starting from all variables with P.25 on the univariate analyses and stepwise eliminating variables with P.10 in the resulting models. Replication Two independent data extractors separately evaluated the 60 trial reports of HIV drug therapy. We observed very high interrater agreement. For the assessment of the adequacy of reporting of clinical adverse effects and laboratorydefined toxicity, the coefficients were 0.72 and 0.85, respectively. There were no instances in which 1 extractor considered the reporting adequate and the other inadequate. Moreover, we observed no important discrepancies in the data extraction of quantitative parameters. Thereafter, 1 reviewer examined trial reports on the other 6 areas. Analyses were performed in SPSS (SPSS Inc, Chicago, Ill). P values are 2-tailed. RESULTS Characteristics of Eligible Trials A total of 192 trials from 7 different medical areas (N=130074 patients) were included (Table 1). Trial characteristics differed across the selected areas. Trials with a sample size of more than 1000 patients had been performed only in HIV therapy and thrombolysis for AMI. A total of 117 trials were double blind, but the percentage of double-blind trials across areas varied from 37% to 100%. Trials involving dose comparisons were available in 4 areas. The large majority of the trials on acute sinusitis showed no statistically significant differences for efficacy, while statistically significant results were more common than nonsignificant results in the other areas. The percentage of trials with government funding varied from 0% to 60%. The percentage of trials with long-term follow-up ranged from 0% to 100%. Children were evaluated in the areas of HIV and sinusitis. There was wide variability in the proportion of trials where the drugs had been already used for another indication (0% to 100%). Trials had been conducted both in the United States and elsewhere and they covered a wide range of publication years. The percentage of trials published in journals with an impact factor higher than 7 ranged from 0% to 60% across medical areas. Overall, in accordance with our aim, the large diversity in trial characteristics across medical areas ensured the generalizability of the results. Qualitative Assessment of Safety Reporting Only 39% of trials had adequate reporting of clinical adverse effects and only 29% had adequate reporting of laboratory-determined toxicity. A further 11% (clinical adverse effects) and 8% (laboratory-determined toxicity) had partially adequate reporting. The numbers of discontinuations due to toxicity per study arm were mentioned in 75% of the trial reports, but specific reasons for these discontinuations were given only 46% of the time. For all these outcomes, there was statistically significant heterogeneity for the rates of adequate reporting (vs partially adequate and inadequate reporting combined) between the 7 medical areas (TABLE 2). Table 2. Qualitative Parameters of Safety Reporting* of Trials Safety Reporting HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Clinical adverse events Adequate reporting 19 (32) 17 (42) 23 (62) 8 (53) 2 (20) 6 (32) 0 75 (39) P Value for Heterogeneity Partially adequate reporting 12 (20) 0 6 (16) 1 (7) 0 3 (16) 0 22 (11).003 Inadequate reporting 29 (48) 24 (58) 8 (22) 6 (40) 8 (80) 10 (52) 10 (100) 95 (50) Laboratory-defined toxicity Adequate reporting 37 (62) 11 (27) 2 (5) 5 (33) 1 (10) 0 0 56 (29) Partially adequate reporting 8 (13) 2 (4) 1 (3) 3 (20) 1 (10) 0 0 15 (8).001 Inadequate reporting 15 (25) 28 (69) 34 (92) 7 (47) 8 (80) 19 (100) 10 (100) 121 (63) Discontinuations due to toxicity Number per arm given 49 (82) 39 (95) 17 (46) 15 (100) 4 (40) 16 (84) 3 (30) 143 (75).001 Reasons per arm given 23 (38) 28 (68) 15 (41) 9 (60) 3 (30) 8 (42) 2 (20) 88 (46).02 *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract. P values reported for comparison of adequate vs partially adequate and inadequate reporting across the 7 topics. 440 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.

Of the 95 trials with inadequate reporting of clinical adverse events, 3 gave the total number of serious or lifethreatening events but failed to specify their types, 52 gave numbers for various adverse effects but without separating severe adverse events, 11 only offered generic statements without specific numbers, and 29 did not report specifically on clinical adverse effects (although 12 of these mentioned discontinuations due to adverse effects, but no other relevant information). Among the 121 trials with inadequate reporting of laboratory-determined toxicity, 1 gave the total number of serious or worse toxicity but failed to specify its types, 8 gave numbers for various adverse effects but without separating severe toxicity, 16 offered generic statements without specific numbers, and 96 did not report anything on toxicity. Quantitative Parameters of Safety Reporting The absolute space allocated to safety information was limited (median, 0.3 page; interquartile range, 0.1-0.7 page). The median was less than half a page in all areas except for arthritis trials (TABLE 3). A similar picture emerged when we studied the percentage of the results dedicated to safety. Overall, the space given to safety information was the same as or less than the space given for the names of authors and their affiliations. In 92 trials the authors/ affiliations space was larger than the safety space, in 21 trials it was similar (within 0.05 of the length of a page), and in 79 trials safety reporting took more space (P=.16 by Wilcoxon test). Safety reporting took more space than the names of authors in trials of therapy for sinusitis (P.001) and rheumatoid arthritis (P.001), but it took less space than the names of authors in trials of SDGIT (P=.002) and HIV therapy (P=.06). More than half the trials included at least 1 table for safety information, while only 5% of reports included figures for such information. Regression Analyses In univariate analyses, the percentage of space in the results section devoted to safety was significantly larger in trials also making dose comparisons; similarly, the amount of space was larger in trials involving only dose comparisons. Conversely, emphasis on safety decreased when the trial found statistically significant results for efficacy (TABLE 4). There were also trends for more emphasis on safety in double-blind trials, and less emphasis on safety in trials studying drugs with a prior indication, but these were not significant. The results of multivariate models were consistent with the univariate findings (Table 4). In univariate regressions, the odds of adequate reporting of clinical adverse effects was 2.83-fold higher (95% CI, 1.34-6.00) in double-blind trials vs singleblind or unmasked trials, and it increased 4.14-fold (95% CI, 1.57-10.9) for each 10-fold increase in sample size. It also improved over time (increased 1.07- fold every year, [95% CI, 1.00-1.14] ). On the contrary, long-term trials were probably less likely to have adequate reporting of clinical adverse events than short-term trials (odds ratio [OR], 0.40; 95% CI, 0.16-1.01). Trends were also observed for better clinical reporting in trials in which dose comparisons were involved (OR, 1.67; 0.73-3.81), and worse clinical reporting in trials where there was already a prior indication for the tested medication (OR, 0.50; 95% CI, 0.18-1.39). The results of the final multivariate model were similar (doubleblinding: OR, 2.51, 95% CI, 1.13-5.57; per 10-fold increase in sample size: OR, 4.52, 95% CI, 1.51-13.6; per year: OR, 1.06, 95% CI, 0.99-1.13; long-term trials: OR, 0.27, 95% CI, 0.10-0.74). Adequate reporting of laboratorydetermined toxicity was less likely when there was a prior indication for the studied medication (OR, 0.33; 95% CI, 0.13-0.88) and possibly when the efficacy results reached statistical significance (OR, 0.61; 95% CI, 0.26-1.42) and in pediatric trials (OR, 0.39; 95% CI, 0.09-1.74). Adequate reporting of laboratory-determined toxicity was more likely in trials performed mostly in the United States (OR, 2.29; 95% CI, 1.04-5.04) and possibly when there was government funding (OR, 1.87; 95% CI, Table 3. Quantitative Parameters of Safety Reporting* Safety Reporting HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Extent of space for safety reporting Absolute pages, median (IQR) 0.5 (0.2-0.8) 0.4 (0.15-0.9) 0.2 (0.1-0.4) 1.2 (0.8-1.3) 0.05 (0-0.3) 0.2 (0.05-0.35) 0.05 (0-0.05) 0.3 (0.1-0.7) Percentage of results, 13 (7-19) 17 (7-23) 11 (4-17) 22 (18-31) 3 (0-9) 11 (5-22) 0 (0-2) 12 (5-22) median (IQR) Relative emphasis on safety reporting Safety space affiliations space, No. 34 10 22 0 6 10 10 92 Safety space = affiliations space, No. 8 6 1 1 2 3 0 21 Safety space affiliations 18 25 14 14 2 6 0 79 space, No. Tables and figures for safety data 1 Table for safety, 41 (68) 23 (56) 22 (59) 12 (80) 2 (20) 10 (53) 0 110 (57) 1 Figure for safety, 8 (13) 0 0 0 0 1 (5) 0 9 (5) *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract; and IQR, interquartile range. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 441

0.77-4.57). In multivariate modeling, prior indication (OR, 0.33; 95% CI, 0.12-0.90) and US location (OR, 2.29; 95% CI, 1.03-5.10) remained significant independent predictors. COMMENT An evaluation of safety reporting in randomized trials across 7 different medical areas proves that safety reporting is often inadequate and neglected. Key information that would take minimal space to report is often missing. The extent of neglect varies significantly across medical areas. However, in the 7 medical areas we examined, we found no instances where safety reporting can be deemed satisfactory. With 1 exception, safety reporting takes less than a half page in the average trial report; at least as much space is taken by the listing of the names and affiliations of the trial contributors and authors. Safety gets more space in trials in which dose comparisons are involved. This is not surprising, since such trials usually aim to show that a certain dose has similar efficacy, but shows superior tolerability and fewer adverse effects. Otherwise, adverse effects are even more neglected in trials that report statistically significant results for efficacy. Adequate reporting of clinical adverse events was seen in only 39% of trial reports. Many trials reported clinical adverse events without distinction of severity. There was some evidence that the situation may be improving over time, and that double-blind studies and large studies may pay more attention to clinical adverse events. Longterm trials were less likely to report such events adequately, perhaps because long-term trials are conducted with a strong emphasis on clinical efficacy rather than safety outcomes. Adequate reporting of laboratorydetermined toxicity occurred only in 29% of the trial reports. Half of the trial reports failed to mention laboratorydetermined toxicity altogether. Trials of drugs with prior indications fared significantly worse in this aspect, just as they did for the reporting of clinical adverse events. For drugs with prior established indications, authors may not Table 4. Regression Analyses for Predictors of the Percentage of Space Devoted to Safety in the Results* Increase in Percentage of Space Devoted to Safety in the Results Section Univariate Analyses (Adjusted for Medical Areas) Multivariate Analysis (Adjusted for Medical Areas)* P P Predictor Effect Size (95% CI) Value Effect Size (95% CI) Value Sample size (per 10-fold increase) 1.5 ( 3.1 to 6.0).53... Double-blind 2.7 ( 1.1 to 6.5).16... Dose comparison only 15.6 (9.2 to 22.1).001 11.0 (2.9 to 19.1).008 Dose comparison involved 9.3 (4.7 to 13.8).001 4.4 ( 1.1 to 9.9).12 Significant results for efficacy 3.8 ( 7.5 to 0.1).047 2.2 ( 5.8 to 1.4).24 Mainly government funding 0.7 ( 3.6 to 5.1).74... Follow-up 1 y 2.6 ( 7.5 to 2.3).30... Pediatric population 1.9 ( 9.6 to 5.9).63... Prior use for other indication 4.1 ( 9.6 to 1.4).14... Most patients in the US 1.4 ( 2.6 to 5.4).49... Year of publication (per 10 years) 0.8 ( 4.1 to 2.5).62... Journal with impact factor 7 2.4 ( 6.5 to 1.7).24... First trial with 100 patients 1.4 ( 2.5 to 5.3).69... *Model considering all variables with P.05 in univariate analyses; stepwise elimination or stepwise entry models retain only the Dose comparison only variable as a significant predictor. CI indicates confidence interval; ellipses, predictor not included in the multivariate model. Significant results indicated by P.05. See Table 1 for definition of impact factor. Adults, children, and pregnant women count as different populations. feel compelled to repeat what might be considered standard knowledge, even if the population and indication under study are different. The better reporting of laboratory-determined toxicity from trials performed in the United States is more difficult to explain. If not a chance finding due to the large number of associations examined in our study, it could reflect a difference in reporting across continents or a difference in the phase of trials performed in the United States vs other countries. Our figures may be overestimating the actual emphasis that is given to safety as compared with efficacy. While further subsequent publications for efficacy involving subgroup analyses, surrogate marker analyses, and specialized analyses are very frequent in pivotal randomized trials, we could not identify other follow-up publications from these trials focusing comprehensively on toxicity results (with 2 exceptions in the area of HIV therapy and 1 exception in thrombolysis for AMI). Moreover, since we examined 7 different medical areas, the results of our evaluation are likely to be generalizable across medical specialties. Other investigators also have presented preliminary data that safety reporting is neglected in trials of anesthesia and pain management. 19 We should acknowledge that comparative clinical trials such as those included in our evaluation have several limitations in providing information about medication adverse effects. They are unlikely to reveal uncommon but important adverse effects occurring in fewer than 1 in 1000 patients. Important adverse events are often recognized many years after a medication has passed the clinical trial stage and has been used extensively in the community. 20 Nevertheless, randomized clinical trials of adequate sample size offer the best (and only) opportunity for assessing the frequency and severity of common side effects from a new medication in a controlled setting. While there have been major strides in standardizing the collection, analysis, and reporting of efficacy data in clinical trials, 1-3 efforts to assess and improve 442 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.

the quality of analysis and reporting of safety data are lagging behind. This important deficiency needs to be corrected, if we wish to use quantitative objective evidence both for the efficacy and the toxicity of specific treatments in making therapeutic decisions. Such information may complement the data obtained from postmarketing reporting. 21 Postmarketing reporting is very important, but is highly dependent on the reporting of adverse events; such reporting can be spontaneous, sporadic, and erratic. Standardization of safety reporting may allow the performance of more reliable meta-analyses of safety information that may complement the meta-analyses performed on efficacy parameters. Meta-analysis of toxicity data has been hampered in the past because of inadequate safety reporting, as also admitted by other authors. 22 In our experience, most high-quality trials amass an enormous amount of information about safety and adverse effects during their conduct as part of regulatory requirements. Yet, the selective filtering of all these data into a quarter of a page can hardly be adequate. The simple storage of such information in company archives does not help the educated clinician who wants to critically interpret the efficacy vs toxicity data in each large trial considering the specific trial population, dosing, concomitant medications used, setting, study design, and other factors. The reliability and generalizability of the information conveyed in medication brochures is uncertain and cannot be critically evaluated. The set of parameters we have developed to evaluate safety reporting offers a standardized evaluation tool. It may complement the CONSORT statement that has been developed in an effort to standardize the reporting of the design and efficacy outcomes of randomized clinical trials. 2 Descriptors for such an addendum to the CONSORT statement might be summarized as follows: (a) Specify the number of patients withdrawn from the study because of adverse effects, per study arm and per type of adverse effect; (b) Provide the number of specific adverse effects per study arm and per type of adverse effect. Give exact numbers, especially for high-grade (severe) clinical adverse events and laboratory-determined toxicity; and (c) Tabulation of safety information per study arm and severity grade is encouraged, as well as detailed description of cases of unusual or previously unrecorded adverse effects. This addendum may be used in future research and may offer a guide to investigators and journals for the concise, focused reporting of adverse effects and toxicity. Author Contributions: Dr Ioannidis participated in study concept and design, acquisition of data, analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, provided statistical expertise, and obtained funding. Dr Lau participated in study concept and design, acquisition of data, analysis and interpretation of data, critical revision of the manuscript for important intellectual content, obtained funding, and provided administrative, technical, or material support. Funding/Support: This work was supported in part by grant R03 HS10345 from the Agency for Healthcare Research and Quality of the US Public Health Service. Acknowledgment: We would like to thank Peter Gøtzsche, MD, for providing the list of NSAID randomized controlled trials used in his meta-analysis. We also thank Depsina G. Contopoulos-Ioannidis, MD, for her contribution in the set of HIV trials. REFERENCES 1. Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials: a survey of three medical journals. N Engl J Med. 1987;317:426-432. 2. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276:637-639. 3. The Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature. Checklist of information for inclusion in reports of clinical trials. Ann Intern Med. 1996;124:741-743. 4. Ioannidis JPA, Contopoulos-Ioannidis DG. Reporting of safety data from randomized trials. Lancet. 1998; 352:1752-1753. 5. Vollset SE. Confidence intervals for a binomial proportion. Stat Med. 1993;12:809-824. 6. Ioannidis JPA, Cappelleri JC, Sacks HS, Lau J. The relationship between study design, results, and reporting of randomized trials of HIV infection. Control Clin Trials. 1997;18:431-444. 7. Diagnosis and Treatment of Acute Bacterial Sinusitis: Evidence Report/Technology Assessment No. 17 [prepared by New England Medical Center Evidence-Based Practice Center, under contract No. 290-97-0019]. Rockville, Md: Agency for Health Care Policy and Research; 1998. AHCPR publication 99-E0017. 8. Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327:248-254. 9. Lau J, Schmid CH, Chalmers TC. Cumulative metaanalysis of clinical trials builds evidence for exemplary medical care. J Clin Epidemiol. 1995;48:45-57. 10. Insua JT, Sacks HS, Lau TS, et al. Drug treatment of hypertension in the elderly: a meta-analysis. Ann Intern Med. 1994;121:355-362. 11. Schmid CH, Whiting G, Cory D, Ross SD, Chalmers TC. Omeprazole plus antibiotics in the eradication of Helicobacter pyloriinfection: a meta-regression analysis of randomized, controlled trials. Am J Ther. 1999; 6:25-36. 12. Veldhuyzen SJ, van Zanten V, Sherman P. Indications for treatment of Helicobacter pylori infection: a systematic overview. CMAJ. 1994;150:189-198. 13. Selective Decontamination of the Digestive Tract Trialists Collaborative Group. Meta-analysis of randomised controlled trials of selective decontamination of the digestive tract. BMJ. 1993;307:525-532. 14. Vandenbroucke-Grauls CM, Vandenbroucke JP. Effect of selective decontamination of the digestive tract on respiratory tract infections and mortality in the intensive care unit. Lancet. 1991;338:859-862. 15. Gotzsche PC. Sensitivity of effect variables in rheumatoid arthritis: a meta-analysis of 130 placebo controlled NSAID trials. J Clin Epidemiol. 1990;43:1313-1318. 16. Mulrow C, Lau J, Cornell J, Brand M. Pharmacotherapy for hypertension in the elderly [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000: issue 1. 17. Institute for Scientific Information. Journal Citation Index, 1998. 18. Petitti DB. Meta-analysis, Decision Analysis, and Cost-effectiveness Analysis. New York, NY: Oxford University Press; 1994. 19. Edwards JE, McQuay HJ, Moore AR, Collins SL. Reporting of adverse effects in clinical trials should be improved: lessons from acute postoperative pain. J Pain Symptom Manage. 1999;18:427-437. 20. Venning GR. Identification of adverse reactions to new drugs, II: how were 18 important adverse reactions discovered and with what delays? BMJ. 1983; 286:289-292. 21. Kessler DA. Introducing MedWatch: a new approach to reporting medication and device adverse effects and product problems. JAMA. 1993;269:2765-2768. 22. Chalmers TC, Berrier J, Hewitt P, et al. Metaanalysis of randomized controlled trials as a method of estimating rare complications of non-steroidal antiinflammatory drug therapy. Aliment Pharmacol Ther. 1988;2(suppl 1):9-26. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 443