ORIGINAL CONTRIBUTION Completeness of Safety Reporting in Randomized Trials An Evaluation of 7 Medical Areas John P. A. Ioannidis, MD Joseph Lau, MD See also p 444. Context Randomized trials with adequate sample size offer an opportunity to assess the safety of new medications in a controlled setting; however, generalizable data on drug safety reporting are sparse. Objective To scrutinize the completeness of safety reporting in randomized trials. Design, Setting, and Patients Survey of safety reporting in 192 randomized drug trials 7 diverse topics with sample sizes of at least 100 patients and at least 50 patients in a study arm (N=130 074 patients). Trial reports were identified from comprehensive meta-analyses in 7 medical areas. Main Outcome Measures Adequate reporting of specific adverse effects and frequency and reasons for withdrawals due to toxic effects; article space allocated to safety reporting and predictors of such reporting. Results Severity of clinical adverse effects and laboratory-determined toxicity was adequately defined in only 39% and 29% of trial reports, respectively. Only 46% of trials stated the frequency of specific reasons for discontinuation of study treatment due to toxicity. For these 3 parameters, there was significant heterogeneity in rates of adequate reporting across topics (P=.003, P.001, and P=.02, respectively). Overall, the median space allocated to safety results was 0.3 page. A similar amount of space was devoted to contributor names and affiliations (P=.16). On average, the percentage of space devoted to safety in the results section was 9.3% larger in trials involving dose comparisons than in those that did not (P.001) and 3.8% smaller in trials reporting statistically significant results for efficacy outcomes (P=.047). Conclusions The quality and quantity of safety reporting vary across medical areas, study designs, and settings but they are largely inadequate. Current standards for safety reporting in randomized trials should be revised to address this inadequacy. JAMA. 2001;285:437-443 www.jama.com RANDOMIZED TRIALS WITH adequate sample size offer a unique opportunity to assess the frequency and severity of adverse events in a controlled and objective setting, with the most comprehensive and systematic accumulation of pertinent information. Such information is important in estimating benefit-harm ratios in the application of medical interventions. However, compared with the heightened scrutiny of the conduct, analysis, and reporting of randomized trials in general, 1-3 assessment of the reporting of adverse events and toxicity has only recently drawn some attention. In a preliminary analysis of the quality of safety reporting, we observed that toxicity data may sometimes be presented erratically or may be missing altogether. 4 A limitation of this preliminary work was that it considered trials in only 1 medical domain, ie, drug therapy for human immunodeficiency virus (HIV) infection. However, the adequacy of safety reporting may be different in other medical areas. Therefore, in the present study, we extended our evaluation of safety reporting to 7 different areas of drug therapy. In doing so, we also sought to understand the settings and predictors that lead to suboptimal safety reporting, and to gain insight for improving these important deficiencies. METHODS Trial Databases Randomized trials of drug therapies qualified for the analysis if they had a sample size of at least 100 patients and at least 50 patients allocated to a study arm. Smaller trials give very uncertain estimates for even the most frequent adverse events, and may completely miss even relatively common toxicity. With 100 patients, when no subjects are observed to experience or report a specific adverse effect, the upper limit of the 95% confidence interval (CI) for the Author Affiliations: Clinical Trials and Evidence- Based Medicine Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece (Dr Ioannidis); and Division of Clinical Care Research, Department of Medicine, New England Medical Center, Boston, Mass (Drs Ioannidis and Lau). Corresponding Author and Reprints: Joseph Lau, MD, Division of Clinical Care Research, New England Medical Center, 750 Washington St, Box 63, Boston, MA 02111 (e-mail: JLau1@Lifespan.org). 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 437
true frequency of this unobserved effect is still 3%. With 50 patients assigned to an experimental arm, the upper limit of the 95% CI is as high as 6%. 5 We analyzed safety reporting in randomized drug trials in the following 7 medical areas: (1) HIV therapy (all therapeutic trials, excluding immunization [passive immunotherapy, vaccines] and treatment of complications); (2) antibiotic therapy for acute sinusitis (all comparisons of antibiotics among themselves or with placebo or no therapy); (3) thrombolysis for acute myocardial infarction (AMI) (all comparisons of different thrombolytic regimens against placebo or no therapy); (4) use of nonsteroidal antiinflammatory drugs (NSAIDs) for rheumatoid arthritis (comparisons of NSAIDs among themselves or with placebo or no therapy); (5) treatment of hypertension in elderly persons (all comparisons of antihypertensive regimens among themselves or with placebo or no therapy in this age group); (6) treatment of Helicobacter pylori with antibiotics (all comparisons of any antibiotic regimen until 1994 and all comparisons involving proton-pump inhibitors [omeprazole] with antibiotics until 1999); and (7) selective decontamination of the gastrointestinal tract (SDGIT) (all comparisons against placebo or no therapy). These topics represent a diverse set of medical questions that have a significant public health impact (ie, the diseases are very common and/or have major morbidity). Also, a comprehensive list of the pertinent trials would be easily retrievable from systematic databases and/or meta-analyses performed by either our group or prior investigators. We purposely included topics spanning the acute and chronic care settings and both inpatient and outpatient settings to maximize generalizability. However, we focused on drug therapies and excluded surgical interventions and vaccines, since safety may be assessed differently for such interventions. We identified the pertinent trials for each topic using several sources. For the HIV, sinusitis, AMI, hypertension, and H pylori topics, we used comprehensive databases of randomized controlled trials that have been developed by members of our research group as part of previous work on evidence reports, meta-analyses, and methodologic studies. 4,6-11 Typically, MEDLINE and EMBASE searches using the names of specific medications and a large array of terms pertaining to randomized trials were complemented with handsearching of journals specializing in the given area, identification of trials referenced in retrieved publications, and communications with experts. For these 5 topics, databases cover until late 1997 or early 1998 (1999 for H pylori). For the H pylori topic, we also retrieved trials from a meta-analysis published by a different team that addressed all trials of antibiotic regimens published until 1994. 12 For the rheumatoid arthritis and SDGIT topics, we used the trial databases from meta-analyses published by other teams. 13-15 These cover randomized trials published until 1993 13,14 and 1989, 15 respectively. For the topic of hypertension in the elderly, we also consulted the respective systematic review in the Cochrane Library. 16 We did not systematically update these metaanalyses, since the aim of the project was not a thorough meta-analysis to estimate the treatment effects based on the most current data. Evaluation and Analysis Qualitative and Quantitative Parameters of Safety Reporting. Wedecided to use both qualitative and quantitative components of adverse event reporting; these components may offer complementary information. The list of these parameters was originally developed for the evaluation of HIVrelated drug trials. 4 Although it is difficult to specify which aspects of safety reporting are most important, some aspects are probably indispensable, if the reported information is to be used and interpreted for clinical purposes. First, data should be given with numbers. Generic statements (eg, few patients had side effects ) cannot be objectively appraised and may be misinterpreted. Second, the severity of adverse effects should be stated and, at minimum, the frequency of severe or life-threatening toxic events should be provided per study arm. Standardized scales for grading toxicity and definitions for severity are both important for this purpose. Third, data should be given separately for each specific type of severe adverse effect so clinicians can determine what kind of harm is involved. Based on the above, we selected the following 2 qualitative components: (1) whether the number of withdrawals and discontinuations of study treatment due to toxicity are reported, and whether the number was given for each specific type of adverse effect leading to withdrawal; and (2) whether the severity of the described clinical adverse events or abnormalities of laboratory tests (laboratory-determined toxicity) were adequately defined, only partially defined, or inadequately defined. Adequate definition of severity requires either detailed description of the severity or reference to a known scale of toxicity severity (typically with grades being 1=mild, 2=moderate, 3=severe, 4=life-threatening), with separate reporting of at least severe or lifethreatening events. At least 2 adverse effects (clinical or laboratory) have to be defined in this way, with numbers or rates given for each study arm. Partial definition of severity means that reports of severity combine moderate with severe or life-threatening toxicity counts, or that the number of severe or life-threatening toxicity cases are separately specified for only 1 of many reported clinical adverse events and laboratory abnormalities per study arm. Inadequate definition of severity includes protocols reporting the total number of severe clinical or laboratorydetermined toxic effects without giving numbers on specific types of adverse events per arm, those that lumped numbers for all grades of toxicity without separating any grades for any specific adverse events, those providing 438 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.
only generic statements, and those not reporting adverse effects at all. The common characteristic of all these situations is that information is missing on the frequency of severe adverse effects information that is directly relevant for the estimation of benefitharm ratios. Quantitative measures assess the relative emphasis given to safety in the results of published trials. We specified these measures as the extent of space (in printed pages) devoted to safety in the results section, and the proportion it represents of the whole results section; and the space devoted to safety as compared with the space devoted to the names and affiliations of authors, participants, and contributors in the same trial report. The space allocated to toxicity is not necessarily correlated with the quality of reporting, but it complements the qualitative assessment, because it is an objective estimate of the relative importance that safety has in the overall clinical trial report. In our study, we measured the space for each section with a resolution of 0.05 page. When there were N columns per page and a printed page had a length of Y centimeters, a section spanning a length of S centimeters in 1 such column was calculated as occupying S/(N Y) pages. Y refers to the printed area, excluding upper and lower margins. Other Trial Parameters. We collected information on trial parameters that may have affected safety reporting. In particular, we wanted to evaluate: (1) whether dose-comparison studies may place more emphasis on safety; (2) whether high-impact journals, 17 and articles with significant results for efficacy, used less space for safety; (3) whether safety reporting was given more emphasis in larger trials, longterm trials, or masked trials; (4) whether sponsorship, type of population, and location of the trial affected reporting; (5) whether safety was less emphasized for drugs that had already been used for a different indication; and (6) whether the first trial for a new indication devoted more emphasis to safety. Furthermore, we evaluated whether the situation improved over time. In making these evaluations, we noted several trial characteristics (TABLE 1). Additional information about the characteristics of the considered trials, the data extracted from the trials, and the trials included in the database is available at: http://www.nemc.org/dccr/projects /safetyreporting/supplements.htm. Regression Analyses All characteristics listed in Table 1 were also used as predictors in least-squares regression models using the percentage of space devoted to safety reporting in the results section as the dependent variable. We estimated univariate models for each predictor adjusting for medical area (with dummy variables for medical areas). We also considered interaction Table 1. Characteristics of Included Trial Databases* Characteristic HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Sample size Median (interquartile range) 381 (217-829) 286 (160-397) 315 (145-670) 194 (150-222) 862 (434-3669) 154 (120-231) 192 (111-324) 286 (155-479) Total 36 820 11 990 53 739 3019 18 479 3872 2155 130 074 Double-blind, 49 (82) 15 (37) 19 (51) 15 (100) 6 (60) 8 (42) 5 (50) 117 (61) Dose comparison only, 7 (12) 5 (12) 0 0 0 1 (19) 0 13 (7) Dose comparison involved, 18 (30) 6 (15) 0 5 (33) 0 4 (21) 0 33 (17) Significant results 32 (53) 6 (15) 23 (62) 13 (87) 8 (80) 13 (68) 7 (70) 102 (53) for efficacy, Mainly government funding, 27 (45) 2 (5) 9 (24) 2 (13) 6 (60) 1 (5) 0 47 (24) Follow-up 1 y, 30 (50) 1 (2) 4 (11) 0 10 (100) 3 (16) 0 48 (25) Pediatric population, 6 (10) 4 (10) 0 0 0 0 0 10 (5) Prior use for other 11 (18) 35 (85) 0 7 (43) 10 (100) 19 (100) 10 (100) 92 (48) indication, Most patients in the US, 37 (62) 10 (24) 4 (11) 9 (60) 2 (20) 2 (11) 1 (10) 65 (34) Year of publication, mean (range) Journal with impact factor 7, First trial with 100 patients, 1994 (1987-1997) 1990 (1971-1998) 1984 (1969-1993) 1980 (1967-1987) 1985 (1972-1997) 1996 (1991-1999) 1991 (1989-1992) 1990 (1967-1999) 34 (57) 1 (2) 17 (46) 0 6 (60) 3 (16) 4 (40) 65 (34) 20 (33) 16 (39) 5 (37) 13 (87) 1 (10) 9 (47) 1 (10) 65 (34) *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract. Significant results indicated by P.05. Journal impact factor is the average number of citations received per year by an article published in a certain journal, within the 2 years following the year of its publication. Adults, children, and pregnant women count as different populations. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 439
terms between covariates and medical areas, but they did not improve model fit substantially. Multivariate models were also considered, either using all variables that were significant (P.05) in univariate analyses by forced entry, or starting from all variables with P.25 on the univariate analyses and stepwise eliminating variables with P.10 in the resulting models. The results were similar when univariate regressions were performed separately in each field and regression coefficients for each predictor were combined with general variance models. 18 We also performed logistic regressions to identify predictors of adequate reporting of clinical adverse events and laboratory-determined toxicity. All analyses were adjusted for medical area using dummy variables. Both univariate and multivariate models were considered. Again, we reached identical final results, whether considering multivariate models using all variables that were significant (P.05) in univariate analyses by forced entry, or starting from all variables with P.25 on the univariate analyses and stepwise eliminating variables with P.10 in the resulting models. Replication Two independent data extractors separately evaluated the 60 trial reports of HIV drug therapy. We observed very high interrater agreement. For the assessment of the adequacy of reporting of clinical adverse effects and laboratorydefined toxicity, the coefficients were 0.72 and 0.85, respectively. There were no instances in which 1 extractor considered the reporting adequate and the other inadequate. Moreover, we observed no important discrepancies in the data extraction of quantitative parameters. Thereafter, 1 reviewer examined trial reports on the other 6 areas. Analyses were performed in SPSS (SPSS Inc, Chicago, Ill). P values are 2-tailed. RESULTS Characteristics of Eligible Trials A total of 192 trials from 7 different medical areas (N=130074 patients) were included (Table 1). Trial characteristics differed across the selected areas. Trials with a sample size of more than 1000 patients had been performed only in HIV therapy and thrombolysis for AMI. A total of 117 trials were double blind, but the percentage of double-blind trials across areas varied from 37% to 100%. Trials involving dose comparisons were available in 4 areas. The large majority of the trials on acute sinusitis showed no statistically significant differences for efficacy, while statistically significant results were more common than nonsignificant results in the other areas. The percentage of trials with government funding varied from 0% to 60%. The percentage of trials with long-term follow-up ranged from 0% to 100%. Children were evaluated in the areas of HIV and sinusitis. There was wide variability in the proportion of trials where the drugs had been already used for another indication (0% to 100%). Trials had been conducted both in the United States and elsewhere and they covered a wide range of publication years. The percentage of trials published in journals with an impact factor higher than 7 ranged from 0% to 60% across medical areas. Overall, in accordance with our aim, the large diversity in trial characteristics across medical areas ensured the generalizability of the results. Qualitative Assessment of Safety Reporting Only 39% of trials had adequate reporting of clinical adverse effects and only 29% had adequate reporting of laboratory-determined toxicity. A further 11% (clinical adverse effects) and 8% (laboratory-determined toxicity) had partially adequate reporting. The numbers of discontinuations due to toxicity per study arm were mentioned in 75% of the trial reports, but specific reasons for these discontinuations were given only 46% of the time. For all these outcomes, there was statistically significant heterogeneity for the rates of adequate reporting (vs partially adequate and inadequate reporting combined) between the 7 medical areas (TABLE 2). Table 2. Qualitative Parameters of Safety Reporting* of Trials Safety Reporting HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Clinical adverse events Adequate reporting 19 (32) 17 (42) 23 (62) 8 (53) 2 (20) 6 (32) 0 75 (39) P Value for Heterogeneity Partially adequate reporting 12 (20) 0 6 (16) 1 (7) 0 3 (16) 0 22 (11).003 Inadequate reporting 29 (48) 24 (58) 8 (22) 6 (40) 8 (80) 10 (52) 10 (100) 95 (50) Laboratory-defined toxicity Adequate reporting 37 (62) 11 (27) 2 (5) 5 (33) 1 (10) 0 0 56 (29) Partially adequate reporting 8 (13) 2 (4) 1 (3) 3 (20) 1 (10) 0 0 15 (8).001 Inadequate reporting 15 (25) 28 (69) 34 (92) 7 (47) 8 (80) 19 (100) 10 (100) 121 (63) Discontinuations due to toxicity Number per arm given 49 (82) 39 (95) 17 (46) 15 (100) 4 (40) 16 (84) 3 (30) 143 (75).001 Reasons per arm given 23 (38) 28 (68) 15 (41) 9 (60) 3 (30) 8 (42) 2 (20) 88 (46).02 *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract. P values reported for comparison of adequate vs partially adequate and inadequate reporting across the 7 topics. 440 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.
Of the 95 trials with inadequate reporting of clinical adverse events, 3 gave the total number of serious or lifethreatening events but failed to specify their types, 52 gave numbers for various adverse effects but without separating severe adverse events, 11 only offered generic statements without specific numbers, and 29 did not report specifically on clinical adverse effects (although 12 of these mentioned discontinuations due to adverse effects, but no other relevant information). Among the 121 trials with inadequate reporting of laboratory-determined toxicity, 1 gave the total number of serious or worse toxicity but failed to specify its types, 8 gave numbers for various adverse effects but without separating severe toxicity, 16 offered generic statements without specific numbers, and 96 did not report anything on toxicity. Quantitative Parameters of Safety Reporting The absolute space allocated to safety information was limited (median, 0.3 page; interquartile range, 0.1-0.7 page). The median was less than half a page in all areas except for arthritis trials (TABLE 3). A similar picture emerged when we studied the percentage of the results dedicated to safety. Overall, the space given to safety information was the same as or less than the space given for the names of authors and their affiliations. In 92 trials the authors/ affiliations space was larger than the safety space, in 21 trials it was similar (within 0.05 of the length of a page), and in 79 trials safety reporting took more space (P=.16 by Wilcoxon test). Safety reporting took more space than the names of authors in trials of therapy for sinusitis (P.001) and rheumatoid arthritis (P.001), but it took less space than the names of authors in trials of SDGIT (P=.002) and HIV therapy (P=.06). More than half the trials included at least 1 table for safety information, while only 5% of reports included figures for such information. Regression Analyses In univariate analyses, the percentage of space in the results section devoted to safety was significantly larger in trials also making dose comparisons; similarly, the amount of space was larger in trials involving only dose comparisons. Conversely, emphasis on safety decreased when the trial found statistically significant results for efficacy (TABLE 4). There were also trends for more emphasis on safety in double-blind trials, and less emphasis on safety in trials studying drugs with a prior indication, but these were not significant. The results of multivariate models were consistent with the univariate findings (Table 4). In univariate regressions, the odds of adequate reporting of clinical adverse effects was 2.83-fold higher (95% CI, 1.34-6.00) in double-blind trials vs singleblind or unmasked trials, and it increased 4.14-fold (95% CI, 1.57-10.9) for each 10-fold increase in sample size. It also improved over time (increased 1.07- fold every year, [95% CI, 1.00-1.14] ). On the contrary, long-term trials were probably less likely to have adequate reporting of clinical adverse events than short-term trials (odds ratio [OR], 0.40; 95% CI, 0.16-1.01). Trends were also observed for better clinical reporting in trials in which dose comparisons were involved (OR, 1.67; 0.73-3.81), and worse clinical reporting in trials where there was already a prior indication for the tested medication (OR, 0.50; 95% CI, 0.18-1.39). The results of the final multivariate model were similar (doubleblinding: OR, 2.51, 95% CI, 1.13-5.57; per 10-fold increase in sample size: OR, 4.52, 95% CI, 1.51-13.6; per year: OR, 1.06, 95% CI, 0.99-1.13; long-term trials: OR, 0.27, 95% CI, 0.10-0.74). Adequate reporting of laboratorydetermined toxicity was less likely when there was a prior indication for the studied medication (OR, 0.33; 95% CI, 0.13-0.88) and possibly when the efficacy results reached statistical significance (OR, 0.61; 95% CI, 0.26-1.42) and in pediatric trials (OR, 0.39; 95% CI, 0.09-1.74). Adequate reporting of laboratory-determined toxicity was more likely in trials performed mostly in the United States (OR, 2.29; 95% CI, 1.04-5.04) and possibly when there was government funding (OR, 1.87; 95% CI, Table 3. Quantitative Parameters of Safety Reporting* Safety Reporting HIV (n = 60) Sinusitis (n = 41) AMI (n = 37) Arthritis (n = 15) Hypertension Helicobacter pylori (n = 19) SDGIT All (N = 192) Extent of space for safety reporting Absolute pages, median (IQR) 0.5 (0.2-0.8) 0.4 (0.15-0.9) 0.2 (0.1-0.4) 1.2 (0.8-1.3) 0.05 (0-0.3) 0.2 (0.05-0.35) 0.05 (0-0.05) 0.3 (0.1-0.7) Percentage of results, 13 (7-19) 17 (7-23) 11 (4-17) 22 (18-31) 3 (0-9) 11 (5-22) 0 (0-2) 12 (5-22) median (IQR) Relative emphasis on safety reporting Safety space affiliations space, No. 34 10 22 0 6 10 10 92 Safety space = affiliations space, No. 8 6 1 1 2 3 0 21 Safety space affiliations 18 25 14 14 2 6 0 79 space, No. Tables and figures for safety data 1 Table for safety, 41 (68) 23 (56) 22 (59) 12 (80) 2 (20) 10 (53) 0 110 (57) 1 Figure for safety, 8 (13) 0 0 0 0 1 (5) 0 9 (5) *HIV indicates human immunodeficiency virus; AMI, acute myocardial infarction; SDGIT, selective decontamination of the gastrointestinal tract; and IQR, interquartile range. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 441
0.77-4.57). In multivariate modeling, prior indication (OR, 0.33; 95% CI, 0.12-0.90) and US location (OR, 2.29; 95% CI, 1.03-5.10) remained significant independent predictors. COMMENT An evaluation of safety reporting in randomized trials across 7 different medical areas proves that safety reporting is often inadequate and neglected. Key information that would take minimal space to report is often missing. The extent of neglect varies significantly across medical areas. However, in the 7 medical areas we examined, we found no instances where safety reporting can be deemed satisfactory. With 1 exception, safety reporting takes less than a half page in the average trial report; at least as much space is taken by the listing of the names and affiliations of the trial contributors and authors. Safety gets more space in trials in which dose comparisons are involved. This is not surprising, since such trials usually aim to show that a certain dose has similar efficacy, but shows superior tolerability and fewer adverse effects. Otherwise, adverse effects are even more neglected in trials that report statistically significant results for efficacy. Adequate reporting of clinical adverse events was seen in only 39% of trial reports. Many trials reported clinical adverse events without distinction of severity. There was some evidence that the situation may be improving over time, and that double-blind studies and large studies may pay more attention to clinical adverse events. Longterm trials were less likely to report such events adequately, perhaps because long-term trials are conducted with a strong emphasis on clinical efficacy rather than safety outcomes. Adequate reporting of laboratorydetermined toxicity occurred only in 29% of the trial reports. Half of the trial reports failed to mention laboratorydetermined toxicity altogether. Trials of drugs with prior indications fared significantly worse in this aspect, just as they did for the reporting of clinical adverse events. For drugs with prior established indications, authors may not Table 4. Regression Analyses for Predictors of the Percentage of Space Devoted to Safety in the Results* Increase in Percentage of Space Devoted to Safety in the Results Section Univariate Analyses (Adjusted for Medical Areas) Multivariate Analysis (Adjusted for Medical Areas)* P P Predictor Effect Size (95% CI) Value Effect Size (95% CI) Value Sample size (per 10-fold increase) 1.5 ( 3.1 to 6.0).53... Double-blind 2.7 ( 1.1 to 6.5).16... Dose comparison only 15.6 (9.2 to 22.1).001 11.0 (2.9 to 19.1).008 Dose comparison involved 9.3 (4.7 to 13.8).001 4.4 ( 1.1 to 9.9).12 Significant results for efficacy 3.8 ( 7.5 to 0.1).047 2.2 ( 5.8 to 1.4).24 Mainly government funding 0.7 ( 3.6 to 5.1).74... Follow-up 1 y 2.6 ( 7.5 to 2.3).30... Pediatric population 1.9 ( 9.6 to 5.9).63... Prior use for other indication 4.1 ( 9.6 to 1.4).14... Most patients in the US 1.4 ( 2.6 to 5.4).49... Year of publication (per 10 years) 0.8 ( 4.1 to 2.5).62... Journal with impact factor 7 2.4 ( 6.5 to 1.7).24... First trial with 100 patients 1.4 ( 2.5 to 5.3).69... *Model considering all variables with P.05 in univariate analyses; stepwise elimination or stepwise entry models retain only the Dose comparison only variable as a significant predictor. CI indicates confidence interval; ellipses, predictor not included in the multivariate model. Significant results indicated by P.05. See Table 1 for definition of impact factor. Adults, children, and pregnant women count as different populations. feel compelled to repeat what might be considered standard knowledge, even if the population and indication under study are different. The better reporting of laboratory-determined toxicity from trials performed in the United States is more difficult to explain. If not a chance finding due to the large number of associations examined in our study, it could reflect a difference in reporting across continents or a difference in the phase of trials performed in the United States vs other countries. Our figures may be overestimating the actual emphasis that is given to safety as compared with efficacy. While further subsequent publications for efficacy involving subgroup analyses, surrogate marker analyses, and specialized analyses are very frequent in pivotal randomized trials, we could not identify other follow-up publications from these trials focusing comprehensively on toxicity results (with 2 exceptions in the area of HIV therapy and 1 exception in thrombolysis for AMI). Moreover, since we examined 7 different medical areas, the results of our evaluation are likely to be generalizable across medical specialties. Other investigators also have presented preliminary data that safety reporting is neglected in trials of anesthesia and pain management. 19 We should acknowledge that comparative clinical trials such as those included in our evaluation have several limitations in providing information about medication adverse effects. They are unlikely to reveal uncommon but important adverse effects occurring in fewer than 1 in 1000 patients. Important adverse events are often recognized many years after a medication has passed the clinical trial stage and has been used extensively in the community. 20 Nevertheless, randomized clinical trials of adequate sample size offer the best (and only) opportunity for assessing the frequency and severity of common side effects from a new medication in a controlled setting. While there have been major strides in standardizing the collection, analysis, and reporting of efficacy data in clinical trials, 1-3 efforts to assess and improve 442 JAMA, January 24/31, 2001 Vol 285, No. 4 (Reprinted) 2001 American Medical Association. All rights reserved.
the quality of analysis and reporting of safety data are lagging behind. This important deficiency needs to be corrected, if we wish to use quantitative objective evidence both for the efficacy and the toxicity of specific treatments in making therapeutic decisions. Such information may complement the data obtained from postmarketing reporting. 21 Postmarketing reporting is very important, but is highly dependent on the reporting of adverse events; such reporting can be spontaneous, sporadic, and erratic. Standardization of safety reporting may allow the performance of more reliable meta-analyses of safety information that may complement the meta-analyses performed on efficacy parameters. Meta-analysis of toxicity data has been hampered in the past because of inadequate safety reporting, as also admitted by other authors. 22 In our experience, most high-quality trials amass an enormous amount of information about safety and adverse effects during their conduct as part of regulatory requirements. Yet, the selective filtering of all these data into a quarter of a page can hardly be adequate. The simple storage of such information in company archives does not help the educated clinician who wants to critically interpret the efficacy vs toxicity data in each large trial considering the specific trial population, dosing, concomitant medications used, setting, study design, and other factors. The reliability and generalizability of the information conveyed in medication brochures is uncertain and cannot be critically evaluated. The set of parameters we have developed to evaluate safety reporting offers a standardized evaluation tool. It may complement the CONSORT statement that has been developed in an effort to standardize the reporting of the design and efficacy outcomes of randomized clinical trials. 2 Descriptors for such an addendum to the CONSORT statement might be summarized as follows: (a) Specify the number of patients withdrawn from the study because of adverse effects, per study arm and per type of adverse effect; (b) Provide the number of specific adverse effects per study arm and per type of adverse effect. Give exact numbers, especially for high-grade (severe) clinical adverse events and laboratory-determined toxicity; and (c) Tabulation of safety information per study arm and severity grade is encouraged, as well as detailed description of cases of unusual or previously unrecorded adverse effects. This addendum may be used in future research and may offer a guide to investigators and journals for the concise, focused reporting of adverse effects and toxicity. Author Contributions: Dr Ioannidis participated in study concept and design, acquisition of data, analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, provided statistical expertise, and obtained funding. Dr Lau participated in study concept and design, acquisition of data, analysis and interpretation of data, critical revision of the manuscript for important intellectual content, obtained funding, and provided administrative, technical, or material support. Funding/Support: This work was supported in part by grant R03 HS10345 from the Agency for Healthcare Research and Quality of the US Public Health Service. Acknowledgment: We would like to thank Peter Gøtzsche, MD, for providing the list of NSAID randomized controlled trials used in his meta-analysis. We also thank Depsina G. Contopoulos-Ioannidis, MD, for her contribution in the set of HIV trials. REFERENCES 1. Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials: a survey of three medical journals. N Engl J Med. 1987;317:426-432. 2. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276:637-639. 3. The Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature. Checklist of information for inclusion in reports of clinical trials. Ann Intern Med. 1996;124:741-743. 4. Ioannidis JPA, Contopoulos-Ioannidis DG. Reporting of safety data from randomized trials. Lancet. 1998; 352:1752-1753. 5. Vollset SE. Confidence intervals for a binomial proportion. Stat Med. 1993;12:809-824. 6. Ioannidis JPA, Cappelleri JC, Sacks HS, Lau J. The relationship between study design, results, and reporting of randomized trials of HIV infection. Control Clin Trials. 1997;18:431-444. 7. Diagnosis and Treatment of Acute Bacterial Sinusitis: Evidence Report/Technology Assessment No. 17 [prepared by New England Medical Center Evidence-Based Practice Center, under contract No. 290-97-0019]. Rockville, Md: Agency for Health Care Policy and Research; 1998. AHCPR publication 99-E0017. 8. Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327:248-254. 9. Lau J, Schmid CH, Chalmers TC. Cumulative metaanalysis of clinical trials builds evidence for exemplary medical care. J Clin Epidemiol. 1995;48:45-57. 10. Insua JT, Sacks HS, Lau TS, et al. Drug treatment of hypertension in the elderly: a meta-analysis. Ann Intern Med. 1994;121:355-362. 11. Schmid CH, Whiting G, Cory D, Ross SD, Chalmers TC. Omeprazole plus antibiotics in the eradication of Helicobacter pyloriinfection: a meta-regression analysis of randomized, controlled trials. Am J Ther. 1999; 6:25-36. 12. Veldhuyzen SJ, van Zanten V, Sherman P. Indications for treatment of Helicobacter pylori infection: a systematic overview. CMAJ. 1994;150:189-198. 13. Selective Decontamination of the Digestive Tract Trialists Collaborative Group. Meta-analysis of randomised controlled trials of selective decontamination of the digestive tract. BMJ. 1993;307:525-532. 14. Vandenbroucke-Grauls CM, Vandenbroucke JP. Effect of selective decontamination of the digestive tract on respiratory tract infections and mortality in the intensive care unit. Lancet. 1991;338:859-862. 15. Gotzsche PC. Sensitivity of effect variables in rheumatoid arthritis: a meta-analysis of 130 placebo controlled NSAID trials. J Clin Epidemiol. 1990;43:1313-1318. 16. Mulrow C, Lau J, Cornell J, Brand M. Pharmacotherapy for hypertension in the elderly [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000: issue 1. 17. Institute for Scientific Information. Journal Citation Index, 1998. 18. Petitti DB. Meta-analysis, Decision Analysis, and Cost-effectiveness Analysis. New York, NY: Oxford University Press; 1994. 19. Edwards JE, McQuay HJ, Moore AR, Collins SL. Reporting of adverse effects in clinical trials should be improved: lessons from acute postoperative pain. J Pain Symptom Manage. 1999;18:427-437. 20. Venning GR. Identification of adverse reactions to new drugs, II: how were 18 important adverse reactions discovered and with what delays? BMJ. 1983; 286:289-292. 21. Kessler DA. Introducing MedWatch: a new approach to reporting medication and device adverse effects and product problems. JAMA. 1993;269:2765-2768. 22. Chalmers TC, Berrier J, Hewitt P, et al. Metaanalysis of randomized controlled trials as a method of estimating rare complications of non-steroidal antiinflammatory drug therapy. Aliment Pharmacol Ther. 1988;2(suppl 1):9-26. 2001 American Medical Association. All rights reserved. (Reprinted) JAMA, January 24/31, 2001 Vol 285, No. 4 443