REVIEW Issues in Meta-Analysis: An Overview Alfred A. Bartolucci, Charles R. Katholi, Karan P. Singh, and Graciela S. Alarc6n A summary is provided for the relevant issues that investigators must address when wishing to synthesize data via a meta-analysis. The analogy is made to conducting a clinical trial and its components. Both design and statistical issues are discussed with the goal of assuring that the proper elements are included in the meta-analysis. Further, global concerns are presented so that the researcher is aware of the clinical controversies faced when conducting a meta-analysis. Finally a comparison is made of the advantages and disadvantages of a large-scale clinical trial versus a meta-analysis. Key Words: Meta-analysis; Clinical trial: Statistical issues; Controversies. Meta-analysis is defined as the statistical analysis of a collection of analytic results for the purpose of integrating the findings. In medical terms the focus is on combining the results of several different studies that may not render in themselves a definitive conclusion concerning the superiority of one treatment over another. One may then determine after combining their results if superiority in one direction does in fact exist. Such analyses are becoming increasingly popular in the medical literature, where information Alfred A. Bartolucci, PhD, Charles R. Katholi, PhD. Karan P. Singh, PhD, and Graciela S. Alarcbn, MD. MPH, are at the Department of Biostatistics, School of Public Health, and Division of Immunology and Rheumatology, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama. Address correspondence to Alfred A. Bartolucci, PhD, Department of Biostatistics, Room 101 Bishop Building. 900 19th Street South, University of Alabama at Birmingham, Birmingham, AL 35294-2030. Submitted for publication August 3, 1993; accepted March 15, 1994. 0 1994 by the American College of Rheumatology. 156 on the efficacy of a treatment is available from a number of clinical studies with similar treatment protocols. Dickersin and Berlin [l] believe that meta-analysis should be a standard tool. That is to say it should be performed when practical before new studies are undertaken (i.e., funded) to ensure that the investigators have a proper understanding of the gaps in the existing literature and of the relevant methodologic issues in a given area. The traditional rationale for combining studies is that separate studies may have insufficient sample size or be too limited in scope to be conclusive about treatment effect. Difficulties abound when one wishes to integrate various studies. Some studies may be better designed than others; some may be controlled and others may not be. Thus, the two main functions of meta-analysis, i.e., data summarization and recognition of the pattern of results from various studies, have inherent difficulties, some of which are subjective. This will be seen more readily in the sections below. Despite these concerns, Mosteller and Chalmers [2], in an overview of meta-analyses, indicated that 16 meta-analyses per year were found in scientific journals from 1983 to 1990. These include social and behavioral as well as medical journals. The method of meta-analysis has definitely reached a stage of general acceptance in the scientific literature. As one might expect, researchers are continuing to explore this technique and to improve on the validity of the method. Design Strategies There are two major concerns that come to mind should one wish to undertake a meta-analysis. Like a clinical trial, one has to consider the design and the analysis strategies. Sacks et al. [3] outlined over 20 points that must be considered in order to conduct a proper meta-analysis. As one might suspect for the abstracted studies or data, these include the following: 0893-7524/94/$5.00
Arthritis Care and Research Issues in Meta-Analysis 157 a clear statement of eligibility criteria, accountability of all admissions, methods of treatment allocation, clearly stated hypotheses or aims, proper tracking of all patients, properly presented statistical analysis and statistical methods, all possible sources of bias must be accounted for in the analysis [for example, if the results differ by sex or socioeconomic status are these categories analyzed separately?], a sample size sufficient to justify the conclusions, proper quality control of collected data, complete discussion of results, and complete and accurate citation of references. One then scores the individual studies or publications according to their ability to meet the proper number of requirements to be used in the meta-analysis. Only those studies meeting the requisite score are considered for this type of integration. In a survey by Colditz et al. [4], the articles having one of these required attributes ranged from 11% to 95%. One can thus see that not all of the published material on a particular treatment would qualify for the intended integrated analysis. This of course leads to another topic for the scientific literature on exactly what types of articles should be included in the meta-analysis. Usually that decision rests with more than one person doing the literature search. In other words it is the old saying that there is safety in numbers. The more individuals who agree on what articles should be included in the meta-analysis, then presumably the more valid will be the undertaking. Of course, this leaves the extra burden on investigators. They must determine an acceptable level of agreement among themselves before accepting material for the analysis. One must be aware that there are problems with the scoring or quality scheme. There has been no formal assessment of reliability between scorers, validity, or comparability of scoring methods. Jenicek [5] discusses a variety of scoring methods. The basic difficulty has been in translating the quality of articles into a valid scoring system. For example, a well-designed study that is reported poorly may receive a poor score and vice versa for a poorly designed study that is well reported, although the latter may seem a bit unlikely. Synthesizing the Studies Further problems and controversies concerning meta-analyses are discussed later. For the present, assume that one has sufficient articles and reports to conduct a meta-analysis. The next major concern, as in a clinical trial, is how to synthesize and analyze this information. The details of such an analysis are too extensive and technical to attempt to summarize in this short article. However, we will try to familiarize the reader with some of the concerns that have been expressed when approaching the data summary stage of this work. As stated earlier, the difficulty in the integration of the results stems mainly from the diverse nature of the studies both in terms of design and analysis methods employed. For example, some studies may be randomized and stratified, whereas others may have some combination of these features. Because of the differing sample sizes and patient populations [due to varying eligibility criteria], each study has a different level of sampling error. Also, because of the differing sample sizes of the studies, one may wish to consider a weighting scheme that attributes more importance to the larger studies. Various considerations have gone into the weighting scheme. For example, the weighted average of within-study differences, as is commonly used, may be constant, a function of the sample size of individual studies, a function of withinstrata sample size, or a function of within-study differences. Another important concern is that a metaanalysis by its very purpose is retrospective in nature. That is to say, it attempts to synthesize and analyze data that have already been observed. This in itself has left the methodology susceptible to the pitfalls of retrospective research. Another issue that has concerned individuals wishing to analyze combined studies may be labeled as heterogeneity. Chalmers [6] has addressed this issue. Heterogeneity occurs when the size or direction as well as variance between the treatment and control group differ across the studies being considered. This makes it difficult statistically to combine a function of these differences, called the effect size, and thus say they share a common variance so that some traditional statistical tests may be performed. Clinicians should be aware that heterogeneity can be the result of phenomenon, such as how patients vary across studies in their pretreatment characteristics. Thus, there could be a legitimate objection to combining data from two different studies with nonhomogeneous patient characteristics. As a result, one of the first steps in performing the meta-analysis is to test for the homogeneity of the effect sizes among the studies being considered. Work is currently being performed on the examination of ways of calculating sample variances in meta-analyses for valid analysis purposes [6]. Subjectivity may abound in this context in that one may ignore among-study variability in summarizing mean effect sizes, relative risks, or odds ratios. This is called assuming a fixed effects model. When one chooses to incorporate among-study variability or heterogeneity into these calculations, which may be more appropriate, one assumes a random effects model. The choice has consequences on the widths of the confidence limits of point estimates in a meta-analysis. This
158 Bartolucci et a]. Vol. 7, No. 3, September 1994 idea of heterogeneity also affects the interpretation of the final determination of the effect size. That is to say that this one result may be misleading, because if studies differ substantially in the size or direction of the effect, then one must recognize that the average results may not be representative of the individual components [studies] contributing to the average. Another problem is the issue of comprehensiveness. Simply stated, this concern presents itself in the literature search of relevant articles or reports on the treatment of interest if an insufficient amount of material is determined to be relevant for the meta-analysis. One would not want to perform a statistical analysis unless all the data were available, for fear of introducing a bias into the interpretation of the results due to missing or incomplete data. Naturally, in many cases this is a common phenomenon. However, clinicians and statisticians involved in modern-day clinical trials have learned to minimize this problem by such tactics as including all eligible patients and doing an intent-to-treat analysis. In a meta-analysis one does not have all the data at hand, either complete or incomplete. It is not a self-contained entity. As stated previously, one only has reports [published or unpublished]. As a result, one has to take the precaution to ensure that the bibliographical search is as comprehensive as possible. Even when dealing with abstracts, the person wishing to perform the meta-analysis should contact the authors of the abstract being considered for inclusion to determine whether appropriate information not covered in the abstract is available. This is also necessary to ensure that the original study was methodologically correct according to the criteria required for a proper meta-analysis. This in turn helps ensure that the results of the meta-analysis are statistically sound and not biased. Felson [7] summarizes in detail many aspects of bias, one of which is publication bias. The goal is to be comprehensive in the inclusion of articles. However, publication bias can interfere with that goal in that authors, sponsors of a study, and the editor or reviewers of the journal to which a paper is submitted may be selective. Authors are less likely to submit manuscripts if they are statistically null. Editors may encourage submission of manuscripts with positive results [see [8]]. Finally, a drug company may likely discourage publication of a null trial of a drug that it sponsors. Felson discusses these infractions and ways of assessing publication bias [7]. He also suggests that an important solution to publication bias may be the establishment of clinical trial registries that include published and unpublished studies. One way to overcome or at least honestly address the comprehensiveness issue is to include relevant experts in the field of study to which the meta- analysis is addressed so that they can help direct one s search for data, published or unpublished, from various centers of excellence in that research area. One also has to make full use of computerized database searches, bibliographic reviews, and reference lists. Hogue [9], recognizing the inherent problems of database sharing, discusses why data should be shared and how data sharing should be conducted. She further elaborates on the need to reduce bias leading to inaccurate results in data presentations. Felson also points out that the best protection against this source of bias is a thorough description of the procedures used to locate the studies so that the reader can make an intelligent assessment of the representativeness and completeness of the database for a meta-analysis [7]. Statistical Analysis Up to this point we have been discussing design issues that are important to the proper means of conducting a meta-analysis. One may also wish to know what types of statistical analyses are used to analyze the synthesized data. It would be inappropriate to go into all the details of what conditions call for what types of analyses. Suffice to say that there is nothing really magical or new in terms of statistical procedures in a meta-analysis. The standard statistical tests such as the chi-square, t-test, z-test, and analysis of variance [ANOVA) are commonly used. Several authors have discussed these procedures. Most notable is the work of Hedges and Olkin [lo]. These same two authors [ll] outline the use of regression methods in meta-analyses. They have made the material accessible to a nonstatistician by avoiding cumbersome mathematical proofs in the text and by presenting them in the commentary section at the end of each chapter [lo]. This type of presentation is certainly appealing to the clinician, who may wish to discover relevant analyses for various forms of the synthesized data without the burden of having to make one s way through a morass of mathematics. The book [lo] does not discuss techniques for the literature search or data gathering strategies. Alarc6n et al. [12] discuss techniques for a proper search and evaluation of articles for a meta-analysis. They discuss the comparison of methotrexate and nonmethotrexate disease-modifying anti-rheumatic drugs in rheumatoid arthritis patients. The materials and methods section deals with the selection of articles and the scoring of each for the meta-analysis. Validity and Replicability There are several general concerns beyond the scope of the design and analysis issues discussed here. One
Arthritis Care and Research Issues in Meta-Analysis 159 must be aware that meta-analysis is not designed to provide overwhelming evidence for or against a particular treatment modality. That may be the purpose of a large clinical trial. Meta-analysis deals with confirming that minor increments or advantages of one treatment over another are, in fact, real. The combination of several small studies to achieve this end may provide the overall sample size that tips the scale one way or the other. One may come to the conclusion that a large clinical trial may be more reliable than a meta-analysis combining many small trials, but there is no real evidence that this is the case. Both have their advantages. A meta-analysis of many small trials run in local clinics may have the reality of the heterogeneity that one would expect to find in the general population. Also, the large-scale clinical trial is usually made up of contributions of many clinics, so the pooling of all these results, assuming homogeneity in a cooperative setting, may or may not be appropriate. Another issue, discussed by Cook et. al. [13], is whether or not one should include unpublished data in metaanalysis. This is certainly a point of discussion, because some individuals believe that peer review brings validity and consistency to the material needed in the meta-analysis. Others will argue that as long as it can be confirmed that unpublished material came from methodologically sound research, then it should be included. Because of the recent surge in interest and application of meta-analysis, the replicability of the technique has at times been discouraging. Chalmers et al. [14] reviewed replicate analyses of 20 different research questions. They pointed out that meta-analyses agreed in terms of statistical results for 10 of the research questions and disagreed for 10. The reasons for disagreement between meta-analyses for the same research questions were not obvious. It was noted, however, that the meta-analyses that disagreed tended more often to use crude methods of pooling than did those that agreed. Likewise, Chalmers et al. [15] compared meta-analyses of several trials with a single large cooperative trial for three different topics. Depending on the particular trial and endpoint, the meta-analysis agreed with the cooperative trial in some cases and disagreed in others. The explanation, for example, when the meta-analysis favors the treatment and the large trial favors the control, could be some form of bias such as publication bias, in which small trials showing null results failed to reach publication. One would hope that the enhanced combination of scientific rigor during the conduct of studies and a drastic reduction in biases plaguing meta-analyses over time will result in more consistent results across meta-analyses of similar research questions. Another point of concern that affects the validity of the meta-analyzed result is recording error bias. This actual error rate of transfer of collected data to the actual trial report is fairly minimal, usually around 1%. Although this does not appear to be a problem of great magnitude in the actual observed trial, it is yet another element contributing to the validity of the nonobservational data-type meta-analysis. This obviously is something over which the meta-analyst has little or no control. In summary, we have discussed the major relevant quantitative issues facing meta-analysis. It is wise to assume that one wishing to embark on such a project should, as in conducting a clinical trial, gather a good team consisting of clinical, statistical, and other relevant scientific experts in the disease area of interest. REFERENCES 1. Dickersin K, Berlin JA: Meta-analysis: state-of-the-science. Epidemiol Rev 14:154-176, 1992. 2. Mosteller F, Chalmers C: Some progress and problems in meta-analysis of clinical trials. Stat Sci 7(2):227-236, 1992. 3. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC: Meta-analysis of randomized controlled trials. N Engl J Med 8:450-455.1987. 4. Colditz GA, Miller JN, Mosteller F How study design affects outcomes in comparisons of therapy. I: medical. Stat Med 8:441-454, 1989. 5. Jenicek M: Meta-analysis in medicine: where we are and where we want to go. J Clin Epidemiol 42(1):35-44, 1989. 6. Chalmers TC: Problems induced by meta-analysis. Stat Med 10:971-980, 1991. 7. Felson DT: Bias in meta-analytic research. J Clin Epidemiol 45(8]:885-892, 1992. 8. Dickersin K: The existence of publication bias and risk factors for its occurrence. JAMA 263:1385-1399, 1990. 9. Hogue CI: Ethical issues in sharing epidemiologic data. J Clin Epidemiol44(Suppl 1):103S-l07S, 1992. 10. Hedges LV, Olkin I: Statistical Methods for Meta-Analysis. New York, Academic Press, 1985. 11. Hedges LV, Olkin I: Regression models in research synthesis. The American Statistician 37(2):137-140, 1983. 12. Alarc6n GS, L6pez-Mkndez A, Walter J, Boerbooms A, Russel AS, Furst DE, Rau R, Drosos AA, Bartolucci AA: Radiographic evidence of disease progression in methotrexate treated and non-methotrexate disease modifying antirheumatic drug treated rheumatoid arthritis patients: a meta-analysis. J Rheumatol 19:1868-1873, 1992. 13. Cook DJ, Guyatt GH, Ryan G, Clifton J, Buckingham L, Wilan A, McIlroy W, Oxman AD: Should unpublished data be included in meta-analysis? JAMA 269(21):2749-2753, 1993.
160 Bartolucci et al. Vol. 7, No. 3, September 1994 14. Chalmers TC, Berrier TC, Sacks HS, et al: Meta-analysis of clinical trials as a scientific discipline 11: replicate variability and comparison of studies that agree and disagree. Stat Med 6:733-744, 1987. 15. Chalmers TC, Levin H, Sacks HS, et al: Meta-analysis of clinical trials as a scientific discipline. I: control of bias and comparison with large cooperative trials. Stat Med 6:315-325, 1987.