SROC Curve. S. D. Walter McMaster University, Hamilton, Ontario, Canada. Petra Macaskill University of Sydney, NSW, Australia INTRODUCTION

Similar documents
Clinical Study Report Synopsis Drug Substance Naloxegol Study Code D3820C00018 Edition Number 1 Date 01 February 2013 EudraCT Number

XII. HIV/AIDS. Knowledge about HIV Transmission and Misconceptions about HIV

ENERGY CONTENT OF BARLEY

The Measurement of Interviewer Variance

Comparison of three simple methods for the

The step method: A new adaptive psychophysical procedure

Single-Molecule Studies of Unlabelled Full-Length p53 Protein Binding to DNA

Reducing the Risk. Logic Model

2. Hubs and authorities, a more detailed evaluation of the importance of Web pages using a variant of

Analytic hierarchy process-based recreational sports events development strategy research

STATISTICAL DATA ANALYSIS IN EXCEL

Quantifying perceived impact of scientific publications

Analysis of Regulatory of Interrelated Activity of Hepatocyte and Hepatitis B Viruses

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

EVALUATION OF DIFFERENT COPPER SOURCES AS A GROWTH PROMOTER IN SWINE FINISHING DIETS 1

Math 254 Calculus Exam 1 Review Three-Dimensional Coordinate System Vectors The Dot Product

Supplementary Online Content

Review TEACHING FOR GENERALIZATION & MAINTENANCE

Assessment of Depression in Multiple Sclerosis. Validity of Including Somatic Items on the Beck Depression Inventory II

Chapter II. THE PREVALENCE METHOD John Bongaarts*

EFFECTS OF INGREDIENT AND WHOLE DIET IRRADIATION ON NURSERY PIG PERFORMANCE

Invasive Pneumococcal Disease Quarterly Report. July September 2017

Summary. Effect evaluation of the Rehabilitation of Drug-Addicted Offenders Act (SOV)

The accuracy of creatinine clearance with and without

ORIGINAL ARTICLE. Diagnostic Signs of Accommodative Insufficiency. PILAR CACHO, OD, ÁNGEL GARCÍA, OD, FRANCISCO LARA, OD, and M A MAR SEGUÍ, OD

Check your understanding 3

Appendix J Environmental Justice Populations

Geographical influence on digit ratio (2D:4D): a case study of Andoni and Ikwerre ethnic groups in Niger delta, Nigeria.

Teacher motivational strategies and student self-determination in physical education

Efficacy of Pembrolizumab in Patients With Advanced Melanoma With Stable Brain Metastases at Baseline: A Pooled Retrospective Analysis

CheckMate 153: Randomized Results of Continuous vs 1-Year Fixed-Duration Nivolumab in Patients With Advanced Non-Small Cell Lung Cancer

THE EVALUATION OF DEHULLED CANOLA MEAL IN THE DIETS OF GROWING AND FINISHING PIGS

Opioid Use and Survival at the End of Life: A Survey of a Hospice Population

PNEUMOVAX 23 is recommended by the CDC for all your appropriate adult patients at increased risk for pneumococcal disease 1,2 :

Journal of Personality

The Effects of Diet Particle Size on Animal Performance

Recently, the National Lung Screening Trial demonstrated that three annual low-dose

Infrared Image Edge Detection based on Morphology- Canny Fusion Algorithm

Linear Programming Approach to Diet Problem for Black Tiger Shrimp in Shrimp Aquaculture

Breast carcinoma grading by histologic features has

Estimating the impact of the 2009 influenza A(H1N1) pandemic on mortality in the elderly in Navarre, Spain

Addendum to the Evidence Review Group Report on Aripiprazole for the treatment of schizophrenia in adolescents (aged years)

HEMOGLOBIN STANDARDS*

Body mass index, waist-to-hip ratio, and metabolic syndrome as predictors of middle-aged men's health

Diabetes affects 29 million Americans, imposing a substantial

Metabolic Syndrome and Health-related Quality of Life in Obese Individuals Seeking Weight Reduction

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

8/1/2017. Correlating Radiomics Information with Clinical Outcomes for Lung SBRT. Disclosure. Acknowledgements

The potential future of targeted radionuclide therapy: implications for occupational exposure? P. Covens

METHOD 4010 SCREENING FOR PENTACHLOROPHENOL BY IMMUNOASSAY

Impact of Pharmacist Intervention on Diabetes Patients in an Ambulatory Setting

Management and Outcomes of Binge-Eating Disorder in Adults: Current State of the Evidence

27 June Bmnly L. WALTER ET AL.: RESPONSE OF CERVICAL CANCERS TO IRRADIATION

Safety and Tolerability of Subcutaneous Sarilumab and Intravenous Tocilizumab in Patients With RA

Inadequate health literacy is a

Finite-Dimensional Linear Algebra Errata for the first printing

1980, 133, NUMBER 4 (WINTER 1980) UNIVERSITY OF NOTRE DAME. be viewed as a response and defined a priori by

Extraction and Some Functional Properties of Protein Extract from Rice Bran

JOB DESCRIPTION. Volunteer Student Teacher. Warwick in Africa Programme. Warwick in Africa Programme Director

Interrelations of Age, Visual Acurty, and Cognitive Functioning

Paper-based skin patch for the diagnostic screening of cystic fibrosis

Sleep Disordered Breathing and Gestational Hypertension: Postpartum Follow-up Study

SUPPLEMENTARY INFORMATION

Will All Americans Become Overweight or Obese? Estimating the Progression and Cost of the US Obesity Epidemic

T.S. Kurki a, *,U.Häkkinen b, J. Lauharanta c,j.rämö d, M. Leijala c

The Effect of Substituting Sugar with Artificial. Sweeteners on the Texture and Palatability of Pancakes

Input from external experts and manufacturer on the 2 nd draft project plan Stool DNA testing for early detection of colorectal cancer

Rates of weight change for black and white Americans over a twenty year period

US Food and Drug Administration-Mandated Trials of Long-Acting b -Agonists Safety in Asthma. Samy Suissa, PhD ; and Amnon Ariel, MD, FCCP

changes used to indicate the aversiveness of

Risks for All-Cause Mortality: Stratified by Age, Estimated Glomerular Filtration Rate and Albuminuria

The relationship between women s subjective and physiological sexual arousal

Report of the Conference on Low Blood

University of Texas Health Science Center, San Antonio, San Antonio, Texas, USA

3/10/ Energy metabolism o How to best supply energy to the pig o How the pig uses energy for growth

Utilization of dental services in Southern China. Lo, ECM; Lin, HC; Wang, ZJ; Wong, MCM; Schwarz, E

Health-Related Quality of Life and Symptoms of Depression in Extremely Obese Persons Seeking Bariatric Surgery

Time trends in repeated spirometry in children

Using Load Research Data to Model Weather Response

distraction cleaning Peaks cages specifications

LATE RESULTS OF TRANSFER OF THE TIBIAL TUBERCLE FOR RECURRENT DISLOCATION OF THE PATELLA1

Effect of linear and random non-linear programming on environmental pollution caused by broiler production

BMI and Mortality: Results From a National Longitudinal Study of Canadian Adults

The Effects of High-Oil Corn or Typical Corn with or without Supplemental Fat on Diet Digestibility in Finishing Steers

Recall Bias in Childhood Atopic Diseases Among Adults in The Odense Adolescence Cohort Study

Agilent G6825AA MassHunter Pathways to PCDL Software Quick Start Guide

A review of the patterns of docetaxel use for hormone-resistant prostate cancer at the Princess Margaret Hospital

Consumer perceptions of meat quality and shelf-life in commercially raised broilers compared to organic free range broilers

Nicholas James Boddicker Iowa State University. Dorian J. Garrick Iowa State University, Raymond Rowland Kansas State University

Checks on inadvertently modified BAS-funseeking scale from BIS-BAS. Modified scale excluded 2 of the original 4 items: bisbas10, bisbas20.

Fertility in Norwegian testicular cancer patients

Meat and Food Safety. B.A. Crow, M.E. Dikeman, L.C. Hollis, R.A. Phebus, A.N. Ray, T.A. Houser, and J.P. Grobbel

Estimating the Cost to U.S. Health Departments to Conduct HIV Surveillance

Computer-Aided Learning in Insulin Pump Training

Feeding state and age dependent changes in melaninconcentrating hormone expression in the hypothalamus of broiler chickens

DXA: Can It Be Used as a Criterion Reference for Body Fat Measurements in Children?

SYNOPSIS Final Abbreviated Clinical Study Report for Study CA ABBREVIATED REPORT

Trends in antihypertensive and lipidlowering therapy in subjects with type II diabetes: clinical effectiveness or clinical discretion?

USE OF SORGHUM-BASED DISTILLERS GRAINS IN DIETS FOR NURSERY AND FINISHING PIGS

DIFFERENTIAL REINFORCEMENT OF VOCAL DURATION1

Transcription:

SROC Curve S. D. Wlter McMster University, Hmilton, Ontrio, Cnd Petr Mcskill University of Sydney, NSW, Austrli INTRODUCTION The summry receiver operting chrcteristic (SROC) curve hs been recommended to represent the performnce of dignostic test, bsed on dt from metnlysis. Under fied-effect, logit-threshold model, the position of the SROC curve cn be chrcterized in terms of the overll dignostic odds rtio nd the mgnitude of interstudy heterogeneity. The Are Under the Curve (AUC) nd n inde Q* re potentilly useful summry mesures. It is shown tht AUC is mimized when the study odds rtios re homogeneous, nd tht it is quite robust to heterogeneity. An eplicit upper bound is derived for AUC in the homogeneous sitution, nd lower bound bsed on the limit cse Q*, defined by the point where sensitivity equls specificity: Q* is invrint to heterogeneity. The stndrd error of AUC is derived for homogeneous studies, nd is shown to be resonble pproimtion with heterogeneous studies. AUC nd its stndrd error re esily computed in the homogeneous cse, nd void the need for numericl integrtion in the more generl cse. SE(AUC) nd SE(Q*) re numericlly close, with SE(Q*) being lrger if the odds rtio is very lrge. Motivtion for the use of the AUC nd Q* mesures s summries of the SROC curve re discussed. A multilevel mied model is lso described, in which test ccurcy nd threshold re llowed to vry between studies. This provides more generl frmework, within which the fied effect model is specil cse. The mied model provides for the direct estimtion of summry vlues of sensitivity, specificity, nd likelihood rtios. Byesin Mrkov Chin Monte Crlo (MCMC) methods re required to fit the mied model, nd pproprite softwre is becoming more vilble. META-ANALYSIS OF DIAGNOSTIC TEST DATA Systemtic reviews of primry studies re becoming incresingly importnt for summrizing evidence bout the ccurcy of dignostic tests. Guidelines for the conduct of such reviews [1 3] include defining the objectives, retrievl of the relevnt literture, dt etrction, met-nlytic methods for obtining summry estimtes of test ccurcy, nd investigting resons for vrition in test ccurcy cross studies. The receiver operting chrcteristic (ROC) curve is well estblished s method of summrizing the performnce of dignostic test within single study. [4 6] It indictes the reltionship between the true positive rte (TPR) nd the flse positive rte (FPR) of the test, s the threshold used to distinguish disese cses from noncses vries. For instnce, the threshold might be defined level of serum cholesterol s mrker of crdiovsculr disese, or prticulr level of cellulr bnormlity s mrker of mlignncy. The summry receiver operting chrcteristic (SROC) curve nd the re under the curve (AUC) hve been proposed s wy to describe dignostic dt in the contet of metnlysis. [1,2,7 10] A schemtic ROC curve is shown in Fig. 1. All of its dt points come from single study, nd re defined by the results rising when one of severl lterntive thresholds (or cut points) on the test results is used to discriminte between disese cses nd noncses. Liberl thresholds tht give high vlues of TPR will tend to incorrectly lbel reltively high frction of the noncses s cses, so the flse positive rte (FPR) will be high. Conversely, conservtive thresholds yield incorrect lbels for noncses rther infrequently, but t the cost of detecting smller frction of the true cses; in this sitution, the vlue of TPR nd FPR re both reltively low. Thus one generlly epects TPR nd FPR to be positively ssocited. In the ROC spce, points ner the lower-left corner correspond to conservtive thresholds (low TPR nd low FPR vlues), nd points ner the upperright corner correspond to liberl thresholds (high TPR nd high FPR vlues). In single study, chnging the threshold necessrily results in monotonic chnges in TPR nd FPR. Accordingly, the ROC curve cn lwys be empiriclly represented, in its simplest form by connecting the dt points s shown in Fig. 1. Alterntively, smoothed curves Encyclopedi of Biophrmceuticl Sttistics 1 DOI: 10.1081/E-EBS 120024189 Copyright D 2004 by Mrcel Dekker, Inc. All rights reserved.

2 SROC Curve discriminting cses from noncses. S is function of the dignostic threshold in study, with high vlues corresponding to liberl inclusion criteri for cses. S =0 when TPR=1 FPR, i.e., on the ntidigonl from the top-left to bottom-right corners of the SROC spce. The regression eqution D ¼ þ bs ð3þ Fig. 1 Schemtic ROC curve. cn lso be fitted using ltent vrible pproch, bsed on model tht the cse nd noncse test vlues follow norml or logistic distributions. [11 19] In met-nlysis, the units of nlysis re seprte studies. In the simplest cse, ech study contributes n estimte of TPR nd FPR. The SROC curve represents the reltionship between TPR nd FPR cross studies, recognizing they my hve used different thresholds. In contrst to the ROC nlysis, the set of (FPR, TPR) points does not necessrily yield unique, monotonic curve. Severl methods hve been proposed for fitting n SROC curve; [7,10,20,21] this pper will focus on the properties of the populr method developed by Moses et l. [10] An lterntive but more comple method proposed by Rutter nd Gtsonis [20,21] is lso outlined nd its potentil dvntges re discussed. LOGIT-THRESHOLD FIXED EFFECT MODEL FOR SUMMARY RECEIVER OPERATING CHARACTERISTIC CURVE The most populr method to obtin smoothed fit of the SROC curve is to use the regression model proposed by Moses et l. [10] The dependent nd independent vribles re TPR FPR D ¼ ln ln ð1þ 1 TPR 1 FPR nd TPR FPR S ¼ ln þ ln 1 TPR 1 FPR respectively. D is equivlent to the dignostic log odds rtio, ln(or), which conveys the test s ccurcy in ð2þ cn be fitted by stndrd lest squres methods, ssuming tht D is pproimtely normlly distributed for given vlue of S. Optionlly, weights cn be employed to reflect interstudy heterogeneity with respect to the smple vrince of D, or robust fitting procedure my lso be used. [10] The coefficient b in Eq. 3 represents the dependence of the test ccurcy on threshold. If b0, then the studies will be referred to s homogeneous; they cn then be summrized by n overll OR, noting tht =ln(or). If b 6¼ 0, then the studies re referred to s heterogeneous with respect to OR. In this cse, cn be thought of s the vlue of ln(or) when S=0. This model ssumes logistic distributions for the test vlues in the cses nd noncses, but norml distributions my lso constitute n dequte pproimtion. However, the model is generlly fitted without directly ppeling to the logistic distribution ssumption. If the underlying test results ctully hve logistic distributions, then S for given study cn be epressed s function of the cut point tht gives rise to the sensitivity nd specificity for tht study. [10] If the vrinces of the distributions of test results for the cses nd noncses re equl, then the SROC is symmetric bout the digonl line where TPR=1 FPR. If the vrinces re unequl, the resulting SROC will be symmetric. No ssumptions re mde bout the distribution of S. Covrites my be dded to the model to ssess whether test ccurcy vries systemticlly with other study-relted fctors. [22,23] If the estimte of test ccurcy for ech study is weighted by the inverse of its vrince, one is ssuming tht smpling error is the only source of vribility between studies. However, this ssumption my be unrelistic, given tht studies re likely to vry in number of respects, including the spectrum of disese, conditions under which the test ws dministered, nd other study nd ptient chrcteristics tht could ffect test ccurcy. [24 26] Although the prmeters nd b re both fied becuse they re ssumed to be constnt cross studies, pplying equl weights to studies (i.e., fitting n unweighted regression) hs been empiriclly shown to give results tht re consistent with ssuming rndom effect. [8,10,20] This is becuse both the within- nd between-study vrinces re tken into ccount, thereby giving reltively more weight to smller studies, s would occur in rndom effect model. Irwig et l. [8] recommend the

SROC Curve 3 unweighted nlysis, becuse weights bsed on smple vrinces my give too much emphsis to inccurte studies, nd hence give bised SROC curve. Moses et l. [10] lso recommend the unweighted pproch. Once the regression hs been fitted, one cn reverse the trnsformtions (Eqs. 1 nd 2) nd hence deduce the reltionship between TPR nd FPR s FPR ð1 þ bþ=ð1 bþ ep 1 b 1 FPR TPR ¼ 1 þ ep 1 b FPR 1 FPR ð1 þ bþ=ð1 bþ Epression 4 gives TPR t ny given vlue of FPR, nd hence defines the entire SROC curve. While there my be interest in identifying prticulr points on the curve, it is lso useful to hve n overll summry mesure of the curve s behvior. One pproprite mesure is the Are Under the Curve (AUC), which cn be clculted s ð1 þ bþ=ð1 bþ ep AUC ¼ Z 1 0 1 b 1 þ ep 1 b 1 1 ð4þ ð1 þ bþ=ð1 bþ d In generl, AUC must be numericlly obtined, becuse there is no closed form for epression 5. Adoption of summry inde is helpful for succinct reporting of given dt set, especilly when limited dt preclude the relible identifiction of prticulr points on the curve. The AUC mesure is widely used in ROC nlysis, where it cn be interpreted s the probbility tht the test vlues for rndom pir of disesed nd nondisesed individuls would be correctly rnked; it lso represents the (unweighted) verge of TPR over ll possible vlues of FPR. The AUC is lso nturl cndidte summry for n SROC nlysis. We will consider the lterntive inde Q*, which will be defined s point of indifference on the SROC curve, where the probbilities of n incorrect test result re equl for disese cses nd noncses. The prtil AUC hs lso been proposed s summry mesure, this being the re under some restricted portion of the curve corresponding to FPR vlues of clinicl interest, or in which study dt re locted. EMPIRICAL BEHAVIOR OF THE SUMMARY RECEIVER OPERATING CHARACTERISTIC CURVE Figure 2 shows set of three symmetric SROC curves with b=0, which occurs when the studies re homogeneous nd thus ehibit no reltionship between OR nd ð5þ Fig. 2 Summry receiver operting chrcteristic with vrious vlues of (b=0). (View this rt in color t www.dekker.com.) threshold (S). The curves were obtined by numericl evlution of Eq. 4 for =1.5, 2, nd 3, which correspond to OR=4.5, 7.4, nd 20.1, respectively. As increses, the SROC curve moves closer to its idel position ner the upper-left corner. If!1, then AUC!1, which would indicte perfect test hving 100% sensitivity nd specificity, nd no errors in distinguishing cses from noncses. In contrst, if 0 (or OR1), the curve lies close to the digonl TPR=FPR in the SROC spce; then, AUC=1/2 nd the test performs no better thn chnce. For completeness, we mention situtions where <0. These correspond to OR < 1, when the test discrimintes cses nd noncses in the wrong direction, nd worse thn t rndom. As OR!0, AUC!0, nd the curve lies close to the lower-right corner of the SROC spce. Such situtions re unlikely to occur in prctice. We now consider the cse of heterogeneous studies under model 3, where dignostic ccurcy depends on threshold (b6¼0). Fig. 3 shows three SROC curves derived from Eq. 4, ll with = 2 but different vlues of b. Nonzero vlues of b give symmetric curves. If b>0, the curve initilly rises less steeply thn the symmetric curve (with b=0), but then it rises more steeply nd crosses the symmetric curve to chieve reltively high vlues of TPR for high vlues of FPR. The SROC curves with b <0 ehibit the opposite behvior see, for emple, the curve with b= 0.5 in Fig. 3. Interestingly, s suggested by Fig. 3 nd s shown by Moses et l., [10] the fmily of curves defined by fied vlue of ll pss through common point, locted on the ntidigonl, where TPR=1 FPR, or sensitivity equls specificity. Tht point hs coordintes TPR ¼ epð=2þ 1 þ epð=2þ ð6þ

4 SROC Curve Fig. 3 Summry receiver operting chrcteristic with vrious vlues of b (=2). (View this rt in color t www.dekker.com.) nd FPR ¼ 1 1 þ epð=2þ Moses et l. [10] denote the vlue of TPR t this point by Q*. In ROC nlysis, Q* hs been suggested s further summry mesure. It corresponds to the point closest to the idel top-left corner of the SROC spce in symmetric curves. The fct tht ll SROC curves with given vlue of pss through the sme Q* point mens tht Q* conveys no dditionl sttisticl informtion beyond the odds rtio. However, use of Q* cn be motivted by its simple interpretbility, given tht Q* is the point on the SROC curve where TPR=1 FPR; thus it represents the dignostic threshold t which the probbility of correct dignosis is constnt for ll subjects. Also, s will be shown lter, the eistence of common vlue of Q* permits the derivtion of useful lower bound for AUC. As consequence of ssuming model 3, when b6¼0, the SROC curve hs region where TPR<FPR, which lies below the min digonl. This region my be seen, for emple, ner the lower-left corner in the curve for b=0.5 in Fig. 3. In this region, the test would theoreticlly be performing worse thn t rndom. However, in prctice, this region is very smll. Even if reltively strong heterogeneity (with b> 0) is present, the improper prt of the SROC curve (where TPR<FPR) includes very low vlues of TPR, nd these vlues correspond to test thresholds tht re unlikely to be cceptble in clinicl prctice. On the other hnd, if heterogeneity with b<0 occurs, the improper prt of the curve corresponds to ð7þ very high vlues of FPR, which would gin hve no clinicl relevnce. See Ref. [27] for numericl detils of this effect. We now briefly discuss the behvior of model 3 for etreme vlues of b. As b!1, the fitted SROC curve becomes progressively steeper, nd in the limit cse t b =1 it degenertes to verticl line. This could occur, for emple, if there ws no vrition in FPR between studies. However, the line still psses through the common Q* point defined by Eqs. 6 nd 7. Once b>1, the curve inverts nd shows negtive reltionship of TPR to FPR. This is implusible in prctice, ecept perhps by chnce in smll smples. Similr behvior is seen if b is ner or below 1. Ner b= 1, the curve is horizontl, indicting no reltionship of TPR to FPR. This could occur if there ws no vrition in TPR between studies. If b< 1, the curve gin inverts (to different shpe) nd suggests n implusible negtive reltionship between TPR nd FPR. Despite this, ll the curves for jbj>1 possess the common vlue of Q* given by Eqs. 6 nd 7. In prctice, dt yielding jbj >1 re unlikely. Empiricl eperience suggests tht prcticl met-nlysis dt sets often hve b close to 0, nd re rrely lrger thn 0.5 in bsolute vlue. SUMMARY MEASURES: AREA UNDER THE CURVE AND Q* The AUC is populr inde of the overll performnce of test. [4 6,18,19,28] As mentioned erlier, AUC rnges from 1 for perfect test tht lwys correctly dignoses, to 0 for test tht never correctly dignoses. In single studies, AUC cn be interpreted s the probbility tht the test will correctly rnk rndomly chosen cse/noncse pir with respect to their test vlues. [29] The AUC is intended to fulfill the sme function in met-nlyses, nd in effect one ssumes tht the SROC curve ccurtely conveys test performnce t the individul subject level. Although numericl integrtion is required in generl to obtin AUC, some specil cses re of interest becuse they yield ect nlytic epressions nd comprtive results, s we now discuss. As epected, AUC increses with, for fied b. By emining Eq. 5, one cn prove tht for given vlue of, AUC is mimized when b=0, implying tht AUC is optimlly lrge in homogeneous studies. [27] Furthermore, one my show tht AUC is symmetric in b, so tht negtive vlues of b yield the sme vlue of AUC s the equivlent positive vlue, this despite the very different shpes of their ssocited SROC curves. If =b=0, then from Eq. 5 AUC ¼ Z 1 0 d ¼ 1 2

SROC Curve 5 Tble 1 AUC hom, Q*, nd their difference for vrious vlues of the dignostic odds rtio: homogeneous cse Odds rtio AUC hom Q* AUC hom Q* 0.5 0.386 0.414 0.028 1 0.500 0.500 0.000 1.5 0.567 0.551 0.017 2 0.614 0.586 0.028 3 0.676 0.634 0.042 4 0.717 0.667 0.051 5 0.747 0.691 0.056 10 0.827 0.760 0.067 20 0.887 0.817 0.069 30 0.913 0.846 0.068 40 0.929 0.863 0.065 50 0.939 0.876 0.063 By symmetry, it is evident tht AUC=1/2 when =0 even if b6¼0, so AUC=1/2 indictes rndom overll performnce for ny set of studies. In the homogeneous cse b= 0, the generl epression 5 becomes epðþ AUC ¼ Z 1 0 1 1 þ epðþ 1 d In this cse, we cn obtin n ect solution OR AUC hom ¼ ½ðOR 1Þ lnðorþš 2 ðor 1Þ ð8þ where AUC hom indictes the AUC for homogeneous studies, nd OR = ep(). If =0 (or OR =1), then the specil vlue AUC hom =1/2 should be used in plce of Eq. 8, which is then degenerte. Epression 8 cn be used to evlute AUC for homogeneous studies, without the need for numericl integrtion. As demonstrted lter, epression 8 is lso useful upper bound nd close pproimtion for AUC in heterogeneous studies. For reference, Tble 1 shows the vlue of AUC, Q*, nd their difference for rnge of vlues of OR in the homogeneous cse. By noting tht AUC declines with incresing b, nd tht the limit curve with b!1 psses through the common Q* point, from Eqs. 6 nd 7, we my deduce tht lower bound for AUC in curves with given vlue of is pffiffiffiffiffiffiffi Q epð=2þ ¼ 1 þ epð=2þ ¼ OR p 1 þ ffiffiffiffiffiffiffi ð9þ OR which is equivlent to the TPR vlue given in Eq. 6. Also, Q* from Eq. 9 nd the mimum vlue AUC hom from Eq. 8 provide esily computble lower nd upper bounds, respectively, for AUC with ny given vlue of >0. Numericl Tbultions Numericl evlutions of epressions 8 nd 9 demonstrte tht in the homogeneous cse, the difference between AUC hom nd Q* increses for moderte vlues of OR, but is never more thn bout 7%. [27] The mimum difference occurs t = 2.85 (or OR = 17.3), which vlue is the solution to trnscendentl eqution. For lrger vlues of, the difference declines very slowly, with limit vlue AUC hom Q*=0 t =1. The numericl integrtion of Eq. 5 for the heterogeneous cse llows one to ssess the impct of heterogeneity on the vlue of AUC. For fied vlue of OR, AUC declines slowly s b increses from 0 (homogeneous studies) to lrger vlues (incresing heterogeneity). However, the dependence of AUC on b is wek, nd the dominnt effect on AUC is the vlue of OR (or ). For jbj<0.4, the percentge chnge in AUC compred to the homogeneous cse is less thn 2%. Accordingly, AUC hom provides good pproimtion to AUC even in heterogeneous studies. [27] STANDARD ERRORS OF AREA UNDER THE CURVE AND Q* We first consider the smple vrition in dauc. From Eq. 5, we see tht AUC is function of the regression prmeters nd b, nd hence the vribility in dauc is function of the smple vrition in â nd bˆ. Using the delt method, n pproimte vrince for dauc is vrð daucþ ¼ @AUC 2 vrð^þ @ þ @AUC 2 vrð^bþ þ2 @AUC @b @ @AUC covð^; ^bþ ð10þ @b where, from Eq. 5 @AUC 1 ¼ ep @ 1 b 1 b p Z 1 1 h 0 pep i 2 d 1 þ 1 1 b @AUC @b ¼ 1 2 ep 1 b 1 b p h i Z 1 þ 2ln 1 1 h 0 pep i 2 d 1 þ 1 1 b ð11þ ð12þ

6 SROC Curve nd p=(1+b)/(1 b). The vrinces nd covrince of â nd ^b cn be directly obtined from the stndrd regression softwre used to fit model 3 to the dt. In generl, evlution of vr( dauc) gin requires numericl integrtion. However, for the specil cse of homogeneous studies where b = 0, n pproimte, lrgesmple epression is possible. Using the delt method, we my then obtin SEð dauc hom Þ ¼ OR ½ðOR þ 1Þ ln OR 3 ðor 1Þ 2ðOR 1ÞŠSEð^Þ ð13þ Eq. 13 implies SE( dauc hom ) is symmetric in ln(or), lthough we would usully be concerned with vlues OR>1. When OR=1, epression 13 is degenerte, but by using L Hôpitl s rule, one cn show tht in the neighborhood of OR=1, SEð daucþ SEð^Þ=6 ð14þ The delt method lso yields n pproimte stndrd error for ^Q* s pffiffiffiffiffiffiffi SEð ^Q OR Þ ¼ pffiffiffiffiffiffiffi 2ð OR þ 1Þ 2 SEð^Þ ð15þ Moses et l. [10] give this result in n lterntive form involving cosh(/4). Numericl evlutions show tht the stndrd errors of Q* nd AUC re both mimized when OR=1, so the lest precise sitution for either inde is for tests tht hve close to rndom performnce. In the region of OR=1, SE( dauc)1/6 nd SE( ^Q*)1/8. Comprisons show SE( dauc)>se( ^Q*) for OR vlues between 1 nd 17.3, the sme vlue ssocited with the vlues of AUC nd Q*; for OR > 17.3, AUC hs the smller stndrd error. Both stndrd errors pproch 0 s!1. For heterogeneous studies, SE( dauc) is not necessrily mimized when = 0, becuse it involves vr(^b) nd cov(â, ^b) s well s vr(â). However, it is the lst of these terms tht domintes, with the other two mking reltively smll numericl contributions. Thus in prctice SE( dauc) is mimized pproimtely when OR=1, even for heterogeneous studies. For most vlues of OR, the stndrd error declines by up to bout 10% t the most etreme level of b. OTHER ISSUES WITH THE FIXED-EFFECT MODEL So fr, we hve estblished some bsic properties of the SROC curve under model 3, nd we found tht the vlue AUC hom ssocited with homogeneous studies is resonble upper-bound pproimtion even for the generl AUC with heterogeneous studies. The corresponding stndrd error lso provides good pproimtion for heterogeneous studies, nd is conservtively lrge ecept in some cses of etreme heterogeneity. In the presence of strong heterogeneity, the AUC would be n indequte summry of the dt nywy, nd it would then be preferble to emine the SROC curve in more detil, including specific TPR vlues for given FPR. Q* lso provides n esily computed lower bound for AUC, but empiriclly ppers to be not quite s good n pproimtion s AUC hom. The motivtion for Q* sn inde in its own right is tht it is locted where the SROC curve crosses the ntidigonl from (0,1) to (1,0) of the SROC spce. Hence TPR=1 FPR t Q*, nd so the probbility of n incorrect result from the test is the sme for cses nd noncses. Q* is therefore point of indifference between flse-positive nd flse-negtive dignostic errors. In homogeneous studies, Q* is the point on the SROC curve lying closest to the optiml upper-left corner, but this is not true with heterogeneous studies. Use of Q* s the summry mesure ssumes implicitly tht flse-negtive nd flse-positive test results re of equl vlue. In prctice, there my be different costs ssocited with these two types of error: One wishes to minimize flse-positive results becuse of the dditionl testing required to estblish the correct dignosis (noncse), nd becuse the dditionl tests tend to be more costly, invsive, or risky. On the other hnd, flsenegtive results led to disese cses being missed, with possible deteriortion in their prognosis. In generl, one must weigh the flse-positive nd flse-negtive errors to blnce the overll performnce of the test in popultion; the optiml dignostic threshold does not then correspond in generl to the Q* point. One cn motivte the use of AUC s n inde tht represents the probbility tht the test will correctly rnk cse/noncse pir of subjects. [4,29] It cn lso be thought of s the verge TPR over the entire rnge of FPR vlues. Becuse it summrizes the whole SROC curve, AUC hs symmetric interprettion with respect to either TPR or FPR. The AUC is ffected by the whole SROC curve, including regions with limited or no dt, or by sectors corresponding to TPR nd FPR vlues tht re unlikely to occur in prctice. Accordingly, it hs been suggested tht prtil SROC curves be dopted, by limiting ttention to those portions of the SROC curve of clinicl interest, or where dt re ctully observed. [30 33] There re some unresolved issues on the use of prtil SROC curves nd the corresponding prtil AUC. First, the prtil AUC my be thought of s the verge TPR within restricted rnge of FPR, but not vice vers. Hence the prtil AUC hs n symmetric interprettion with respect to TPR nd FPR; on the other hnd, the complete

SROC Curve 7 AUC enjoys symmetric interprettion with respect to both types of test error. Second, there my be some rbitrriness in which regions of the curve to select. One might emine the SROC curve within prespecified rnge of TPR vlues, for instnce by defining mimum cceptble level for clinicl prctice. Another pproch would be to choose the region ccording to the observed rnge of dt in the met-nlysis; the choice itself would then be ffected by smpling vrition, which might be substntil in met-nlyses involving smll studies. Furthermore, resonble choice for one test my be unresonble for nother test, thus complicting their comprison. The nlytic methods for AUC presented in this pper cn be etended to cover the prtil AUC nd its stndrd error. We my epect the effect of interstudy heterogeneity to be greter for the prtil AUC thn the wek effect seen for the full AUC. For instnce, if ttention is limited to vlues of FPR<0.2 when =2 (Fig. 3), the corresponding prtil AUC will be greter when b>0 thn when b<0. Hence the prtil AUC lcks symmetry with respect to b. On the other hnd, the complete AUC hs some compensting decreses in contributions to the re t higher vlues of FPR, so there re only modest chnges in the totl AUC s b vries (Fig. 3). We my lso recll tht AUC is symmetric with respect to b, so tht the sme summry vlue will be obtined for given strength of dependence of dignostic ccurcy on the test threshold. The prtil AUC does not possess these properties, nd hence it will show greter dependence on the degree of interstudy heterogeneity. Further work is needed to eplore the properties of the prtil AUC in more detil. Similr investigtion is needed for other summry indices, such s the ASC (Are Swept out by the Curve), PLC (Projected Length of the Curve), [34] nd the Gini/Lorenz coefficients. [35] Smpling vribility nd dependence between test ccurcy nd test threshold re unlikely to fully ccount for the observed heterogeneity in test ccurcy between studies. Additionl sources of heterogeneity cn be eplored by emining the ssocition between test ccurcy nd study level covrite informtion. [3,8,36] However, given the limited dt on ptient nd study design chrcteristics tht re typiclly vilble, it is unlikely tht study level vribles will fully eplin the ecess vribility in test ccurcy. [26] Inppropritely ssuming fied effect cn led to spurious precision nd result in covrites being incorrectly identified s significntly ssocited with test ccurcy. [26] A mied model tht includes rndom effects for test ccurcy cn tke into ccount uneplined vribility between studies. This pproch is used in the hierrchicl (mied) model outlined briefly below, which, unlike the Moses pproch, directly models the TPR nd FPR for ech study. HIERARCHICAL (MIXED) EFFECTS MODEL An lterntive pproch to fitting SROC curves hs been proposed by Rutter nd Gtsonis. [20,21] As with the Moses model, ech study (i) contributes n estimte of TPR nd FPR to the met-nlysis. For ech study, the number in the disesed group who hve positive test result (y i1 ) nd the number in the nondisesed group who hve positive test result (y i2 ) re both ssumed to follow binomil distribution such tht y ij b(n ij, p ij ), where p ij represents the probbility of positive test for group j ( j=1,2) in study i, nd n ij represents the totl number of positive nd negtive test results in group j. The model is bsed on n ordinl logistic regression model [37] nd tkes the form logit(p ij )=(y i + i dis ij )ep( b dis ij ) where dis ij represents the true disese sttus (coded s 0.5 for the nondisesed nd 0.5 for the disesed). The model prmeters estimte the implicit threshold (y i ) for positive test result in study i nd the dignostic ccurcy ( i ) for study i. The prmeter b llows for n ssocition between test ccurcy nd test threshold. The estimted SROC is symmetric when b =0. This hierrchicl (multilevel) model tkes into ccount both the within- nd between-study vribility. The within-study vribility is modeled t the first level. For the ith study, logit(tpr i ) nd logit(fpr i ) re modeled to estimte the implicit threshold (y i ) nd dignostic ccurcy ( i ) for tht study. Hence rndom effects re ssumed for both threshold nd ccurcy s these my vry cross studies. When the SROC is symmetric (i.e., b=0), the level 1 nlysis for ech study reduces to ordinry logistic regression model where i is estimted by logit(tpr i ) logit(fpr i ) (which is equivlent to the observed vlue D i in study i for the dependent vrible in model 3) nd y i is estimted by (logit(tpr i )+ logit(fpr i ))/2 (which is equivlent to S i /2, where S i is the observed vlue in study i for the independent vrible in model 3). The smpling vribility for both TPR i nd FPR i is tken into ccount. The vribility between studies is modeled t level 2. The rndom effects for test threshold nd ccurcy re ssumed to be independent (uncorrelted) nd normlly distributed with y i N(Y, t y 2 ) nd i N(L, t 2 ). The hyperprmeters Y nd L represent the epected threshold nd ccurcy, respectively, cross ll studies in the nlysis. If t y 2 =0 nd t 2 =0, the model reduces to fied-effect model. Becuse ech study contributes only one point in ROC spce to the nlysis, single study does not provide informtion on the shpe of the SROC. Hence the shpe prmeter (b) is ssumed to be fied effect, which is estimted from the study points jointly considered. The hierrchicl model is fitted using Mrkov Chin Monte Crlo (MCMC) Byesin methods. This requires third level to specify prior distributions for ll model prmeters.

8 SROC Curve A summry ROC curve cn be constructed by choosing rnge of vlues of FPR nd using the estimted model prmeters to compute the predicted vlues for sensitivity. The epected TPR t chosen FPR vlue is given by TPR(FPR)=1/(1+ep[ {Lep( 0.5b)+logit(FPR) ep( b)}). The epected operting point on the curve is estimted using E(TPR)=1/(1+ep[ {(Y+L/2)ep( b/2)}]) nd E(FPR)=1/(1+ep[ {(Y L/2)ep( b/2)}]). Covrites cn be dded to the model to ssess whether threshold, ccurcy, nd/or the shpe of the SROC vry with ptient or study chrcteristics. Such terms re generlly fitted s fied effects, but could lso be included s rndom effects. Advntges of the Hierrchicl Model The Rutter nd Gtsonis method provides generl frmework for the met-nlysis of dignostic test performnce. The Moses method does not directly model the TPR nd FPR vlues for ech study. Becuse D nd S re functions of both TPR nd FPR, the prmeters for the Moses model cnnot be used to obtin the epected operting point on the SROC, the corresponding likelihood rtios t tht point, or the stndrd errors of these estimtes. The hierrchicl model cn be used to model vrition in test threshold s well s test ccurcy through the inclusion of study level covrites, wheres the Moses pproch cn only be used to model vrition in test ccurcy. However, the Moses model does hve the dvntge tht no ssumption is mde bout the distribution of S. Lstly, the hierrchicl model incorportes both the within- nd between-study vribility, nd tkes ccount of uneplined heterogeneity between studies through the inclusion of rndom effects for test threshold nd ccurcy. Fitting the Model Despite the potentil dvntges of the hierrchicl model, it hs not been widely dopted. This is most likely becuse of the necessity to use Mrkov Chin Monte Crlo (MCMC) methods to fit it. [20] Rutter nd Gtsonis [21] hve demonstrted how the model cn be fitted using BUGS (Byesin inference Using Gibbs Smpling) softwre. However, they comment tht the process involves MCMC simultion, nd is still reltively comple. The recent vilbility of softwre for fitting nonliner mied models in SAS [38] provides n lterntive nd potentilly more strightforwrd pproch tht does not require specifiction of prior distributions. PROC NLMIXED in SAS llows for nonliner function of the model prmeters, nd for non-norml error distribution, but the rndom effects re restricted to be normlly distributed. Summry estimtes of TPR, FPR, likelihood rtios cn be computed nd their symptotic confidence intervls estimted using the delt method. The distribution of the rndom effects my be checked by emining histogrm nd norml probbility plot of their Empiricl Byes (EB) estimtes. This follows the sme pproch dopted for checking distributionl ssumptions for liner mied models. [39,40] CONCLUSION The mied model provides rigorous method of nlysis tht tkes proper ccount of the sources of vribility in test performnce between studies. The resulting stndrd errors (nd confidence intervls) reflect the uneplined heterogeneity tht is likely to be present in mny metnlyses. The mied model lso llows estimtion of cliniclly relevnt indictors of test performnce such s the epected TPR, FPR nd likelihood rtios. A rnge of lterntive fied-effect nd rndom effects methods re vilble for obtining these summry mesures. [22] However, they cn led to results tht re potentilly inconsistent within the sme met-nlysis. For instnce, for the sme dt, summry estimtes of TPR nd FPR could be computed tht re not consistent with summry estimtes of likelihood rtios or n SROC estimted using the Moses model. A useful feture of the mied model is tht other pproches cn be regrded s specil cses. The compleity of the Byesin (MCMC) method for fitting the mied effects model is likely to discourge its use. The NLMIXED procedure is more ccessible, but t present this softwre cn only del with two levels in the response vrible. For met-nlyses where studies contribute more thn one (FPR, TPR) pir to the nlysis, n dditionl level would be required. A Byesin pproch potentilly provides greter fleibility for fitting the mied effects model in tht it llows the met-nlyst to specify lterntive distributions for the rndom effects nd lso llows prior informtion to be tken into ccount. However, in prctice, norml distribution is often ssumed for the rndom effects nd noninformtive priors re commonly used. Likelihood-bsed methods for estimting mied model prmeters re lso consistent with current pproches used in the met-nlysis of clinicl trils. [39,41,42] Model checks re required to ssess the dequcy of the distributionl ssumption for the rndom effects, but ssessing normlity will clerly be difficult in smll met-nlyses. [39] The Moses fied-effect SROC model hs the dvntge of simplicity, nd it mkes no ssumptions bout the distribution of S. It is useful s preliminry step before

SROC Curve 9 fitting the mied model. Mjor differences in the vribles found to be ssocited with test ccurcy or the shpe of the SROC would wrrnt investigtion s this my be indictive of convergence problems. An issue tht potentilly ffects ll of the methods discussed here is tht reference test (or gold stndrd) hs been ssumed to be error-free. In fct, the reference stndrd is often itself subject to mesurement error, s illustrted, for emple, in the observble differences between pthologists nd rdiologists in their ssessments of the sme smple mteril. Severl methods hve been proposed to correct for referent errors, but they primrily pertin to dt from single studies. Rther little hs been done on this problem in the contet of combining studies in met-nlysis, but one proposed pproch [43] hs been to use ltent clss frmework to etend the logitthreshold model (Eq. 3), nd to recognize the fct tht both the cndidte nd referent tests re potentilly subject to error. The true disese stte of n individul cnnot be directly observed, but n estimte of the probbility of n individul hving disese cn be obtined from the ltent clss model. This then permits dettenution of the errors in the dt, nd the resulting djustment to the SROC curve tends to led to n improved estimte of test performnce. Further development of the SROC methods re required to llow comprison of SROC curves when not ll component studies in met-nlyses re independent of one nother. For emple, some of the component studies my mke direct comprisons of two tests while other studies only evlute one. Methods to tke such dependencies into ccount hve been proposed for therpeutic studies, [44] nd etensions to dignostic test comprisons would be useful. The techniques discussed here pply to situtions where one hs only summry mesures (TPR nd FPR) from ech study. Sometimes one hs ccess to the individul level dt in ech study; one would then be ble to crry out multilevel nlysis, tking both internd intrstudy vrition into ccount, s well s the effects of subject-specific covrites. However, in prctice, current reporting of dignostic studies nd met-nlyses is methodologiclly poor, nd this level of detil in the dt would often be difficult to chieve. [1,36,45,46] Accordingly, further work to understnd the properties of the SROC curve bsed on summry dt from ech study seems wrrnted. REFERENCES 1. Irwig, L.; Tosteson, A.N.A.; Gtsonis, G.; Lu, J.; Colditz, G.; Chlmers, T.C.; Mosteller, F. Guidelines for metnlyses evluting dignostic tests. Ann. Intern. Med. 1994, 120 (8), 667 676. 2. Vmvks, E.C. Met-nlyses of studies of the dignostic ccurcy of lbortory tests. Arch. Pthol. Lb. Med. 1998, 122 (8), 675 686. 3. Deville, W.L.; Buntin, F. Guidelines for Conducting Systemtic Reviews of Studies Evluting the Accurcy of Dignostic Tests. In The Evidence Bse of Clinicl Dignosis; Knottnerus, J.A., Ed.; BMJ Books: London, 2002; 145 165. 4. Hnley, J.A. Receiver Operting Chrcteristic (ROC) Curves. In Encyclopedi of Biosttistics; Armitge, P., Colton, T., Eds.; Wiley: Chichester, 1998; 3738 3745. 5. Beck, J.R.; Shultz, E.K. The use of reltive operting chrcteristic (ROC) curves in test performnce evlution. Arch. Pthol. Lb. Med. 1986, 110 (1), 13 19. 6. Metz, C.E.; Hermn, B.A.; Shen, J.H. Mimum likelihood estimtion of receiver operting chrcteristic (ROC) curves from continuously-distributed dt. Stt. Med. 1998, 17 (9), 1033 1053. 7. Krdun, J.W.P.F.; Krdun, O.J.W.F. Comprtive dignostic performnce of three rdiologicl procedures for the detection of lumbr disk hernition. Methods Inf. Med. 1990, 29 (1), 12 22. 8. Irwig, L.; Mcskill, P.; Glsziou, P.; Fhey, M. Metnlytic methods for dignostic test ccurcy. J. Clin. Epidemiol. 1993, 48 (1), 119 130. 9. Midgette, A.S.; Stukel, T.A.; Littenberg, B. A metnlytic method for summrizing dignostic test performnces: Receiver-operting-chrcteristic-summry point estimtes. Med. Decis. Mk. 1993, 13 (3), 253 256. 10. Moses, L.E.; Shpiro, D.E.; Littenberg, B. Combining independent studies of dignostic tests into summry ROC curve: Dt-nlytic pproches nd some dditionl considertions. Stt. Med. 1993, 12 (14), 1293 1316. 11. Dorfmn, D.D.; Alf, E. Mimum likelihood estimtion of prmeters of signl detection theory nd determintion of confidence intervls Rting method dt. J. Mth. Psychol. 1969, 6 (3), 487 496. 12. Hnley, J.A. The use of the binorml model for prmetric ROC nlysis of quntittive dignostic tests. Stt. Med. 1996, 15 (14), 1575 1585. 13. McCullgh, P. Regression models for ordinl dt (with discussion). J. R. Stt. Soc., Ser. B 1980, 42 (2), 109 142. 14. M, G.; Hll, W.J. Confidence bnds for receiver operting chrcteristics curves. Med. Decis. Mk. 1993, 13 (3), 191 197. 15. Green, D.M.; Swets, J. Signl Detection Theory nd Psychophysics; Wiley: New York, 1966. 16. Hnley, J.A. Receiver operting chrcteristic (ROC) methodology: The stte of the rt. Crit. Rev. Dign. Imging 1989, 29 (3), 307 335. 17. Ogilvie, J.C.; Creelmn, C.D. Mimum likelihood estimtion of ROC curve prmeters. J. Mth. Psychol. 1968, 5 (3), 377 391. 18. Hnley, J.A.; McNeil, B.J. The mening nd use of the re under receiver operting chrcteristic (ROC) curve. Rdiology 1982, 143 (1), 29 36.

10 SROC Curve 19. Hilden, J. The re under the ROC curve nd its competitors. Med. Decis. Mk. 1991, 11 (2), 95 101. 20. Rutter, C.M.; Gtsonis, G. Regression methods for metnlysis of dignostic test dt. Acd. Rdiol. 1995, 2 (Suppl. 1), S48 S56. 21. Rutter, C.M.; Gtsonis, G. A hierrchicl regression pproch to met-nlysis of dignostic test ccurcy evlutions. Stt. Med. 2001, 20 (19), 2865 2884. 22. Deeks, J.J. Systemtic Reviews of Evlutions of Dignostic nd Screening Tests. In Systemtic Reviews in Helth Cre: Met-Anlysis in Contet; Egger, M., Dvey- Smith, G., Altmn, D.G., Eds.; BMJ Publishing Group: London, 2001. 23. Fhey, M.T.; Irwig, L.; Mcskill, P. Met-nlysis of Pp smer ccurcy. Am. J. Epidemiol. 1995, 141 (7), 680 689. 24. Irwig, L.; Bossuyt, P.; Glsziou, P.; Gtsonis, G.; Lijmer, J. Designing Studies to Ensure tht Estimtes of Test Accurcy Will Trvel. In Designing Studies on Dignostic Tests; Knottnerus, A., Ed.; BMJ Books: London, 2001. 25. Rnsohoff, D.F.; Feinstein, A.R. Problems of spectrum nd bis in evluting the efficcy of dignostic tests. N. Engl. J. Med. 1978, 299 (17), 926 930. 26. Lijmer, J.G.; Bossuyt, P.M.M.; Heisterkmp, S.H. Eploring sources of heterogeneity in systemtic reviews of dignostic tests. Stt. Med. 2002, 21 (11), 1525 1537. 27. Wlter, S.D. Properties of the summry receiver operting chrcteristic (SROC) curve for dignostic test dt. Stt. Med. 2002, 21 (9), 1237 1256. 28. DeLong, E.R.; DeLong, D.M.; Clrke-Person, D.L. Compring the res under two or more correlted receiver-operting chrcteristic curves: A non-prmetric pproch. Biometrics 1988, 44 (3), 837 845. 29. Bmber, D. The re bove the ordinl dominnce grph nd the re below the receiver operting grph. J. Mth. Psychol. 1975, 12 (4), 387 415. 30. McClish, D.K. Anlyzing portion of the ROC curve. Med. Decis. Mk. 1989, 9 (3), 190 195. 31. Thompson, M.L.; Zucchini, W. On the sttisticl nlysis of ROC curves. Stt. Med. 1989, 8 (10), 1277 1290. 32. Wiend, S.; Gil, M.H.; Jmes, B.R. A fmily of nonprmetric sttistics for compring dignostic tests with pired or unpired dt. Biometrik 1988, 76 (3), 585 592. 33. Jing, Y.; Metz, C.E.; Nishikw, R.M. A receiver operting chrcteristic prtil re inde for highly sensitive dignostic tests. Rdiology 1996, 201 (3), 745 750. 34. Lee, W.C.; Hsio, C.K. Alternte summry indices for the receiver operting chrcteristic curve. Epidemiology 1996, 7 (6), 605 611. 35. Lee, W.C. Probbilistic nlysis of globl performnces of dignostic tests: Interpreting the Lorenz curve-bsed summry mesures. Stt. Med. 1999, 18 (4), 455 471. 36. Lijmer, J.G.; Mol, B.W.; Heisterkmp, S. Empiricl evidence of design relted bis in studies of dignostic tests. JAMA 1999, 282 (11), 1061 1066. 37. McCullgh, P.; Nelder, J.A. Generlized Liner Models, 2nd Ed.; Chpmn nd Hll: London, 1989. 38. SAS Institute. SAS/STAT User s Guide, Version 8; SAS Institute: Cry, 1999. 39. Turner, R.M.; Omr, R.Z.; Yng, M.; Goldstein, H.; Thompson, S.G. A multilevel model frmework for metnlysis of clinicl trils with binry outcomes. Stt. Med. 2000, 19 (24), 3417 3432. 40. Verbeke, G.; Molenberghs, G. Liner Mied Models for Longitudinl Dt; Springer: New York, 2000. 41. Normnd, S.T. Met-nlysis: Formulting, evluting, combining, nd reporting. Stt. Med. 1999, 18 (3), 321 359. 42. Vn Houwelingen, H.C.; Arends, L.R.; Stijnen, T. Advnced methods in met-nlysis: Multivrite pproch nd met-regression. Stt. Med. 2002, 21 (4), 589 624. 43. Wlter, S.D.; Irwig, L.; Glsziou, P.P. Met-nlysis of dignostic tests with imperfect reference stndrds. J. Clin. Epidemiol. 1999, 52 (10), 943 951. 44. Bucher, H.C.; Guytt, G.H.; Wlter, S.D.; Griffith, L. The results of direct nd indirect tretment comprisons in met-nlysis of rndomized clinicl trils. J. Clin. Epidemiol. 1997, 50 (6), 683 691. 45. Wlter, S.D.; Jdd, A.R. Met-nlysis of screening dt: A survey of the literture. Stt. Med. 1999, 18 (24), 3409 3424. 46. Reid, M.C.; Lchs, S.; Feinstein, A.R. Use of methodologicl stndrds in dignostic test reserch: getting better but still not good. JAMA 1995, 274 (8), 645 651.

Request Permission or Order Reprints Instntly! Interested in copying nd shring this rticle? In most cses, U.S. Copyright Lw requires tht you get permission from the rticle s rightsholder before using copyrighted content. All informtion nd mterils found in this rticle, including but not limited to tet, trdemrks, ptents, logos, grphics nd imges (the "Mterils"), re the copyrighted works nd other forms of intellectul property of Mrcel Dekker, Inc., or its licensors. All rights not epressly grnted re reserved. Get permission to lwfully reproduce nd distribute the Mterils or order reprints quickly nd pinlessly. Simply click on the "Request Permission/ Order Reprints" link below nd follow the instructions. Visit the U.S. Copyright Office for informtion on Fir Use limittions of U.S. copyright lw. Plese refer to The Assocition of Americn Publishers (AAP) website for guidelines on Fir Use in the Clssroom. The Mterils re for your personl use only nd cnnot be reformtted, reposted, resold or distributed by electronic mens or otherwise without permission from Mrcel Dekker, Inc. Mrcel Dekker, Inc. grnts you the limited right to disply the Mterils only on your personl computer or personl wireless device, nd to copy nd downlod single copies of such Mterils provided tht ny copyright, trdemrk or other notice ppering on such Mterils is lso retined by, displyed, copied or downloded s prt of the Mterils nd is not removed or obscured, nd provided you do not edit, modify, lter or enhnce the Mterils. Plese refer to our Website User Agreement for more detils. Request Permission/Order Reprints Reprints of this rticle cn lso be ordered t http://www.dekker.com/servlet/product/doi/101081eebs120024189