Blinded, independent, central image review in oncology trials: how the study endpoint impacts the adjudication rate

Blinded, independent, central image review in oncology trials: how the study endpoint impacts the adjudication rate Poster No.: C-0200 Congress: ECR 2014 Type: Authors: Keywords: DOI: Scientific Exhibit O. Bohnsack, M. Lesch, A. Urbank; Berlin/DE Oncology, Computer applications, CT, MR, Computer Applications-Detection, diagnosis, Cancer, Image verification 10.1594/ecr2014/C-0200 Any information contained in this pdf file is automatically generated from digital material submitted to EPOS by third parties in the form of scientific presentations. References to any names, marks, products, or services of third parties or hypertext links to thirdparty sites or information are provided solely as a convenience to you and do not in any way constitute or imply ECR's endorsement, sponsorship or recommendation of the third party, information, product or service. ECR is not responsible for the content of these pages and does not make any representations regarding the content or accuracy of material in this file. As per copyright regulations, any unauthorised use of the material or parts thereof as well as commercial reproduction or multiple distribution by any traditional or electronically based reproduction/publication method ist strictly prohibited. You agree to defend, indemnify, and hold ECR harmless from and against any and all claims, damages, costs, and expenses, including attorneys' fees, arising from or related to your use of these pages. Please note: Links to movies, ppt slideshows and any other multimedia files are not available in the pdf version of presentations. www.myesr.org Page 1 of 14

Aims and objectives The Adjudication Challenge A persisting topic of concern with a blinded, independent, central review of medical images is the adjudication rate between reviewers. Imaging in oncology clinical trials does not lose but rather gains increasing importance. However with double-reads by more than a single reviewer one will always encounter adjudication. It seems that the adjudication rate will never be 0%. Thus we strive to answer these two questions: What is an acceptable adjudication rate? How does its mere existence have a decisive impact on trial endpoints? Based upon the analysis of review data from oncology studies with different indications and endpoints we determined: the adjudication rates their different definition and meaning how those are derived their value and use in these clinical trials their impact on data validity for study teams. While the occurrence of adjudication does not necessarily imply, that one of the reviewers made a mistake in assessing patients' radiographic images, it still shows a discrepancy in opinion, in lesion selection, in tumor burden evaluation, in lesion measurement or in qualitative assessment, which all lead to a discrepancy in the review results, which in turn impact the study results. We focus on the review design and adjudication based on RECIST evaluation. RECIST is meant to be simply relying on just three assessment aspects: 1) quantitative, measuring diameters of target lesions; 2) qualitative, not measured non-target lesions; 3) new lesion identification. These three aspects are the basis of unwanted discrepancies. One would assume that many years of Radiology experience and ideally also as an independent reviewer is the driving factor for a low adjudication rate. Our analysis shows that the experience and CV-based qualification of the reviewers seems not to have a major influence on the disagreement of these experts' opinions. However the pairing of tightly monitored reviewers can have such an impact, but does this mean an artificially decreased adjudication rate is better or the results are more correct? What are ideas and options for reducing the adjudication rate in future oncology trials? Page 2 of 14

Images for this section: Fig. 1: Tough choices for an adjudicator Page 3 of 14

Methods and materials The Adjudication Method The design of adjudication can vary from study, indication and study design. Perceptive Informatics uses in standard oncology studies with response evaluation according to RECIST or IWG criteria for lymphoma the following model (see Figure 2 right side): In a disagreement between two primary readers, the readings will be given to a third radiologist for adjudication. This adjudicator will then decide, which of the two primary readers is "more right". The adjudicator will review the data from the assessments of the two primary radiologists and will determine the final radiologic outcome for the case, which must be one of the two primary reviewers' assessment. The adjudicator is not allowed to bring in a third opinion. Scenario in Figure 2, right: The adjudication is triggered based on the different timepoint response assessment between reviewer green and orange. In this example at baseline both select different measurable target lesions from the total lesion burden. At timepoint 2 one single lesion increases, selected as a target by only one. Reviewer orange calculates correctly a progression, whereas reviewer green correctly calculates partial response. Both reviewers' assessments are correct. This is one example how differently chosen lesions may trigger adjudication without any reviewer error made. Based on the clinical endpoints defined by the currently applicable FDA guidelines from 2007. Table 1 on the right summarizes the Perceptive Informatics Imaging recommendations for adjudication triggers. We recommend to chose these in accordance with the defined "imaging" related study endpoints. You will find that we do not recommend a timepoint by timepoint adjudication, in order to ease the review design and to reduce the adjudication rate. Images for this section: Page 4 of 14

Fig. 2: Standard oncology adjudication model Page 5 of 14

Page 6 of 14

Table 1: Adjudication based on "imaging" related study endpoints (aligned with the FDA guidelines, 2007) Page 7 of 14

Results The Adjudication Truth Adjudication rates can be derived differently with different meanings and as such with different impact. In Table 2 those 4 exemplary scenarios describe how various discrepancies, with or without Baseline computation, determine significant impacts on published adjudication rates. Baseline timepoints shall not be included in the adjudication rate calculation, since there is no assessment made at this timepoint, only the lesions' selection. However, even those values are not yet "the real truth". If we were to compute the real discrepant timepoints, one would have to use only the actual patient's timepoints. Images for this section: Table 2: Discrepancies computations, with and without Baseline Page 8 of 14

Conclusion The Adjudication Message Based on Table 1. we generated the Perceptive internal database analysis and summarize our findings in Table 3, right. The five most common cancer types in oncology studies are chosen to determine whether the indication could impact the adjudication. An average of 2-3 studies per indication is used to establish an overall % calculation based on indication in relation to study imaging endpoints. To discuss a standard adjudication rate across all indications an average percentage adjudication rate is computed. The outcome of this analysis is presented into the two main imaging related endpoints to visualize the impact of the study endpoint on adjudication rate: 1. Timepoint by timepoint based response assessment and 2. Progression vs. Non-Progression The adjudication rate is distinguished between total number of patients, discrepant time point assessments including the baseline, and discrepant time point assessments excluding the baseline. The database analysis shows a correlation to the indication and the study endpoint. A global, average adjudication rate may not reflect the truth as it is obvious that the rate depends on the indication itself. Especially, in ovarian cancer and breast cancer a higher adjudication rate is observed than for lung, prostrate, renal, and colorectal cancer. Each indication has its own challenge: evaluation of lymph nodes in breast cancer, or distinguish a benign cyst from a malignancy in ovarian cancer, or weather lesions can be considered measurable or not. Reasonable questioning of imaging derived data and its validity are commonly seen and challenged for meaning, clarification, and understanding. "The higher the adjudication rate the more questionable is the credibility of my data?" The rate is expected to be higher for a response rate study than for a progression free survival study. Page 9 of 14

"Is data of central review analysis more powerful than local sites' assessments?" Relying purely on the investigators' analysis bears the risk that other factors play a decisive role whether to keep a patient on study or to change treatment. It is very difficult to expect pure objectivity and only image-based patient treatment decisions. The rigour needed for a robust data analysis with standardized, reproducible results per patient is nearly impossible in the daily routine in a hospital. Any misinterpretations, different lesions selection, different approaches, or plain errors will simply not be captured in such deep details as in a central review with unbiased, blinded reviewers who follow strict Charter defined rules. "How can the adjudication rate be reduced?" We see different ways to reduce the adjudication rate where overall the "preventive" approach starts with the selection of experienced readers. Nevertheless this will not reduce the need for thorough reviewer trainings and reviewer oversight during the course of the study. The training shall be representative of imaging and review scenarios to be expected for the trial. A very good way to address and or resolve differences in interpretive 'style' is the evaluation in consensus and individual sessions which is fundamental to ensure reviewer agreements and promotes a uniform assessment approach. It either raises or lowers the bar between highly conservative and less conservative. If you seek to have your adjudication rate to be 0 % then do not have double reads, chose single reads instead. The Unspoken Truth "The adjudication rate will never be 0% as long evaluation is in human hands!" Images for this section: Page 10 of 14

Table 3: Adjudication based on imaging related endpoints per indication Page 11 of 14

Personal information Bohnsack Oliver; Perceptive Informatics, Inc., A PAREXEL Company Lesch Manuela; Perceptive Informatics, Inc., A PAREXEL Company Urbank Anja; Perceptive Informatics, Inc., A PAREXEL Company Fig. 3 References: Perceptive Informatics, Imaging - Berlin/DE oliver.bohnsack@parexel.com Images for this section: Page 12 of 14

Fig. 3 Fig. 4 Page 13 of 14

References 1. US Department of Health and Human Services, Food and Drug Administration, Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics, (FDA, Rockville, MD, 2007). 2. K. Borradaile, R. Ford, M. O'Neal, K. Byrne "Discordance Between BIRCR Readers" Applied Clinical Trials Online, Supplement, November 2010. 3. Ford R, Schwartz L, Dancey J, Dodd LE, Eisenhauer EA, Gwyther S, Rubinstein L, Sargent D, Shankar L, Therasse P, Verweij J. "Lessons learned from independent central review" Eur J Cancer. 2009 Jan;45(2):268-74. doi: 10.1016/j.ejca.2008.10.031. 4. P. Therasse, S.G. Arbuck, E.A. Eisenhauer, et al, "New Guidelines to Evaluate the Response to Treatment in Solid Tumors (RECIST Guidelines)," Journal of the National Cancer Institute 92 (3) 205-216 (2000). 5. E.A. Eisenhauer, P. Therasse, J. Bogaerts, et al, "New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1)," European Journal of Cancer 45 (3) 228-247 (2009). Page 14 of 14