Geisinger Clinic Annual Progress Report: 2011 Nonformula Grant Reporting Period July 1, 2012 June 30, 2013 Nonformula Grant Overview The Geisinger Clinic received $1,000,000 in nonformula funds for the grant award period June 1, 2012 through May 31, 2014. Accomplishments for the reporting period are described below. Research Project: Project Title and Purpose Diagnostic-Prognostic Testing in Patients at High Risk for Esophageal Cancer The purpose of this project is to clinically validate a diagnostic-prognostic test for esophageal cancer, which will accurately diagnose at a premalignant stage and predict which patients are at high risk for esophageal cancer to enable early, preventative therapy. A prototype test has been developed and proof-of-concept of the testing technology has been established in collaborative work by Geisinger and Cernostics. The project aims to perform clinical validation studies in a training cohort and two independent validation cohorts of esophageal biopsies with clinical outcome data from Geisinger, University of Pittsburgh and University of Pennsylvania to select diagnostic and prognostic classifiers and to establish the sensitivity, specificity and positive and negative predictive values of the diagnostic-prognostic test for patients at high risk for esophageal cancer. Anticipated Duration of Project 6/1/2012 5/31/2014 Project Overview The broad objective of the research is to clinically validate a diagnostic and prognostic test that accurately assigns diagnosis and predicts risk of developing esophageal cancer. The test is a spatial systems biology-based approach to anatomic pathologic testing. The test employs multiplexed fluorescence labeling of tumor system biomarkers, including malignant, immune and stromal processes in anatomic pathology specimens with digital imaging and image analysis to quantify biomarker expression and spatial relationships between biomarkers in the context of tissue morphology. This is coupled to classifier software to integrate biomarker data with morphology data and clinical data to produce diagnostic and prognostic scores. These scores will be used to accurately diagnose and predict the risk of developing esophageal cancer in individual patients to enable early treatment. A prototype test has been collaboratively developed by Geisinger (lead applicant) and Cernostics, Inc. (small business collaborator) as a proof-ofconcept. As a next step, a consortium of investigators will perform retrospective clinical Therapeutics Page 1
validation studies of the test towards the long term goal of commercializing the test via a CLIAcertified laboratory. The test will be performed first in a training cohort of formalin-fixed paraffin-embedded esophageal biopsies with clinical data from Geisinger using Cernostics spatial systems biology technology, and diagnostic and prognostic classifiers will be developed. The test, including the classifiers, will then be performed in two independent validation patient cohorts from the University of Pittsburgh and the University of Pennsylvania to determine specificity, sensitivity and positive and negative predictive values of the diagnostic-prognostic test. The specific research aims are, 1) Determine the performance of the prototype test in stratifying patients according to diagnosis and predicting risk for esophageal cancer in a retrospective training patient cohort; and 2) Validate the diagnostic and prognostic performance of the optimized diagnostic-prognostic test in two independent retrospective patient cohorts. The training and validation cohorts represent both urban and rural populations and are designed to reach the maximum number of the underserved and will ensure a significant statewide impact on the health of Pennsylvanians. Paralleling the proposed project, Cernostics and Geisinger will perform further analytical validation studies on the test. The test will be commercialized by Cernostics and will be offered as a service to pathologists and gastroenterologists to guide individualized patient management to help prevent the development of esophageal cancer. Principal Investigator Jeffrey W. Prichard, DO Director of Surgical Pathology Geisinger Medical Laboratory 100 North Academy Avenue Danville, PA 17822 Other Participating Researchers Jinhong Li, MD, PhD; David L. Diehl, MD employed by Geisinger Clinic Rebecca J. Critchley-Thorne, PhD; Bruce Campbell, MS employed by Cernostics, Inc. Gary W. Falk, MD, MSc; Anil K. Rustgi, MD; Nirag Jhala, MD, PhD employed by the University of Pennsylvania Jon M. Davison, MD; Chakra Chennubhotla, Ph.D. employed by the University of Pittsburgh Blair A. Jobe, MD; Ali H. Zaidi, MD employed by West Penn Allegheny Health System Yi Zhang, Ph.D. consultant statistician Expected Research Outcomes and Benefits The project employs a testing technology for which Geisinger Health System and Cernostics have demonstrated proof-of-concept. The investigators have selected a comprehensive panel of diagnostic and prognostic biomarkers, many of which have established significance in diagnosing the stages of Barrett s esophagus and in predicting risk for esophageal cancer. Therefore, the expected research outcomes of the project are classifiers based on optimal sets of biomarker, morphology and clinical data that can accurately assign diagnosis and predict whether a patient will develop high grade dysplasia or esophageal cancer and also estimate the Therapeutics Page 2
sensitivity, specificity and overall accuracy of the diagnostic-prognostic test. It is expected that the test will have high sensitivity and specificity and high positive and negative predictive values based on the known diagnostic and prognostic significance of the panel of biomarkers and based on the high stringency of feature selection for the classifiers. It is also expected that the research will identify a key set of biomarkers and related molecular pathways involved in the progression of Barrett s esophagus to esophageal cancer, which will lead to a better understanding of the biology and behavior of esophageal cancer and aid in the design of new therapeutic agents to prevent and treat esophageal cancer. The diagnostic utility of the test will improve health status by increasing the accuracy of pathological diagnosis, thus reducing the number of repeat endoscopies and biopsies that patients with Barrett s esophagus must currently undergo, particularly for patients who are initially diagnosed as indefinite/indeterminate for dysplasia. The prognostic utility of the test will improve health status by identifying patients at high risk for developing esophageal cancer early in the disease progression when treatments such as endoscopic mucosal resection and radiofrequency ablation can be applied to effectively prevent development of cancer. The prognostic utility will also identify low risk patients, who will not develop esophageal cancer, and who can be spared unnecessary endoscopies, biopsies and treatments. The expected benefits of the project include; significant improvements in diagnostic and prognostic accuracy to prevent delays in treatment of patients at high risk for esophageal cancer, and a reduction in unnecessary and costly endoscopies and biopsies. This individualized approach will benefit patients by reducing the incidence and mortality associated with esophageal adenocarcinoma and will benefit health care systems by targeting treatments and screenings to the high-risk patients who need them. Summary of Research Completed Specific Aim 1 - Milestones for period 7/1/12 6/30/13: 1: Biomarker panel stained, imaged and quantified on 220 esophagus biopsies, 2: Selection of optimal panel of approx of 10-15 biomarker features, morphology and clinical data features to assign diagnosis and prognosis, 3: Automated classifier algorithm to integrate optimal set of data features into diagnostic and prognostic indices. Progress towards achieving the milestones associated with specific aim 1 for this project period is described below. Geisinger Clinic cohort for retrospective training study: The investigators at Geisinger Clinic sent tissue sections from 211 Barrett s cases to Cernostics to be stained, imaged and analyzed during this project period. This brings the total cohort size to 470 cases from 215 patients with Barrett s esophagus (case collection for this cohort began before the start of the project). The cohort from Geisinger Clinic is summarized in Table 1. The tissues were reviewed by a gastrointestinal pathologist to confirm diagnosis. Elements of deidentified clinical and pathological data were also provided. Additional data elements are being extracted from the electronic medical records and will be de-identified for use in the study. The Therapeutics Page 3
investigators have also continued work on a relational database to store the de-identified Barrett s data extracted from various clinical databases at Geisinger. Selection of Prognostic Features and Prognostic Classifier Algorithm: Cernostics investigators have continued to develop and refine their image analysis software for use in the project. Figure 1A shows whole slide digital images of an example Barrett s biopsy labeled with the traditional histologic stain (Hematoxylin and Eosin) and labeled by 4-color multiplexed immunofluorescence for Barrett s biomarkers. Cernostics has developed image analysis masks that segment Barrett s biopsies into subcellular structures (nuclei, cytoplasm and plasma membrane) and tissue structures such as surface epithelium, glands, metaplastic cells, stroma, as shown in Figure 1B. These software tools allow measurement of specific biomarkers and morphology measurements to be made within each of these specific subcellular and tissue compartments (summarized in Figure 1C). The investigators have worked closely with an independent consultant statistician, Yi Zhang, Ph.D. to develop and test preliminary prognostic classifiers to stratify patients according to risk of developing high grade dysplasia (HGD) or esophageal cancer (EAC). 14 protein biomarkers were measured by image analysis in whole slide images of 241 Barrett s biopsies from 192 patients. Multiple measurements were made per biomarker, resulting in a total of 270 biomarker measurements extracted from each biopsy. Each measurement was summarized as 7 statistics, i.e. the 270 features produced 1,890 statistics per biopsy. Development of HGD or EAC was considered the high risk event and the low risk group was used as censoring. Patients with baseline HGD are more likely to develop EAC, and to develop it faster, than patients with lower baseline diagnosis, which is accounted for in the statistical analyses of the data. 3 different COX models were developed for each biomarker measurement/statistic: i. Univariate with each biomarker measurement/statistic, ii. Univariate adjusted by diagnosis (HGD vs all others), iii. Univariate adjusted by diagnosis, age, gender and race. 783 biomarker measurement/statistics were significantly associated with risk and 389 of the 783 significant biomarker measurement/statistics were associated with risk independently of diagnosis. All 783 biomarker measurement/statistics associated with risk were evaluated in multivariate analysis since diagnosis is associated with risk of progression in Barrett s. Age, gender and race were not significantly associated with risk of progression in this data set and were not adjusted for in the subsequent multivariate Cox modeling. The most significant biomarker measurement/statistics were then used in COX proportional hazard modeling. The top 5, 10, 15,, 50 biomarker measurement/statistics were evaluated. Leave-one-out cross validation (LOOCV) was performed by setting 1 test case aside and using all the remaining cases as a training set to select biomarker measurement/statistics as described above. The final prediction model was built using a Cox model that uses the selected biomarker measurement/statistics on the training set. This model was applied on 1 test case at a time to calculate the probability of remaining high risk free at 2, 3 and 4 years. This process was repeated until each case was treated as the test case once. At the end of LOOCV each case had an assigned probability of remaining high risk free at 2 years, 3 years and 4 years. C-indices were then calculated to assess the predictive power of the classifiers based on increasing numbers of biomarker measurement/statistics and at the 3 time points. Leave-one-out-cross-validation (LOOCV) is a method to assess how the prognostic classifier will generalize to an independent set. Each biopsy is used as a validation case (i.e. the one left Therapeutics Page 4
out), while the remaining biopsies are used as training set. This is repeated until all biopsies have been used as the validation case (i.e. all biopsies have been the one left out). The validation results are averaged across the multiple rounds of cross validation to produce the C-index, which indicates overall discrimination power of the classifiers across the cohort. Kaplan-Meier survival curves were plotted with one probability cutoff, which stratifies patients into low and high risk groups, and with two cutoffs, which stratify patients into low, intermediate and high risk groups. Results from the prognostic classifier testing are shown in Figure 2. A prognostic classifier based on 25 biomarker measurement/statistics derived from 5 biomarkers and DNA measurements with a single cutoff at 0.6 stratifies patients into two risk groups (low or high) with hazard ratio 7.2, p value <0.001, sensitivity of 64% and specificity of 87% (Figure 2B). This translates into 64% of the high risk cases being accurately classified and 87% of the low risk cases being accurately classified. This preliminary classifier identifies high risk patients who would be considered low risk by current standard pathology. The position of the classifier cutoff can be varied, which influences sensitivity and specificity, as shown in Figure 2A-C. The classifier with two cutoffs stratifies patients into 3 risk groups (low, intermediate or high) with hazard ratio of 12.2 (low versus high) and 3.58 (low versus intermediate) and p value of <0.0001 (Figure 2D). Receiver operating characteristic (ROC) curve is shown in Figure 2E. The ROC curve illustrates the performance of the classifier system as its cutoff/threshold is varied. The ROC curve was created by plotting the true positive rate versus the false positive rate at various cutoff/threshold settings. Specific Aim 2 - Milestone for period 7/1/12 6/30/13: Construction of validation cohort of approx Barrett s esophagus biopsies from 400 patients with relevant clinicopathological data. 400 existing patient samples and de-identified data will be collected 6/1/12 6/30/13. Progress towards achieving the milestone associated with specific aim 2 for this project period is described below. University of Pittsburgh Cohort for Validation Study: The investigators at the University of Pittsburgh performed a retrospective search of the electronic medical record to identify patients who had undergone upper gastrointestinal endoscopy with biopsies by using natural language search capabilities in our laboratory information system CoPath. A set of biopsies representing over 20,000 individual patients and over 32,000 individual encounters (an encounter is a set of biopsies) were identified. There were over 5,300 patients with more than one biopsy. The median clinical surveillance interval (time between first and last set of biopsies for a given patient with more than one biopsy) is approximately 2 years (mean, approximately 3.5 years). The diagnosis of all 32,000 encounters were classified based on the degree of dysplasia (no dysplasia up to carcinoma) and the presence or absence of Barrett s esophagus. Of these, there were over 9,600 patients with a diagnosis of Barrett s esophagus and over 3,000 patients with high grade dysplasia or adenocarcinoma (many with only a single biopsy). The investigators at University of Pittsburgh have sent samples from a total of 37 cases from 31 patients to Cernostics to be stained, imaged and analyzed. The cohort status is summarized in Therapeutics Page 5
Table 1. The investigators are currently utilizing the database that they constructed to select the remainder of the Barrett s cohort and have identified an additional 62 patients for the study, e.g. patients with Barrett s esophagus who did not progress to high grade dysplasia or carcinoma during surveillance; patients with prevalent high grade dysplasia or adenocarcinoma; patients who developed high grade dysplasia or adenocarcinoma during surveillance. The University of Pittsburgh investigators continue to query the dataset to identify additional patients/cases for the cohort. University of Pennsylvania Cohort for Validation Study: A query of Barrett s pathology results has been done at the University of Pennsylvania. 21 cases from 4 patients have been sent to Cernostics to be stained, imaged and analyzed thus far. The cohort status is summarized in Table 1. The following additional patients/cases have been identified and will be sent to Cernostics: 42 patients that are non-progressors after 5 or more years of follow up, 6 patients that have progressed from no dysplasia/indefinite and/or low grade dysplasia to high grade dysplasia and/or adenocarcinoma and 13 patients with high grade dysplasia. The University of Pennsylvania investigators continue to query the dataset to identify additional patients/cases for the cohort. Therapeutics Page 6
Tables and Figures. Table 1. Summary of Barrett s Cohorts at Each Clinical Institution Institution Cases Patients Low Risk Patients High Risk Patients High Recur Patients HGD/ EAC Patients Geisinger Clinic University of Pittsburgh University of Pennsylvania 470 215 135 35 10 35 37 31 9 12 1 9 21 4 1 2 1 0 Definitions: Case is a set of biopsies in a single paraffin block and some patients have multiple cases from different parts of the esophagus or from different surveillance time points in the cohorts; Low risk patients did not develop HGD/EAC over the course of 5 years surveillance; High risk patients developed HGD/EAC during surveillance; High recur patients were treated for HGD/EAC then entered surveillance and had a recurrence of HGD/EAC. Therapeutics Page 7
Figure 1. Figure 1. Summary of Cernostics Image Analysis Tools Developed and/or Modified During this Project Period. A: Whole slide images of histologically-stained Barrett s biopsy and multiplexed fluorescence-stained Barrett s biopsy, B: Tissue and cellular image analysis masks, C: Biomarker and morphology measurements made on each Barrett s biopsy. Therapeutics Page 8
Figure 2. Figure 2. Preliminary Prognostic Classifier Distinguishes High Risk from Low Risk Barrett s Patients. A prognostic classifier based on 25 measurements/statistics derived from 5 biomarkers and DNA measurements with a single cutoff stratifies patients according to two risk groups (low or high). The significance and performance of the classifier (measured by hazard ratio, p value, sensitivity and specificity) varies depending on the position of the classifier cutoff (A (cutoff 0.4), B (cutoff 0.6), C (cutoff 0.9). The use of two cutoffs produces a 3 tier classifier (D) that stratifies patients according to three risk groups (low, intermediate and high). Receiver operating characteristic curve is shown (E) that has area under the curve of 0.856. Therapeutics Page 9