Impressions of a New NCI Director: Big Data Norman E. Sharpless, M.D. Cancer Informatics for Cancer Centers April 3, 2018
www.cancer.gov www.cancer.gov/espanol October 17, 2018
NCI: Leading the National Cancer Program Bethesda Maryland NCI- Designated Cancer Centers Frederick Maryland National Clinical Trials Network 3
Why Big Data Really Matters A personal story 4
NCI Genomic Data Commons
Many Programs Generating Multimodal Data Clinical Proteomic Tumor Analysis Consortium Open Public Data TCIA Proteomic Data Coordinating Center The Cancer Imaging Archive
National Cancer Data Ecosystem Overarching goals Accelerate progress in cancer, including prevention & screening From cutting edge basic research to wider uptake of standard of care Encourage greater cooperation and collaboration Within and between academia, government, and private sector Enhance data sharing Recommendations Build a National Cancer Data Ecosystem Enhanced cloud-computing platforms Services that link disparate information, including clinical, image, and molecular data Essential underlying data science infrastructure, standards, methods, and portals for the Cancer Data Ecosystem
Enhanced Data Sharing Working Group Recommendation: The Cancer Data Ecosystem SBG CGC Broad FireCloud ISB CGC Cancer Research Data Commons
Retrospective Characterization and Analysis of Biospecimens Collected from NCI-Sponsored Trials of the National Clinical Trials Network (NCTN) and NCI Community Oncology Research Program (NCORP) Program Announcement Released: Receipt Date for Proposals: December 4, 2017 March 15, 2018 Based on the BRP recommendations, projects of particular interest to accelerate our understanding of biologic response include: Analyses in clinical settings in which it usually takes many years for complete outcome data to become available from a trial Analyses in rare tumor types Analyses in special populations (e.g., children, adolescent and young adults, racial/ethnic minority groups and underserved populations)
Retrospective Characterization and Analysis of Biospecimens Collected from NCI-Sponsored Trials of the National Clinical Trials Network (NCTN) and NCI Community Oncology Research Program (NCORP) Highest priority Hypothesis-driven proposals with detailed statistical plans. Exploratory or hypothesis-generating projects will be considered, particularly in cases of good clinical opportunity, high diversity sample representation, or building on data generated from prior analysis projects. Additional criteria Comprehensive molecular analyses of malignant and patient-matched normal samples could answer a key clinical question(s) Feasibility given number and quality of biospecimens available Acceptable timelines for provision of biospecimens and data Appropriate consent for use of specimens and appropriate data sharing plans
NCI-MATCH and Pediatric MATCH Molecular Analysis for Therapy Choice 13
NCI Molecular Analysis for Therapy Choice (NCI-MATCH) Precision oncology trial to explore treating patients based on the molecular profiles of their tumors 1,089 sites in U.S. across NCTN and NCORP 14
NCI-MATCHBox NCI-MATCHBox Team Responsibili7es Sequencing Pipeline Configura7on Seamless Integra7on with Laboratory and Clinical Systems Biospecimen Tracking Parsing, Annota7on and Variant Repor7ng Automated Pa7ent Management Workflows Treatment Arm Management and Tracking Algorithm-Driven Treatment Assignment Proficiency and Competency Tes7ng Support Data analy7cs, Visualiza7on and Repor7ng 15
NCI Molecular Analysis for Therapy Choice (NCI-MATCH) Rare Variant Initiative: Patients with low frequency mutations (< 2%) where well qualified drugs/targets available Foundation Medicine, Caris Life Sciences, MDACC, MSKCC will notify treating physician at any of the MATCH sites when results of their NGS panel would make patient eligible for a MATCH treatment arm Results verified centrally by NCI-MATCH Oncomine assay RFP from other NGS providers posted August 2017 and received January 2018 to broaden the base of patients available to enroll in precision oncology studies 16
NCI Molecular Analysis for Therapy Choice (NCI-MATCH) Time period # enrolled # first samples submitted # first sample fail # assay complete # assigned to Rx # enrolled on Rx Total Pre Pause 794 739 116 645 54 27 Total Post Pause 5,602 5,222 428 4,913 938 662 Overall Total Screening Cohort Total Outside Assay 6,396 5,961 544 5,558 992 689 104 59 3 102 88 71 17
First NCI-MATCH Efficacy Data: Nivolumab in MSI high cancers Median cycles 3.5 (range 1-13+ cycles) Median time to first response was 2.1 months (includes unconfirmed PRs) 6-Month PFS was 49% (95% CI: 32-67%) Median duration of response has not been reached (4-8+ months; 7/8 still under treatment at time of data cutoff) 11 patients remain on therapy at time of data cutoff 18
NCI-COG Pediatric MATCH 19
Pediatric MATCH Active Therapeutic Arms 20
Pediatric MATCH Enrollment First 131 patients: 74 males, 57 females Age 1-21, median age 12 yrs 35% patients AYA Tumor sequencing completed on 94 patients At least one patient has matched to each of the treatment arms 45 40 35 30 25 20 15 10 5 0 Monthly AcAvity 2017-07 2017-08 2017-09 2017-10 2017-11 2017-12 2018-01 2018-02 registraaon specimen_received assay_completed 21
BRCA Challenge Program Overview Mission: Improve care of patients at risk of breast and ovarian cancer using global data sharing and collaboration in the analysis of BRCA1 and BRCA2. 1. Share BRCA1 and BRCA2 variants publicly via a web portal 2. Address social, ethical, legal challenges to global data sharing 3. Create a GA4GH model for all disease genes Major milestones BRCA Exchange >18,000 variants, multiple sites 1/3 expert-classified with supporting rationale Coming soon: mobile app with alert function 22
BRCA Exchange Website brcaexchange.org Flexible searching Drill down to extra info Tiled format Versioning Variant level Dataset level 24
NCI SEER Program Surveillance, Epidemiology, and End Results 25
The SEER Program Funded by NCI to support research on the diagnosis, treatment and outcomes of cancer since 1973 16 population-based registries covering 28% of the US population Registries collect information on all cancer cases for residents of the state or region Representing racial and ethnic minorities Various geographic subgroups 450,000+ incident cases annually Approximately 85% of cases with real time electronic pathology (e-path) reporting 26
Walgreen s Data for Georgia: Table of frequency distribution of oral antineoplastic drugs by generic category (2013-2016) Initial pilot in GA once data assessed will scale to entire SEER program. 20,000 Total unique patients with 225,420 fills These types of real world data will permit: Monitoring of patient compliance Assessing the use of these agents in the context of outcome differences in use across subpopulations - disparity analysis Drug Category Unique Patient / Prescription Count (2013 2016) Antineoplastic - Hormonal and Related Agents 16,806 Antimetabolites 7,032 Antineoplastics Misc. 3,345 Antineoplastic Enzyme Inhibitors 1,642 Alkylating Agents 1,008 Chemotherapy Rescue/Antidote Agents 524 Antineoplastic - Immunomodulators 222 Mitotic Inhibitors 122 Topisomerase I Inhibitors 26 Antineoplastic - Antibodies 17 Atineoplastic or Premalignant Lesion Agents - Topical Antineoplastic Angiogenesis Inhibitors 4 Diagnostic Drugs 5 Antineoplastic Antibiotics 5 14 Total 30,772
Trends in checkpoint inhibitor use in oncology practices Captured from Unlimited Systems claims (2013-2017)* Once scaled to SEER, linked claims data will permit: Evaluation of use in the context of demographics and outcome Monitoring diffusion of agents Measuring use across subgroups of the population (potential for disparities research) *Represents 12-35% of oncologists in 5 registries 28
Variation in genetic testing in breast and ovarian cancers by race/ethnicity (California and Georgia) Overall Testing Rates (2013-2015) 26% of all 82,120 Breast Cancers 33% of all 6,268 Ovarian Cancers 29
Capturing outcomes other than survival: Two methods NLP/Machine learning solutions Working with Department of Energy partners to develop deep learning algorithms to extract recurrence as distant metastatic disease from unstructured text documents (pathology and radiology reports) Patient-generated data within the registry Working with partners to test solutions, e.g., patient portals, direct patient reporting, and patient-generated data sources (2 studies in process) 30
Department of Energy Pilot Project NLP / Machine learning solutions Develop deep learning algorithms to extract recurrence as distant metastatic disease from unstructured text documents (pathology and radiology reports) 31
Big Issues in Big Data Facing NCI Workforce and career development EHR Mining Storage What? How Long? Cloud? Security, privacy and de-identification Use of challenges / prizes CBIIT leadership 35
www.cancer.gov www.cancer.gov/espanol