Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form

Similar documents
AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Supplementary Figure 1

Session 4 Rebecca Poulos

Transform genomic data into real-life results

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Session 4 Rebecca Poulos

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Integrated Analysis of Copy Number and Gene Expression

Building Cognitive Computing for Healthcare

KEY FINDINGS 1. Potential Clinical Benefit in Non-Small Cell Lung Cancer with Gefitinib, Erlotinib, Afatinib due to EGFR E746_A750del. 2. Potential Cl

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Module 3: Pathway and Drug Development

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

NCCN Non-Small Cell Lung Cancer V Meeting June 15, 2018

Analysis with SureCall 2.1

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Cancer Informatics Lecture

DeSigN: connecting gene expression with therapeutics for drug repurposing and development. Bernard lee GIW 2016, Shanghai 8 October 2016

Standardize and Optimize. Trials and Drug Development

Tumor mutational burden and its transition towards the clinic

Next generation diagnostics Bringing high-throughput sequencing into clinical application

NGS ONCOPANELS: FDA S PERSPECTIVE

6/12/2018. Disclosures. Clinical Genomics The CLIA Lab Perspective. Outline. COH HopeSeq Heme Panels

NGS IN ONCOLOGY: FDA S PERSPECTIVE

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Data mining with Ensembl Biomart. Stéphanie Le Gras

SALSA MLPA probemix P315-B1 EGFR

MY LUNG CANCER CARE PLAN

Clinical Biomarker in Kidney Cancer. Maria Nirvana Formiga, M.D., Ph.D.

Corporate Medical Policy

Reporting TP53 gene analysis results in CLL

The Cancer Genome Atlas & International Cancer Genome Consortium

The Role of Next Generation Sequencing in Solid Tumor Mutation Testing

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13

5 th July 2016 ACGS Dr Michelle Wood Laboratory Genetics, Cardiff

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They?

Supplementary Online Content

Refining Prognosis of Early Stage Lung Cancer by Molecular Features (Part 2): Early Steps in Molecularly Defined Prognosis

Introduction to the Partners Biobank Portal. December 2016

LUNG CANCER. pathology & molecular biology. Izidor Kern University Clinic Golnik, Slovenia

Clinical Grade Genomic Profiling: The Time Has Come

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Personalised cancer care Information for Medical Specialists. A new way to unlock treatment options for your patients

Hands-On Ten The BRCA1 Gene and Protein

SAGE. Nick Beard Vice President, IDX Systems Corp.

IPA Advanced Training Course

COSMIC - Catalogue of Somatic Mutations in Cancer

Infrastructure for Clinical Data Exchange

Molecular Testing in Lung Cancer

Shashikant Kulkarni, M.S (Medicine)., Ph.D., FACMG Associate Professor of Pathology & Immunology Associate Professor of Pediatrics and Genetics

7/6/2015. Cancer Related Deaths: United States. Management of NSCLC TODAY. Emerging mutations as predictive biomarkers in lung cancer: Overview

Steps to Creating a New Workout Program

From somatic variants towards precision oncology: Evidence-driven reporting of treatment options in molecular tumor boards

Comprehensive Genomic Profiling, in record time. Accurate. Clinically Proven. Fast.

Variant interpretation exercise. ACGS Somatic Variant Interpretation Workshop Joanne Mason 21/09/18

Statistical Considerations for Novel Trial Designs: Biomarkers, Umbrellas and Baskets

Introduction to LOH and Allele Specific Copy Number User Forum

I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

Precision Medicine Knowledgebase (PMKB)

David Tamborero, PhD

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Updated Molecular Testing Guideline for the Selection of Lung Cancer Patients for Treatment with Targeted Tyrosine Kinase Inhibitors

Journal: Nature Methods

Data Sharing Consortiums and Large Datasets to Inform Cancer Diagnosis

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Comprehensive genomic profiling for various solid tumors

Oncofocus. Patient Test Report

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Lipid annotation with MS2Analyzer. Yan Ma 10/24/2013

1. Q: What has changed from the draft recommendations posted for public comment in November/December 2011?

1.Basis of resistance 2.Mechanisms of resistance 3.How to overcome resistance. 13/10/2017 Sara Redaelli

Evolution of Pathology

Re: NCI Request for Information, Strategies for Matching Patients to Clinical Trials (NOT-CA )

EGFRIndb: Epidermal Growth Factor Receptor Inhibitor database

A Method for Analyzing Commonalities in Clinical Trial Target Populations

LibreHealth EHR Student Exercises

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

The clinical trial landscape in oncology and connectivity of somatic mutational profiles to targeted therapies

Using the NIH Collaboratory's and PCORnet's distributed data networks for clinical trials and observational research - A preview

Chapter 12 Conclusions and Outlook

The Expanding Value of Biomarkers in NSCLC Treatment

December 13, The Future Reimbursement Environment for NGS for Oncology

Phenotype analysis in humans using OMIM

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2011 Formula Grant

Automatic Context-Aware Image Captioning

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Facts from text: Automated gene annotation with ontologies and text-mining

Structural Variation and Medical Genomics

IMPaLA tutorial.

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Next Generation Sequencing in Clinical Practice: Impact on Therapeutic Decision Making

DPV. Ramona Ranz, Andreas Hungele, Prof. Reinhard Holl

Clonal Evolution of saml. Johnnie J. Orozco Hematology Fellows Conference May 11, 2012

MET skipping mutation, EGFR

Liquid biopsy: the experience of real life case studies

Lung Cancer Concept Annotation from Spanish Clinical Narratives

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Disclosures Genomic testing in lung cancer

Transcription:

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1

PubGene, founded 2001 ArrayIt H25K microarray Integration of structured and unstructured information Interpretation of biomedical analysis data General information Specialized information analysis Scientific Literature Coremine Networks COREMINE Oncology COREMINE Medical COREMINE Platform 2

Clinical NLP in PubGene - examples Clinical trials in Coremine Oncology PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 3

Coremine Oncology AIM: To enable oncologists to make better treatment decisions HOW: Combine data from relevant sources to aid interpretation of oncogenomics data from NGS and other platforms Input: Somatic mutations, copy number changes, gene expression, or similar quantity Output: Gene/biomarker annotations, related drugs and drug sensitivity, pathways, clinical trials, etc. 4

Coremine Oncology Our Scope We focus on: Analysis of called events ; assumed that normalization and data quality considerations have been taken care of Collecting and integrating information for interpretation Linking to potentially relevant treatments Linking to clinical trials related to the input data 5

Coremine Oncology Currently three types of input data: (Somatic) mutations Copy number changes gene expression Analysis/Interpretation module to display information (annotations) about Mutation Gene/Protein Protein Domains Summary module to show patient level information with respect to: Statistics on mutations Related drugs for targets with change (in progress: also biomarker and sensitivity info) Pathways for targets with change Relevant clinical trials for aberrations 6

Example Somatic mutations input data Input for Coremine Oncology, case from lung cancer Chromosome number Position Reference nucleotide Alternate nucleotide 7

View of imported data file

Mutation annotation 1 patient - 1 missense mutation

10

Clinical Trials for Cetuximab 11

Clinical Trials for biomarkers AIM: To map biomarkers from patient data to relevant clinical trials METHOD: Identify how biomarkers are mentioned (referred to) in clinical trials Download and index data from clinicaltrials.gov Develop dictionaries of biomarkers and methods for detecting these in trial descriptions Focus on eligibility CHALLENGES Text mining is difficult! Biomarkers are described, or referred to in many ways Ultimately, we want to identify biomarkers related to eligibility, but this is not straightforward Complicated logic in inclusion/exclusion criteria, e.g., negation Also need to check title, description, and condition for biomarkers 12

Clinical Trials text data mining Compiled several lists of biomarkers of different types: Single-Nucleotide mutations (Cosmic) Polymorphisms Fusion genes Gene regulation (Exp-up/down) Copy number changes Several strategies for finding these in text: Detect explicit mentions Detect patterns based on gene name and marker type, e.g., GENE amplification GENE activating mutation Curated list of cancer types matched with conditions Statistics for patterns Expression: 135 CNV: 32 Other (positive/negative): 20/10 Mutation: 37 Fusion/rearrangement/translocation: 10 Indexing statistics 5350 trials with at least one biomarker 855 different biomarkers with hits Top markers: BCR/ABL1 (907), ERBB2 positive (725), ERBB2 negative (603), ESR1 positive (467), ERBB2 exp-up (403) 13

Clinical Trials for example case NSCLC and Erlotinib 14

Clinical Trials for copy number data (CNV) 15

Trials matching patient biomarkers and disease Cancer type, e.g., NSCLC Filter Manual curation Domain knowledge Clinical Trials GUI or command line CNA SNP EXP INDEL SNA FUSION 16

Clinical Trials for combined data NSCLC BRAF G469A BRAF D594G BRAF V600E EGFR T790M KIF5B/RET CD74/ROS1 KIF5B/ALK BCR/ABL1 17

Details from Clinical Trial information NCT01922583 18

Clinical Trials matching to patient data Various levels of stringency for matching trial to patient Perfect match Other alteration (incl. same effect) Same gene (other biomarker) Related gene S = weighted sum of scores Biomarker specific scoring models due to different prioritization of relevance of other alterations AIM: To better map/identify other alterations with same/similar effect, e.g., amplification/up-regulation with activating mutation Example: Patient ERBB2 Exp up Trial: 1. Perfect match: ERBB2 Exp up 2. Same effect: ERBB2 CNV gain 3. Similar effect: ERBB2 Positive 4. Other alteration: ERBB2 mutation 5. Likely opposite effect: ERBB2 Neg. 6. Opposite effect: ERBB2 Exp down or, ERBB2 CNV loss 7. Gene Only: ERBB2 8. Related Gene: EGFR 19

Clinical NLP in PubGene - examples PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 20

Akershus University Hospital (Ahus) Optique project. Increase patient security by providing easier access to existing information Courtesy of DNV-GL (Tore Hartvigsen) Human touch and empathy with professional skill

The Surgery Planning Form is completed in 3 Stages Surgery Planning Form ( The Green Form ) Stage 1: Examination Stage 2: Preparations DIPS Ahus Structured data Text Metavision Ahus Metavision O Metavision I Metavision DKS Stage 3: Check/ QA Additional systems System System To complete the form, data must be collected from a number of systems! This is today done manually. Courtesy of DNV-GL (Tore Hartvigsen) 22

Leave the data in the source systems! Expert users «Ordinary» users Researchers/ Analysts A semantic IT solution and ontology for clinical use in Health Care Data warehousing is an option Ahus research Databases. Metavision O Metavision I Metavision DKS Ahus production databases DIPS DIPS (EPJ) (EPJ) (EPJ) Metav DIPS Courtesy of DNV-GL (Tore Hartvigsen) 23

We want to «lift» the data out of the silos! Expert users «Ordinary» users A semantic IT solution and ontology for clinical use in Health Care Structured data Unstructured data (text) Solutions provided by the Optique project Text mining Courtesy of DNV-GL (Tore Hartvigsen) 24

PubGene in Ahus Optique, information extraction Unstructured information Height 1,83 m Fields ASA BMI Height Weight Puls Blood pressure Temperature Diagnose codes Treatment codes Structured information name=height, type=int, unit=cm, value=183 25

PubGene i Ahus Optique, allergy information 26

PubGene i Ahus Optique, status on smoking Sentence Røyker. Røyker 15-20 om dagen. Ifølge datter erhan også storrøyker, 40/ dag siste 50 år. Røykeplaster? Tidligere storrøyker. Ikke røyker og drikker ikke alkohol, tidligere, måteholdent alkoholbruk. Eks-røyker, lite alkohol. Status Yes Yes Yes Uncertain Stopped No Stopped Text analysis Separate text in sentences, detection of sentences containing røyke, røyki, røykt Classification of sentences based on recognition of keywords and word or sentence patterns NB: Based on a small database 27

Ahus Optique Screenshots Courtesy of DNV-GL (Tore Hartvigsen) 28

Courtesy of DNV-GL (Tore Hartvigsen)

Page for surgery planning form Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen) BMI

Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen)

Allergy Courtesy of DNV-GL (Tore Hartvigsen)

Smoking Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen)

Courtesy of DNV-GL (Tore Hartvigsen) Surgery planning form

Further development, text processing/analysis A large set of options and potential Far more effective collection of more relevant information, e.g., by filling surgery forms ( The green form ) Improved quality through automatic detection of errors in documents and control of consistency with structured data Further steps for Ahus Optique Simple: Extraction of more static fields, like lab results Information about medication Information on heart function, lung function Exploit document structure and information on document types 42