TIES Cancer Research Network Y2 Face to Face Meeting U24 CA October 29 th, 2014 University of Pennsylvania

Similar documents
Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Standardize and Optimize. Trials and Drug Development

How to Advance Beyond Regular Data with Text Analytics

Proceedings of the Workshop on NLP for Medicine and Biology

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13

Bigomics : Challenges and promises in large scale sequencing projects

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Healthcare Research You

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

How Big Data and Advanced Analytics Can Improve Population Health: Now and In the Near Future

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

Clinical Decision Support Technologies for Oncologic Imaging

Data driven Ontology Alignment. Nigam Shah

Automatic abstraction of imaging observations with their characteristics from mammography reports

TeamHCMUS: Analysis of Clinical Text

now a part of Electronic Mammography Exchange: Improving Patient Callback Rates

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms

QUANTITATIVE IMAGING ANALYTICS

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

Building Cognitive Computing for Healthcare

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

arxiv: v2 [cs.cv] 8 Mar 2018

Tool Support for Cancer Lesion Tracking and Quantitative Assessment of Disease Response

Big Data Phenomics in the VA. Outline

London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare. Professor Reza Razavi Centre Director

Statement of research interest

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Computational Pathology, Artificial Intelligence and Informatics to Enhance Cancer Diagnostics

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer

Schema-Driven Relationship Extraction from Unstructured Text

Annotating Clinical Events in Text Snippets for Phenotype Detection

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Improving Patients' Understanding of Radiology Reports: Comparing Coverage of a Lay-Language Radiology Glossary to MedlinePlus

Exercise 15: CSv2 Data Item Coding Instructions ANSWERS

Truth Versus Truthiness in Clinical Data

SNOMED CT and Orphanet working together

Decision Support in Radiation Therapy. Summary. Clinical Decision Support 8/2/2012. Overview of Clinical Decision Support

SemEval-2015 Task 6: Clinical TempEval

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Rebooting Cancer Data Through Structured Data Capture GEMMA LEE NAACCR CONFERENCE JUNE, 2017

Not all NLP is Created Equal:

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Early Detection of Lung Cancer

The Future of Big Data & Analytics: Excerpts from IBM's Global Technology Outlook ( ) 2013 IBM Corporation

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Breast Tomosynthesis. What is breast tomosynthesis?

BUILDING AN INFORMATION PLATFORM FOR CANCER RESEARCH & EVIDENCE-BASED HEALTHCARE

HCC Screening Compliance Module: A Tool for Automated Compliance Monitoring, Clinician Notification, and Efficacy Research

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

Factuality Levels of Diagnoses in Swedish Clinical Text

Surveillance and SEER Where are we going? NAACCR Meeting June 23, 2017 Lynne Penberthy MD, MPH

Supplementary Figure 1

Deep-Learning Based Semantic Labeling for 2D Mammography & Comparison of Complexity for Machine Learning Tasks

FDA Workshop NLP to Extract Information from Clinical Text

Improved Intelligent Classification Technique Based On Support Vector Machines

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

The Intermountain Oncology Clinical Program: Present and Future

Artificial Intelligence to Enhance Radiology Image Interpretation

Session 35: Text Analytics: You Need More than NLP. Eric Just Senior Vice President Health Catalyst

What Computer Tools to Use for Your Quality and Safety Program

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

Learning Convolutional Neural Networks for Graphs

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Mike Becich, MD PhD Distinguished University Professor and Chair, Department of Biomedical Informatics University of Pittsburgh School of Medicine

A Case-based Retrieval System using Natural Language Processing and Populationbased Visualization

Sunday, March 13th (Day Zero) NCI-ITCR Workshop. 10:00 AM Spring 2016 Conference Symposium Chairs Kickoff: Warren Kibbe & Sorena Nadaf

SAGE. Nick Beard Vice President, IDX Systems Corp.

Lung Cancer Concept Annotation from Spanish Clinical Narratives

Personalized, Evidence-based, Outcome-driven Healthcare Empowered by IBM Cognitive Computing Technologies. Guotong Xie IBM Research - China

CLINICIAN-LED E-HEALTH RECORDS. (AKA GETTING THE LITTLE DATA RIGHT) Dr Heather Leslie Ocean Informatics/openEHR Foundation

Introduction to the Partners Biobank Portal. December 2016

Innovative Lung Cancer Screening Solution

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

Improving Methods for Breast Cancer Detection and Diagnosis. The National Cancer Institute (NCI) is funding numerous research projects to improve

Chapter 12 Conclusions and Outlook

TMIST A Bridge to Personalized Screening. Canadian Society of Breast Imaging April 26, 2018

arxiv: v1 [stat.ml] 23 Jan 2017

Expert System Profile

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

HEALTHCARE AI DEVELOPMENT CYCLE

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

DIGITIZING HUMAN BRAIN: BLUE BRAIN PROJECT

Automatically extracting, ranking and visually summarizing the treatments for a disease

General Symptom Extraction from VA Electronic Medical Notes

Clinical decision support (CDS) and Arden Syntax

Interactive Intervention Analysis

Guide to Use of SimulConsult s Phenome Software

Mammogram Analysis: Tumor Classification

Transcription:

TIES Cancer Research Network Y2 Face to Face Meeting U24 CA 180921 Session IV The Future of TIES October 29 th, 2014 University of Pennsylvania

Afternoon Other Uses of TIES/Future of TIES 12:45-1:15 TIES Radiology at University of Pittsburgh (Legowski and Crowley-Jacobson) 1:15 1:30 Cancer Deep Phenotyping (Mitchell and Crowley-Jacobson) 1:30 2:00 EMR corpus Deep Phenotyping (Feldman) Year 2 Development Projects 2:00 2:30 Whole Slide Imaging inside TIES (Tseytlin) 2:30 3:00 Paraffin Archive (Chavan) 3:00 3:30 Prioritization of current feature requests (Chavan) Wrap Up 3:30 4:00 Y2 Project Plan, Action Items and Wrap Up (Crowley-Jacobson)

Data in isolation Isolated laboratory information systems Isolated radiology information systems Text is usually last on the list for enterprise data warehousing efforts Ability to access data, tissue and images limited to those who have clinical access (HIPAA)

Previous Work Significant previous work in developing NLP systems for processing clinical text (MedLee, ctakes) At least two preceding systems have brought together radiology and pathology data using NLP RadBank/RadTF (Rubin et al) Presto/Montage (Langlotz et al) Potential for enhancing research and QI will probably require more sophisticated information extraction methods

Montage Combined Radiology / Pathology timeline Initial CT shows mass lesion in this patient evaluated for Aphasia

Montage Final pathology result

RadTF Natural language report query (ontologically assisted) Linkage to images in PACS Do B, Wu A, Biswal S, Kamaya A, Rubin DL, Radiographics. 2010 Nov;30(7):2039-48

TIES Radiology TIES Radiology was deployed at University of Pittsburgh in January 2014. Currently contains over 19 million de-identified radiology reports across all UPMC hospitals from 2003- present Fully integrated with Radiology HB System. HBs use TIES to collect accession list, and then provide de-identified images from PACS system to investigators Approval for Radiology and Pathology done by separate groups; governance model Prior to deployment, extensive QA was conducted to ensure accuracy of report coding and search results.

Key differences for Radiology Need to use imaging exam type derived from source system metadata. Probably worse at UPMC than any of your institutions Large number of vendor systems were involved More variation in section labeling required up front work to map them. How much will this generalize? Different vocabulary and semantic types used Required a fair amount of trial and error and expert curation WSD becomes more and more important as you add new domains 9

Vocabulary Building Select and Prioritize vocabularies Select Semantic Type Filters Acronyms and Stop Words Radiology Source Vocabs UMLS or NCIM 12 Vocabularies 1. RADLEX 2. NCI 3. FMA 4. SNOMEDCT 5. ICD10PCS 6. MSH 7. OMIM 8. ICD10CM 1. NCI 2. FMA 3. SNOMEDCT 4. RADLEX 5. CBO 6. ICD10PCS 7. MSH 8. OMIM 45 Semantic Types Acquired Biomedical Abnormality Dental Material Element, Anatomical Ion, Abnormality Isotope Indicator, Anatomical Reagent, Structure or Bacterium Diagnostic Aid Medical Biologic Device Function Body Part, Organ, or Component Body Space or Junction Acquired Cell Abnormality Anatomical Cell Component Abnormality Anatomical Cell Function Structure Bacterium Cell or Molecular Biologic Dysfunction Function Body Gene Part, or Genome Organ, or Molecular Component Biology Research Body Technique Space or Junction CT Would that be Computerized Tomography? Chest tube? Cardiothoracic? Clotting Time? Connecticut? Pathology 14 Vocabularies 51 Semantic Types

RADIOLOGY DEMONSTRATION

Comparison against existing methods Before TIES was deployed in Radiology there were questions about adequacy: Would we be able to match up patients? How would the system compare against current method - experts searching our MARS data repository which includes free text Use iterative QA approach using queries from radiology leadership based on previous studies that they had done As such these were hard queries where MARS expert (with > 20 years experience) had already done extensive work to identify the best possible query

Radiology QA Several QA queries were conducted comparing results from TIES searches with results from searches conducted in MARS. MARS is the database center at UPMC from which TIES reports come from. MARS is searched using text terms. Searches can be conducted among certain report types, in particular header sections, etc. Results from MARS were treated as the gold standard.

Radiology QA Process TIES and MARS queries were constructed to be as analogous as possible. For each query, we first verified that all MARS results existed in the TIES database. Reports were scored as true positive (TP) and false positive (FP). Liz Legowski did initial scoring, which was validated by a radiologist Precision and recall (in comparison to MARS) were computed.

QA Query #1 Query: Hepatocellular carcinoma found on abdominal/pelvis MRIs with contrast Report Type # of Distinct Reports TP FP Relative Recall (measured against MARS) Precision TIES 361 219 142 0.94 0.61 MARS 397 233 164 N/A 0.59 TP Reports FP Reports

QA Query #1 Cont d TP reports not returned by TIES (TIES FN): All 15 TP reports were missed due to negation (wordings such as not typical for or hepatocellular carcinoma is considered unlikely ). Since HCC was not 100% ruled out, the reports were considered TP. This may have been too stringent. FP reports returned only by TIES: Reports were returned due to hepatocellular carcinoma appearing in the clinical history (section searching was not enabled at the time this query was conducted).

QA Query #2 Query: Pulmonary embolism found on chest/thorax CT Due to the large number of reports, only discrepant reports (reports returned by only one system) were scored Final Results: Report Type # of Distinct Reports TP FP Relative Recall (measured Precision against MARS) TIES 187 52 135 0.79 0.28 MARS 12200 (random sample of 115 scored) 14 101 N/A 0.12

QA Query #2 Cont d All reports returned only by TIES contained wording for pulmonary embolism that did not exactly match the MARS query (ex: pulmonary thromboembolism) 128 of the 135 TIES only FP reports were returned due to missed negation. The remaining 7 TIES only FP reports stated the study was adequate to evaluate for pulmonary thromboembolism, but no PE was found. TP reports not returned by TIES (TIES FN): All 14 reports had PE wordings that were used in the MARS query but are not synonyms of the concepts used in TIES (ex: pulmonary embolus)

Challenges Coding on scale required to create such massive text repositories Schema changes to speed up database operations JMS for coordination, enabling arbitrary and dynamic number of coding machines Further modularized system and parallelization of tasks (e.g. database operations and coding happen simultaneously) Differences in pathology and radiology sublanguages (for example nuances of uncertainty), level of maturity of controlled vocabulary, increased complications of WSD Architectures to support Further information extraction Datamining (and potentially in combination with structured data) QI programs

Quality Measures 28 ACR performance measures 5 of which relate to mammography, including One requiring entry Into separate database

Foundation for QI Opportunity for Pathology/Radiology correlation, analysis and feedback beyond what we could easily do previously Currently working on BIRADS extraction and correlation with pathology 630K reports of various types (mammogram, Breast MRI, Breast US) with BIRADS term in Pitt TIES system Represents 385K patients Of these 101K patients have pathology reports And 16098 of those patients have pathology reports within 1 month of BIRADS classification including the term breast

BIRADS Extraction Existing regular expression produced very high accuracy results in recent publication Approach needs work especially due to addendums which are plentiful 1 st evaluation of published code on our corpus with error analysis underway

Breast Imaging and Pathology Inclusion of data in data warehousing efforts may be particularly important in decreasing unnecessary testing Quality metrics and dashboards Audit and feedback efforts Comparison of mammography reading to other imaging tests Comparison of imaging tests to pathology results Ability for clinicians to create their own reports Prototyping interactive system with group of design students from University of Pittsburgh School of Information Science

Questions for Discussion Interest in using Radiology in your local institutions (even if it is not used for TCRN)? What kinds of quality metrics would you be interested in? Does this dovetail with other efforts ongoing at your institutions? How? 24

Afternoon Other Uses of TIES/Future of TIES 12:45-1:15 TIES Radiology at University of Pittsburgh (Legowski and Crowley-Jacobson) 1:15 1:30 Cancer Deep Phenotyping (Mitchell and Crowley-Jacobson) 1:30 2:00 EMR corpus Deep Phenotyping (Feldman) Year 2 Development Projects 2:00 2:30 Whole Slide Imaging inside TIES (Tseytlin) 2:30 3:00 Paraffin Archive (Chavan) 3:00 3:30 Prioritization of current feature requests (Chavan) Wrap Up 3:30 4:00 Y2 Project Plan, Action Items and Wrap Up (Crowley-Jacobson)

Team and Funding University of Pittsburgh Rebecca Crowley-Jacobson (MPI), Harry Hochheiser, Roger Day, Adrian Lee, Robert Edwards, John Kirkwood, Kevin Mitchell, Eugene Tseytlin, Girish Chavan, Liz Legowsky Boston Children s Hospital/Harvard Medical School Guergana Savova (MPI), Dmitriy Dligach, Sameer Pradhan, Timothy Miller, Sean Finan, David Harris, Pei Chen NCI 1U24CA184407-01; Another NCIP ITCR grant Funding period 2014-2019, NCI PO is Kim Jessup

Specific Aims - Methods Specific Aim 1: Develop methods for extracting phenotypic profiles. Extract patient s deep phenotypes, and their attributes such as general modifiers (negation, uncertainty, subject) and cancer specific characteristics (e.g. grade, invasion, lymph node involvement, metastasis, size, stage) Specific Aim 2: Extract gene/protein mentions and clinically significant molecular information from the clinical narrative Specific Aim 3: Create longitudinal representation of disease process and its resolution. Link phenotypes, treatments and outcomes in temporal associations to create a longitudinal abstraction of the disease Specific Aim 4: Extract discourses containing explanations, speculations, and hypotheses, to support explorations of causality

Specific Aims Design and Dissemination Specific Aim 5: Design and implement a computational platform for deep phenotype discovery and analytics for translational investigators, including integrative visual analytics. Specific Aim 6: Advance translational research in driving cancer biology research projects in breast cancer, ovarian cancer, and melanoma. Include research community throughout the design of the platform and its evaluation. Disseminate freely available software.

Use Cases and Scientific Experts Melanoma (John Kirkwood) Breast Cancer (Adrian Lee) Ovarian Cancer (Robert Edwards)

Software Dissemination Apache ctakes ctakes.apache.org TIES software - http://ties.pitt.edu/

Combining Structured and Unstructured Data Using clinical element models (Intermountain Health) as templates for information extraction Benefits us in several ways including potential to merge structured and unstructured data Models will be agnostic in the sense that they simply represent the kind of information that translational researchers want to use Can be populated through NLP but also from structured data sources Currently investigating i2b2 versus transmart as warehouse for data 33

Information modeling and template creation Instance level, document level and phenotype level annotations Methods for aggregating from instance level to document level and from document level to phenotype level These methods probably apply equally well to structured data as unstructured data 34

Information modeling and template creation 35

Information modeling and template creation 36

INFORMATION EXTRACTION FROM SELECTED COHORT i2b2 transmart TARGET MODEL IE PIPELINE CASE SET STRUCTURED DATA ADD TO EXISTING REPORT DATA 37

ASSISTED ANNOTATIONS i2b2 transmart TARGET MODEL ANNOTATOR SOFTWARE FINAL STRUCTURED DATA AUTO EXTRACTED DATA 38

Afternoon Other Uses of TIES/Future of TIES 12:45-1:15 TIES Radiology at University of Pittsburgh (Legowski and Crowley-Jacobson) 1:15 1:30 Cancer Deep Phenotyping (Mitchell and Crowley-Jacobson) 1:30 2:00 EMR corpus Deep Phenotyping (Feldman) Year 2 Development Projects 2:00 2:30 Whole Slide Imaging inside TIES (Tseytlin) 2:30 3:00 Paraffin Archive (Chavan) 3:00 3:30 Prioritization of current feature requests (Chavan) Wrap Up 3:30 4:00 Y2 Project Plan, Action Items and Wrap Up (Crowley-Jacobson)

EHR driven phenotyping True patient state Recording Discrete Phenotype Discovery Raw EHR data Phenotype Knowledge - Classify - Predict - Understand - Intervene Inform Inform Representation, bidirectional - Frequently unidirectional Use case driven process model - Real world - Defined parameters - Semantic Ontology Incomplete Inaccurate Highly Complex Bias Not developed with real word use cases, just old care delivery model

Techniques Concept extraction Coreference resolution Word sense disambiguation Temporal relationships (bache) Spatial relationship Validation Standardization across approaches Result are annotated corpora but is that enough or is it a start?

Purpose developed NLP Specific feature extraction High accuracy Semantic web/ontologies based on real world use cases (Pathak, Fernadez-Breis) Marry NLP with real world use cases to mine for features and develop machine learnable patterns within annotated corpora

NLP + discreet data NLP alone may not be the answer but combinations may be critical (Tien, Ludvigson) More precision Allow better modeling of extraction of NLP concepts Allow larger multidimensional data Fodder for machine learning algorithms High yield inputs, not entire corpus Cancer and cardiology are two low hanging fruits

Statistical approaches Active learning (Chen) Dimensionality reduction (Lyalina) Graph embedding (my idea) Graph theory (my idea) Bayesian networks (Klann) Conditional random fields (Deleger) Visual Phenome amongst the concept cloud (Warner)