How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Similar documents
Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

MedDRA Coding/ AE Log Item 1 Refresher. ASPIRE Protocol Team Meeting February 10, 2013

MedDRA Overview A Standardized Terminology

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Building Cognitive Computing for Healthcare

How to Advance Beyond Regular Data with Text Analytics

What Smokers Who Switched to Vapor Products Tell Us About Themselves. Presented by Julie Woessner, J.D. CASAA National Policy Director

THE PHASE I/II CLINICAL TRIAL IN PATIENTS WITH BLADDER CANCER

2.0 Synopsis. ABT-711 M Clinical Study Report R&D/06/573. (For National Authority Use Only) to Part of Dossier: Volume:

ONCOLOGY: WHEN EXPERTISE, EXPERIENCE AND DATA MATTER. KANTAR HEALTH ONCOLOGY SOLUTIONS: FOCUSED I DEDICATED I HERITAGE

Scientific conclusions

What s New MedDRA Version 13.1

DEMOGRAPHICS PHYSICAL ATTRIBUTES VITAL SIGNS. Protocol: ABC-123 SCREENING. Subject ID. Subject Initials. Visit Date: / / [ YYYY/MM/DD]

The clinical trial information provided in this public disclosure synopsis is supplied for informational purposes only.

Meaningful Use - Core Measure 5 Record Smoking Status Configuration Guide

Not all NLP is Created Equal:

Sponsor / Company: Sanofi Drug substance(s): SAR (iniparib)

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

Study No.: Title: Rationale: Phase: Study Period: Study Design: Centers: Indication: Treatment: Objectives: Primary Outcome/Efficacy Variable:

Convolutional Neural Networks for Text Classification

PFIZER INC. These results are supplied for informational purposes only. Prescribing decisions should be made based on the approved package insert.

Synopsis Style Clinical Study Report SR EFC10139 Version number: 1 (electronic 2.0)

Proposed MedDRA Version 17.0 Complex Changes (July 2013)

2.0 Synopsis. Adalimumab M Clinical Study Report R&D/04/900. (For National Authority Use Only) Referring to Part of Dossier: Volume:

Session 35: Text Analytics: You Need More than NLP. Eric Just Senior Vice President Health Catalyst

What s New MedDRA Version March

Now Available: Final Rule for FDAAA 801 and NIH Policy on Clinical Trial Reporting

Using Natural Language Processing To Analyze Electronic Health Records. Philip Poon PhD Data Scientist

Safety Assessment in Clinical Trials and Beyond

Regulatory Support for Tobacco Products. Feeling daunted by the regulatory process for tobacco products? Don t worry Battelle can help.

Common Errors. ClinicalTrials.gov Basic Results Database

PFIZER INC. THERAPEUTIC AREA AND FDA APPROVED INDICATIONS: See USPI.

Validating and Grouping Diagnosis

MARS Ambulatory ECG Analysis The power to assess and predict

NEXAVAR (sorafenib tosylate) oral tablet

NGS IN ONCOLOGY: FDA S PERSPECTIVE

PFIZER INC. THERAPEUTIC AREA AND FDA APPROVED INDICATIONS: See USPI.

SYNOPSIS. Clinical Study Report IM Double-blind Period

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Automatic extraction of adverse drug reaction terms from medical free text

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Search for studies: ClinicalTrials.gov Identifier: NCT

Cochrane Breast Cancer Group

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Low-dose AZA, Pioglitazone, ATRA Versus Standard-dose AZA in Patients >=60 Years With Refractory AML (AML-ViVA)

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Cetirizine Proposed Core Safety Profile

Safety profile of Liraglutide: Recent Updates. Mohammadreza Rostamzadeh,M.D.

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

EMBASE Find quick, relevant answers to your biomedical questions

Chapter 12 Conclusions and Outlook

Global Harmonization Task Force SG3 Comments and Recommendations ISO/DIS 9001: 2000 and ISO/DIS 9000: 2000 And Revision of ISO and 13488

NGS ONCOPANELS: FDA S PERSPECTIVE

PROGRAMMER S SAFETY KIT: Important Points to Remember While Programming or Validating Safety Tables

ClinicalTrials.gov Protocol Registration and Results System (PRS) Receipt Release Date: 10/11/2013. ClinicalTrials.gov ID: NCT

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Patient Group Direction for the Supply of Varenicline (Champix ) by Authorised Community Pharmacists

The clinical trial information provided in this public disclosure synopsis is supplied for informational purposes only.

Adverse Events Monitoring (aka Pharmacovigilance)

Psychology Perception

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form

Intervention ML: Mapping of Meal data. Rahul Loharkar, inventiv Health Clinical, Pune, India Sandeep Sawant, inventiv Health Clinical, Mumbai, India

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

In all of these roles, I've used healthcare data to drive strategy for the life sciences industry.

Bellagio, Las Vegas November 26-28, Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10

icommunicator, Leading Speech-to-Text-To-Sign Language Software System, Announces Version 5.0

Full Novartis CTRD Results Template

Adapting Your Risk Adjustment Program to HCC Model V.22

Extraction of Adverse Drug Effects from Clinical Records

Sponsor Novartis. Generic Drug Name Pasireotide. Therapeutic Area of Trial Cushing s disease. Protocol Number CSOM230B2208E1

Critical Illness Claim - Doctor s Statement Blindness (Loss of Sight) / Optic Nerve Atrophy with Low Vision

ALL PRINCIPAL INVESTIGATORS/NURSES/DATA MANAGERS RE: PROTOCOL GOG-0233 ACRIN 6671, REVISION # 9 & #10

Computational Neuroscience. Instructor: Odelia Schwartz

This clinical study synopsis is provided in line with Boehringer Ingelheim s Policy on Transparency and Publication of Clinical Study Data.

Sponsor Novartis. Generic Drug Name Vildagliptin/Metformin. Therapeutic Area of Trial Type 2 diabetes. Approved Indication Type 2 diabetes

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

A proposed bridging approach for the assessment of novel tobacco products

MyDispense OTC exercise Guide

Deaf Awareness mini-presentation

GENERAL INFORMATION. Adverse Event (AE) Definition (ICH GUIDELINES E6 FOR GCP 1.2):

Phase 3 investigation of aprocitentan for resistant hypertension management. Investor Webcast June 2018

Standards for reporting Plain Language Summaries (PLS) for Cochrane Diagnostic Test Accuracy Reviews (interim guidance adapted from Methodological

Adverse Event Terminology and Coding Working Group

Study 2 ( ) Pivotal Phase 3 Study Top-Line Results. October 29, 2018

VI.2 Elements for a Public Summary VI.2.1 Overview of disease epidemiology VI.2.2 Summary of treatment benefits

Chantix Label Update 2018

Distraction techniques

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

MedDRA Coding Quality: How to Avoid Common Pitfalls

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Real Time Sign Language Processing System

Automatic generation of MedDRA terms groupings using an ontology

Supplementary Figure 1

Hacettepe University Department of Computer Science & Engineering

The study listed may include approved and non-approved uses, formulations or treatment regimens. The results reported in any single study may not

Goal of site data management 12/2/2009. Ultimate goal: reliable and valid clinical trial to improve health

ClinicalTrials.gov "Basic Results" Data Element Definitions (DRAFT)

Big Data & Predictive Analytics Case Studies: Applying data science to human data Big-Data.AI Summit

ISV RANDOMIZED CLINICAL TRIAL PUBLIC USE DATASET

Transcription:

How can Natural Language Processing help MedDRA coding? April 16 2018 Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Summary About NLP and NLP in life sciences Uses of NLP with MedDRA Examples in MedDRA coding of adverse events in FDA drug labels How NLP could feed into MedDRA development 2

Use of NLP in Life Sciences Advanced text analytics delivers value along the pipeline Gene-disease mapping Regulatory Submission QC HEOR Target ID/selection Trial site selection and study design IDMP Pharmacovigilance Toxicity analysis and prediction Safety Competitive intelligence Mutation/expression analysis SAR Biomarker discovery Real World Evidence Comparative Effectiveness Drug repurposing Patent analysis KOL identification Opportunity scouting Voice of the Customer analysis Social media analysis 3

NLP Turns Text into Actionable Insights Transform unstructured or semi-structured data into insights to advance human health Turn text Into structured data using sophisticated queries To drive analytics Analytics Natural Language Processing Ontologies Statistical Methods Machine Learning - Chemistry Regular Expressions etc. Enterprise Warehouse 4

NLP finds information however it is expressed Different word, same meaning cyclosporine ciclosporin Neoral Sandimmune NLP Different expression, same meaning Non-smoker Does not smoke Does not drink or smoke Denies tobacco use Different grammar, same meaning 5mg/kg of cyclosporine per day 5mg/kg per day of cyclosporine cyclosporine 5mg/kg per day Same word, different context Diagnosed with diabetes Family history of diabetes No family history of diabetes 5

Blend of powerful rule- and machine learning-based methods to transform unstructured data into structured Linguistic Processing Terminologies/ Ontologies Precise linguistic relationships, sentence co-occurrence Precise negation e.g. pressure not blood pressure Multiple languages Search for concepts and their synonyms with spelling and optical character recognition (OCR) correction Out of the box or custom ontologies Quantitative Data Results Normalization Quantitative & pattern-based data extraction at scale e.g. numerical data, dates, gene mutations Range search Ontology and rule-based normalization of results Essential for organizing structured output Enables indirect relations, filtering/faceting results, etc. Chemistry Identify and extract chemicals in context based on substructure and chemical similarity Table & Region Processing Unique capability to capture knowledge from tables embedded in documents Fielded search within regions of a document 6

Data normalization: always treat the same concept in the same way the key to structured results Concept Text Normalized Value Diseases breast cancer Breast Neoplasm carcinoma of the breast Genes Raf-1 RAF1 Raf I Dates 27th Feb 2014 20140227 2014/02/27 Measurements 0.2g 200 mg Two hundred milligrams Mutations Val 158 Met V158M Behaviours Val by Met at codon 158 denies alcohol and tobacco use is not a cigarette smoker Non-smoker Data normalization Overview Convert text into a standard format Is a fundamental component in transforming text into structured data and driving actionable insights Key benefits Find concepts however they are expressed Join results to discover new indirect relationships Cluster or facet results by concept or quantity Compare measurements with different units e.g. kg vs. lbs Relationships...nimesulide, a selective COX2 inhibitor, inhibits Entrez ID: 5743 7

Use of NLP with MedDRA Errors in Regulatory Submissions Social Media Adverse Events in Drug Labels 8

Table: Most Frequently Reported Medical Conditions ( 5% in Any Treatment Group) Study Total Number Subjects Cardiac disorders 70 (7.0) Angina pectoris 4 (0.4) Dyspepsia 174 (17.5) GERD 83 (8.3) Metabolic / nutritional disorders 2000 Pooled Studies Rx N=997 Pbo N=927 Rx N=1021 Number (%) of Subjects 253 (25.4) Dyslipedaemia 1 (0.1) Hypercholesterolaemia 65 (6.5) Hyperlipidaemia 147 (14.7) Osteoarthritis 102 (10.2) Nervous system disorders 628 (63.0) Headache 413 (41.4) Psychiatric disorders 137 (13.7) Insomnia 84 (8.4) 32 (3..5) 5 (0.5) 120 (12.9) 52 (5.6) 165 (17.8) 0 (0) 50 (5.4) 79 (8.5) 57 (6.6) 409 (44.1) 280 (30.2) 81 (8.7) 47 (5.1) 2003 Pooled Study 108 (10.6) 74 (7.2) 3 (0.3) 30 (2.9) 194 (19.0) 15 (1.5) 88 (8.6) 56 (5.5) 12 (1.2) 28 (2.7) 9 (0.9) 14 (1.4) 9 (0.9) Pbo N=956 101 (10.6) 71 (7.4) 2 (0.2) 27 (2.8%) 212 (22.2) 19 (2.0) 103 (10.8) 66 (6.9) 11 (1.2) 19 (2.0) 7 (0.7) 15 (1.6) 8 (0.8) Commonly reported conditions included Seasonal allergies, Back pain, and Hypercholesterolaemia. The majority of AEs were considered treatment related in all cohorts and the relationship between treatment groups and between cohorts was similar to that observed for all-causality AEs. Permanent discontinuations were reported at higher rates in the Rx groups than in the placebo groups in the 3 pooled cohorts. The majority of AEs leading to permanent discontinuation were considered treatment related in both treatment groups in all cohorts. The single most frequently reported event was headache, which was reported in approximately 40% of Rx subjects and 20% of placebo subjects in the 2000 Pooled cohort. Other AEs reported across all cohorts at rates greater in Rx subjects than placebo subjects included Seasonal allergies and Insomnia (2000 8.4% vs 5.4%, 2003 0.9% vs 0.8%, 2006 14.0% vs 10.1%; Rx vs placebo respectively). Key Incorrect formatting: doubled period, incorrect number of decimal places, addition of percent sign Incorrect calculation: number of patients divided by total number does not agree with percent term Incorrect threshold: presence of row does not agree with table title Text-Table inconsistency: numbers in the table do not agree with numbers in the accompanying text Sample table and text highlighting, to show inconsistencies between data. The highlight colour makes it easy for the reviewer to rapidly assess where there are errors and what type of errors, and can then correct these appropriately. 9 Linguamatics 2016

Use Case: Automated Blinded Data Review for Regulatory Submissions Before unblinding a clinical trial, data are checked for errors and inconsistencies Among the many checks performed, MedDRA terms for Adverse Events Reports are verified, including: Is the Preferred Term valid in any version of MedDRA? Reporter may have inserted the Investigator Entry in the wrong field, or used an LLT Are multiple MedDRA versions in use in the same trial? Reporter Error or Error when generating the blinded data Does the specified version of MedDRA agree with the Preferred Terms being reported? Reporter may have used a more precise MedDRA term from a more recent version of MedDRA Does the Preferred Term agree with the declared System Organ Class? Automation of this process is in use at large pharma 10

Use Case: Social Media Analysis Social media: plenty of AEs mentioned Language informal Linguistic patterns can find mentions of AEs without using a dictionary Using MedDRA LLTs finds only one of the following 4 examples 11

Use Case: Extraction of Adverse Events using MedDRA Extraction of adverse events, MedDRA terms and frequency of occurrence, clustered by medicinal product Structured results can be used to populate a database, e.g. IDMP Different customers have different MedDRA requirements, e.g. PT vs LLT, which is easy to accommodate Results table (background) and highlighted source document (foreground) are shown 12

Extraction of AEs from FDA Drug Labels FDA drug labels are not structured Want to compare AEs found in Real World Evidence with known AEs Find AEs from within text, and within tables Pull out values if want to filter to only include AEs where greater than placebo 13

Use of NLP terminology features in extracting AEs Increase recall with: Morphological variants Spelling correction Matching across conjunctions Mapping multiple concepts to MedDRA PT Increase precision with: Excluding inappropriate contexts Use of document sections to exclude inappropriate terms 14

Increase recall: morpho variants MedDRA PT Congenital anomaly Additional hits when using morphological variants 15

Increase recall: spelling correction MedDRA PT Hypersensitivity 16

Increase recall: MedDRA matching across conjunctions MedDRA PT Hepatic neoplasm OR Thyroid neoplasm 17

Increase recall: mapping multiple concepts to a MedDRA PT MedDRA PT Blood creatinine increased Blood creatinine increased Creatinine blood increased Creatinine high Creatinine increased Creatinine serum increased Increased serum creatinine Plasma creatinine increased Raised serum creatinine Serum creatinine increased has low recall. Combining MedDRA PT Blood creatinine Blood creatinine Creatinine Plasma creatinine Serum creatinine with Relation Increase Increase Elevate Raise... in a linguistic pattern allowing flexibility in expression... gives significant additional recall (). 18

Increase precision: exclusion of hits in inappropriate contexts when searching for adverse events Thousands of examples of MedDRA concepts that are not AEs. Linguistic patterns can filter out inappropriate contexts. 19

Increase precision: using document regions - exclusion of PTs that occur in Indications when searching for AEs Can be removed based on same PT 20

How NLP could feed into MedDRA development: improved coverage of terminology Terms appearing with MedDRA terms in the same list Explicit constructions such as AEs such as, or from tables Look for terms in appropriate contexts e.g. made me? 21

Noun phrases occurring in a list after adverse events such as, and which are not already in MedDRA 22

Noun phrases occurring in the same list as another MedDRA term 23

Summary NLP is required to rule out inappropriate contexts, improving precision NLP techniques e.g. Morphological variants and OCR correction improve recall String based synonym matching cannot cope with all the variation found in real text, e.g. Elevation of blood creatinine. Here Linguistic patterns are required. Region and table processing are often required to get the right context. 24