Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Similar documents
A Method for Analyzing Commonalities in Clinical Trial Target Populations

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Conditional Outlier Detection for Clinical Alerting

Understanding and Reducing Clinical Data Biases. Daniel Fort

Using Probabilistic Methods to Optimize Data Entry in Accrual of Patients to Clinical Trials

Temporal Knowledge Representation for Scheduling Tasks in Clinical Trial Protocols

Using Probabilistic Methods to Optimize Data Entry in Accrual of Patients to Clinical Trials

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Predictive and Similarity Analytics for Healthcare

How preferred are preferred terms?

A Deep Learning Approach to Identify Diabetes

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Data Structures vs. Study Results:

AN ONTOLOGICAL APPROACH TO REPRESENTING AND REASONING WITH TEMPORAL CONSTRAINTS IN CLINICAL TRIAL PROTOCOLS

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

A Framework for Conceptualizing, Representing, and Analyzing Distributed Interaction. Dan Suthers

Statement of research interest

MAY 1, 2001 Prepared by Ernest Valente, Ph.D.

A Real-Time Screening Alert Improves Patient Recruitment Efficiency

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Asthma Surveillance Using Social Media Data

Comparing Decision Support Methodologies for Identifying Asthma Exacerbations

Schema-Driven Relationship Extraction from Unstructured Text

Jia Jia Tsinghua University 26/09/2017

An Ontology for Healthcare Quality Indicators: Challenges for Semantic Interoperability

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

A Brief Introduction to Bayesian Statistics

A HMM-based Pre-training Approach for Sequential Data

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Identifying Deviations from Usual Medical Care using a Statistical Approach

Clinical decision support (CDS) and Arden Syntax

PGT: Measuring Mobility Relationship using Personal, Global and Temporal Factors

A Descriptive Delta for Identifying Changes in SNOMED CT

Learning Optimal Individualized Treatment Rules from Electronic Health Record Data

S10: Data Quality Assessment and Control Framework for Secondary Use of Healthcare Data

Leveraging Electronic Health Data in a Multinational Clinical Trial: Early Learnings from the HARMONY- OUTCOMES EHR Ancillary Study

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

The openehr approach. Background. Approach. Heather Leslie, July 17, 2014

Discovering Meaningful Cut-points to Predict High HbA1c Variation

THE DYNAMICS OF MOTIVATION

SNOMED CT and Orphanet working together

Not all NLP is Created Equal:

A Data Quality Assessment Guideline for Electronic Health Record Data Reuse

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

The AAA statement on Ethnography and Institutional Review Boards (2004) provides a useful working definition:

Deep Patient: Predict the Medical Future of Patients with Artificial Intelligence and EHRs

An Evolutionary Approach to the Representation of Adverse Events

Methodological Issues in Measuring the Development of Character

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

What Helps Where And Why? Semantic Relatedness for Knowledge Transfer

Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical. Data. Rimma Pivovarov

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Non-Pharmacological Strategies to Ameliorate Symptoms of Alzheimer s Disease and Related Dementia (NPSASA)

Naturalistic Driving Performance During Secondary Tasks

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Introduction to Computational Neuroscience

Identifying Parkinson s Patients: A Functional Gradient Boosting Approach

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

REDUCING CLINICAL NOISE FOR BODY

Efficient AUC Optimization for Information Ranking Applications

Implicit Attitude. Brian A. Nosek. University of Virginia. Mahzarin R. Banaji. Harvard University

Leveraging Standards for Effective Visualization of Early Efficacy in Clinical Trial Oncology Studies

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Research Questions and Survey Development

A Framework for Optimal Cancer Care Pathways in Practice

Chapter 1. Introduction

Using a Guideline-Centered Approach for the Design of a Clinical Decision Support System to Promote Smoking Cessation

M.Sc. in Cognitive Systems. Model Curriculum

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

Problem solving therapy

Bayesian and Frequentist Approaches

A Lexical-Ontological Resource forconsumerheathcare

Allergen immunotherapy for the treatment of allergic rhinitis and/or asthma

Answers to end of chapter questions

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

NIH Public Access Author Manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2010 February 28.

How many speakers? How many tokens?:

SPICE: Semantic Propositional Image Caption Evaluation

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Statistical Considerations for Research Design. Analytic Goal. Analytic Process

FDA Workshop NLP to Extract Information from Clinical Text

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT

Bottom-Up Model of Strategy Selection

Abstract. Background. Objective

The Meta on Meta-Analysis. Presented by Endia J. Lindo, Ph.D. University of North Texas

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Discussion Meeting for MCP-Mod Qualification Opinion Request. Novartis 10 July 2013 EMA, London, UK

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Patient Subtyping via Time-Aware LSTM Networks

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Transcription:

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University, New York City

Acknowledgments The research was supported by two grants: 1. R01 LM009886 Bridging the semantic gap between clinical research eligibility criteria and clinical data from The National Library of Medicine (PI: Weng) 2. UL1 TR000040 Clinical and Translational Science Award to Columbia University (PI: Ginsberg) 2

The Top 5 Challenges for Clinical Research 1. Recruitment 2. Recruitment 3. Recruitment 4. Recruitment 5. Recruitment 3

The Opportunity Samson W. Tu, Carol A. Kemper, Nancy M. Lane, Robert W. Carlson, and Mark A. Musen. A methodology for determining patients eligibility for clinical trials. Journal of Methods of Information in Medicine, 32(4):317 325, 1993. EHRs contains rich information for identifying eligible patients for clinical trials As of 2015, EHRs have become pervasive in medicine 4

Our Initial Plan To derive a computable representation of the clinical trial eligibility criteria and to align it with patient EHR records Free-text Eligibility Criteria Text-based Knowledge Acquisition Semantic Alignment Potentially Eligible Patients 5

and Attempts 6

and Attempts 7

and Attempts 8

and Attempts 9

and Attempts 10

and Attempts 11

Data quality issues (Weiskopf, JAMIA 2013) Completeness Concordance Correctness Plausibility Currency and Challenges Data representation heterogeneity: structured, unstructured Lack of common information models Concept granularity discrepancies even using the same T Incomplete knowledge, imprecise disease classification Workflow and practical issues: recent visit, belief in research, etc. 12

An Alternative To derive a computable representation of the clinical trial eligibility criteria target patient and to align it with patient EHR records find patients similar to the target 13

Our Conceptual Framework 1. A set of eligible patients or clinical trial participants is manually identified their EHRs are aggregated to derive the target patient 2. The target patient is applied to any unseen patient of a clinical data warehouse to check the eligibility status 3. For each patient the framework returns a relevance score the higher the score, the more likely the patient is eligible for the trial 4. Patients can be be ranked by their relevance scores potentially eligible patients are at the top of the list manual identification performed by the investigator can be done quickly 14

Example Related Work 15

Methods: comparison to related work State-of-the-art: train a binary classifier to determine if a patient is eligible or not the classifier is the target patient Limitation of the state-of-the-art training a binary classifier requires a list of participants and also a list of ineligible patients so that finding ineligible patients can be as difficult as laborious as finding eligible patients Alternative Approach generate the target patient by modeling only the trial participants Advantages Rely only on a patient representation using comprehensive data without formally representing eligibility criteria 16

A Pilot Study Study Goal to show the feasibility of using only minimal trial participants to discover new potentially eligible patients Our implementation favored a simple design to ensure a focused and correct evaluation more complicated implementations are more likely to introduce mistakes in the process Flexible customization on how to: process and summarize patient EHR data represent the clinical trial participants discover potential eligible patients 17

Patient EHR Processing (1) EHR data types medication orders, ICD-9 diagnosis, laboratory results, clinical notes Each participant is represented by four vectors, one per data type Medication orders count the presence of each code in every participant Laboratory results count the presence of each test if results were categorical or expressed with different scales average test result values if expressed using the same unit measure 18

Patient EHR Processing (2) ICD9 codes count the presence of each code in every participant Clinical notes extract relevant tags limited presence of stop words, matching to UMLS use UMLS to normalize tags aggregate synonyms and semantically similar concepts under the same tag remove negated tags remove temporally consecutive similar notes topic modeling unsupervised inference process that captures patterns of word co-occurrences within a heterogeneous set of notes to define topics represent each note as a a vector of topic probabilities 19

Target Patient (1) Participant EHR representations are aggregated to derive the target patient 20

Target Patient (2) Target patient for each data type, retain only the concepts frequently shared by the participants motivated by the small number of participants available for some trials a concepts is frequent when appears in at least 60% of the participants average concept occurrences over all participants Target patient represented by four vectors aggregated common patterns among the trial s participants same structure of a regular patient representation favor direct comparisons with patient data 21

Finding New Eligible Patients EHR data of unseen patients are matched to the target patient of a clinical trial Relevance score indicating the patient eligible likelihood pairwise cosine similarity within each data type aggregating the scores using a weighted linear combination w-comb The higher the relevance score, the more likely is the patient eligible List of patients sorted by score the most likely candidates at the top of the list to speed up manual review 22

Evaluation: study design Dataset 13 clinical trials from Columbia University 262 unique participants additional 30k patients extracted at random from the data warehouse 2-fold cross validation half participants to derive the target patient half participants plus the 30k random patients to test Ranking experiment for each trial rank all the patients in the test set by their relevance score with the corresponding target patient measure at which point of the list the participants in the test set are ranked 23

Evaluation: 13 multi-site trials 24

Evaluation: 13 multi-site trials 25

Evaluation: comparisons w-comb obtains the best results take advantage of the heterogeneous data types AMIA 2014 Annual Symposium, 15 19 November 2014 26

Evaluation: AUC Area under the ROC Curve = 0.952 27

Evaluation: results Precision-at-5 for each trial in each fold every trial has at least one potentially eligible patient within the top 5 position of the ranking list (P5 0.2) 28

Conclusions EHR data of clinical trial participants can be used to recommend new eligible patients ranking results consistent among multiple trials of different medical conditions satisfactory results regardless the number of participants used for training Potential applicative scenarios self-standing tool constantly monitoring a clinical data warehouse and alerting investigators when a new potentially eligible patient is identified to be used on request to rank all patients in a data warehouse to find eligible patients for a specific trial component to integrate approaches processing eligibility criteria AMIA 2014 Annual Symposium, 15 19 November 2014 29

Ongoing Works Patient representation test more sophisticated techniques to represent EHR data model laboratory results accounting for the temporal trends of the values model diagnosis and medications using a well-chosen probability distribution to handle the incompleteness problem of EHR data Improve the target patient statistical model that can be trained by only estimating the distributions of participants associated with each trial e.g., hidden Markov models, mixture models Extend the experimental framework more and diverse clinical trials more patients in the dataset Tackle the complexity of trial design, e.g., the need for > 1 target model 30

Chunhua Weng, Ph.D. chunhua@columbia.edu