Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Similar documents
How to Advance Beyond Regular Data with Text Analytics

Information Retrieval from Electronic Health Records for Patient Cohort Discovery

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

IR Meets EHR: Retrieving Patient Cohorts for Clinical Research Studies

NLM at TREC 2012 Medical Records track

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

SEP-1 CHALLENGING CASES WITH DR. TOWNSEND

Name: Today s Date: Address: State, Zip Code

Chapter 12 Conclusions and Outlook

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

PATIENT REGISTRATION FORM. Last Name: First Name: Initial: Address: City: State: Zip Code: Date of Birth: / / Social: - - address:

DNA CENTER New Patient Information

Pathway Project Team

Big Data Phenomics in the VA. Outline

Date: New Patient Form First Visit Date:

Patient Name Date of Birth MALE / FEMALE Date. Left handed or Right handed. Marital Status: Single Married Divorced Widowed Children?

Do you currently have a family physician?: If not, where have you been getting health care?:

Case 1: 24 yo pregnant female presenting with abnormal TFTs and tachycardia RAJESH JAIN ENDORAMA 3/16/2017

A Method for Analyzing Commonalities in Clinical Trial Target Populations

New Patient Intake Form

CARDIOVASCULAR CASE-BASED SMALL GROUP DISCUSSION

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer

Example 5: Automated computation of process quality indicators

Retinal Consultants of San Antonio PATIENT REGISTRATION

Specifications Manual Update: Hospital Outpatient Quality Reporting (OQR) Program

Renal Remission and Hypertension Consultants PLLC

Hereditary Cancer Risk Program

Advanced Pharmacology Diabetes Homework

MedDRA Coding/ AE Log Item 1 Refresher. ASPIRE Protocol Team Meeting February 10, 2013

Patient Profile. Full Name: Address: Work Phone: Date of Birth: Social Security #: (Circle One) Full Time / Part Time. Emergency Contact: Number:

Case #3 Clinician. Past Medical History: hypertension, hypercholesterolemia, arthritis, seasonal allergies, remote history of stroke

NEUROLOGICAL SURGERY, P.C.

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

PEDIATRIC SOAP NOTE PDF

Session 35: Text Analytics: You Need More than NLP. Eric Just Senior Vice President Health Catalyst

Bioengineering and World Health. Lecture Twelve

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Safer Tracheostomy Care Course

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

ED-SCANS: OVERALL DECISION SUPPORT ALGORITHM. Is This Strictly a Pain Episode? Decision 7: Referrals

Personal Data. Present Symptoms

5AB Dysrhythmia Interpretation and Management 2016

Morris Medical Center, P.A.

Gender: Male Female Age: Current Address: City: State: Zip Code: Work Phone: Is it okay to leave a message? VISIT INFORMATION

Chapter 10. Screening for Disease

DIVISION OF HOSPITAL MEDICINE PERIOPERATIVE MEDICINE

From Population Health to Precision Health. William J, Kassler, MD, MPH Deputy Chief Health Officer March 28, 2017

Topic: Chronic Heart Failure Cases for Monday s March 21th lecture.

Normal Recovery or Complication: The Risks of Post-Operative Care

Chronic Disease Management when Resources are Limited

MEDICAL DATA SHEET For Patients 18 years of age and older

Individualizing Treatment Plans for Older Adults With T2DM

Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study

WEIGHT LOSS PATIENT INFORMATION RECORD

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION

NEWBORN FEMALE WITH GOITER PAYAL PATEL, M.D. PEDIATRIC ENDOCRINOLOGY FELLOW FEBRUARY 12, 2015

Genetic Risk Evaluation and Testing Program

Bahl & Bahl Medical Associates PATIENT MEDICAL HISTORY

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Welcome and Texas DSHS Overview

Please complete all pages of this form. Your physician will review the form with you during your appointment. Last Name: First Name: Middle Initial:

Patient Medical History Form Pre-Surgical Bleeding History Questionnaire Name:

GUPTA SPORTS & SPINE CENTER

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

Chiropractic Case History/Patient Information

New Patient Urologic History Form

DUKEMedicine. SMITH, JAMES MRN: D DOB: 2/6/1993, Sex: M Adm: 2/15/2016, D/C: 2/15/2016

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

Update in Hospital Medicine. Disclosures 10/30/2017. none Oregon Chapter ACP Scientific Meeting

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

ACS-NSQIP Geriatric Collaborative. Thomas Robinson MD MS FACS Associate Professor, Surgery University of Colorado

Community Paramedic Training Program

Charles Krasner, M.D. University of NV, Reno School of Medicine Sierra NV Veterans Affairs Medical Center

Physician Orders ADULT

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Who's Driving the DRG Bus: Selecting the Appropriate Principal Diagnosis

1. What additional information needs to be collected to properly treat this client?

Declaration of Conflict of Interest. No potential conflict of interest to disclose with regard to the topics of this presentations.

How to Make Sense of Statistics Reported in the Medical Literature

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

PATIENT INFORMATION DENTAL HEALTH HISTORY

On-time clinical phenotype prediction based on narrative reports

Quality Improvement Updates Foley Discontinuation Protocol Surgical Care Improvement Project

Sample. Fractured Hip Post-Operative Orders. Legend < Mandatory fields o Optional fields. Height Allergies: List or o Up to date in electronic system

POST-OP CARDIAC SURGERY PHYSICIAN S ORDER SHEET USE BALLPOINT PEN ONLY. CARDIAC INTENSIVE CARE UNIT

FAMILY MEDICINE New Patient Medical History Form

Measure Up / Pressure Down: Improving Blood Pressure Control in Washington, DC

MEDICAL DATA SHEET For Patients 18 years of age and older

Supporting Documents Case Studies

Introduction to Epidemiology Screening for diseases

Area of Complaint: Right Left Bilateral. When did your complaint begin? Unknown Work Accident Auto Accident Sports Injury Other:

Cardiology. Self Learning Package. Module 5: Pharmacology: Treatment of Acute Coronary. Prevention

Information & Health History Form

Name Class Date. Note Taking Guide. Disease Description Effect on Health. a. blood pressure consistently measuring 140/90 or higher. i. j.

Mercy Metabolic and Bariatric Surgery Program Questionnaire

FIRST TIME VISIT APPOINTMENT CHECKLIST Department of Radiation Oncology 200 Medical Plaza, Ste B265 Los Angeles, CA

Evaluation of a Clinical Decision Support Rule-set for Medication Adjustments in mhealth-based Heart Failure Management

Transcription:

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Enhanced Cohort Identification and Retrieval S105 Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017

Co-Authors Dina Demner-Fushman, MD, PhD (National Library of Medicine) Aaron Cohen, MD, MS (Oregon Health & Science University) Steven Bedrick, PhD (Oregon Health & Science University) William Hersh, MD (Oregon Health & Science University) 2

Acknowledgements NLM 2 T15 LM 7088-21 National Library of Medicine NLM Scientists, Staff, and Fellows OHSU DMICE Faculty, Staff, and Students 3

Disclosure I and my spouse/partner have no relevant relationships with commercial interests to disclose. 4

Learning Objectives After participating in this session the learner should be better able to: Understand the importance of identifying document section headings for natural language processing Understand rule-based identification of document section headings 5

Use of Clinical Data Secondary use of EHR data Quality improvement Disease surveillance Regulatory reporting Research To use this data, it is important to be able to retrieve specific patient cohorts Image from http://epidemiologystudy.com/study.php 6

Structured and Unstructured Data for Cohort Retrieval Structured data including diagnosis and procedure codes are commonly used to identify clinical cohorts Relying solely on structured data may not retrieve the full cohort Patients who had colonoscopies during the last 10 years Denny JC (2012) Chapter 13: Mining Electronic Health Records in the Genomics Era. PLoS Comput Biol 8(12): e1002823. doi:10.1371/journal.pcbi.1002823 7

Cohort Retrieval from Clinical Text Cohort retrieval from clinical text is difficult Terminology and spelling differences Multiple meanings for terms Temporality Negation References to illnesses in other people Clinical text may provide clues to help resolve some of these issues 8

Structure of Clinical Text SOAP Format S: Patient reports not much sleep last night; no complaints this morning. O: T 99 F, HR 68, RR 16, BP 107/75 Chest CTA, bilateral breath sounds CV RRR without murmur A: Ovarian carcinoma POD #1 for staging laparotomy. Adequate UOP, incision in good condition. P: Clear liquids today. D/C foley catheter. 9

Structure of Clinical Text Chief Complaint: Sent from NWH with left sided hemorrhage History of Present Illness: The pt is a 44 year-old right handed woman with no significant PMH and family history significant for stroke (father, paternal uncle and sister @ 46 years) who was transferred from [**Hospital 1771**] Hospital with a left sided intraparenchymal hemorrhage. The patient was in her USOH... Past Medical History: Had an ulcer at age 10 Social History: Works at the [**Last Name (un) 10457**] Laboratories in [**Location (un) 2997**]. Married. Has a son. No ETOH, TOBACCO, or Drugs. Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46. 10

Facilitating Retrieval by Segmenting Clinical Text Past Medical History: Had an ulcer at age 10 Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46. Sections provide clues that may avoid some retrieval issues - Temporal differences - References to illnesses in other people Several algorithms have been published that segment clinical documents - Segmenting was validated - No published studies evaluate whether segmenting improves recall and precision 11

Project Overview Segmented a set of clinical documents Developed topics for several patient cohorts Developed queries with and without sections Judged a subset of documents for performance Analyzed results 12

Methods - Data MIMIC-II database neonatal and adult patients De-identified ICU records developed by MIT, Philips Medical Systems, and Beth Israel Deaconess Medical Center Relational database containing structured data and unstructured documents Discharge summaries MD notes Radiology reports 25,000 patients Nursing notes 13

Methods Segmenting Documents Identified section indicators Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY Allergies: Penicillin Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY Allergies - penicillin Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY Allergic to penicillin Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to Searched for indicators and inserted XML tags Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY <allergies>allergic to penicillin</allergies> Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to 14

Methods Segmenting Documents <TEXT>Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Date of Birth: [**3312-11-5**] Sex: M Service: SURGERY Allergies: Penicillin <TEXT> Attending:[**First Name3 (LF) 2679**] Addendum: Pt <preamble>admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] discharged to [**Hospital3 **] Hospital [**3391-6-1**]. Date of Birth: [**3312-11-5**] Sex: M Service: SURGERY</preamble> This is an updated medication list, which has been <allergies>allergies: Penicillin</allergies> faxed to [**Hospital3 **]. Discharge Medications: 1. <addendum>addendum: Pt is discharged to [**Hospital3 **] Hospital [**3391- Acetaminophen 325 mg Tablet Sig: 1-2 Tablets PO Q6H 6-1**]. This is an updated medication list, which has been faxed to (every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet [**Hospital3 **]. </addendum> Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro <dc_meds>discharge Medications: 1. Acetaminophen 325 mg Tablet Sig: 1-2 100 unit/ml Solution Sig: One (1) injection Tablets PO Q6H (every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet Subcutaneous ASDIR (AS DIRECTED). Discharge Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro 100 unit/ml Disposition: Extended Care Facility: [**Hospital6 694**] Solution Sig: One (1) injection Subcutaneous ASDIR (AS DIRECTED). [((Location (un) 695**] [**First Name11 (Name </dc_meds> Pattern1) 531**] [**Last Name (NamePattern1) 2684**] <dc_disposition>discharge Disposition: Extended Care Facility: [**Hospital6 MD [**MD Number 2685**]</TEXT> 694**] [((Location (un) 695**] [**First Name11 (Name Pattern1) 531**] [**Last Name (NamePattern1) 2684**] MD [**MD Number 2685**] </dc_disposition> </TEXT> Original format Segmented text 15

Methods Search Engine NLM s Essie Developed to facilitate searching of medical literature by non-clinicians through use of UMLS UMLS relates terms by concept Allows matching even if different words used Maps text corpus to the UMLS and indexes the corpus on these concepts Maps the search concepts to the UMLS Returns a ranked, scored list of documents 16

Methods Clinical Topics Began with topics from TRECMed 2012 and adapted them to the MIMIC ICU data Modified or eliminated topics that retrieved few documents 17

Methods Clinical Topic Examples Patients who develop thrombocytopenia in pregnancy Patients taking atypical antipsychotics without a diagnosis of schizophrenia or bipolar depression Patients with delirium, hypertension, and tachycardia Patients with thyrotoxicosis treated with beta-blockers Final set included 22 topics 18

Methods Query Development Developed initial query without sections Ran queries against data Examined retrieved documents to refine query Rewrote query using sections Ran queries against data Examined retrieved documents to refine query Ran all queries and recorded documents returned and scores 19

Methods Query Development Topic: Patients with diabetes who also have thrombocytosis Baseline query diabetes AND thrombocytosis With sections we could avoid Family History thrombocytosis AND AREA[AdmissionDiagnosis] diabetes OR AREA[ChiefComplaint] diabetes OR AREA[Course] diabetes 20

Methods Document Sampling Samples selected for each topic based on difference in scores Segmented Documents 0-10 docs 0-10 high 0-10 low Whole Document 0-10 docs Total sample size was 574 documents Sample sizes ranged from 10 to 40 Average sample size 26 documents 21

Methods Document Evaluation 1. Was the document relevant to the topic? 2. Why were non-relevant documents retrieved? 3. Did segmentation help retrieval and why? 22

Results Document Relevance 574 Documents Analyzed Queries of Segmented Documents 26 328 220 Queries of Whole Documents 23

Results Document Relevance 343 Relevant Documents Segmented Documents 246 20 77 Whole Document 231 Non-relevant Documents Segmented Documents 6 82 143 Whole Document 24

Results Reasons for Retrieving Non-relevant Documents Non-relevant reference to condition 84 Past or possible future condition 70 Condition mentioned but not diagnosed 23 Condition denied or ruled out 22 Issue with term mapping 20 Query issue 11 25

Results Effect of Segmenting on Document Retrieval Segmenting avoided retrieval of non-relevant document by avoiding specific sections Segmenting allowed retrieval of relevant document by focusing on specific sections 132 20 Performance unrelated to segmenting 320 Query error did not look in the right section 80 Document not segmented correctly 18 Condition included in incorrect section of notes 1 26

Results Segmenting avoided retrieval of non-relevant documents Patients who develop thrombocytopenia in pregnancy Issue: Neonatal notes often document mother s pregnancy history Solution: Look in sections containing the patient s diagnosis 27

Results Segmenting allowed retrieval of relevant documents by focusing on specific sections Patients taking atypical antipsychotics without a diagnosis of schizophrenia or bipolar depression Issue: Need to ignore mentions of these conditions in family members Solution: Look in sections containing the patient s diagnosis; avoid family-history section 28

Quantitative Analysis Correlation to indicate whether querying the segmented documents impacted performance Precision and recall 29

Analysis Matthews Correlation Coefficient Segmented score higher than base Segmented score lower than base Document relevant True Positive False Negative Document not relevant False Positive True Negative MCC = TP x TN FP x FN ((TP + FP)(TP + FN)(TN + FP)(TN + FN)) Values range from -1 to 1 30

Analysis Matthews Correlation Coefficient Average * p<0.05 p<0.01 ** ** ** ** ** -0.2 0 0.2 0.4 0.6 0.8 1 ** * ** 31

Analysis Recall and Precision Recall = Number of relevant documents retrieved All relevant documents judged Precision = Number of relevant documents retrieved All documents judged Values range from 0 to 1 32

Analysis - Recall 1 0.8 0.6 0.4 0.2 0 Whole Document Segmented Document Avg 33

Analysis - Precision 1 0.8 0.6 0.4 0.2 0 Whole Document Segmented Document Avg 34

Discussion Queries of segmented documents retrieved fewer documents These documents were more likely to be relevant and less likely to be non-relevant Some queries performed better Some documents were easier to segment accurately 35

Limitations Small sample size Only one person writing queries and doing relevance judgments Inaccuracies in identifying note segments Some queries did not perform well 36

Future Work Use validated algorithm to segment text Use larger sample and independent relevance judges Develop queries for specific type of clinical note Identify specific types of information that benefit from searching specific sections Search unstructured and structured data together to reflect real-world EHR data use 37

AMIA is the professional home for more than 5,400 informatics professionals, representing frontline clinicians, researchers, public health experts and educators who bring meaning to data, manage information and generate new knowledge across the research and healthcare enterprise. @AMIAInformatics @AMIAinformatics Official Group of AMIA @AMIAInformatics #WhyInformatics 38

Thank you! Email me at: edingert@ohsu.edu