Data and Text-Mining the ElectronicalMedicalRecord for epidemiologicalpurposes

Similar documents
Lecture 10: POS Tagging Review. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008)

Chapter 12 Conclusions and Outlook

Correlation Analysis between Sentiment of Tweet Messages and Re-tweet Activity on Twitter

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Analyzing Emotional Statements Roles of General and Physiological Variables

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Automated Conversion of Text Instructions to Human Motion Animation

TWITTER SENTIMENT ANALYSIS TO STUDY ASSOCIATION BETWEEN FOOD HABIT AND DIABETES. A Thesis by. Nazila Massoudian

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Signing for the Deaf using Virtual Humans. Ian Marshall Mike Lincoln J.A. Bangham S.J.Cox (UEA) M. Tutt M.Wells (TeleVirtual, Norwich)

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Scalar implicature and negative strengthening: What it takes to be happy

Interactive Clinical Query Derivation and Evaluation

Analyzing Informal Caregiving Expression in Social Media

Common Sense Assistant for Writing Stories that Teach Social Skills

A Method for Analyzing Commonalities in Clinical Trial Target Populations

Schema-Driven Relationship Extraction from Unstructured Text

IRIT at e-risk. 1 Introduction

Fuzzy Rule Based Systems for Gender Classification from Blog Data

Not For Sale. Alternative Billing Codes (ABC Codes) Clinical Care Classification (CCC) System

Sentence Formation in NLP Engine on the Basis of Indian Sign Language using Hand Gestures

Walpole Highway. Literacy Text Types to Be Covered Each Term With Associated Grammar & Punctuation Objectives

Extraction of Adverse Drug Effects from Clinical Records

Epimining: Using Web News for Influenza Surveillance

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Named Entity Recognition in Crime News Documents Using Classifiers Combination

Neuroinformatics. Ilmari Kurki, Urs Köster, Jukka Perkiö, (Shohei Shimizu) Interdisciplinary and interdepartmental

Feature Engineering for Depression Detection in Social Media

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

An Avatar-Based Weather Forecast Sign Language System for the Hearing-Impaired

Clinical decision support (CDS) and Arden Syntax

How Doctors Feel About Electronic Health Records. National Physician Poll by The Harris Poll

Newborn Screening Issues and Answers Workshop: Laboratory Health Information Exchange November 1-2, Utah Department of Health

Detecting Hoaxes, Frauds and Deception in Writing Style Online

Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing

Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Discovering and Understanding Self-harm Images in Social Media. Neil O Hare, MFSec Bucharest, Romania, June 6 th, 2017

UniNE at CLEF 2015: Author Profiling

ORC: an Ontology Reasoning Component for Diabetes

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Building a framework for handling clinical abbreviations a long journey of understanding shortened words "

Analysis of Semantic Classes in Medical Text for Question Answering

Today we will... Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing. Evaluating parse accuracy. Evaluating parse accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

English and Persian Apposition Markers in Written Discourse: A Case of Iranian EFL learners

Generalizing Dependency Features for Opinion Mining

Kashan University of Medical Sciences Faculty of Medicine English Department Lesson Plan

Clinical Narratives Context Categorization: The Clinician Approach using RapidMiner

())*+ -.-/012 !" #$% &'

Chat-Bots for People with Parkinson s Disease: Science Fiction or Reality?

Somnorifics - A Study of Sleep Activity 3C

CHAPTER 6 DESIGN AND ARCHITECTURE OF REAL TIME WEB-CENTRIC TELEHEALTH DIABETES DIAGNOSIS EXPERT SYSTEM

SEMANTICS-BASED CALORIE CALCULATOR. A Paper Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science

Comparison of machine learning models for the prediction of live birth following IVF treatment: an analysis of 463,669 cycles from a national

How to code rare diseases with international terminologies?

Headings: Information Extraction. Natural Language Processing. Blogs. Discussion Forums. Named Entity Recognition

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

Identifying Engineering Ambiguity in Laws and Regulations

Tracking Disease Outbreaks using Geotargeted Social Media and Big Data

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Bringing Commitments (and Other Norms) to Practice

Signals from Text: Sentiment, Intent, Emotion, Deception

Guidelines for Effective Usage of Text Highlighting Techniques

A Corpus of Clinical Narratives Annotated with Temporal Information

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Textual Emotion Processing From Event Analysis

Bellagio, Las Vegas November 26-28, Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10

SNOMED CT and Orphanet working together

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

Combining unsupervised and supervised methods for PP attachment disambiguation

Binge drinking increases risk of dementia

Standardize and Optimize. Trials and Drug Development

Boundary identification of events in clinical named entity recognition

Inferencing in Artificial Intelligence and Computational Linguistics

Artificial-intelligence-augmented clinical medicine

Emotion Detection on Twitter Data using Knowledge Base Approach

From Guidelines To Decision Support A Systematic and Replicable Approach To Guideline Knowledge Transformation

9-10 Issue 181 VIBE ACTIVITIES. Healthy Vibe - I Quit Because... page 22. Issue 181 Page 1 Y E A R. Name:

Appendix A: DARTNet Design Specifications

Sub-Topic Classification of HIV related Opportunistic Infections. Miguel Anderson and Joseph Fonseca

Automated Social Network Epidemic Data Collector

Proposing a New Term Weighting Scheme for Text Categorization

Expert Systems. Artificial Intelligence. Lecture 4 Karim Bouzoubaa

A Method for Probing Disease Relatedness Using Common Clinical Eligibility Criteria

Guidelines for Captioning

Captioning Your Video Using YouTube Online Accessibility Series

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Eastern Kentucky University Department of Special Education SED 538_738 Language of the Deaf and Hard of Hearing 3 Credit Hours CRN: XXXX

Semantic Structure of the Indian Sign Language

Artificial Doctors In A Human Era

Using a grammar implementation to teach writing skills

Clinical Coreference Annotation Guidelines (with excerpts from ODIE guidelines and modified for SHARP) Arrick Lanfranchi and Kevin Crooks

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Transcription:

SESSION 2: MASTER COURSE Data and Text-Mining the ElectronicalMedicalRecord for epidemiologicalpurposes Dr Marie-Hélène Metzger Associate Professor marie-helene.metzger@aphp.fr 1 Assistance Publique Hôpitaux de Paris, Hôpital Avicenne, Bobigny 2 Paris 13, University, LEPS 3 INSERM U 1018, CESP

Definitions Data-Mining Computationalprocessof discoveringpatterns in large data sets involving methods of artificial intelligence, machine learning, statistics, and database systems Stepin the KnowledgeDiscoveryin Databases(KDD) process, commonly defined with the following stages: Selection Pre-processing Transformation Data Mining Interpretation/Evaluation Ref: U. Fayyad et al. 1996, American Association for Artificial Intelligence: 37-54

Definitions Text-Mining Linguistictechnologies to move text(full text) to a digital vector(presence-absence or frequency) It isthenpossible to applythe samealgorithmsas thoseusedin Data-Mining(eg. Principal Component Analysis, Naives Bayes Classifier, ) General applications Information extraction Information retrieval Keyword-based association analysis Document classification Text clustering analysis

From text to epidemiology Data sources Electronic medical records E-mails Internet queries Blogs Social networks Scientific literature Newspapers Medicalconsultation (voicerecognition transcription) Epidemiological use Prevalence, incidence of a disease Patient s phenotyping Surveillance, alert Exploratory analysis of risk factors Indicatorsof healthcare quality

From medical text to epidemiological knowledge Text-Mining Data-Mining Epidemiology Ref: U. Fayyad et al. 1996, American Association for Artificial Intelligence: 37-54

Use case: quality of care The SYNODOSproject: SYstemfor the Normalization and Organization of textual medical Data for Observation in Healthcare http://www.synodos.fr Detectionof nosocomial infections or automateddescription of the patient scare pathwayof colon cancer

Data Selection Hospital Informa on System Metadata (dates, type of document) Structured Data (ADICAP,..) Internet Textual Data (discharge summaries, opera ve reports, ) HTTPS protocol 3 LAN 1 Mul terminology extractor «PUSH» import - XML format SOAP request SYNODOS Mediator 2 Storage directory Indexing Seman c Analyzer (source, de-iden fied, terminology and seman cs labeling) 4 Fact Base Certain facts Inferred facts 5 DMZ User Interface Rule Base

Multi-terminology Indexer Terminology Code Terminology Wording Terminology Source UMLS semantic type

Text-Mining Processing Token Part of Speech tags (17 universal tags) Dependency Parses (triplet) : 40 universal relations The DT Determiner Determiner patient NN Common noun nsubj Nominal Subject claims VB Verb : base form null not RB Adverb neg negation modifier to TO To aux have VB Verb : base form aux Auxiliary Verb consumed VBN Verb : past participle xcomp Open clausal complement this DT Determiner det morning NN Common noun tmod Temporal Modifier 6 CD Cardinal Number num Numeric modifier es NNS Common noun dobj Direct Object of IN Preposition prep paracetamol NNP Proper Noun pobj Object of Preposition + SYM Symbol dep dependency other JJ Adjective amod Adjectival Modifier drugs NN dep dependency. PUNCT Punctuation

Populating the database of facts Patient s care pathway Surgical report : T0 Relational database BRMS (Business Rules Management System) Discharge summary: T0 +7days Expert rules Consultation letter: T0+3 months

Expert rules: linguistic Token Part of Speech tags (17 universal tags) Dependency Parses (triplet) : 40 universal relations The DT Determiner Determiner patient NN Common noun nsubj Nominal Subject claims VB Verb : base form null not RB Adverb neg negation modifier to TO To aux have VB Verb : base form aux Auxiliary Verb consumed VBN Verb : past participle xcomp Open clausal complement this DT Determiner det morning NN Common noun tmod Temporal Modifier 6 CD Cardinal Number num Numeric modifier es NNS Common noun dobj Direct Object of IN Preposition prep paracetamol NNP Proper Noun pobj Object of Preposition + SYM Symbol dep dependency other JJ Adjective amod Adjectival Modifier drugs NN dep dependency. PUNCT Punctuation

Expert rules: medical Detection of nosocomial infection Example of a medical expert rule: If "surgery» at T0 and "purulent drainage from the scar" tagged "> T0" then»surgical site infection» = «YES»

Performances of detection

Use case : psychiatry Ref: Int. J. Methods Psychiatr. Res. (2016)

Use case : psychiatry

Discussion Critical steps Linguisticissues Use of different terminologies depending on whether one is studying corpus written by patients or doctors. The corpus of medical reports are difficult to analyze because written with a lot of unconventional abbreviations, typographicalerrors, absence of punctuation

Discussion Critical steps Availability of clinical text for secondary use Ethical challenges: Barriers to accessing clinical text linkedto perceptions or misperceptionsof risksto patient privacy In fact, adoption of NLP methods and automatic information extraction wouldreducethreatsto patient privacy because the human intervention is lower Technicalchallenges Quality of the text deidentification process Standardisation of metadatarelatedto the type of documents available in the Health Information System