Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Size: px
Start display at page:

Download "Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor"

Transcription

1 Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

2 Opportunities for Biomedical Informatics Increasing roll-out of EHR and HI systems creating analysis opportunities in Biomed Informatics and ehealth Prediction: factors in disease and effective treatment Detection: observables indicating of disease Prevention: what factors circumvent those related to prediction Linked hospital data allows multiple sources to be leveraged for complex analytic tasks Radiology Pathology Pharmacology Admission Emergency Room etc

3 Alfred Health s REASON Discovery Platform Initiative by Alfred Health Informatics Department: Technologies, tools and large-scale data sources to support: REsearch, AnalysiS, and OperatioNs Integrates data sources from multiple hospital departments Historical patient data linked by unique Unit Record (UR) number

4 Alfred Health s REASON Discovery Platform

5 Alfred Health s REASON Discovery Platform Initiative by Alfred Health Informatics Department: Technologies, tools and large-scale data sources to support REsearch, AnalysiS, and OperatioNs Integrates data sources from multiple hospital departments Historical patient data linked by unique Unit Record (UR) number Large-scale Data architecture (parts of )14+ years of data from Cerner HI implementations 171,000+ updates to Clinical Events table each day 62.4 million updates per annum

6 Language Technology and Decision Support Much information remains and will remain in text form Inter-departmental reports Clinical notes and narratives In-patient and discharge summaries EHRs/EMRs Such data can t be leveraged without language technology/nlp Tasks: Monitoring (adverse) clinical events, surveillance Providing best up-to-date evidence Creating knowledge bases Converting text into actionable data that can be mined

7 Disease Recognition from Clinical Reports Task: classify records according to specified disease Enables retrieval of specific cases Detect patterns of disease occurrence Support creation of patient cohorts Prelude to automated ICD-encoding

8 Disease Recognition from Clinical Reports Task: classify records according to specified disease Enables retrieval of specific cases Detect patterns of disease occurrence Support creation of patient cohorts Prelude to automated ICD-encoding Disease: Lung Cancer Identified by ICD-10 code C34: Malignant neoplasm of bronchus and lung

9 Disease Recognition from Clinical Reports Task: classify records according to specified disease Enables retrieval of specific cases Detect patterns of disease occurrence Support creation of patient cohorts Prelude to automated ICD-encoding Disease: Lung Cancer Identified by ICD-10 code C34: Malignant neoplasm of bronchus and lung Classification Task: assign code to patient admission record

10 Hybrid Text Mining Processing Framework Machine learning algorithm Annotated Data Set Classification Model Language processing Words and Linguistic structure; Names of entities; Context features; Domain concepts Biomedical knowledge sources

11 Previous Applications Fungal infection surveillance by classifying CT scan reports Extracting information from pathology reports

12 Method Data: radiology reports for 2 (financial) years ( ) 756,502 reports, plus associated metadata Each report linked to an admission record Metadata: ICD-10 (manually assigned) used as ground truth; demographics, reason for admission, etc Extracted from REASON platform Data pre-processed to remove ICD-10 codes and extract features Challenge: real distribution highly skewed data: only 0.8% of data are positive for lung cancer

13 Method Features: Bags-of-Words from reports text Bags-of-Phrases identified by MetaMap Negative context identified by NegEx Metadata associated with linked admission record NAME, DOB, SEX, MARITALSTATUS, RELIGION,...: ADMISSIONREASON, ADMISSIONUNIT, ADMISSIONTYPE: ALLERGIES, DRUGCODE, DRUGDESC:...

14 Method Machine learning algorithms Support Vector Machines (as implemented in Weka toolkit) Correlation-based feature subset selection filter Baseline: term-matching approach lung cancer, lung malignancy, lung malignant, lung neoplasm, lung tumour, lung carcinoma

15 Results: different systems / configurations Evaluation: stratified 10-fold cross-validation Classifier Precision Recall F-score Text features only Full feature set (including metadata) Term-matching baseline * Results not using feature selection, which reduced performance

16 Results: temporal-based training sets

17 Conclusions Promising results using straightforward machine-learning-based approach over heavily-skewed distributions Volume of data means training time can be lengthy mainly feature extraction, due to MetaMap however, this is off-line and record-classification is fast Use of metadata improved performance Data linking helps

18 Conclusions Promising results using straightforward machine-learning-based approach over heavily-skewed distributions Volume of data means training time can be lengthy mainly feature extraction, due to MetaMap however, this is off-line and record-classification is fast Use of metadata improved performance Data linking helps Next Steps Expand to more ICD codes / diseases Incorporate more data sources into decision-making

19 Thanks! Questions?

20 Problematic sentences for phrase-matching baseline False positives: - Clinical indications: Surveillance of lung cancer - A small primary lung neoplasm would have to be considered -? small primary lung neoplasm - Clinical: Metastatic lung cancer False negatives: - I suspect this is more likely be carcinoma and lung primary are suspected in this highly likely - ill-defined nodular density superior to the right lung hilum (?)

Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance

Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance Professor Karin Verspoor @karinv School of Computing and Information Systems The University of Melbourne

More information

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text Anthony Nguyen 1, Michael Lawley 1, David Hansen 1, Shoni Colquist 2 1 The Australian e-health Research Centre, CSIRO ICT

More information

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Natalia Viani, PhD IoPPN, King s College London Introduction: clinical use-case For patients with schizophrenia, longer durations

More information

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Sumithra VELUPILLAI, Ph.D. Oslo, May 30 th 2012 Health Care Analytics and Modeling, Dept. of Computer and Systems Sciences

More information

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview Innovative Risk and Quality Solutions for Value-Based Care Company Overview Meet Talix Talix provides risk and quality solutions to help providers, payers and accountable care organizations address the

More information

Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish

Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish Maria Skeppstedt 1,HerculesDalianis 1,andGunnarHNilsson 2 1 Department of Computer and Systems Sciences (DSV)/Stockholm

More information

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 1 Department of Biomedical Informatics, Columbia University, New York, NY, USA 2 Department

More information

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N

More information

How to Advance Beyond Regular Data with Text Analytics

How to Advance Beyond Regular Data with Text Analytics Session #34 How to Advance Beyond Regular Data with Text Analytics Mike Dow Director, Product Development, Health Catalyst Carolyn Wong Simpkins, MD, PhD Chief Medical Informatics Officer, Health Catalyst

More information

Standardize and Optimize. Trials and Drug Development

Standardize and Optimize. Trials and Drug Development Informatics Infrastructure to Standardize and Optimize Quantitative Imaging in Clinical Trials and Drug Development Daniel L. Rubin, MD, MS Assistant Professor of Radiology Member, Stanford Cancer Center

More information

TeamHCMUS: Analysis of Clinical Text

TeamHCMUS: Analysis of Clinical Text TeamHCMUS: Analysis of Clinical Text Nghia Huynh Faculty of Information Technology University of Science, Ho Chi Minh City, Vietnam huynhnghiavn@gmail.com Quoc Ho Faculty of Information Technology University

More information

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2014 Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments Eric James Klosterman University

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

Asthma Surveillance Using Social Media Data

Asthma Surveillance Using Social Media Data Asthma Surveillance Using Social Media Data Wenli Zhang 1, Sudha Ram 1, Mark Burkart 2, Max Williams 2, and Yolande Pengetnze 2 University of Arizona 1, PCCI-Parkland Center for Clinical Innovation 2 {wenlizhang,

More information

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Malcolm Pradhan, CMO MBBS, PhD, FACHI Daniel Padilla, ML Engineer BEng,, PhD Alcidion Corporation Overview Alcidion s Natural

More information

Chapter 12 Conclusions and Outlook

Chapter 12 Conclusions and Outlook Chapter 12 Conclusions and Outlook In this book research in clinical text mining from the early days in 1970 up to now (2017) has been compiled. This book provided information on paper based patient record

More information

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King

More information

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT BAHADORREZA OFOGHI 1, SAMIN SIDDIQUI 1, and KARIN VERSPOOR 1,2 1 Department of Computing and

More information

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD School of Biomedical Informatics The University of Texas Health Science Center at Houston 1 Advancing Cancer Pharmacoepidemiology

More information

QUANTITATIVE IMAGING ANALYTICS

QUANTITATIVE IMAGING ANALYTICS QUANTITATIVE IMAGING ANALYTICS the future of radiology enabling evidence based care for oncology September 2015 Madison, Wisconsin HealthMyne.com Page 1 quantitative imaging analytics the future of radiology

More information

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Dr. Sascha Losko, Dr. Karsten Wenger, Dr. Wenzel Kalus, Dr. Andrea Ramge, Dr. Jens Wiehler,

More information

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics How can Natural Language Processing help MedDRA coding? April 16 2018 Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics Summary About NLP and NLP in life sciences Uses of NLP with MedDRA

More information

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erik M. van Mulligen, Zubair Afzal, Saber A. Akhondi, Dang Vo, and Jan A. Kors Department of Medical Informatics, Erasmus

More information

Not all NLP is Created Equal:

Not all NLP is Created Equal: Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information

More information

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Authors: Xiao Liu, Department of Management Information Systems, University of Arizona Hsinchun Chen, Department of

More information

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS Semantic Alignment between ICD-11 and SNOMED-CT By Marcie Wright RHIA, CHDA, CCS World Health Organization (WHO) owns and publishes the International Classification of Diseases (ICD) WHO was entrusted

More information

Automatic coding of death certificates to ICD-10 terminology

Automatic coding of death certificates to ICD-10 terminology Automatic coding of death certificates to ICD-10 terminology Jitendra Jonnagaddala 1,2, * and Feiyan Hu 3 1 School of Public Health and Community Medicine, UNSW Sydney, Australia 2 Prince of Wales Clinical

More information

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods D. Weissenbacher 1, A. Sarker 2, T. Tahsin 1, G. Gonzalez 2 and M. Scotch 1

More information

Factuality Levels of Diagnoses in Swedish Clinical Text

Factuality Levels of Diagnoses in Swedish Clinical Text User Centred Networked Health Care A. Moen et al. (Eds.) IOS Press, 2011 2011 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-806-9-559 559 Factuality Levels of

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery Leonard W. D Avolio MS a,b, Mark S. Litwin MD c, Selwyn O. Rogers Jr. MD, MPH

More information

Clinical decision support (CDS) and Arden Syntax

Clinical decision support (CDS) and Arden Syntax Clinical decision support (CDS) and Arden Syntax Educational material, part 1 Medexter Healthcare Borschkegasse 7/5 A-1090 Vienna www.medexter.com www.meduniwien.ac.at/kpa (academic) Better care, patient

More information

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Christopher Wing and Hui Yang Department of Computer Science, Georgetown University,

More information

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia JSCI 2017 1 Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan School of Information Technology King Mongkut's University

More information

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions Jannik Strötgen, Michael Gertz Institute of Computer Science, Heidelberg University Im Neuenheimer Feld 348, 69120 Heidelberg,

More information

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT Volume 116 No. 21 2017, 243-249 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED

More information

Running Head: AUTOMATED SCORING OF CONSTRUCTED RESPONSE ITEMS. Contract grant sponsor: National Science Foundation; Contract grant number:

Running Head: AUTOMATED SCORING OF CONSTRUCTED RESPONSE ITEMS. Contract grant sponsor: National Science Foundation; Contract grant number: Running Head: AUTOMATED SCORING OF CONSTRUCTED RESPONSE ITEMS Rutstein, D. W., Niekrasz, J., & Snow, E. (2016, April). Automated scoring of constructed response items measuring computational thinking.

More information

On-time clinical phenotype prediction based on narrative reports

On-time clinical phenotype prediction based on narrative reports On-time clinical phenotype prediction based on narrative reports Cosmin A. Bejan, PhD 1, Lucy Vanderwende, PhD 2,1, Heather L. Evans, MD, MS 3, Mark M. Wurfel, MD, PhD 4, Meliha Yetisgen-Yildiz, PhD 1,5

More information

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Enhanced Cohort Identification and Retrieval S105 Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Co-Authors

More information

Application of Automated Pathology Reporting Concepts to Radiology Reports

Application of Automated Pathology Reporting Concepts to Radiology Reports Original Article Application of Automated Pathology Reporting Concepts to Radiology Reports Suzanne March, MBA, CMC a ; George Cernile, BSc, CKE, PMP b ; Kim West, BS a ; Diane Borhani, MBA, CMC a ; April

More information

Visual and Decision Informatics (CVDI)

Visual and Decision Informatics (CVDI) University of Louisiana at Lafayette, Vijay V Raghavan, 337.482.6603, raghavan@louisiana.edu Drexel University, Xiaohua (Tony) Hu, 215.895.0551, xh29@drexel.edu Tampere University (Finland), Moncef Gabbouj,

More information

Rebooting Cancer Data Through Structured Data Capture GEMMA LEE NAACCR CONFERENCE JUNE, 2017

Rebooting Cancer Data Through Structured Data Capture GEMMA LEE NAACCR CONFERENCE JUNE, 2017 Rebooting Cancer Data Through Structured Data Capture GEMMA LEE NAACCR CONFERENCE JUNE, 2017 Acknowledgement Richard Moldwin, MD, PhD, CAP Sandy Jones, CDC Wendy Blumenthal, CDC David Kwan, Cancer Care

More information

Microblog Retrieval for Disaster Relief: How To Create Ground Truths? Ribhav Soni and Sukomal Pal

Microblog Retrieval for Disaster Relief: How To Create Ground Truths? Ribhav Soni and Sukomal Pal Microblog Retrieval for Disaster Relief: How To Create Ground Truths? Ribhav Soni and Sukomal Pal Outline 1. Overview of this work 2. Introduction 3. Background: FMT16 4. Experiments 5. Discussion Overview

More information

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports Ramakanth Kavuluru, Ph.D 1, Isaac Hands, B.S 2, Eric B. Durbin, DrPH 2, and Lisa Witt, A.S 2 1 Division of Biomedical Informatics,

More information

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application Problem-Oriented Patient Record Summary: An Early Report on a Watson Application Murthy Devarakonda, Dongyang Zhang, Ching-Huei Tsou, Mihaela Bornea IBM Research and Watson Group Yorktown Heights, NY Abstract

More information

Modeling Annotator Rationales with Application to Pneumonia Classification

Modeling Annotator Rationales with Application to Pneumonia Classification Modeling Annotator Rationales with Application to Pneumonia Classification Michael Tepper 1, Heather L. Evans 3, Fei Xia 1,2, Meliha Yetisgen-Yildiz 2,1 1 Department of Linguistics, 2 Biomedical and Health

More information

Truth Versus Truthiness in Clinical Data

Truth Versus Truthiness in Clinical Data Temple University Health System Truth Versus Truthiness in Clinical Data Mark Weiner, MD, FACP, FACMI Assistant Dean for Informatics, Temple University School of Medicine mark.weiner@tuhs.temple.edu 1

More information

Big Data Phenomics in the VA. Outline

Big Data Phenomics in the VA. Outline Big Phenomics in the VA Mary Whooley MD Director, VA Measurement Science QUERI San Francisco VA Health Care System University of California, San Francisco Kelly Cho PhD MPH Phenomics Lead, Million Veteran

More information

Colorectal Cancer Screening Rates in Health Centers

Colorectal Cancer Screening Rates in Health Centers Colorectal Cancer Screening Rates in Health Centers December 7, 2017 Sue Lin PhD, MS Director, Office of Quality Improvement/Quality Division Bureau of Primary Health Care (BPHC) Health Resources and Services

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES International Journal of Civil Engineering and Technology (IJCIET) Volume 10, Issue 02, February 2019, pp. 96-105, Article ID: IJCIET_10_02_012 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=10&itype=02

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson RC25496 (WAT1409-068) September 24, 2014 Computer Science IBM Research Report Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei Tsou IBM Research

More information

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Isa Maks and Piek Vossen Vu University, Faculty of Arts De Boelelaan 1105, 1081 HV Amsterdam e.maks@vu.nl, p.vossen@vu.nl

More information

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Bing Tian Yong Zhang Kaixin Liu Chunxiao Xing RIIT, Beijing National Research Center for Information Science and

More information

Clinical Event Detection with Hybrid Neural Architecture

Clinical Event Detection with Hybrid Neural Architecture Clinical Event Detection with Hybrid Neural Architecture Adyasha Maharana Biomedical and Health Informatics University of Washington, Seattle adyasha@uw.edu Meliha Yetisgen Biomedical and Health Informatics

More information

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us SIIM 2016 Scientific Session Quality and Safety Part 1 Thursday, June 30 8:00 am 9:30 am Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us Linda C. Kelahan, MD, Medstar

More information

Schema-Driven Relationship Extraction from Unstructured Text

Schema-Driven Relationship Extraction from Unstructured Text Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2007 Schema-Driven Relationship Extraction from Unstructured Text Cartic

More information

Drug side effect extraction from clinical narratives of psychiatry and psychology patients

Drug side effect extraction from clinical narratives of psychiatry and psychology patients Drug side effect extraction from clinical narratives of psychiatry and psychology patients Sunghwan Sohn, 1 Jean-Pierre A Kocher, 1 Christopher G Chute, 1 Guergana K Savova 2 < Additional appendices are

More information

Social Media Mining for Toxicovigilance

Social Media Mining for Toxicovigilance Social Media Mining for Toxicovigilance Automatic Monitoring of Prescription Medication Abuse from Twitter Abeed Sarker (@sarkerabeed) Health Language Processing Lab Research Associate Department of Biostatistics,

More information

The Impact of Belief Values on the Identification of Patient Cohorts

The Impact of Belief Values on the Identification of Patient Cohorts The Impact of Belief Values on the Identification of Patient Cohorts Travis Goodwin, Sanda M. Harabagiu Human Language Technology Research Institute University of Texas at Dallas Richardson TX, 75080 {travis,sanda}@hlt.utdallas.edu

More information

Pneumonia identification using statistical feature selection

Pneumonia identification using statistical feature selection Pneumonia identification using statistical feature selection Research and applications Cosmin Adrian Bejan, 1 Fei Xia, 1,2 Lucy Vanderwende, 1,3 Mark M Wurfel, 4 Meliha Yetisgen-Yildiz 1,2 < An additional

More information

An Ontology for Healthcare Quality Indicators: Challenges for Semantic Interoperability

An Ontology for Healthcare Quality Indicators: Challenges for Semantic Interoperability 414 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

A Deep Learning Approach to Identify Diabetes

A Deep Learning Approach to Identify Diabetes , pp.44-49 http://dx.doi.org/10.14257/astl.2017.145.09 A Deep Learning Approach to Identify Diabetes Sushant Ramesh, Ronnie D. Caytiles* and N.Ch.S.N Iyengar** School of Computer Science and Engineering

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information

Multifaceted Approach to CT Dose Reduction for Rule-Out Aortic Dissection. Exhibit ID

Multifaceted Approach to CT Dose Reduction for Rule-Out Aortic Dissection. Exhibit ID Multifaceted Approach to CT Dose Reduction for Rule-Out Aortic Dissection Exhibit ID 14002378 Judah Goldschmiedt a, Sharon Steinberger a, Esther Mizrachi a, David Esses b, Jeffrey M. Levsky a, Linda B.

More information

Building a framework for handling clinical abbreviations a long journey of understanding shortened words "

Building a framework for handling clinical abbreviations a long journey of understanding shortened words Building a framework for handling clinical abbreviations a long journey of understanding shortened words " Yonghui Wu 1 PhD, Joshua C. Denny 2 MD MS, S. Trent Rosenbloom 2 MD MPH, Randolph A. Miller 2

More information

A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records

A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records Hee-Jin Lee School of Biomedical Informatics The University of Texas Health

More information

Clinical Decision Support Technologies for Oncologic Imaging

Clinical Decision Support Technologies for Oncologic Imaging Clinical Decision Support Technologies for Oncologic Imaging Ramin Khorasani, MD, MPH Professor of Radiology Harvard Medical School Distinguished Chair, Medical Informatics Vice Chair, Department of Radiology

More information

Big Data in Healthcare: motivation, current state and specific use cases

Big Data in Healthcare: motivation, current state and specific use cases Big Data in Healthcare: motivation, current state and specific use cases Alejandro Rodríguez González Centro de Tecnología Biomédica Universidad Politécnica de Madrid Who we are?

More information

Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports

Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports Amol Wagholikar 1, Guido Zuccon 1, Anthony Nguyen 1, Kevin Chu 2, Shane Martin 2, Kim Lai 2, Jaimi Greenslade

More information

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD) Journal of the American Medical Informatics Association, 24(e1), 2017, e79 e86 doi: 10.1093/jamia/ocw109 Advance Access Publication Date: 18 August 2016 Research and Applications Research and Applications

More information

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION 9 CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION 2.1 INTRODUCTION This chapter provides an introduction to mammogram and a description of the computer aided detection methods of mammography. This discussion

More information

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Danchen Zhang 1, Daqing He 1, Sanqiang Zhao 1, Lei Li 1 School of Information Sciences, University of Pittsburgh, USA

More information

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer Application of AI in Healthcare Alistair Erskine MD MBA Chief Informatics Officer 1 Overview Why AI in Healthcare topic matters Is AI just another shiny objects? Geisinger AI collaborations Categories

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Large-scale Histopathology Image Analysis for Colon Cancer on Azure

Large-scale Histopathology Image Analysis for Colon Cancer on Azure Large-scale Histopathology Image Analysis for Colon Cancer on Azure Yan Xu 1, 2 Tao Mo 2 Teng Gao 2 Maode Lai 4 Zhuowen Tu 2,3 Eric I-Chao Chang 2 1 Beihang University; 2 Microsoft Research Asia; 3 UCSD;

More information

An unsupervised machine learning model for discovering latent infectious diseases using social media data

An unsupervised machine learning model for discovering latent infectious diseases using social media data Journal of Biomedical Informatics j o u r n al homepage: www.elsevier.com/locate/yj b i n An unsupervised machine learning model for discovering latent infectious diseases using social media data ARTICLE

More information

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System T.Manikandan 1, Dr. N. Bharathi 2 1 Associate Professor, Rajalakshmi Engineering College, Chennai-602 105 2 Professor, Velammal Engineering

More information

Mansourvar, Marjan; Andersen-Ranberg, Karen; Nøhr, Christian; Wiil, Uffe Kock

Mansourvar, Marjan; Andersen-Ranberg, Karen; Nøhr, Christian; Wiil, Uffe Kock Syddansk Universitet A Predictive Model for Acute Admission in Aged Population Mansourvar, Marjan; Andersen-Ranberg, Karen; Nøhr, Christian; Wiil, Uffe Kock Published in: Building Continents of Knowledge

More information

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD George Cernile Artificial Intelligence in Medicine Toronto, ON Carol L. Kosary National Cancer Institute Rockville, MD Using RCA A system to convert free text pathology reports into a database of discrete

More information

Session 35: Text Analytics: You Need More than NLP. Eric Just Senior Vice President Health Catalyst

Session 35: Text Analytics: You Need More than NLP. Eric Just Senior Vice President Health Catalyst Session 35: Text Analytics: You Need More than NLP Eric Just Senior Vice President Health Catalyst Learning Objectives Why text search is an important part of clinical text analytics The fundamentals of

More information

Extraction of Adverse Drug Effects from Clinical Records

Extraction of Adverse Drug Effects from Clinical Records MEDINFO 2010 C. Safran et al. (Eds.) IOS Press, 2010 2010 IMIA and SAHIA. All rights reserved. doi:10.3233/978-1-60750-588-4-739 739 Extraction of Adverse Drug Effects from Clinical Records Eiji Aramaki

More information

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR Medical Knowledge Attention Enhanced Neural Model for Named Entity Recognition in Chinese EMR Zhichang Zhang, Yu Zhang, Tong Zhou College of Computer Science and Engineering, Northwest Normal University,

More information

Automatic Pathology Software for Diagnosis of Non-Alcoholic Fatty Liver Disease

Automatic Pathology Software for Diagnosis of Non-Alcoholic Fatty Liver Disease Automatic Pathology Software for Diagnosis of Non-Alcoholic Fatty Liver Disease (OTT ID 1236) Inventors: Joseph Bockhorst and Scott Vanderbeck, Department of Computer Science and Electrical Engineering,

More information

Using Electronic Medical Records to Identify Complex Health Outcomes

Using Electronic Medical Records to Identify Complex Health Outcomes Using Electronic Medical Records to Identify Complex Health Outcomes Mary Anne Armstrong MA; Maqdooda Merchant, MA, MSc; Amy Alabaster, MS, MPH; Tina Raine-Bennett, MD, MPH; Debbie Postlethwaite RNP, MPH

More information

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Multi-modal Patient Cohort Identification from EEG Report and Signal Data Multi-modal Patient Cohort Identification from EEG Report and Signal Data Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Cerner COMPASS ICD-10 Transition Guide

Cerner COMPASS ICD-10 Transition Guide Cerner COMPASS ICD-10 Transition Guide Dx Assistant Purpose: To educate Seton clinicians regarding workflow changes within Cerner COMPASS subsequent to ICD-10 transition. Scope: Basic modules and functionality

More information

Prevalence of adrenal incidentaloma a methodologic comparison of EMR query strategies

Prevalence of adrenal incidentaloma a methodologic comparison of EMR query strategies Prevalence of adrenal incidentaloma a methodologic comparison of EMR query strategies Michio Taya, BA 1 ; Viktoriya Paroder, MD, PhD 2 ; Eran Bellin, MD 3,4 ; Linda Haramati, MD, MS 2,3 2 Departments of

More information

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE Dennis A. Dean, II, PhD Sanofi Auditorium July 13, 2017 sevenbridges.com A little about me Research Experience Analytics and

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION Yunxiang Mao 1, Zhaozheng Yin 1, Joseph M. Schober 2 1 Missouri University of Science and Technology 2 Southern Illinois University

More information

Does Machine Learning. In a Learning Health System?

Does Machine Learning. In a Learning Health System? Does Machine Learning Have a Place In a Learning Health System? Grand Rounds: Rethinking Clinical Research Friday, December 15, 2017 Michael J. Pencina, PhD Professor of Biostatistics and Bioinformatics,

More information

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

Symbolic rule-based classification of lung cancer stages from free-text pathology reports Symbolic rule-based classification of lung cancer stages from free-text pathology reports Anthony N Nguyen, 1 Michael J Lawley, 1 David P Hansen, 1 Rayleen V Bowman, 2 Belinda E Clarke, 3 Edwina E Duhig,

More information

Tweet Location Detection

Tweet Location Detection Tweet Location Detection Bahareh Rahmanzadeh Heravi Insight Centre for Data Analytics National University of Ireland Galway, Ireland Bahareh.Heravi@insightcentre.org Ihab Salawdeh Insight Centre for Data

More information

Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems Finn Kuusisto, MS 1 ; Inês Dutra, PhD 2 ; Mai Elezaby, MD 1 ; Eneida Mendonça, MD, PhD 1 ; Jude Shavlik, PhD 1 ; Elizabeth

More information

Icd 9 code for small cell lung cancer

Icd 9 code for small cell lung cancer 1-10-2017 ICD -10-CM Diagnosis Code C34.. The two main types are small cell lung cancer and non- small cell lung cancer.. Convert C34.90 to ICD - 9 -CM. Code History. 25-8-2009 What is the ICD 9 code for

More information

Characteristics of Inpatient Fever

Characteristics of Inpatient Fever 2016 International Conference on Computational Science and Computational Intelligence Characteristics of Inpatient Fever A Case in Teaching Hospital Prof. SuFeng Tseng NCCU MIS Taipei, TW (ROC) Email:

More information

Improving Patients' Understanding of Radiology Reports: Comparing Coverage of a Lay-Language Radiology Glossary to MedlinePlus

Improving Patients' Understanding of Radiology Reports: Comparing Coverage of a Lay-Language Radiology Glossary to MedlinePlus Improving Patients' Understanding of Radiology Reports: Comparing Coverage of a Lay-Language Radiology Glossary to MedlinePlus American College of Radiology National Meeting May 2017 Teresa Martin-Carreras,

More information