Schema-Driven Relationship Extraction from Unstructured Text

Size: px
Start display at page:

Download "Schema-Driven Relationship Extraction from Unstructured Text"

Transcription

1 Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2007 Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan Wright State University - Main Campus Follow this and additional works at: Part of the Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, and the Science and Technology Studies Commons Repository Citation Ramakrishnan, C. (2007). Schema-Driven Relationship Extraction from Unstructured Text.. This Presentation is brought to you for free and open access by the The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis) at CORE Scholar. It has been accepted for inclusion in Kno.e.sis Publications by an authorized administrator of CORE Scholar. For more information, please contact corescholar@

2 Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan Kno.e.sis Center, Wright State University, Dayton, OH

3 Outline Motivation Problem Description & Approach Results Future Work

4 Anecdotal Example mentioned_in UNDISCOVERED PUBLIC KNOWLEDGE Discovering connections hidden in text Harry Potter Nicolas Flammel mentioned_in Nicolas Poussin The Hunchback of Notre Dame member_of painted_by Holy Blood, Holy Grail written_by Victor Hugo member_of Priory of Sion cryptic_motto_of Et in Arcadia Ego The Da Vinci code mentioned_in member_of displayed_at mentioned_in Leonardo Da Vinci painted_by painted_by painted_by The Last Supper The Mona Lisa displayed_at The Louvre displayed_at The Vitruvian man Santa Maria delle Grazie

5 Motivation 1 Undiscovered Public knowledge in biology Migraine Stress? Calcium Channel Blockers Magnesium Swanson s Discoveries Spreading Cortical Depression PubMed Associations Discovered based on keyword searches These associations were discovered in 1986 followed by manually analysis of text to establish possible relevant relationships

6 Motivation 2 - Hypothesis Driven retrieval of Scientific Literature Migraine affects Stress inhibit isa Magnesium Patient Calcium Channel Blockers Keyword query: Migraine[MH] + Magnesium[MH] Complex Query PubMed Supporting Document sets retrieved

7 Motivation 3 -- Growth Rate of Public Knowledge Data captured per year = 1 exabyte (10 18 ) (Eric Neumann, Science, 2005) How much is that? Compare it to the estimate of the total words ever spoken by humans = 12 exabyte A small but significant portion is text data PubMed 16 Million abstracts MedlinePlus health information OMIM catalog of human genes and genetic disorders Undiscovered public knowledge may have also increased by a large amount

8 Our past work in Connection Discovery Semantic Associations over RDF graphs Discovery and Ranking Semantically Connected affects Migraine It is therefore critical to bridge the gap between unstructured and structured data Magnesium Assumption: by extracting Rich entities Semantic and relationships Stress Metadata inhibit containing between resulting entities isa related in semantic by a diverse set metadata of relationships Patient Calcium Channel Blockers

9 Outline Motivation Problem Description & Approach Results Future Work

10 Problem Extracting relationships between MeSH terms from PubMed Biologically active substance complicates causes affects UMLS Semantic Network Lipid causes affects Disease or Syndrome instance_of Fish Oils??????? instance_of Raynaud s Disease MeSH 9284 documents 5 documents 4733 documents PubMed

11 Background knowledge used UMLS A high level schema of the biomedical domain 136 classes and 49 relationships Synonyms of all relationship using variant lookup (tools from NLM) 49 relationship + their synonyms = ~350 mostly verbs MeSH T147 effect 22,000+ topics organized as a forest of 16 trees Used to query PubMed PubMed Over 16 million abstract Abstracts annotated with one or more MeSH terms T147 induce T147 etiology T147 cause T147 effecting T147 induced

12 Method Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) Entities (MeSH terms) in sentences occur in modified forms (TOP (S adenomatous (NP (NP (DT An) modifies (JJ excessive) hyperplasia (ADJP (JJ endogenous) (CC or) (JJ exogenous) An excessive ) (NN stimulation) endogenous ) (PP or exogenous (IN by) (NP stimulation (NN estrogen) modifies ) ) ) (VP (VBZ induces) estrogen (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) Entities (NN endometrium) can also occur ) as ) ) composites ) ) ) of 2 or more other entities adenomatous hyperplasia and endometrium occur as adenomatous hyperplasia of the endometrium

13 Method Identify entities and Relationships in Parse Tree DT the NP NP JJ excessive ADJP NN stimulation JJ endogenous CC or JJ exogenous IN by PP TOP S NN estrogen VBZ induces VP JJ adenomatous NP NP NN hyperplasia Modifiers Modified entities Composite Entities IN of PP DT the NP NN endometrium

14 Entities The simple, the modified and the composite To capture the various types of entities we define Simple entities as MeSH terms Modifiers as siblings of entities that are Determiners Y induces no X Noun Phrases An excessive endogenous or exogenous stimulation Adjective phrases adenomatous Prepositional phrases M is induced by the X in the Z Modified Entities as any entity that has a sibling which is a modifier Composite Entity as any entity that has another entity as a sibling

15 Resulting RDF adenomatous hyperplasia hasmodifier An excessive endogenous or exogenous stimulation hasmodifier modified_entity2 haspart haspart modified_entity1 induces composite_entity1 haspart estrogen haspart Modifiers Modified entities Composite Entities endometrium

16 Outline Motivation Approach Results Future Work

17 Results Dataset 1 Swanson s discoveries Associations between Migraine and Magnesium [Hearst99] stress is associated with migraines stress can lead to loss of magnesium calcium channel blockers prevent some migraines magnesium is a natural calcium channel blocker spreading cortical depression (SCD) is implicated in some migraines high levels of magnesium inhibit SCD migraine patients have high platelet aggregability magnesium can suppress platelet aggregability

18 Results Creation of Dataset 1 Keywords pairs e.g. stress + migraine etc. against PubMed return PubMed abstracts that are annotated (by NLM) with both terms 8 pairs of terms in this scenario result in 8 subsets of PubMed Semantic Metadata Represented in RDF With complex entities and relationships connecting them Pointers to original document and sentence Size ~2MB RDF for Migraine Magnesium subset of PubMed

19 Evaluating the Result of Extraction Ideal method to evaluate the Extraction method Domain experts read a set of abstract given a set of relationship names and entities to look for In addition to this give them the extracted triples and entities For every abstract the expert judges counts the correct, incorrect and missed triples Measure precision and recall

20 Evaluating the Result of Extraction In the absence of a domain expert we focus of getting a feel for the utility of the extracted data We know the association manually discovered between Migraine and Magnesium We locate paths of various lengths between them and manually inspect these paths If the paths are indicative of the manually discovered associations the extracted data is useful

21 Paths between Migraine and Magnesium Paths are considered interesting if they have one or more named relationship Other than haspart or hasmodifiers in them

22 An example of such a path stimulated caused_by migraine (D008881) haspart platelet (D001792) haspart collagen (D003094) haspart magnesium (D008274) stimulated me_2286 _13%_and_17%_adp_and_collagen_induced_platelet_aggregation me_3142 by_a_primary_abnormality_of_platelet_behavior

23 Results Dataset 2 Neoplasm (C04) For subtree of MeSH rooted at Neoplasms all topics under this subtree are used as query terms against PubMed The resulting dataset contains ~500,000 PubMed abstracts The extraction process run on this data returns ~150MB Processing the tagged and parsed sentences for Dataset 2 (Neoplasm) to generate RDF took approx. 5 minutes Stats 211 different named relationships found 500,000 instance-property-instance statements 260,000 instance-property-literal statements Currently setting up to extract RDF from all of PubMed

24 Outline Motivation Problem Description & Approach Results Future Work

25 Future Extensions to the Extraction process Short-term goals (1 month) MeSH qualifiers (blood pressure, contraindications) Curate and release Migraine-Magnesium RDF Long-Term goals More complex structures Conjunctions X causes Y to inhibit Z Rule-action language to test new extraction rules Finding new terms to enrich existing vocabularies Perhaps ontology enrichment

26 The projected future of research in Biology From Hypothesis driven wet lab experiments To Data-driven reduction/pruning of hypothesis space leading to new insight and possibly discovery What challenges does this transition bring?

27 Use of Generated Semantic Metadata Semantic Browsing of PubMed based on named relationships between MeSH terms Path/hypothesis based document retrieval Knowledge discovery from literature Coprus-based complex relationship discovery and ranking Corpus-based relevant connection subgraph discovery

28 Support such retrieval and discovery operations across multiple data sources Extract Semantic Metadata about entities in all of these databases that might occur in PubMed text Resulting metadata will contain relationships between genes (OMIM), diseases (MeSH), nucleotide anomalies (SNP) hypothesis validation and knowledge discovery in biology.

29 THANK YOU!

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Dr. Sascha Losko, Dr. Karsten Wenger, Dr. Wenzel Kalus, Dr. Andrea Ramge, Dr. Jens Wiehler,

More information

Automated Annotation of Biomedical Text

Automated Annotation of Biomedical Text Automated Annotation of Biomedical Text Kevin Livingston, Ph.D. Postdoctoral Fellow Pharmacology Department, School of Medicine University of Colorado Anschutz Medical Campus Kevin.Livingston@ucdenver.edu

More information

Semantic Web & Semantic Web Services: Applications in Healthcare And Scientific Research

Semantic Web & Semantic Web Services: Applications in Healthcare And Scientific Research Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2005 Semantic Web & Semantic Web Services: Applications in Healthcare

More information

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS Semantic Alignment between ICD-11 and SNOMED-CT By Marcie Wright RHIA, CHDA, CCS World Health Organization (WHO) owns and publishes the International Classification of Diseases (ICD) WHO was entrusted

More information

How preferred are preferred terms?

How preferred are preferred terms? How preferred are preferred terms? Gintare Grigonyte 1, Simon Clematide 2, Fabio Rinaldi 2 1 Computational Linguistics Group, Department of Linguistics, Stockholm University Universitetsvagen 10 C SE-106

More information

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King

More information

Semantic Web Applications in Financial Industry, Government, Health Care and Life Sciences

Semantic Web Applications in Financial Industry, Government, Health Care and Life Sciences Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 3-28-2006 Semantic Web Applications in Financial Industry, Government,

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Seeking Informativeness in Literature Based Discovery

Seeking Informativeness in Literature Based Discovery Seeking Informativeness in Literature Based Discovery Judita Preiss University of Sheffield, Department of Computer Science Regent Court, 211 Portobello Sheffield S1 4DP, United Kingdom j.preiss@sheffield.ac.uk

More information

Towards Querying Bioinformatic Linked Data in Natural Langua

Towards Querying Bioinformatic Linked Data in Natural Langua Towards Querying Bioinformatic Linked Data in Natural Language September 6, 2013 Outline 1 Linked Data - the concept Existing approaches for querying 2 3 Linked Data - the concept Existing approaches for

More information

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2014 Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments Eric James Klosterman University

More information

Scientific Discovery as Link Prediction in Influence and Citation Graphs

Scientific Discovery as Link Prediction in Influence and Citation Graphs Scientific Discovery as Link Prediction in Influence and Citation Graphs Fan Luo, Marco Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu Text Graphs Workshop, June 6, 2018 1 Background 2 Psychology

More information

Biomedical resources for text mining

Biomedical resources for text mining August 30, 2005 Workshop Terminologies and ontologies in biomedicine: Can text mining help? Biomedical resources for text mining Olivier Bodenreider Lister Hill National Center for Biomedical Communications

More information

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic Clinical Genome Knowledge Base and Linked Data technologies Aleksandar Milosavljevic Topics 1. ClinGen Resource project 2. Building the Clinical Genome Knowledge Base 3. Linked Data technologies 4. Using

More information

Lecture 10: POS Tagging Review. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 10: POS Tagging Review. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 10: POS Tagging Review LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Overview Part-of-speech tagging Language and Computers, Ch. 3.4 Tokenization, POS tagging NLTK Book Ch.5

More information

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N

More information

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Opportunities for Biomedical Informatics Increasing roll-out

More information

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Multi-modal Patient Cohort Identification from EEG Report and Signal Data Multi-modal Patient Cohort Identification from EEG Report and Signal Data Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Extraction of Adverse Drug Effects from Clinical Records

Extraction of Adverse Drug Effects from Clinical Records MEDINFO 2010 C. Safran et al. (Eds.) IOS Press, 2010 2010 IMIA and SAHIA. All rights reserved. doi:10.3233/978-1-60750-588-4-739 739 Extraction of Adverse Drug Effects from Clinical Records Eiji Aramaki

More information

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erik M. van Mulligen, Zubair Afzal, Saber A. Akhondi, Dang Vo, and Jan A. Kors Department of Medical Informatics, Erasmus

More information

How to code rare diseases with international terminologies?

How to code rare diseases with international terminologies? How to code rare diseases with international terminologies? Ana Rath Inserm US14, Paris, France ana.rath@inserm.fr Special thanks to Prof Paul Landais for his kind presentation. Needs for terminologies

More information

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, July 2002, pp. 69-76. Association for Computational Linguistics. Analyzing the Semantics of Patient Data

More information

Statistical Weights of Mixed DNA Profiles

Statistical Weights of Mixed DNA Profiles Wright State University CORE Scholar Biological Sciences Faculty Publications Biological Sciences 12-2012 Statistical Weights of Mixed DNA Profiles Dan E. Krane Wright State University - Main Campus, dan.krane@wright.edu

More information

Using a grammar implementation to teach writing skills

Using a grammar implementation to teach writing skills Using a grammar implementation to teach writing skills Dan Flickinger CSLI, Stanford University Workshop on Technology Enhanced Learning GWC 2018, Singapore 12 January 2018 Goals Automated error detection

More information

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text Anthony Nguyen 1, Michael Lawley 1, David Hansen 1, Shoni Colquist 2 1 The Australian e-health Research Centre, CSIRO ICT

More information

Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008)

Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008) Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008) Michael Roylance and Nicholas Waltner Tuesday 3 rd June, 2014 Michael Roylance and Nicholas Waltner Looking

More information

Semantic empowerment of Health Care and Life Science Applications

Semantic empowerment of Health Care and Life Science Applications Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 5-26-2006 Semantic empowerment of Health Care and Life Science Applications

More information

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics How can Natural Language Processing help MedDRA coding? April 16 2018 Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics Summary About NLP and NLP in life sciences Uses of NLP with MedDRA

More information

Semantic Structure of the Indian Sign Language

Semantic Structure of the Indian Sign Language Semantic Structure of the Indian Sign Language Purushottam Kar and Achla M. Raina Indian Institute of Technology Kanpur 6 January 2008 Overview Indian Sign Language An Introduction Sociolinguistic and

More information

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.) INTEGRATED REPRESENTATION OF CLINICAL DATA AND MEDICAL KNOWLEDGE AN ONTOLOGY-BASED APPROACH FOR THE RADIOLOGY DOMAIN Heiner Oberkampf DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer.

More information

Facts from text: Automated gene annotation with ontologies and text-mining

Facts from text: Automated gene annotation with ontologies and text-mining 1. Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) Facts from text: Automated gene annotation with ontologies and text-mining Conrad Plake Schroeder Group (Bioinformatics),

More information

Kalpana Raja, PhD 1, Andrew J Sauer, MD 2,3, Ravi P Garg, MSc 1, Melanie R Klerer 1, Siddhartha R Jonnalagadda, PhD 1*

Kalpana Raja, PhD 1, Andrew J Sauer, MD 2,3, Ravi P Garg, MSc 1, Melanie R Klerer 1, Siddhartha R Jonnalagadda, PhD 1* A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical Knowledge Summarization: Combining Concept Extraction, Vector Similarity and Query Expansion for High Precision Kalpana Raja, PhD 1, Andrew

More information

HBML: A Representation Language for Quantitative Behavioral Models in the Human Terrain

HBML: A Representation Language for Quantitative Behavioral Models in the Human Terrain HBML: A Representation Language for Quantitative Behavioral Models in the Human Terrain Nils F. Sandell, Robert Savell David Twardowski, George Cybenko Conference on Social Computing, Behavioral Modeling,

More information

DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES

DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES KAZUHIRO SEKI Graduate School of Science and Technology, Kobe University 1-1 Rokkodai, Nada, Kobe 657-8501, Japan E-mail: seki@cs.kobe-u.ac.jp

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

A Method for Analyzing Commonalities in Clinical Trial Target Populations

A Method for Analyzing Commonalities in Clinical Trial Target Populations A Method for Analyzing Commonalities in Clinical Trial Target Populations Zhe (Henry) He 1, Simona Carini 2, Tianyong Hao 1, Ida Sim 2, and Chunhua Weng 1 1 Department of Biomedical Informatics, Columbia

More information

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks 1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand

More information

Hypertension encoded in GLIF

Hypertension encoded in GLIF Hypertension encoded in GLIF Guideline 2 (Based on the hypertension guideline. Simplified (not all contraindications, relative contra-indications, and relative indications are specified). Drug interactions

More information

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D. e-vident

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D. e-vident PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D e-vident Objectives Introducing Practitioner PubMed Searching for Systematic Reviews Combining Search Terms MeSH Search Outline What is PubMed Searching

More information

Animal Disease Event Recognition and Classification

Animal Disease Event Recognition and Classification 19th World Wide Web Conference WWW-2010 26-30 April 2010: Raleigh Conference Center, Raleigh, NC, USA Animal Disease Event Recognition and Classification Svitlana Volkova, Doina Caragea, William H. Hsu,

More information

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia JSCI 2017 1 Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan School of Information Technology King Mongkut's University

More information

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012 www.iedb.org Bjoern Peters bpeters@liai.org La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012 Overview 1. Introduction to the IEDB 2. Application: 2009 Swine-origin influenza virus

More information

Factuality Levels of Diagnoses in Swedish Clinical Text

Factuality Levels of Diagnoses in Swedish Clinical Text User Centred Networked Health Care A. Moen et al. (Eds.) IOS Press, 2011 2011 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-806-9-559 559 Factuality Levels of

More information

Mining Medline for New Possible Relations of Concepts

Mining Medline for New Possible Relations of Concepts Mining Medline for New ossible elations of Concepts Wei Huang,, Yoshiteru Nakamori, Shouyang Wang, and Tieju Ma School of Knowledge Science, Japan Advanced Institute of Science and Technology, Asahidai

More information

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods D. Weissenbacher 1, A. Sarker 2, T. Tahsin 1, G. Gonzalez 2 and M. Scotch 1

More information

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 1 Department of Biomedical Informatics, Columbia University, New York, NY, USA 2 Department

More information

UKParl: A Data Set for Topic Detection with Semantically Annotated Text

UKParl: A Data Set for Topic Detection with Semantically Annotated Text UKParl: A Data Set for Topic Detection with Semantically Annotated Text Federico Nanni, Mahmoud Osman, Yi-Ru Cheng, Simone Paolo Ponzetto and Laura Dietz My Research Post-Doc in computational social science

More information

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions Jannik Strötgen, Michael Gertz Institute of Computer Science, Heidelberg University Im Neuenheimer Feld 348, 69120 Heidelberg,

More information

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Christopher Wing and Hui Yang Department of Computer Science, Georgetown University,

More information

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview Innovative Risk and Quality Solutions for Value-Based Care Company Overview Meet Talix Talix provides risk and quality solutions to help providers, payers and accountable care organizations address the

More information

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani Semi-automatic WordNet Linking using Word Embeddings Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani January 11, 2018 Kevin Patel WordNet Linking via Embeddings 1/22

More information

Experiment Presentation CS Chris Thomas Experiment: What is an Object? Alexe, Bogdan, et al. CVPR 2010

Experiment Presentation CS Chris Thomas Experiment: What is an Object? Alexe, Bogdan, et al. CVPR 2010 Experiment Presentation CS 3710 Chris Thomas Experiment: What is an Object? Alexe, Bogdan, et al. CVPR 2010 1 Preliminaries Code for What is An Object? available online Version 2.2 Achieves near 90% recall

More information

SNOMED CT and Orphanet working together

SNOMED CT and Orphanet working together SNOMED CT and Orphanet working together Ian Green Business Services Executive, IHTSDO Dr. Romina Armando INSERM Session outline What is Orphanet? Rare disorders Orphanet nomenclature Mappings to other

More information

Building Cognitive Computing for Healthcare

Building Cognitive Computing for Healthcare Building Cognitive Computing for Healthcare Boston University Digital Health Initiative Round Table February 27 th 2017 Patrick McNeillie M.D. Clinical Lead and Senior Architect on Watson for Genomics

More information

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us SIIM 2016 Scientific Session Quality and Safety Part 1 Thursday, June 30 8:00 am 9:30 am Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us Linda C. Kelahan, MD, Medstar

More information

Data Structures vs. Study Results:

Data Structures vs. Study Results: Data Structures vs. Study Results: Confessions of a failed epidemiologist who had an informatics epiphany CG Chute, MD DrPH, Bloomberg Distinguished Professor of Health Informatics April 7, 2015 1 Chris

More information

Automatically extracting, ranking and visually summarizing the treatments for a disease

Automatically extracting, ranking and visually summarizing the treatments for a disease Automatically extracting, ranking and visually summarizing the treatments for a disease Prakash Reddy Putta, B.Tech 1,2, John J. Dzak III, BS 1, Siddhartha R. Jonnalagadda, PhD 1 1 Division of Health and

More information

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY 1 Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov

More information

Insights into Analogy Completion from the Biomedical Domain

Insights into Analogy Completion from the Biomedical Domain Insights into Analogy Completion from the Biomedical Domain Denis Newman-Griffis, Albert Lai, Eric Fosler-Lussier The Ohio State University National Institutes of Health, Clinical Center Washington University

More information

Applying Universal Schemas for Domain Specific Ontology Expansion

Applying Universal Schemas for Domain Specific Ontology Expansion Applying Universal Schemas for Domain Specific Ontology Expansion Paul Groth, Sujit Pal, Darin McBeath, Brad Allen and Ron Daniel {p.groth, sujit.pal, d.mcbeath, b.allen, r.daniel}@elsevier.com Elsevier

More information

UMLS and phenotype coding

UMLS and phenotype coding One Medicine One Pathology: 2 nd annual CASIMIR Symposium on Human and Mouse Disease Informatics UMLS and phenotype coding Anita Burgun, Fleur Mougin, Olivier Bodenreider INSERM U936, EA 3888- Faculté

More information

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Authors: Xiao Liu, Department of Management Information Systems, University of Arizona Hsinchun Chen, Department of

More information

Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing

Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing Alex Lascarides (slides from Alex Lascarides, Henry Thompson, Nathan Schneider and Sharon Goldwater) 6 March 2018 Alex Lascarides

More information

Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems

Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems Suresh K. Bhavnani 1 PhD, Annie Abraham 1, Christopher Demeniuk 1, Messeret Gebrekristos 1 Abe Gong

More information

The Impact of Directionality in Predications on Text Mining

The Impact of Directionality in Predications on Text Mining The Impact of Directionality in Predications on Text Mining Gondy Leroy 1, Marcelo Fiszman 2, Thomas C. Rindflesch 2 1 School of Information Systems and Technology, Claremont Graduate University; 2 Lister

More information

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013. i Curriculum Vitae Name: Deepal Dhariwal. Degree and date to be conferred: Masters in Computer Science, 2013. Secondary education: Dr. Kalmadi Shamarao High School, Pune, 2005 Fergusson College, Pune 2007

More information

W3C Semantic Sensor Networks Ontologies, Applications, and Future Directions

W3C Semantic Sensor Networks Ontologies, Applications, and Future Directions IERC AC4: Semantic Interoperability Workshop On Semantic Interoperability (June 19-20, 2012) http://www.iot-week.eu/ W3C Semantic Sensor Networks Ontologies, Applications, and Future Directions Cory Henson

More information

Semantic Infrastructure for Automated Lipid Classification

Semantic Infrastructure for Automated Lipid Classification Semantic Infrastructure for Automated Lipid Classification Christopher J.O. Baker University of New Brunswick, Canada bakerc@unb.ca AWOSS 10.2 Moncton, NB Nov 10th, 2010 Lipid Ontology: a history 2007

More information

Leman Akoglu Hanghang Tong Jilles Vreeken. Polo Chau Nikolaj Tatti Christos Faloutsos SIAM International Conference on Data Mining (SDM 2013)

Leman Akoglu Hanghang Tong Jilles Vreeken. Polo Chau Nikolaj Tatti Christos Faloutsos SIAM International Conference on Data Mining (SDM 2013) Leman Akoglu Hanghang Tong Jilles Vreeken Polo Chau Nikolaj Tatti Christos Faloutsos 2013 SIAM International Conference on Data Mining (SDM 2013) What can we say about this list of authors? Use relational

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Data and Text-Mining the ElectronicalMedicalRecord for epidemiologicalpurposes

Data and Text-Mining the ElectronicalMedicalRecord for epidemiologicalpurposes SESSION 2: MASTER COURSE Data and Text-Mining the ElectronicalMedicalRecord for epidemiologicalpurposes Dr Marie-Hélène Metzger Associate Professor marie-helene.metzger@aphp.fr 1 Assistance Publique Hôpitaux

More information

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach Osama Mohammed, Rachid Benlamri and Simon Fong* Department of Software Engineering, Lakehead University, Ontario, Canada

More information

Automatic Indexing and Retrieving Context-Specific Evidence to Support Physician Decision Making at the Point of Care

Automatic Indexing and Retrieving Context-Specific Evidence to Support Physician Decision Making at the Point of Care Automatic Indexing and Retrieving Context-Specific Evidence to Support Physician Decision Making at the Point of Care MET Research Group, University of Ottawa Craig Kuziemsky, University of Ottawa, Ottawa

More information

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson RC25496 (WAT1409-068) September 24, 2014 Computer Science IBM Research Report Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei Tsou IBM Research

More information

Learning the Fine-Grained Information Status of Discourse Entities

Learning the Fine-Grained Information Status of Discourse Entities Learning the Fine-Grained Information Status of Discourse Entities Altaf Rahman and Vincent Ng Human Language Technology Research Institute The University of Texas at Dallas Plan for the talk What is Information

More information

Automatic coding of death certificates to ICD-10 terminology

Automatic coding of death certificates to ICD-10 terminology Automatic coding of death certificates to ICD-10 terminology Jitendra Jonnagaddala 1,2, * and Feiyan Hu 3 1 School of Public Health and Community Medicine, UNSW Sydney, Australia 2 Prince of Wales Clinical

More information

Phenobridge WP 7. Crossing the species bridge between mouse and human. 17 February 2015, Helmholtz Zentrum München

Phenobridge WP 7. Crossing the species bridge between mouse and human. 17 February 2015, Helmholtz Zentrum München Phenobridge WP 7 Crossing the species bridge between mouse and human 17 February 2015, Helmholtz Zentrum München Michael Raess on behalf of the WP7 collaborators Who is WP7? Helen Parkinson, Nathalie Conte,

More information

Combining unsupervised and supervised methods for PP attachment disambiguation

Combining unsupervised and supervised methods for PP attachment disambiguation Combining unsupervised and supervised methods for PP attachment disambiguation Martin Volk University of Zurich Schönberggasse 9 CH-8001 Zurich vlk@zhwin.ch Abstract Statistical methods for PP attachment

More information

Medical information: Where to find it, what to trust. Lewis H. Rowett Executive Editor Annals of Oncology

Medical information: Where to find it, what to trust. Lewis H. Rowett Executive Editor Annals of Oncology Medical information: Where to find it, what to trust Lewis H. Rowett Executive Editor Annals of Oncology Presentation structure The problem The question +Use Google ~better Trust and trust proxies Medline

More information

Information Extraction

Information Extraction Information Extraction Claire Cardie Cornell University Information Extraction Introduction Task definition Evaluation IE system architecture Acquiring extraction patterns Manually defined patterns Learning

More information

A Corpus of Clinical Narratives Annotated with Temporal Information

A Corpus of Clinical Narratives Annotated with Temporal Information A Corpus of Clinical Narratives Annotated with Temporal Information Lucian Galescu Nate Blaylock Florida Institute for Human and Machine Cognition (IHMC) Pensacola, Florida, USA {lgalescu, blaylock}@ihmc.us

More information

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing Chapter IR:VIII VIII. Evaluation Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing IR:VIII-1 Evaluation HAGEN/POTTHAST/STEIN 2018 Retrieval Tasks Ad hoc retrieval:

More information

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13 University of Pittsburgh Cancer Institute UPMC CancerCenter Uma Chandran, MSIS, PhD chandran@pitt.edu 412-648-9326 2/21/13 University of Pittsburgh Cancer Institute Founded in 1985 Director Nancy Davidson,

More information

A Descriptive Delta for Identifying Changes in SNOMED CT

A Descriptive Delta for Identifying Changes in SNOMED CT A Descriptive Delta for Identifying Changes in SNOMED CT Christopher Ochs, Yehoshua Perl, Gai Elhanan Department of Computer Science New Jersey Institute of Technology Newark, NJ, USA {cro3, perl, elhanan}@njit.edu

More information

Do Future Work sections have a purpose?

Do Future Work sections have a purpose? Do Future Work sections have a purpose? Citation links and entailment for global scientometric questions Simone Teufel University of Cambridge and Tokyo Institute of Technology August 11, 2017 1/30 Future

More information

cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics

cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics Biomedical Middleware: cagrid cagrid Components Security (GAARDS) Language (metadata, ontologies) Semantic/Federated

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Assessing the Implications for Close Relatives in the Event of Similar but Non-Matching DNA Profiles

Assessing the Implications for Close Relatives in the Event of Similar but Non-Matching DNA Profiles Wright State University CORE Scholar Biological Sciences Faculty Publications Biological Sciences 2-22-2007 Assessing the Implications for Close Relatives in the Event of Similar but Non-Matching DNA Profiles

More information

Clinical Coreference Annotation Guidelines (with excerpts from ODIE guidelines and modified for SHARP) Arrick Lanfranchi and Kevin Crooks

Clinical Coreference Annotation Guidelines (with excerpts from ODIE guidelines and modified for SHARP) Arrick Lanfranchi and Kevin Crooks Clinical Coreference Annotation Guidelines (with excerpts from ODIE guidelines and modified for SHARP) Arrick Lanfranchi and Kevin Crooks The following is a proposal/summary of the ODIE guidelines with

More information

Aligning Medical Domain Ontologies for Clinical Query Extraction

Aligning Medical Domain Ontologies for Clinical Query Extraction Aligning Medical Domain Ontologies for Clinical Query Extraction Pinar Wennerberg Siemens AG, Munich Germany TU Darmstadt, Darmstadt Germany pinar.wennerberg.ext@siemens.com Abstract Often, there is a

More information

PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes

PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes Son Doan, Hyeoneui Kim Division of Biomedical Informatics University of California San Diego Open Access Journal

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

Novel Integrative Bioinformatics Approaches to Biomedical Ontology Practice for Translational Informatics. Sirarat Sarntivijai

Novel Integrative Bioinformatics Approaches to Biomedical Ontology Practice for Translational Informatics. Sirarat Sarntivijai Novel Integrative Bioinformatics Approaches to Biomedical Ontology Practice for Translational Informatics by Sirarat Sarntivijai A dissertation submitted in partial fulfillment of the requirements for

More information

TIES Cancer Research Network Y2 Face to Face Meeting U24 CA October 29 th, 2014 University of Pennsylvania

TIES Cancer Research Network Y2 Face to Face Meeting U24 CA October 29 th, 2014 University of Pennsylvania TIES Cancer Research Network Y2 Face to Face Meeting U24 CA 180921 Session IV The Future of TIES October 29 th, 2014 University of Pennsylvania Afternoon Other Uses of TIES/Future of TIES 12:45-1:15 TIES

More information

Formalizing UMLS Relations using Semantic Partitions in the context of task-based Clinical Guidelines Model

Formalizing UMLS Relations using Semantic Partitions in the context of task-based Clinical Guidelines Model Formalizing UMLS Relations using Semantic Partitions in the context of task-based Clinical Guidelines Model Anand Kumar, Matteo Piazza, Silvana Quaglini, Mario Stefanelli Laboratory of Medical Informatics,

More information

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Sumithra VELUPILLAI, Ph.D. Oslo, May 30 th 2012 Health Care Analytics and Modeling, Dept. of Computer and Systems Sciences

More information

Rare Diseases Nomenclature and classification

Rare Diseases Nomenclature and classification Rare Diseases Nomenclature and classification Annie Olry ORPHANET - Inserm US14, Paris, France annie.olry@inserm.fr Using standards and embedding good practices to promote interoperable data sharing in

More information

SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson1, Basura Fernando1, Mark Johnson2 and Stephen Gould1 1 Australian National University 2

More information