Biomedical resources for text mining

Similar documents
UMLS and phenotype coding

Issues in Integrating Epidemiology and Research Information in Oncology: Experience with ICD-O3 and the NCI Thesaurus

Identifying anatomical concepts associated with ICD10 diseases

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Toward a Unified Representation of Findings in Clinical Radiology

How to code rare diseases with international terminologies?

A Study on a Comparison of Diagnostic Domain between SNOMED CT and Korea Standard Terminology of Medicine. Mijung Kim 1* Abstract

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Automatically extracting, ranking and visually summarizing the treatments for a disease

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Mapping the UMLS Semantic Network into General Ontologies

A Lexical-Ontological Resource forconsumerheathcare

Data Harmonization Efforts Across the Cancer Staging System. Donna M. Gress, RHIT, CTR AJCC Technical Specialist

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D. e-vident

Extracting Diagnoses from Discharge Summaries

These Terms Synonym Term Manual Used Relationship

Chapter 12 Conclusions and Outlook

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

Term variation in clinical records First insights from a corpus study

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Latin Language course

Aligning Medical Domain Ontologies for Clinical Query Extraction

The Significance of SNODENT

Copyright Warning & Restrictions

A Survey on Semantic Similarity Measures between Concepts in Health Domain

Schema-Driven Relationship Extraction from Unstructured Text

Helping Healthcare Consumers Understand: An Interpretive Layer for Finding and Making Sense of Medical Information

SNOMED CT and Orphanet working together

Oncology Ontology in the NCI Thesaurus

GENETIC TESTING FOR NEUROFIBROMATOSIS

Facts from text: Automated gene annotation with ontologies and text-mining

Opposition principles and Antonyms in Medical Terminological Systems: Structuring Diseases Description with Explicit Existential Quantification

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic

Chapter 8 BIOMEDICAL ONTOLOGIES

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

Phenotype analysis in humans using OMIM

Evaluation of SNOMED Coverage of Veterans Health Administration Terms

Summarizing Drug Information in Medline Citations

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Rare Diseases Nomenclature and classification

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Knowledge Representation of Traditional Chinese Acupuncture Points Using the UMLS and a Terminology Model

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

CS 6824: Tissue-Based Map of the Human Proteome

Communications Sciences & Disorders Course Descriptions

Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection

Authoring SNOMED CT Generic Authoring Principles. Penni Hernandez and Cathy Richardson Senior Terminologists

Automatic generation of MedDRA terms groupings using an ontology

The Impact of Belief Values on the Identification of Patient Cohorts

Formalizing UMLS Relations using Semantic Partitions in the context of task-based Clinical Guidelines Model

SNOMED CT in General Practice systems

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

A Descriptive Delta for Identifying Changes in SNOMED CT

ONTOLOGY-BASED METHODS FOR DISEASE SIMILARITY ESTIMATION AND DRUG REPOSITIONING. A DISSERTATION IN Computer Science and Mathematics

TeamHCMUS: Analysis of Clinical Text

Towards an automated procedure for annotation of gene products through SAS Methods. Henrik Tveit Norwegian University of Science and Technology

Harvard-MIT Division of Health Sciences and Technology HST.952: Computing for Biomedical Scientists. Data and Knowledge Representation Lecture 6

ظظظ/ Omar Sami. Hussam Twaissi. Mousa Abbadi

(2017) Comparison of Ontology-Based Semantic-Similarity. in the Biomedical Text. Abstract. Keywords. 1. Introduction. Ahmad Fayez S.

EMBASE Find quick, relevant answers to your biomedical questions

KOS Design for Healthcare Decision-making Based on Consumer Criteria and User Stories

Looking at NLP approaches

Data driven Ontology Alignment. Nigam Shah

CDISC SHARE. CDISC Shared Health and Research Electronic Library Pilot Report. Date: 19 January 2010 Version: 1.0 CDISC, 2010

A framework for the study of diseases and adverse drug reactions

Using the CEN/ISO Standard for Categorial Structure to Harmonise the Development of WHO International Terminologies

Research Scholar, Department of Computer Science and Engineering Man onmaniam Sundaranar University, Thirunelveli, Tamilnadu, India.

School Curriculum and Standards Authority

How preferred are preferred terms?

EMBASE SEARCHING EMBASE

Hands-On Ten The BRCA1 Gene and Protein

LIGN171: Child Language Acquisition Developmental Disorders affecting language

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Telegraphic Language (Barrows00)

Weighted Ontology and Weighted Tree Similarity Algorithm for Diagnosing Diabetes Mellitus

EMOTION DETECTION FROM TEXT DOCUMENTS

International Classification of Diseases in the field of rare diseases Revision process. Ségolène Aymé Chair of the Topic Advisory Group At WHO

Neurofibromatosis 2: Genetics and Prenatal Diagnosis

Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies

USEFULNESS OF ONTOLOGIES FOR RARE DISEASES

Journal of Biomedical Informatics

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

5/22/2013. Alan Zuckerman 1, Swapna Abhyankar 1, Tiffany Colarusso 2, Richard Olney 2, Kristin Burns 3, Marci Sontag 4

Dynamic Knowledge Spaces in Dental Medicine

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

The Adrenal Glands. I. Normal adrenal gland A. Gross & microscopic B. Hormone synthesis, regulation & measurement. II.

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

ShARe Guidelines for the Annotation of Modifiers for Disorders in Clinical Notes

Deliberating on Ontologies: The Present Situation. Simon Milton Department of Information Systems, The University of Melbourne

Formal ontologies in biomedical knowledge representation

Clinical terms and ICD 11

Elsevier ClinicalKey TM FAQs

Lung Cancer Concept Annotation from Spanish Clinical Narratives

Year 2003 Paper two: Questions supplied by Tricia

ONTOLOGY. elearnig INITIATIVE PRAISE: Version number 1.0. Peer Review Network Applying Intelligence to Social Work Education

The Impact of Directionality in Predications on Text Mining

Mining Medline for New Possible Relations of Concepts

Transcription:

August 30, 2005 Workshop Terminologies and ontologies in biomedicine: Can text mining help? Biomedical resources for text mining Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

Overview An example Three types of resources Lexical resources Terminological resources Ontological resources Some issues 2

An example Neurofibromatosis 2

Neurofibromatosis 2 Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22. [Uppal, S., and A. P. Coatesworth. Neurofibromatosis Type 2. Int J Clin Pract, 57, no. 8, 2003, pp. 698-703.] 4

Entity recognition Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22. missed partial ambiguous Lexical resources Ontologies 5

Relation extraction Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22. vestibular schwannomas manifestation of neurofibromatosis 2 neurofibromatosis 2 associated with mutation of NF2 gene NF2 gene located on chromosome 22 Ontologies 6

Resources for text mining

Types of resources Lexical resources Collections of lexical items Additional information Part of speech Spelling variants Useful for entity recognition UMLS SPECIALIST Lexicon, WordNet Ontological resources Collections of kinds of entities (substances, qualities, processes) relations among them Useful for relation extraction UMLS Semantic Network, SNOMED CT 8

Types of resources (revisited) Lexical and terminological resources Mostly collections of names for biomedical entities Often have some kind or hierarchical organization (e.g., relations) Ontological resources Mostly collections of relations among biomedical entities Sometimes also collect names 9

Unified Medical Language System SPECIALIST Lexicon 200,000 lexical items Part of speech and variant information Metathesaurus 5M names from over 100 terminologies 1M concepts 16M relations Semantic Network 135 high-level categories 7000 relations among them Lexical resources Terminological resources Ontological resources 10

Lexical resources SPECIALIST Lexicon

SPECIALIST Lexicon Content English lexicon Many words from the biomedical domain 200,000+ lexical items Word properties morphology orthography syntax Used by the lexical tools 12

SPECIALIST Lexicon record { } base=hemoglobin (base form) spelling_variant=haemoglobin entry=e0031208 (identifier) cat=noun (part of speech) variants=uncount (no plural) variants=reg (plural: hemoglobins, haemoglobins) 13

Lexical tools To manage lexical variation in biomedical terminologies Major tools Normalization Indexes Lexical Variant Generation program (lvg( lvg) Based on the SPECIALIST Lexicon Used by noun phrase extractors, search engines 14

Normalization Remove genitive Remove stop words Lowercase Strip punctuation Uninflect Sort words Hodgkin s diseases, NOS Hodgkin diseases, NOS Hodgkin diseases, hodgkin diseases, hodgkin diseases hodgkin disease disease hodgkin 15

Normalization: Example Hodgkin Disease HODGKINS DISEASE Hodgkin's Disease Disease, Hodgkin's Hodgkin's, disease HODGKIN'S DISEASE Hodgkin's disease Hodgkins Disease Hodgkin's disease NOS Hodgkin's disease, NOS Disease, Hodgkins Diseases, Hodgkins Hodgkins Diseases Hodgkins disease hodgkin's disease Disease, Hodgkin normalize disease hodgkin 16

Normalization Applications Model for lexical resemblance Help find lexical variants for a term Terms that normalize the same usually share the same LUI Help find candidates to synonymy among terms Help map input terms to UMLS concepts 17

Terminological resources UMLS Metathesaurus

Source Vocabularies (2005AA) 134 source vocabularies 132 contributing concept names Broad coverage of biomedicine 5M names 1M concepts 16M relations Common presentation 19

Integrating subdomains Clinical repositories Genetic knowledge bases Other subdomains SNOMED OMIM NCBI Taxonomy UMLS MeSH Biomedical literature Model organisms UWDA GO Genome Anatomy annotations 20

Addison s s Disease: Concept SNOMED MeSH AOD Read Codes Disease or Syndrome Addison s Disease C0001403 ADRENAL INSUFFICIENCY (ADDISON'S DISEASE) ADRENOCORTICAL INSUFFICIENCY, PRIMARY FAILURE Addison melanoderma Melasma addisonii Primary adrenal deficiency Asthenia pigmentosa Bronzed disease Insufficiency, adrenal primary Primary adrenocortical insufficiency Addison's, disease MALADIE D'ADDISON - French Addison-Krankheit - German Morbo di Addison - Italian DOENCA DE ADDISON - Portuguese ADDISONOVA BOLEZN' - Russian ENFERMEDAD DE ADDISON - Spanish A disease characterized by hypotension, weight loss, anorexia, weakness, and sometimes a bronze-like melanotic hyperpigmentation of the skin. It is due to tuberculosis- or autoimmune-induced disease (hypofunction) of the adrenal glands that results in deficiency of aldosterone and cortisol. In the absence of replacement therapy, it is usually fatal. 21

Organize concepts Inter-concept relationships: hierarchies from the source vocabularies Redundancy: multiple paths One graph instead of multiple trees (multiple inheritance) A C B B D E H E F H D E G H A B C D E F G H 22

Metathesaurus concepts Examples Neurofibromatosis type 2 s C0027832 Neurofibromatosis 2 NF2 s C0085114 Neurofibromatosis 2 genes peripheral neurofibromatosis s C0027831 Neurofibromatosis 1 [bilateral] vestibular schwannomas a C0027859 Neuroma, Acoustic mutation / mutations s C0026882 Mutation gene s C0017337 Genes merlin m C0254123 Neurofibromin 2 chromosome 22 s C0008665 Chromosomes, Human, Pair 22 23

Metahesaurus relations Examples Neurofibromin 2 Multiple parent concepts Membrane proteins Tumor suppressor proteins Signaling protein 1 child concept Merlin, Drosophila [MeSH] [MeSH] [NCI Thesaurus] [MeSH] Co-occurring occurring concepts in MEDLINE Neurofibromatosis 2 [13] Membrane proteins [8] 24

Ontological resources UMLS Semantic Network

Semantic types (135) tree structure 2 major hierarchies Semantic Network Entity Physical Object Conceptual Entity Event Activity Phenomenon or Process 26

Semantic Network Semantic network relationships (54) hierarchical (isa = is a kind of) among types Animal isa Organism Enzyme isa Biologically Active Substance among relations treats isa affects non-hierarchical Sign or Symptom diagnoses Pathologic Function Pharmacologic Substance treats Pathologic Function 27

Biologic Function hierarchy (isa) Biologic Function Physiologic Function Pathologic Function Organism Function Organ or Tissue Function Cell Function Molecular Function Cell or Molecular Dysfunction Disease or Syndrome Experimental Model of Disease Mental Process Genetic Function Mental or Behavioral Dysfunction Neoplastic Process 28

Associative (non-isa) relationships part of Anatomical Structure property of Organism Attribute Organism evaluation of Finding Laboratory or Test Result process of Sign or Symptom evaluation of disrupts Physiologic Function Biologic Function Pathologic Function Embryonic Structure conceptual part of Congenital Abnormality Anatomical Abnormality Acquired Abnormality Body System conceptual part of Fully Formed Anatomical Structure contains, produces disrupts Body Substance Injury or Poisoning conceptual part of co-occurs with conceptual part of Body Space or Junction location of Body Location or Region adjacent to Body Part, Organ or Organ Component part of Tissue part of Cell Cell Component part of part of Gene or Genome location of 29

Relationships can inherit semantics Semantic Network Fully Formed Anatomical Structure isa Body Part, Organ, or Organ Component location of Disease or Syndrome Pathologic Function isa Biologic Function isa Adrenal Cortex Metathesaurus location of Adrenal Cortical hypofunction 30

Some issues related to these resources

Ambiguity Neurofibromatosis 2 [disease] NF2 Neurofibromin 2 [protein] Neurofibromatosis 2 gene [gene] 32

Limited coverage e.g., Gene and protein names Additional sources Additional identification methods Genew Entrez Gene UniProt http://www.gene.ucl.ac.uk/nomenclature/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene http://www.ebi.uniprot.org/index.shtml 33

Conclusions

Conclusions Lexical and terminological resources enable entity recognition Terminological and ontological resources enable relation extraction But Text mining techniques can also benefit Terminologies: term extraction Ontologies: : ontology population 35

UMLS documentation and support UMLS homepage http://umlsinfo.nlm.nih.gov umlsinfo.nlm.nih.gov/ with links to all other UMLS information UMLSKS homepage http://umlsks.nlm.nih.gov umlsks.nlm.nih.gov/ with links to the User s s and Developer s s guides Email address for support custserv@nlm.nih.gov 36

Medical Ontology Research Contact: Web: olivier@nlm.nih.gov mor.nlm.nih.gov Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA